Patterns in Machine Learning: A new Parallelization Work-Flow for Machine Learning Methods

Schöner, Holger; Roßbory, Michael

by Holger Schöner, Michael Roßbory

Abstract:

Parallelization of Machine Learning methods is an active research area, fuelled by the need for acceleration of complex computations, and the constant growth of numbers of samples and features in available data sets. Because several Machine Learning methods are general in the sense that they can be reused again and again for new learning tasks, it is common to collect these methods in libraries, e.g. the library mlpp at SCCH. Such libraries are intended to be used by several users on different hardware platforms. As a result, it is important that their parallelization does not introduce dependence on a restricted set of deployment environments. The ParaPhrase approach, besides having advantages in the modelling of parallelism and the parallelization process, promises to provide the needed exibility with respect to supported hardware, by targeting multicore machines, distributed clusters, and hardware accelerators like GPGPUs. The process of parallelizing one method in mlpp, Coordinate Descent, is illustrated in the following.

Reference:

Patterns in Machine Learning: A new Parallelization Work-Flow for Machine Learning Methods (Holger Schöner, Michael Roßbory), In Proceedings of HLPGPU, 2013.

Bibtex Entry:

@inproceedings{schoner_patterns_2013,
	address = {Berlin, Germany},
	title = {Patterns in Machine Learning: A new Parallelization Work-Flow for Machine Learning Methods},
	abstract = {Parallelization of Machine Learning methods is an active research area, fuelled by the need for acceleration of complex computations, and the constant growth of numbers of samples and features in available data sets. Because several Machine Learning methods are general in the sense that they can be reused again and again for new learning tasks, it is common to collect these methods in libraries, e.g. the library mlpp at {SCCH}. Such libraries are intended to be used by several users on different hardware platforms. As a result, it is important that their parallelization does not introduce dependence on a restricted set of deployment environments. The {ParaPhrase} approach, besides having advantages in the modelling of parallelism and the parallelization process, promises to provide the needed exibility with respect to supported hardware, by targeting multicore machines, distributed clusters, and hardware accelerators like {GPGPUs}. The process of parallelizing one method in mlpp, Coordinate Descent, is illustrated in the following.},
	booktitle = {Proceedings of {HLPGPU}},
	author = {Schöner, Holger and Roßbory, Michael},
	month = jan,
	year = {2013},
	file = {download/publications/HLPGPU2013-UseCasesML.pdf}
}