Contributed by Cory Schlesener, B.S.
Machine Learning (ML) enables powerful analysis of data to formulate models. These models can be utilized in applications or help dive into the data for insight on relationships between features. There are many varieties of models and the algorithms that create them. Different varieties of core learning algorithms are better suited to generating models for some types of datasets and questions over others. The complexity of decisions extend beyond algorithm/model-type selection and fans out into many possible sub-choices. The additional choices, in selecting parameter values, determines how an algorithm carries out learning, affecting the end resulting model. Trail and error testing, combined with insights to how ML is being carried out, can be used to assess and fine tune the process. However, acquiring detail expertise in various ML subcategories is not required with the aid of further automation. Automated machine learning
(autoML) can run multiple programs to test out under a variety of parameters, and repeat in an iterative process. Running an autoML algorithm will gradually find the best ML algorithms, under the right parameters, to produce the best model(s) to fit the dataset and address the objective tasked. Automating this model selection and tuning process expands usability to a wider population, and this goal is being pursued by a large variety of teams and platforms. One of the more useful/popular autoML tools is auto-sklearn, built on the popular, open-source machine learning toolkit SciKit-Learn. Auto-sklearn utilizes a wide variety of algorithms, allowing for broader use, and is entering version 2.0 with improvements on efficiency.
Feurer, Matthias, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. “Auto-sklearn 2.0: The next generation.” arXiv preprint arXiv:2007.04074 (2020).