Choosing an ML.NET algorithm

A tipical question when we approach a machine learning problem is: what is the algorithm that fits better in my dataset?

In the previous post we have seen the possible classifications for the ML.NET algorithms and this is the first step to restrict the possible choices.

Now we can make considerations about what are our priorities in order to do the right choice. The precision is one of the parameter that we need to consider, sometimes certain level of aproximation is sufficient, somethimes no. Another parameter is the time we want to spend for the training; it’s very different from an algorithm to another and strictly related to it’s precision. Third, linearity, if the trend of our data follow a straight line, we can choose an algorithm of this type. Fourth, the number of parameters, that can affect the time of the training; we could have a huge number of parameters compared to data points or vice versa.

Sad this, we can analize the behaviours of the algorithms in order to understand how they can fit to our machine learning problem.

Regression algorithms

We can have different type of regression based on the distribution of our data, and we can choose the right algorithm based on their characteristics and the parameters discussed above.

Two algorithm belongs to the linear family algorithms: linear and bayesian linear. As you can imagine, these algorithms show better performances and results with linear data; the only difference is that the second one has better quality and so poor performances than the first one. StochasticDualCoordinateAscentRegressor and OnlineGradientDescentRegressor are examples of linear algorithm.

Another family is decision tree. Algorithms of this family works on non linear dataset and their major quality is the accurancy with accettable performances. Examples are FastTreeRegressor, FastTreeTweedieRegressor, LightGbmRegressor.

The decision forest family derives from decision trees but limit some overfitting problems related to these. By splitting the dataset in random subset, this algorithm build multiple decision tree; thus every tree will retrieves a predicted value and the value with the maximum votes will be the result of the decision forest. FastForestRegressor is an example.

Poisson is the family of algorithms used to predict counts instead of specific values. We have a PoissonRegressor algorithm in ML.NET.

Binary classification algorithms

Like regression algorithms, we have some that belongs to the linear family: StochasticDualCoordinateAscentBinaryClassifier, StochasticGradientDescentBinaryClassifier, LinearSvmBinaryClassifier.
As well as FastTreeBinaryClassifier and FastForestBinaryClassifier belongs to the decision tree family.
Logistic regression is a family that deal with analysis where we have dependent variable that is binary; LogisticRegressionBinaryClassifier is an algorithm of this type.
AveragedPerceptronBinaryClassifier belongs to the average perception family and it’s prediction is based on a linear function; these algorithms deal with linear problems and has good accurancy and performances.
FieldAwareFactorizationMachineBinaryClassifier is an algorithm method base on Advanced Stochastic Gradient Method; is particular useful for high dimensional arrays with sparse data.

Multi classification algorithms

We have similar algorithms for the binary classification, like LogisticRegressionClassifier and StochasticDualCoordinateAscentClassifier.

NaiveBayesClassifier is a bayesian linear algorithm, so as discussed above it’s recommended if we have a number of features and a dataset relatively small.


As you can see, if you know the behaviours of your data and how the algorithms are organized, you can initially choose a subset of them. Microsoft provides a cheat sheet as well and is available an exaustive article about how you can choose an algorithm.
Anyway the final recommendation is to choice an initial subset of algorithms with the rules discussed, then try all of them on the dataset and verify the results. The algorithm with better results will be which we’ll have to choose.
In the next posts we’ll see how we can train an algorithm with a dataset and how we can evaluate the elaboration results.
The technical content of this article is available on a GitHub project.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s