A tipical question when we approach a machine learning problem is: what is the algorithm that fits better in my dataset?
In the previous post we have seen the possible classifications for the ML.NET algorithms and this is the first step to restrict the possible choices.
Now we can make considerations about what are our priorities in order to do the right choice. The precision is one of the parameter that we need to consider, sometimes certain level of aproximation is sufficient, somethimes no. Another parameter is the time we want to spend for the training; it’s very different from an algorithm to another and strictly related to it’s precision. Third, linearity, if the trend of our data follow a straight line, we can choose an algorithm of this type. Fourth, the number of parameters, that can affect the time of the training; we could have a huge number of parameters compared to data points or vice versa.
Sad this, we can analize the behaviours of the algorithms in order to understand how they can fit to our machine learning problem.
We can have different type of regression based on the distribution of our data, and we can choose the right algorithm based on their characteristics and the parameters discussed above.
Two algorithm belongs to the linear family algorithms: linear and bayesian linear. As you can imagine, these algorithms show better performances and results with linear data; the only difference is that the second one has better quality and so poor performances than the first one. StochasticDualCoordinateAscentRegressor and OnlineGradientDescentRegressor are examples of linear algorithm.
Another family is decision tree. Algorithms of this family works on non linear dataset and their major quality is the accurancy with accettable performances. Examples are FastTreeRegressor, FastTreeTweedieRegressor, LightGbmRegressor.
The decision forest family derives from decision trees but limit some overfitting problems related to these. By splitting the dataset in random subset, this algorithm build multiple decision tree; thus every tree will retrieves a predicted value and the value with the maximum votes will be the result of the decision forest. FastForestRegressor is an example.
Binary classification algorithms
Like regression algorithms, we have some that belongs to the linear family: StochasticDualCoordinateAscentBinaryClassifier, StochasticGradientDescentBinaryClassifier, LinearSvmBinaryClassifier.
As well as FastTreeBinaryClassifier and FastForestBinaryClassifier belongs to the decision tree family.
Logistic regression is a family that deal with analysis where we have dependent variable that is binary; LogisticRegressionBinaryClassifier is an algorithm of this type.
AveragedPerceptronBinaryClassifier belongs to the average perception family and it’s prediction is based on a linear function; these algorithms deal with linear problems and has good accurancy and performances.
FieldAwareFactorizationMachineBinaryClassifier is an algorithm method base on Advanced Stochastic Gradient Method; is particular useful for high dimensional arrays with sparse data.