A tipical question when we approach a machine learning problem is: what is the algorithm that fits better in my dataset?

In the previous post we have seen the possible classifications for the ML.NET algorithms and this is the first step to restrict the possible choices.

Now we can make considerations about what are our priorities in order to do the right choice. The **precision** is one of the parameter that we need to consider, sometimes certain level of aproximation is sufficient, somethimes no. Another parameter is the **time** we want to spend for the training; it’s very different from an algorithm to another and strictly related to it’s precision. Third, **linearity**, if the trend of our data follow a straight line, we can choose an algorithm of this type. Fourth, the **number of parameters**, that can affect the time of the training; we could have a huge number of parameters compared to data points or vice versa.

Sad this, we can analize the behaviours of the algorithms in order to understand how they can fit to our machine learning problem.

### Regression algorithms

We can have different type of regression based on the distribution of our data, and we can choose the right algorithm based on their characteristics and the parameters discussed above.

Two algorithm belongs to the linear family algorithms: **linear and bayesian linear**. As you can imagine, these algorithms show better performances and results with linear data; the only difference is that the second one has better quality and so poor performances than the first one. **StochasticDualCoordinateAscentRegressor** and **OnlineGradientDescentRegressor** are examples of linear algorithm.

Another family is **decision tree**. Algorithms of this family works on non linear dataset and their major quality is the accurancy with accettable performances. Examples are **FastTreeRegressor**, **FastTreeTweedieRegressor**, **LightGbmRegressor**.

The **decision forest** family derives from decision trees but limit some overfitting problems related to these. By splitting the dataset in random subset, this algorithm build multiple decision tree; thus every tree will retrieves a predicted value and the value with the maximum votes will be the result of the decision forest. **FastForestRegressor** is an example.

**PoissonRegressor**algorithm in ML.NET.

### Binary classification algorithms

Like regression algorithms, we have some that belongs to the linear family: **StochasticDualCoordinateAscentBinaryClassifier**, **StochasticGradientDescentBinaryClassifier**, **LinearSvmBinaryClassifier**.

As well as **FastTreeBinaryClassifier** and **FastForestBinaryClassifier** belongs to the decision tree family.

Logistic regression is a family that deal with analysis where we have dependent variable that is binary; **LogisticRegressionBinaryClassifier** is an algorithm of this type.

**AveragedPerceptronBinaryClassifier** belongs to the average perception family and it’s prediction is based on a linear function; these algorithms deal with linear problems and has good accurancy and performances.

**FieldAwareFactorizationMachineBinaryClassifier** is an algorithm method base on Advanced Stochastic Gradient Method; is particular useful for high dimensional arrays with sparse data.

### Multi classification algorithms

We have similar algorithms for the binary classification, like **LogisticRegressionClassifier** and **StochasticDualCoordinateAscentClassifier**.

**NaiveBayesClassifier**is a bayesian linear algorithm, so as discussed above it’s recommended if we have a number of features and a dataset relatively small.

## Leave a Reply