Machine Learning algorithms tipologies in ML.NET

ML.NET is a new open source project released by Microsoft few months ago implemented for .NET applications and available as a NuGet package. Now in version 0.4 is growing rapidly but it already has a significative number of algorithms that show the power of the framework.

It’s a group of libraries used internally by microsoft developers team for years and now they have been collected in a project available for all users; differently from other libraries like TensorFlow, it’s an higher level library and we can start using this framework without deeply knowledge of mathematic and machine learning algorithms.

Generally, a ML.NET algorithm can belong to one of these types:

  • Regression algorithms
  • Two-Class Classification algorithms
  • Multi-Class Classification algorithms
  • Anomaly detection

In order to understand these tipologies we need to introduce some concepts about machine learning and the ML.NET models.

The model

The model is the representation of our dataset, therefore for every column in our dataset we have a property in the model.

So, if we have a csv file with rows data structured like this:


CMT,1,1,1271,3.8,CRD,17.5

that is an example of taxy fare data, our model should looks like this:


public class TaxyTrip
{
[Column("0")]
public string VendorId;

[Column("1")]
public string RateCode;

[Column("2")]
public float PassengerCount;

[Column("3")]
public float TripTime;

[Column("4")]
public float TripDistance;

[Column("5")]
public string PaymentType;

[Column("6", "Label")]
public float FareAmount;
}

We need to give an index to all the columns of the dataset; these columns are named features, the algorithm will use them to learn and predict the values. I’ve assigned a name Label to the last column, that is the column with the value we’ll want to predict. As we’ll see, this model will be used to train the algorithm with a set of features and relative predicted value, the label. Based on this training, the algorithm will be able to predict values for dataset where the label value is unknown.

Then, ML.NET will use a prediction model with a column named Score to store the predicted values, that should looks like this:


public class TaxiTripFarePrediction
{
[ColumnName("Score")]
public float FareAmount;
}

Now that we knows what are a feature, a label and a score, we can discuss about the algorithms types.

Regression algorithms

We use these algorithms when we need to predict numeric values. For example, based on a series of feature columns we want to predict the taxy fare (the score). In ML.NET the list of the algorithms is under the Microsoft.ML.Trainers namespace. You can distinguish the regression algorithms by the Regressor word at the end of the name, for example FastTreeRegressor.

Two-Class Classification algorithms

If we have to classify some feature values in two different categories, we use a Two-Class classification algorithm. Establish if the sentiment is positive or negative is a Two-Class classification problem. Like the regression algorithms, the Two-Class classification algorithms are under the Microsoft.ML.Trainers namespace and you can distinguish them by the BinaryClassifier word. FastTreeBinaryClassifier is one of them.

Multi-Class Classification algorithms

In case we have more than two categories we have to use a Multi-Class classification algorithm. An example is the grouping a set of flowers in different categories based on their features columns. Like the previous algorithms, these algorithms are under the Microsoft.ML.Trainers namespace and you can distinguish them by Classifier word, like LogisticRegressionClassifier.

Anomaly detection algorithms

When we want to identify unusual data point, we choose this type of algorithm. A classic case are fraud detections like unusual activities on credit cards. These algorithms can be identified by the convention AnomalyDetector at the end of the name. An algorithm of this type is PcaAnomalyDetector.

Summary

We have seen the main algorithms available in ML.NET and how they can be classified; the next step is understand how we can choose one algorithm instead of another, that will be the argument of the next post.

An completed project of ML.NET and the source code of this post are available on GitHub.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s