Training, prediction and evaluation with ML.NET

Train a ML.NET algorithm is the activity that we need to do when we want to prepare and algorithm to predict some values. As we’ll see we have to prepare the pipeline with some operations that will be propedeutic to the training, like load the dataset, convert alphanumeric colums and so on.
Now that we have seen what are the algorithms families and the how we can choose them, we can start to configure the learning pipeline.

The model

For amenity we summary the model previously described:

public class GlassData
public float IdNumber;
public float RefractiveIndex;
public float Sodium;
public float Magnesium;
public float Aluminium;
public float Silicon;
public float Potassium;
public float Calcium;
public float Barium;
public float Iron;
[Column("10", "Label")]
public string Type;
public class GlassTypePrediction
public string PredictedLabel;

So, we have some feature columns, the data about taxy trips and a label column, the fare amount.

We have a prediction model with the score field as well.


The first step is instantiate the learning pipeline and load the dataset:

var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader(@"traindata\glass.csv").CreateFrom<T>(separator: ','));

In this code we are reading the pipeline from a csv file, but, we can load data from a list as well:

var pipeline = new LearningPipeline();
var data = new List() {
new GlassData { IdNumber = 1, RefractiveIndex = 1.52101, Sodium = 13.64.... },
new GlassData { IdNumber = 2, RefractiveIndex = 1.51761, Sodium = 13.89.... }
var collection = CollectionDataSource.Create(data);

Obviously we can fill up the list as we prefer, for example from a database.
The next operations in the pipeline concerns the data transformations and what can we do with the input dataset. If we have a string label we have to Dictionarize it in order to convert the value in an index.
Because the algorithms requires numeric values, we have to convert all alphanumeric text in numbers with the CategoricalOneHotVectorizer method. Another step is concatenate all the features columns in a vectors, with ColumnConcatenator.
So the pipeline should looks like this:

var pipeline = new LearningPipeline();
pipeline.Add(new TextLoader(@"traindata\glass.csv").CreateFrom<T>(separator: ','));
pipeline.Add(new Dictionarizer("PredictedLabel"));
pipeline.Add(new ColumnConcatenator("Features", new[] { "IdNumber", "RefractiveIndex", "Sodium", "Magnesium", "Aluminium", "Silicon", "Potassium", "Calcium", "Barium", "Iron" }));

Then we are ready to add the algorithm:

pipeline.Add(new LogisticRegressionClassifier());

If necessary, another step of the pipeline is convert back the predicted label (number) for it’s original value. In this case we can use the PredictedLabelColumnOriginalValueConverter:

pipeline.Add(new PredictedLabelColumnOriginalValueConverter { PredictedLabelColumn = "PredictedLabel" });


We are ready for the model training and we can do that with the Train method; it’s a typed method and we have to pass our model and predicted model classes. The time of the training is strictly tied to the choosed algorithm. The result of this operation will be a function (model), that the framework will use to predict the values; we can store the model in a zip file:

var model = pipeline.Train<GlassData, GlassTypePrediction>();
await model.WriteAsync(modelPath);

Thus the prediction phase will be very fast and the framework will load the function from the zip file, apply the input values to the function and return the calculated result.
Save the model to disk is not mandatory but is highly recommended, otherwise you would have to train the algorithm for every prediction.


Once the algorithm is trained, we can evaluate it with a set of data. Conceptually, taken the dataset that we’ll use to train the algorithm, we can broken down it into two parts; we’ll use the bigger subset to train the algorithm and the smaller subset to evaluate it. The evaluator will use the result function to calculate the value, then it’ll compare this value with the labeled value in the dataset and generate some parameters that they’ll describe the deviation from the expected value. Based on the tipology of the algorithm (Regression, Binary classification, Multi classification) we have different evaluators: RegressionEvaluator, BinaryClassificationEvaluator, ClassificationEvaluator.

So, if we have saved the model after the training process we have to load it and make the evaluation with one of these classes:

var textLoader = new TextLoader(@"testdata\glass.csv").CreateFrom<T>(separator: ',');
var model = await PredictionModel.ReadAsync<GlassData, GlassTypePrediction>(@"models\");
var classificationEvaluator = new ClassificationEvaluator();
var classificationMetrics = classificationEvaluator.Evaluate(model, textLoader);

The result is of type ClassificationMetrics and we have a couple of parameters that we can check:

Console.WriteLine($"------------- {algorithm} - EVALUATION RESULTS -------------");
Console.WriteLine($"AccuracyMacro = {classificationMetrics.AccuracyMacro}");
Console.WriteLine($"AccuracyMicro = {classificationMetrics.AccuracyMicro}");
Console.WriteLine($"LogLoss = {classificationMetrics.LogLoss}");
Console.WriteLine($"LogLossReduction = {classificationMetrics.LogLossReduction}");
Console.WriteLine($"------------- {algorithm} - END EVALUATION -------------");

If we run the program from a test method we’ll get results like these:

As you can see, with these parameters we can understand what is the algorithm that fits better to our problem. The log loss is more reliable than the accurancy, so lower log loss means better results.


Once we have trained our algorithms and verified the better with evalutation metrics, we can start with predictions. Then we can use it to predict unknown values and the steps are very similar to the evaluation:

var model = await PredictionModel.ReadAsync<GlassData, GlassTypePrediction>(@"models\");
var result = model.Predict(data);

Data is an object of type GlassData with unknown type (Label) while result is of type GlassTypePrediction with the predicted value. If you prefer, you can get the list of all possible predicted values with the associated score, you have change the predicted model like this:

public class GlassTypePrediction
public float[] Scores;

Thus ML.NET knows that we want the list of all availables scores. Next we have to add a row to the prediction code:

model.TryGetScoreLabelNames(out string[] scoresLabels);

The scoreLabels variable contains an array of all the score labels in the same order of the scores contained in the predicted model. So we are able to generate a list of objects with the labels and the associated scores:

var scores = scoresLabels.Select(ls => new ScoreLabel()
Label = ls,
Score = prediction.Scores[Array.IndexOf(scoresLabels, ls)]

This could be very useful if we have a problem like ‘give me the first 5 more probable values’ and we may want to give this information to the final user.


We are at the end of this post where we have seen how we can load datas from a csv file or a list of objects, add operations of data transformations to the pipeline, train the algorithm, evaluate and predict results. In order to make predictions with few time we have understood that is high recommended to save the result model into a zip file. We have seen that we can ask ML.NET to returns the completed list of predicted values with their scores, that could be useful in some circumnstances.

You can find examples datasets to the UCI Machine Learning Repository.

All the code that we have seen is available on a GitHub project.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s