The version 1.0 ML.NET introduces some broken changes about the syntax used for the configuration and the learning phase; so if we had some test projects that we have used to do practice with the library, we have to upgrade that (and I’m one of these).
There are new classes, interfaces and methods, a new concept about the pipeline context and abstractions about chains, estimators, transformers.
Loading
The first step is loading the dataset from a physical path:
MlContext = new MLContext(); IDataView DataView = MlContext.Data.LoadFromTextFile<T>("data path", "separator", hasHeader: false);
So we have to specify the file path, the separator and if the file has an header, the result will be an object that implements the IDataView interface with the content of the file.
Transform
With transform operations we can copy, convert and concatenate the columns of the dataset, which we have to do before building model.
If we have an alphanumeric column to be predicted we have to convert it with the MapValueToKey method:
var pipeline = MlContext.Transforms.Conversion.MapValueToKey("column name");
Another option that we could have is a predicted column with a specific type like boolean, so we have to convert it:
var pipeline = MlContext.Transforms.Conversion.ConvertType("column name", outputKind: DataKind.Boolean);
The predicted column have to be copied into a column named Label, with CopyColumns method:
var keyColumn = pipeline.append(MlContext.Transforms.CopyColumns("Label", "column name"));
When the normalization of the predicted column has done, every alphanumeric column involved in the process have to be converted to a numeric one:
pipeline = pipeline.Append(MlContext.Transforms.Categorical.OneHotEncoding("column name"));
And then we concatenate all the columns in the Features column:
pipeline = pipeline.Append(MlContext.Transforms.Concatenate("Features", "columns array"));
Build Model
Building model means choose the algorithm and then fit it to the data view; the algorithm have to be appended to the pipeline:
pipeline = pipeline.Append(MlContext.Regression.Trainers.FastTree()).Fit(DataView);
Now the model has been built and is ready to be evaluated.
Evaluate
The model evaluation is very simple:
MlContext.Regression.Evaluate(_model.Transform(dataView));
The result will be an object with parameters about the accurancy of the model:
public sealed class RegressionMetrics { public double MeanAbsoluteError { get; } public double MeanSquaredError { get; } public double RootMeanSquaredError { get; } public double LossFunction { get; } public double RSquared { get; } }
We can predict values as well with the specific method:
var predictionEngine = MlContext.Model.CreatePredictionEngine<TModel, TPredictionModel>(_model); predictionEngine.Predict(data);
Where the model is the one used in the pipeline and the prediction model, in case of a regression problem, is a class with these properties:
public class RegressionPrediction : IPredictionModel<float> { [ColumnName("Score")] public float Score; }
Summary
The new ML.NET 1.0 pipeline has some broken changes about the loading of the data file, the transformation of the columns and the model building.
We are able to manage different data types, convert it into a numeric format, build and evaluate models.
A consistent number of algorithms (regression, classification) are available to predict values and add machine learning features in our applications.
You can find a sample library on GitHub.
Leave a Reply