I'm attempting to use NamedEntityRecognition (NER)(https://github.com/dotnet/machinelearning/issues/630) to predict categories for words/phrases within a large body of text.
Currently using 3 Nuget packages to try get this working:
Microsoft.ML (3.0.0-preview.23511.1)
Microsoft.ML.TorchSharp (0.21.0-preview.23511.1)
Torchsharp-cpu (0.101.1)
At the point of training the model [estimator.Fit(dataView)], I get the following error:
Field not found: 'TorchSharp.torch.CUDA'.
I may have misunderstood something here, but I should be processing with CPU from the Torchsharp-cpu package and I'm not sure where the CUDA reference is coming from. This also appears to be a package reference rather than a field?
using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.TorchSharp;
using System;
using System.Collections.Generic;
using System.Windows.Forms;
namespace NerTester
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private class TestSingleSentenceData
{
public string Sentence;
public string[] Label;
}
private class Label
{
public string Key { get; set; }
}
private void startButton_Click(object sender, EventArgs e)
{
try
{
var context = new MLContext();
context.FallbackToCpu = true;
context.GpuDeviceId = null;
var labels = context.Data.LoadFromEnumerable(
new[] {
new Label { Key = "PERSON" },
new Label { Key = "CITY" },
new Label { Key = "COUNTRY" }
});
var dataView = context.Data.LoadFromEnumerable(
new List<TestSingleSentenceData>(new TestSingleSentenceData[] {
new TestSingleSentenceData()
{ // Testing longer than 512 words.
Sentence = "Alice and Bob live in the USA",
Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
},
new TestSingleSentenceData()
{
Sentence = "Alice and Bob live in the USA",
Label = new string[]{"PERSON", "0", "PERSON", "0", "0", "0", "COUNTRY"}
},
}));
var chain = new EstimatorChain<ITransformer>();
var estimator = chain.Append(context.Transforms.Conversion.MapValueToKey("Label", keyData: labels))
.Append(context.MulticlassClassification.Trainers.NameEntityRecognition(outputColumnName: "outputColumn"))
.Append(context.Transforms.Conversion.MapKeyToValue("outputColumn"));
var transformer = estimator.Fit(dataView);
transformer.Dispose();
MessageBox.Show("Success!");
}
catch (Exception ex)
{
MessageBox.Show($"Error: {ex.Message}");
}
}
}
}
Application is running on x64 and the documentation for NER appears to be limited.
Any help would be greatly appreciated.
Tried changing the Nuget packages I'm referencing, including the use if libtorch packages.
Attempted running the application in x86 and x64 configuration.
Added code to try force CPU usage rather than GPU (CUDA).
You will only need to reference 2 packages for that experiment
As
Microsoft.ML.TorchSharp
contains all the references you will need:Now the bad news.
At runtime you will get a bunch of errors related to missing files or dlls. I spent a good amount of time trying to figure out what I was missing but, I guess, it is just related to the versions of some libraries.
At the end I cloned the whole repo and compiled for my platform (Win-x64) and tried to find the files with different sizes (some don't have a version so I the size was the oonly option) and it boils down to 7 libs:
Those brought in by the compilation are all there ... just not the ones the ML.NET expects:
I replaced the dlls with the ones from the ML.NET repo, copied them in the folder
\bin\Debug\net7.0\runtimes\win-x64\native
and everything works fine:Maybe there is a smarter solution but I couldn't find any.UPDATE:
As anrouxel suggested on Github the best way is to use
libtorch-cpu
version1.13.0.1
: