I trained a NER model following the spaCy Training Quickstart and only enabled the ner pipeline for training since it is the only data I have.
Here is the partial config
[nlp]
lang = "en"
pipeline = ["tok2vec","ner","tagger"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
...
[components.tagger]
source = "en_core_web_sm"
component = "tagger"
replace_listeners = ["model.tok2vec"]
...
[training]
...
frozen_components = ["tagger"]
Now when I get entity predictions, there are no POS tags.
For example, a ent in doc.ents will have no pos_ on the tokens.
>>> ent
Some Entity
>>> ent.label_
'LABEL_NAME'
>>> [token.pos_ for token in ent]
['', '']
So how do I only train the ner pipeline and still allow POS tags to be predicted with the tagger?
Is there a way to load the POS tag predictions from another model such as using the en_core_web_sm for the tagger and using my trained model for the ner?
I am trying to use the frozen_components but it does not seem to work.
Yes, you can "source" a component from a different pipeline. See the sourcing components docs for general information about that, or the double NER project for an example of doing it with two NER components.
Basically you can do this:
Note that both pipelines need to have the same word vectors or this won't work, as described in the sourcing components docs. In this case the
smmodel has no word vectors, so it will work if your pipeline also has no word vectors, for example.