No POS tags in newly trained spaCy NER model, how to enable?

509 Views Asked by At

I trained a NER model following the spaCy Training Quickstart and only enabled the ner pipeline for training since it is the only data I have.

Here is the partial config

[nlp]
lang = "en"
pipeline = ["tok2vec","ner","tagger"]
batch_size = 1000
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
...
[components.tagger]
source = "en_core_web_sm"
component = "tagger"
replace_listeners = ["model.tok2vec"]
...
[training]
...
frozen_components = ["tagger"]

Now when I get entity predictions, there are no POS tags.

For example, a ent in doc.ents will have no pos_ on the tokens.

>>> ent
Some Entity
>>> ent.label_
'LABEL_NAME'
>>> [token.pos_ for token in ent]
['', '']

So how do I only train the ner pipeline and still allow POS tags to be predicted with the tagger?

Is there a way to load the POS tag predictions from another model such as using the en_core_web_sm for the tagger and using my trained model for the ner?

I am trying to use the frozen_components but it does not seem to work.

1

There are 1 best solutions below

1
polm23 On

Yes, you can "source" a component from a different pipeline. See the sourcing components docs for general information about that, or the double NER project for an example of doing it with two NER components.

Basically you can do this:

import spacy

nlp = spacy.load("my_ner")
nlp_tagger = spacy.load("en_core_web_sm") # load the base pipeline
# give this component a copy of its own tok2vec
nlp_tagger.replace_listeners("tok2vec", "tagger", ["model.tok2vec"])

nlp.add_pipe(
    "tagger",
    name="tagger",
    source=nlp_tagger,
    after="ner",
)

Note that both pipelines need to have the same word vectors or this won't work, as described in the sourcing components docs. In this case the sm model has no word vectors, so it will work if your pipeline also has no word vectors, for example.