I want to parse sentences with a spacy pipeline and then convert the docs into a single conllu file. But with
texts = ["First sentence.", "Second sentence.", "Third sentence."]
nlp = init_parser(language,
parser,
include_headers=True)
docs = list(nlp.pipe(texts))
I get multiple docs which I could convert to multiple conllu files with
for doc in docs:
conll = doc._.conll_str
But I want a single file.
If I merge the docs to one doc with
from spacy.tokens import Doc
concat_doc = Doc.from_docs(docs)
conll = doc._.conll_str
I get the following error: UserWarning: [W101] Skipping Doc custom extension 'conll_str' while merging docs.
which results in TypeError: write() argument must be str, not None when I want to write conll to a file.
If I loop through the docs and append them to a file, every sentence will get 1 as the sent_ID, which I don't want either.
Does anyone have an idea how I could manage to parse the sentences and write them to a single conllu file? Thank you very much.