How to speed up sentence tokenization with Spacy

79 Views Asked by dufei At 21 March 2023 at 23:25

I am trying to extract the first sentences from a list of paragraphs with the following function (that I apply in a for loop):

def extract_first_sentence(text):
  doc = nlp(text)
  return [sent.text for sent in doc.sents][0]

The code does what I want but is slow. It seems inefficient to extract all sentences first with .text but apparently there is no way to subset doc.sents directly. The tips given in this question do not really apply for the most part because I do not need to read and write files as I go. I'm using the model "en_core_web_trf".

Original Q&A

How to speed up sentence tokenization with Spacy

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SPACY

Related Questions in SPACY-TRANSFORMERS

Trending Questions

Popular # Hahtags

Popular Questions