How to get the span of a conjunct in spacy?

755 Views Asked by At

I use spacy, token.conjuncts to get the conjuncts of each token.

However, the return type of the token.conjuncts is tuple, but I want to get the span type, for example:

import spacy
nlp = spacy.load("en_core_web_lg")

sentence = "I like to eat food at the lunch time, or even at the time between a lunch and a dinner"
doc = nlp(sentence)
for token in doc:
    conj = token.conjuncts
    print(conj)

#output: <class 'tuple'>

Does anyone know how to convert this tuple into span type?

Or maybe how can I directly get the span type of the conjuncts?

The reason I need span type is, I want to use the conjuncts (span) to locate the location this conjunct, for example, this conjunct belongs to which noun chunk or a split (whatever way I use to split them).

Currently, I convert the tuple to str to iterate all the splits or noun chunks to search whether or not a split/noun chunk contains this conjunct.

However, a bug exists, for example, when a conjunct (of a token) appears in more than one split/noun chunk, then there will be a problem to locate the exact split which contains that conjunct. Because I only consider the str but not the index or id of the conjunct. If I can have a span of this conjunct, then I can locate the exact location of the conjunct.

Please feel free to comment, thanks in advance!

1

There are 1 best solutions below

2
krisograbek On BEST ANSWER

token.conjuncts returns a tuple of tokens. To get a span, call doc[conj.i: conj.i+1]

import spacy

nlp = spacy.load('en_core_web_sm')


sentence = "I like oranges and apples and lemons."


doc = nlp(sentence)

for token in doc:
    if token.conjuncts:
        conjuncts = token.conjuncts             # tuple of conjuncts
        print("Conjuncts for ", token.text)
        for conj in conjuncts:
            # conj is type of Token
            span = doc[conj.i: conj.i+1]        # Here's span
            print(span.text, type(span))