I'm very new to Python, spaCy, and even stack overflow in general. So forgive me if my question is too vague. I would like to ask if there's a way to tell spaCy that certain words in a sentence are related to certain number?
sentence = "The feed rate, aspirator rate, inlet and outlet temperature and air flow rate were approximately 3l/hr, 100%, 120C, 90C, and 357l/hr, respectively."
From above, we know that feed rate is 3l/hr, aspirator rate is 100%, inlet temperature is 120C, outlet temperature is 90C, and finally, air flow rate is 357l/hr
I would like to do the same for parameters. Where I can extract the parameter name, and value that comes with it. Is it possible?
After looking at dependency parser (thank you for that), I realised that I can use the patterns entity ruler
import spacy
#Build upon the spaCy Small Model
nlp = spacy.blank("en")
ruler = nlp.add_pipe("entity_ruler")
patterns = [
{"label": "PARAMETER", "pattern": [{"TEXT":{"REGEX":r"(inlet temperature|outlet temperature|inlet|outlet)"}}]},
{"label": "TEMP", "pattern": [{"TEXT":{"REGEX":"\d+(C|K|F)"}}]},
{"label": "aspirator rate", "pattern": [{"TEXT":{"REGEX":r"\d{1,5}"}},{"LOWER":r"%"}]},
{"label": "feed rate", "pattern": [{"TEXT":{"REGEX":r"\d{1,5}[ml/min]"}},{"LOWER":r"ml/min"}]}
]
ruler.add_patterns(patterns)
text = "The feed rate, aspirator rate, inlet and outlet temperature and air flow rate were approximately 3 ml/min, 100%, 120C and 90C and 357 l/hr, respectively."
##Process the text with spaCy
doc = nlp(text)
# Iterate over the entities in the document
for ent in doc.ents:
print (ent.label_,ent.text)
This gives a very awkward output of
PARAMETER inlet
PARAMETER outlet
aspirator rate 100%
TEMP 120C
TEMP 90C
It seems that I was unable to get the feed rate or the parameter "outlet temperature"
May I ask for help on the following issues?
- I'm only trying things out to see if I can separate them like this. May I ask for your opinions on how to use REGEX to extract words like "ml/min","L/hr" or anything with special characters.
- Sometimes scientific articles have different ways of writing. Some will write "inlet air temperature" while some will write "inlet temperature". May I ask if I can use REGEX to encompass all these varieties? So I can assign PARAMETER --> outlet air temperature
Thank you so much!
SpaCy comes with a dependency parser that you can use for this kind of thing. Look up dependency trees, and try to figure out how exactly they work. Then you can use Displacy to test it out and figure out how exactly you wanna build your setup.