extract structured data from text

157 Views Asked by At

Is there any available tool/library (preferably established/solid commercial product or open source) that can extract structured data from plain text? Usually the plain text contains boolean or math operands like (AND, OR, BETWEEN, etc.).

I like AWS Comprehend but I'm not sure it can be used for this task easily.

vehicle with 2 to 5 wheels
=>
SUBJECT: vehicle
EXPRESSION:
  SUBJECT: wheels
  OPERAND: BETWEEN
    NUMBER: 2
    NUMBER: 5
1

There are 1 best solutions below

0
abhinavatAWS On

Comprehend does not support converting text to structured format natively. However, you can derive the parts of speech using the Syntax API and create a rule based structure from there.

https://docs.aws.amazon.com/comprehend/latest/dg/how-syntax.html

For the example above, "vehicle" and "wheels" will be detected as nouns, "2" and "5" will be detected as numerals/value and "to" and "with" is detected as adposition.