How to display information and numbers from an unstructured text in a table with Spacy

40 Views Asked by At

I am trying, and so far failing to find a way to extract textual information via Spacy and present it in a table.

An example text would be:

lines = 'From June 2020 to November 2020 the total rent was 800 Euro. It was composed of a basic rent of 600 Euro, a premium for the Heating of 100 Euro and another premium for the Garage of 100 Euro. From Dezember 2020 to January 2021 the total rent was 1000 Euro, then composed of a basic rent of 800 Euro, a premium for the Heating of 100 Euro and another premium for the Garage of 100 Euro.'

The output I would like to achieve is as follows:

| Period                | Total Rent | Basic Rent | Heating Premium | Garage Premium |
|------------------------|------------|------------|-----------------|----------------|
| June 2020-November 2020 | 800 Euro   | 600 Euro   | 100 Euro         | 100 Euro       |
| Dezember 2020-January 2021 | 1000 Euro | 800 Euro   | 100 Euro         | 100 Euro       |

So far I have tokenized the text and which seems useful. Then I have iterated over the tokens and only displayed Nouns and Numbers:

print("Iteriere über die Tokens und sage wortart vorher:")
for token in doc:
    # Drucke den Text und die vorhergesagte Wortart
    if token.pos_ == "NOUN" or token.pos_ == "NUM" or token.pos_ == "PROPN":
        print(token.text, token.pos_)

The result ist:

June PROPN
2020 NUM
November PROPN
2020 NUM
rent NOUN
800 NUM
Euro PROPN
rent NOUN
600 NUM
Euro PROPN
premium NOUN
Heating PROPN
100 NUM
Euro PROPN
premium NOUN
Garage PROPN
100 NUM
Euro PROPN
Dezember PROPN
2020 NUM
January PROPN
2021 NUM
rent NOUN
1000 NUM
Euro PROPN
rent NOUN
800 NUM
Euro PROPN
premium NOUN
Heating PROPN
100 NUM
Euro PROPN
premium NOUN
Garage PROPN
100 NUM
Euro PROPN

This seems helpful, because it contains the main parts that shall be displayed in the table. However, there may not be a way to automatically get the table done. Does anybody have an idea? Thanks in advance.

0

There are 0 best solutions below