NLP, NER --> python extraction of personal informations (like names, surnames, fiscal codes)

37 Views Asked by At

i am working on a project to extract personal information from custom documents. In particular, i have a txt file with a lot of names, surnames and information but i would like to extract names and italian fiscal codes. My actual approach is based on regex but i am not very satisfied because the regex pattern does match always all I need. I was thinking about an NLP approach but i do not know how. I think that actually there are no libraries trained on italian vocabulary. Please, could you kindly help me or give me a few advices? Thank you very much in advance!!

I have tried an approach based on regex which works well on standard documents, on strongly custom documents it often fails.

1

There are 1 best solutions below

0
Itamar Trainin On

I would try prompting ChatGPT directly in Italian to extract this information for you. They have an API you can access with simple python code, and you can tell it what exactly you want it to extract and in what output format (json for example).

In addition there are traditional NER models you could use, mainly for names, such as spacy which also supports Italian (see: https://spacy.io/usage/models) or Google's which costs money.

I believe that reguarding the fiscal information you will get the best results by using a REGEX.