How to improve the results from DBpedia Spotlight?

Question

How to improve the results from DBpedia Spotlight?

526 Views Asked by EmJ At 30 July 2019 at 09:46

I am using DBpedia Spotlight to extract DBpedia resources as follows.

import json
from SPARQLWrapper import SPARQLWrapper, JSON
import requests
import urllib.parse

## initial consts
BASE_URL = 'http://api.dbpedia-spotlight.org/en/annotate?text={text}&confidence={confidence}&support={support}'
TEXT = "Tolerance, safety and efficacy of Hedera helix extract in inflammatory bronchial diseases under clinical practice conditions: a prospective, open, multicentre postmarketing study in 9657 patients.     In this postmarketing study 9657 patients (5181 children) with bronchitis (acute or chronic bronchial inflammatory disease) were treated with a syrup containing dried ivy leaf extract. After 7 days of therapy, 95% of the patients showed improvement or healing of their symptoms. The safety of the therapy was very good with an overall incidence of adverse events of 2.1% (mainly gastrointestinal disorders with 1.5%). In those patients who got concomitant medication as well, it could be shown that the additional application of antibiotics had no benefit respective to efficacy but did increase the relative risk for the occurrence of side effects by 26%. In conclusion, it is to say that the dried ivy leaf extract is effective and well tolerated in patients with bronchitis. In view of the large population considered, future analyses should approach specific issues concerning therapy by age group, concomitant therapy and baseline conditions."
CONFIDENCE = '0.5'
SUPPORT = '10'
REQUEST = BASE_URL.format(
    text=urllib.parse.quote_plus(TEXT), 
    confidence=CONFIDENCE, 
    support=SUPPORT
)
HEADERS = {'Accept': 'application/json'}
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
all_urls = []

r = requests.get(url=REQUEST, headers=HEADERS)
response = r.json()
resources = response['Resources']
for res in resources:
    all_urls.append(res['@URI'])
print(all_urls)

My text is shown below:

Tolerance, safety and efficacy of Hedera helix extract in inflammatory bronchial diseases under clinical practice conditions: a prospective, open, multicentre postmarketing study in 9657 patients. In this postmarketing study 9657 patients (5181 children) with bronchitis (acute or chronic bronchial inflammatory disease) were treated with a syrup containing dried ivy leaf extract. After 7 days of therapy, 95% of the patients showed improvement or healing of their symptoms. The safety of the therapy was very good with an overall incidence of adverse events of 2.1% (mainly gastrointestinal disorders with 1.5%). In those patients who got concomitant medication as well, it could be shown that the additional application of antibiotics had no benefit respective to efficacy but did increase the relative risk for the occurrence of side effects by 26%. In conclusion, it is to say that the dried ivy leaf extract is effective and well tolerated in patients with bronchitis. In view of the large population considered, future analyses should approach specific issues concerning therapy by age group, concomitant therapy and baseline conditions.

The results I got is as follows.

['http://dbpedia.org/resource/Hedera', 
'http://dbpedia.org/resource/Helix', 
'http://dbpedia.org/resource/Bronchitis', 
'http://dbpedia.org/resource/Cough_medicine',
'http://dbpedia.org/resource/Hedera', 
'http://dbpedia.org/resource/After_7',
'http://dbpedia.org/resource/Gastrointestinal_tract',
'http://dbpedia.org/resource/Antibiotics',
'http://dbpedia.org/resource/Relative_risk',
'http://dbpedia.org/resource/Hedera',
'http://dbpedia.org/resource/Bronchitis']

As you can see, the results are not very good.

For example, consider Hedera helix extract in the text mentioned above. Even though DBpedia has a resource for Hedera helix (http://dbpedia.org/resource/Hedera_helix), the Spotlight outputs it as two URIs as http://dbpedia.org/resource/Hedera and http://dbpedia.org/resource/Helix.

According to my dataset, I would like to get the longest term in DBpedia as the results. In that case, what are the improvements I can do to get my desired output?

I am happy to provide more details if needed.

Original Q&A

There are 1 best solutions below

**Sadaf Fraz** · Answer 1 · 2020-08-05T09:36:07.380000

Although I am answering quiet late for this question but you can use Babelnet API in python to obtain dbpedia URI's containing longer texts. I reproduced the problem using the code below:

`from babelpy.babelfy import BabelfyClient

text ="Tolerance, safety and efficacy of Hedera helix extract in inflammatory 
bronchial diseases under clinical practice conditions: a prospective, open, 
multicentre postmarketing study in 9657 patients.     In this postmarketing 
study 9657 patients (5181 children) with bronchitis (acute or chronic 
bronchial inflammatory disease) were treated with a syrup containing dried ivy 
leaf extract. After 7 days of therapy, 95% of the patients showed improvement 
or healing of their symptoms. The safety of the therapy was very good with an 
overall incidence of adverse events of 2.1% (mainly gastrointestinal disorders 
with 1.5%). In those patients who got concomitant medication as well, it could 
be shown that the additional application of antibiotics had no benefit 
respective to efficacy but did increase the relative risk for the occurrence 
of side effects by 26%. In conclusion, it is to say that the dried ivy leaf 
extract is effective and well tolerated in patients with bronchitis. In view 
of the large population considered, future analyses should approach specific 
issues concerning therapy by age group, concomitant therapy and baseline 
conditions."

# Instantiate BabelFy client.
params = dict()
params['lang'] = 'english'
babel_client = BabelfyClient("**Your Registration Code For API**", params)

# Babelfy sentence.
babel_client.babelfy(text)


# Get all merged entities.
babel_client.all_merged_entities'

The output will be in the sample format as shown below for all the merged entities in the text. You can further store and process the dictionary structure to extract the dbpedia URIs.

{'start': 34,
'end': 45,
'text': 'Hedera helix',
'isEntity': True,
'tokenFragment': {'start': 6, 'end': 7},
'charFragment': {'start': 34, 'end': 45},
'babelSynsetID': 'bn:00021109n',
'DBpediaURL': 'http://dbpedia.org/resource/Hedera_helix',
'BabelNetURL': 'http://babelnet.org/rdf/s00021109n',
'score': 1.0,
'coherenceScore': 0.0847457627118644,
'globalScore': 0.0013494092960806407,
'source': 'BABELFY'},

How to improve the results from DBpedia Spotlight?

There are 1 best solutions below

Related Questions in SPARQL

Related Questions in WIKIPEDIA

Related Questions in DBPEDIA

Related Questions in LINKED-DATA

Related Questions in SPOTLIGHT-DBPEDIA

Trending Questions

Popular # Hahtags

Popular Questions