I have a list of Wikipedia URL's e.g.
"https://en.wikipedia.org/wiki/Peninsular_War"
"https://en.wikipedia.org/wiki/Napoleon_I_of_France"
etc.
Some of them directly redirect to other pages, for example, https://en.wikipedia.org/wiki/Napoleon_I_of_France redirects directly to https://en.wikipedia.org/wiki/Napoleon
I want to use the following SPARQL query for Wikidata to obtain the corresponding Wikidata entities:
prefix schema: <http://schema.org/>
SELECT ?url ?item WHERE {
VALUES ?url {
<https://en.wikipedia.org/wiki/Peninsular_War>
<https://en.wikipedia.org/wiki/Napoleon_I_of_France>}
?url schema:about ?item.
}
However, because of the redirection of the Napoleon URL, this query is unable to connect the URL with Napoleons's Wikidata entry. Is there any way to resolve this?
Wikipedia's redirects are not handled on Wikidata (except for particular cases), so I think you have to resolve possible redirects by pre-processing your URLs via API.
In your example, you can use the following query: https://en.wikipedia.org/w/api.php?action=query&titles=Napoleon_I_of_France&redirects
which gives you the binding
But, in this case, I would directly use APIs instead of SPARQL for retrieving Wikidata items' IDs.
For example, the query: https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&redirects&titles=Napoleon_I_of_France returns the desired ID
Q517.Note that the
titlesparameter accepts multiple titles!For example, the query: https://en.wikipedia.org/w/api.php?action=query&prop=pageprops&ppprop=wikibase_item&redirects&titles=Peninsular_War|Napoleon_I_of_France returns both
Q152499andQ517.This allows to drastically reduce the number of queries, which will be about
ceil(N/2048), whereNis the total number of characters of your titles and2048is the standard maximum number of characters allowed in a single URL.