I want to extract the pages mentioned in the infobox and templates of pages.
E.g. From this page: https://en.wikipedia.org/wiki/DNA
I want to extract all of the links in the infobox, like: "Genetics", "Introduction to Genetics" etc.
I want to do it, by using the sql dumps, possibly avoiding to parse the xml of whole pages, and I don't want to do it with APIs.
I could not find a way.
While Pagelinks does include also the links of infoboxes, I cannot find a way to exclude them. I thought Templatelinks may have that info, but it is not: I could not find the pageids of the corresponding links in infoboxes.
- Where is this information stored?
- Or which kind of tables should I look at?
I consulted previous questions: where can I find the infobox templates used in wiki? and Mediawiki reference: https://www.mediawiki.org/wiki/Manual:Templatelinks_table#Schema_summary
but could not find a solution.
That is a sidebar rather than an infobox: https://en.wikipedia.org/wiki/Template:Genetics_sidebar
I don't think there's a way of doing it other than parsing the content of the template to extract the links or using the API: e.g. https://en.wikipedia.org/w/api.php?action=query&prop=links&titles=Template:Genetics%20sidebar&pllimit=100&plnamespace=0
Something like this should also work but it's not returning any results for me:
https://quarry.wmcloud.org/query/71442