Easypubmed xml output not in xml

79 Views Asked by At

I am trying to get the output of the following code into a dataframe:

library(easyPubMed)

pmid_list=['35566889','33538053', '30848212']

pmxml <- fetch_pubmed_data_by_PMID(pmid_list,format='asn.1')

require(XML)

xml_data <- xmlToList(pmxml)

According to the documentation the output is in xml. However I get the error:

Error: XML content does not seem to be XML .. Any ideas on how I can convert the output to a dataframe? Thank you!!

1

There are 1 best solutions below

0
Damiano Fantini On

Not sure if you are still looking for an answer to this. Anyway, there are a few things I noticed:

  1. the pmid_list vector needs to be defined using the c() function (unlike python)
  2. the 'asn.1' format is not supposed to return XML, so that is sort of an issue.
  3. easyPubMed includes a function (see code below) to cast XML output into data.frame

Assuming you are working with easyPubMed version 2.22 (available on GitHub at dami82/easyPubMed), you may want to have a look at the following code.

library(easyPubMed)
pmid_list <- c('35566889','33538053', '30848212')
pmxml <- fetch_pubmed_data_by_PMID(pmid_list, format = 'xml', encoding = 'UTF-8')
pmdf <- table_articles_byAuth(pubmed_data = pmxml, included_authors = 'first', getKeywords = TRUE)

pmdf[, 1:2]
#pmid                               doi
#1 35566889             10.3390/polym14091720
#2 33538053            10.1002/cbdv.202000906
#3 30848212 10.2174/1871520619666190307115231

If you prefer to parse XML using the XML library, you can do so as follows.

xd <- XML::xmlParse(pmxml, asText = TRUE)
els <- XML::getNodeSet(xd, path = '//ArticleTitle')
els[[1]]
#<ArticleTitle>The Use of Branching Agents in the Synthesis of PBAT.</ArticleTitle> 

Please note that additional help to cast PubMed records into data.frame objects can be found in the package vignette (e.g., vignette("getting_started_with_easyPubMed") ) as well as in the manual of the table_articles_byAuth() function (e.g., see ?easyPubMed::table_articles_byAuth).