How to get child node of an XML page using Ruby and REXML

1.4k Views Asked by At

I am using Ruby version 1.9.3. Here is a simple version of the actual XML page that I want to get information from. I need to access it from a secure website which requires login credentials. I can't use Nokogiri because I wasn't able to log into the website using it.

<root>
  <person>
    <name>Jack</name>
    <age>10</age>
  </person>
  <person>
    <name>Jones</name>
  </person>
  <person>
    <name>Jon</name>
    <age>16</age>
  </person>
</root>

As you can see sometimes the tag age does not appear. Using REXML with Ruby, I use the following code:

agent = Mechanize.new
xml = agent.get("https://securewebsite.com/page.xml")
document = REXML::Document.new(xml.body)

name = XPath.match(document, "//person/name").map {|x| x.text} 
# => ["Jack", "Jones", "Jon"]

age =  XPath.match(document, "//person/age").map {|x| x.text} 
# => ["10", "16"]

The problem is that I can't associate the age with the correct name because the index are now out of order. For example at index 1, name[1] is Jones but age[1] is 16. But that is not true because the person tag for Jones does not have the age tag.

Is there any way that I can get the age array to output: # => ["10", nil ,"16"] so that I can associate the correct name with its corresponding age?

Or is there a better way? Let me know if further explanation is required.

2

There are 2 best solutions below

8
Alexis Andersen On BEST ANSWER

The problem is that we are looking at age and name as completely separate collections of information. What we need to do is get information from person as a collection.

xml = "<your xml here />"
doc = Nokogiri::XML(xml)
persons = doc.xpath("//person")
persons_data = persons.map {|person| 
  {
    name: person.xpath("./name").text,
    age: person.xpath("./age").text
  }
}

This gets the person nodes and then gets the related information from them giving a result:

puts persons_data.inspect #=> [
                                {:name=>"Jack", :age=>"10"}, 
                                {:name=>"Jones", :age=>""}, 
                                {:name=>"Jon", :age=>"16"}
                              ]

So to get the name and age of the first person you would call

persons_data[0]["name"] #=> "Jack"
persons_data[0]["age"]  #=> "10"
2
the Tin Man On

I'd do something like this:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<root>
  <person>
    <name>Jack</name>
    <age>10</age>
  </person>
  <person>
    <name>Jones</name>
  </person>
  <person>
    <name>Jon</name>
    <age>16</age>
  </person>
</root>
EOT

people = doc.search('person').each_with_object({}){ |person, h|
  age = person.at('age')
  h[person.at('name').text] = age ? age.text : nil
}

people # => {"Jack"=>"10", "Jones"=>nil, "Jon"=>"16"}

At that point, if I only want the ages, I'd use values:

people.values # => ["10", nil, "16"]

Retrieving a single person's age is trivial then:

people['Jon'] # => "16"
people['Jack'] # => "10"

I get this error when I'm using the .to_h method: ``block in ': undefined method to_h'

My mistake. to_h is not in older Rubies, but it's not needed because of how I'm generating the hash being returned. I adjusted the code above which will work in any Ruby that implements each_with_object.