How to get data from the TreelView list

163 Views Asked by At

http://www.vliz.be/vmdcdata/mangroves/aphia.php?p=browser&id=235056&expand=true#ct (That's the information I am trying to scrape)

I wanna to scrape this detailed taxonomic trees so that I can manipulate them anyway I like.

But there are a few problem in geting this tree data.

  1. I can' t fully expand the taxonomic tree . when some expanding ,some collapse as the instruction indicated . so saving the full page as html files can not sove my problem. or I can repeat the process some times to get separate files and concatenate them.. but it seems to be a ugly way.

  2. I am tired of clicking , there are so many "plus" signs and I have to wait.

Is there a way to solve this out using Python ?

1

There are 1 best solutions below

1
root On BEST ANSWER

Use Selenium, this will expand the tree by clicking on the "plus signs" and get the entire DOM with all the elements in it after it's done:

from selenium import webdriver
import time

browser=webdriver.Chrome()
browser.get('http://www.vliz.be/vmdcdata/mangroves/aphia.php?p=browser&id=235301&expand=true#ct')

while True:
      try:
          elem=browser.find_elements_by_xpath('.//*[@src="http://www.marinespecies.org/images/aphia/pnode.gif" or @src="http://www.marinespecies.org/images/aphia/plastnode.gif"]')[1]
          elem.click()
          time.sleep(2)
      except:
          break

content=browser.page_source