Converting images included in table data to thier alt-text using IMPORTHTML in Google Sheets

89 Views Asked by At

I'm attempting to scrape data from spoilers for the new Pokemon Legends: Arceus game, and have run into a complication concerning images contained within the tables. The tables of interest are available here: https://rankedboost.com/pokemon-legends-arceus/pokedex/#National%20Pokedex

First I attempted to scrape the data using IMPORTXML using a reference to the webpage in cell A1:

=IMPORTXML(A1,"//table")

This captured the data from both tables on the page but produced an unusable result in which the data was not organized by column. Next I used IMPORTHTML:

=IMPORTHTML(A1,"TABLE",1) 

and

=IMPORTHTML(A1,"TABLE",2) 

This produced nearly the result that I was hoping for but I found that the "Type" column was blank. When inspecting the source of the "Type" column on the webpage I find that the table uses pictures to indicate the Pokemon type, and that each picture contains some alt-text describing the type in words. The following code refers to the entry for the first row within the "Type' column on the webpage:

<td class="table-td-data-rb location-poke-css " headers="TH4"> <img width="22" height="22" alt="WATER-Type" class="tier-list-table-types-img" src="https://img.rankedboost.com/wp-content/plugins/pokemon-legends-arceus/assets/icons/WATER.png"> <img width="22" height="22" alt="GHOST-Type" class="tier-list-table-types-img" src="https://img.rankedboost.com/wp-content/plugins/pokemon-legends-arceus/assets/icons/GHOST.png"></td>

The words I'm trying to capture can be found following alt= or at the very end of each PNG file labeled src=.

How can I go about selecting the alt text for these types, and must I split pokemon with multiple types into "type 1" and "type 2" columns or can they be imported in the combined manner that they are displayed on the webpage?

Thanks for your guidance!

0

There are 0 best solutions below