Trying to pull information from webpage

123 Views Asked by At

I am trying to pull data from a website. In my example case, I am running a search on Armorgames.com for the search term idle. From there I would like to pull the name of each game and put it into a csv file for use later. My code:

$SearchResult = Invoke-WebRequest 'http://armorgames.com/search?type=games&q=idle' 
($SearchResult.ParsedHtml.getElementsByTagName('H5') | Where { $_.pathname -like '/play*'})

Unfortunately, that won't output any results. I can see the property names using:

$SearchResult.ParsedHtml.getElementsByTagName('H5')

Using the tag 'a' I can find games with a pathname containing 'play'.But I am having trouble filtering the results and then outputting the results to a file

2

There are 2 best solutions below

0
Nas On
$SearchResult.ParsedHtml.getElementsByTagName('a') | where-Object -Property pathname -Like 'play/*'

# select property pathname
$SearchResult.ParsedHtml.getElementsByTagName('a') | 
    Where-Object -Property pathname -Like 'play/*' |
        Select-Object -Property pathname

# select property title
$SearchResult.ParsedHtml.getElementsByTagName('a') | 
    Where-Object -Property pathname -Like 'play/*' |
        Select-Object -Property title -Unique
0
ScriptAutomate On

PowerShell Core (v6.0) compatible webscraping code, which should work with Windows PowerShell too, reliant on regex with the -match operator (as the ParsedHtml property isn't available on Core):

$SearchResult = Invoke-WebRequest 'http://armorgames.com/search?type=games&q=idle'
$GameNames = ($SearchResult.Content.split('<') | 
    where {$_ -match '^a href.*play.*\ title=.*>[A-Z].*'}) -replace '.*>'
$GameNames

The output looks like this:

Artist Idle
Hero Simulator: Idle Adventures
Idle Farmer
Idle Online Universe
Idle Sword
Idle Web Tycoon
Legendary Journey Idle
NGU IDLE
Religious Idle
Zombidle

Now that you have an array of the names you wanted, you should be able to create a CSV with whatever additional information you need.