Trying to pull information from webpage

123 Views Asked by James Hardy At 12 October 2018 at 19:48

I am trying to pull data from a website. In my example case, I am running a search on Armorgames.com for the search term idle. From there I would like to pull the name of each game and put it into a csv file for use later. My code:

$SearchResult = Invoke-WebRequest 'http://armorgames.com/search?type=games&q=idle' 
($SearchResult.ParsedHtml.getElementsByTagName('H5') | Where { $_.pathname -like '/play*'})

Unfortunately, that won't output any results. I can see the property names using:

$SearchResult.ParsedHtml.getElementsByTagName('H5')

Using the tag 'a' I can find games with a pathname containing 'play'.But I am having trouble filtering the results and then outputting the results to a file

Original Q&A

There are 2 best solutions below

Nas On 12 October 2018 at 20:16

$SearchResult.ParsedHtml.getElementsByTagName('a') | where-Object -Property pathname -Like 'play/*'

# select property pathname
$SearchResult.ParsedHtml.getElementsByTagName('a') | 
    Where-Object -Property pathname -Like 'play/*' |
        Select-Object -Property pathname

# select property title
$SearchResult.ParsedHtml.getElementsByTagName('a') | 
    Where-Object -Property pathname -Like 'play/*' |
        Select-Object -Property title -Unique

ScriptAutomate On 15 October 2018 at 22:33

PowerShell Core (v6.0) compatible webscraping code, which should work with Windows PowerShell too, reliant on regex with the -match operator (as the ParsedHtml property isn't available on Core):

$SearchResult = Invoke-WebRequest 'http://armorgames.com/search?type=games&q=idle'
$GameNames = ($SearchResult.Content.split('<') | 
    where {$_ -match '^a href.*play.*\ title=.*>[A-Z].*'}) -replace '.*>'
$GameNames

The output looks like this:

Artist Idle
Hero Simulator: Idle Adventures
Idle Farmer
Idle Online Universe
Idle Sword
Idle Web Tycoon
Legendary Journey Idle
NGU IDLE
Religious Idle
Zombidle

Now that you have an array of the names you wanted, you should be able to create a CSV with whatever additional information you need.

Trying to pull information from webpage

There are 2 best solutions below

Related Questions in POWERSHELL

Related Questions in HTML-OBJECT

Related Questions in POWERSHELL-V6.0

Trending Questions

Popular # Hahtags

Popular Questions