can some please help me to get all href tags from https://www.cnoocltd.com/col/col32091/index.html
i load this url to goose and get html content to Beautifulsoup. checked the html tag, foundall a href tags are in 'datastore' which is a custom tag. how can we extract the hrefs by beautifulsoup
article = g.extract(url='https://www.cnoocltd.com/col/col32091/index.html')
soup = BeautifulSoup(article.raw_html, "lxml")
a = soup.findAll("div",{"class":"Introduction"})
for l in a:
#print(l.findAll('option'))
if l.find('div'):
b =l.find('div').find('script')
custom_values = []
b.findAll(lambda tag:[custom_values.append(a[1]) for a in tag.attrs if a[0].startswith('a href')])
print(custom_values)
The links are encoded inside other
<script>tag. First we locate the<script>tag with links, and then load the content of the tag as other BeautifulSoup object:Prints:
EDIT: For getting
<option>values:Prints: