How Can I Extract a Value from HTML?

Question

How Can I Extract a Value from HTML?

68 Views Asked by Bronson888 At 24 November 2023 at 12:35

I am building automation to deal with email alerts that we receive. The final step is to be able to extract the username involved in the alert and from research it looked like it should be fairly easy to extract this from the original email. See below for a snippet of the HTML I am attempting to extract the value from.

HTML Snippet:

<tr>
    <td style="border:solid #DBDCDC 1.0pt;padding:3.75pt 3.75pt 3.75pt 3.75pt">
        <p class="MsoNormal" align="right" style="text-align:right">
        <span style="font-size:9.0pt;color:black">source_username
        </span>
        <span style="font-size:9.0pt">
            <o:p></o:p>
        </span>
        </p>
    </td>
    <td width="100%" style="width:100.0%;border:solid #DBDCDC 1.0pt;border-left:none;background:#FAFAFA;padding:3.75pt 3.75pt 3.75pt 3.75pt;max-width:100%">
        <p class="MsoNormal">
        <span style="font-size:9.0pt;color:black">ServicePrincipal_64e90aaf-abe7-4fa8-b0f7-a56db5a780bc
        </span>
        <span style="font-size:9.0pt">
            <o:p></o:p>
        </span>
        </p>
    </td>
</tr>

I have made two attempts with Parsel and BeautifulSoup, both of which didn't work.

Parsel attempt:

sel = Selector(text=html)

# Find the td tag that contains the 'source_username' string
source_tag = sel.xpath('//td[contains(.//text(), "source_username")]')[0]
print(source_tag)
# Extract the value from the tag
source_username = source_tag.xpath('./following-sibling::td[1]//text()').get().strip()

print(source_username)

BeautifulSoup attempt:

soup = BeautifulSoup(html, 'html.parser')

# Find the tag that contains the source_username
source_tag = soup.find('td', string='source_username')
print(source_tag)

# Extract the value from the tag
source_username = source_tag.find_next_sibling('td').text.strip()

print(source_username)

Original Q&A

There are 2 best solutions below

**MrXQ** · Answer 1 · 2023-11-24T12:40:51.617000

If you have access t othe HTML :

you can pass data attribute with the desired value something like this :


<span data-username="source_username" style="font-size:9.0pt;color:black">source_username</span>

and get the data using JS like this :

// Get the span element
const spanElement = document.querySelector('span[data-username]');

// Get the data-username attribute value
const username = spanElement.getAttribute('data-username');
console.log(username);

If you don't have

try something like this :

// Get the span element
const spanElement = document.querySelector('span');

// Get the text content of the span
const username = spanElement.textContent || spanElement.innerText;

// Log the username to the console
console.log(username);

**DRA** · Answer 2 · 2023-12-19T20:31:57.410000

You can do something like this:


def contains_text(tag):
   return tag and tag.name =='td' and 'source_username' in tag.get_text(strip=True)

found_tag = soup.find(contains_text)
source_username = source_tag.find_next_sibling('td').text.strip()

How Can I Extract a Value from HTML?

There are 2 best solutions below

If you have access t othe HTML :

If you don't have

Related Questions in HTML

Related Questions in BEAUTIFULSOUP

Related Questions in PARSEL

Trending Questions

Popular # Hahtags

Popular Questions