XPath get text between two "p" tags

40 Views Asked by At

I can't get the text between <p> text </p>

now I have:

//span[@class='_39I1Z _2ITlL _3D3LC _3NGVr embTL _1zNVc']/span/p

It looks into the span and lists the "p" tags, but what to do next?

structure:

<div class="" xpath="1">
<span class="_39I1Z _2ITlL _3D3LC _3NGVr embTL _1zNVc">
    <span>
        <p>Text</p>
        <p>Text</p>
        <p>Text</p>
        <p><b>Text:</b></p>
        <p></p>
            <ul>
                <li>Text</li>
                <li>Text</li>
                <li>Text</li>
            </ul>
    </span>
</span>

tried various combinations. but seems confused

1

There are 1 best solutions below

0
North Legion On

I conquered this problem.

  1. The spider was looking at the wrong page

  2. description = cleanhtml(response.xpath("//span[@class='_39I1Z _1KwXc _3kzFG _2zux5 lU8Yn _1o3OU']/span").getall())

  3. because description comes dirty (with html) I cleaned it:

import re

def cleanhtml(raw_html):
    return re.sub(re.compile('<.*?>'), '', str(raw_html))