I'm trying to extract text containing "<" (lower than character). On my localhost everything works fine, on the server however the text after and including "<" gets truncated.
1) hipoksemia tętnicza (PaO<sub>2</sub>/FiO<sub>2</sub> < 300 )
so I receive:
1) hipoksemia t\u0119tnicza (PaO<sub>2</sub>/FiO<sub>2</sub>
There is no problem with scraping > character. Thank you for your help.
<is invalid HTML. It should be<.Scrapy uses Parsel to parse XML/HTML responses. Parsel uses lxml to parse XML/HTML documents. lxml does not handle broken HTML as well as web browsers and other parsers do.
There is an open issue for Parsel to handle these scenarios. It will probably require supporting an alternative to lxml in Parsel, which is not trivial to implement, so it may take a while before that issue is solved.