Opening a downloaded mht file from Selenium (Help needed)

313 Views Asked by At

Long story short, I'm not a coder. My team used to have this coder who created this Python/Selenium code to extract some information from chrome browser (Echocardiography reports) and/or downloaded mht file (also Echocardiography reports).

This code was working fine until recently, it stopped working. The program still successfully downloads the mht file via chrome. However, it fails to open the file and hence, code continues without extracting any information - resulting in empty extractions.


This is the part I need help figuring out

                driver.get('chrome://downloads')
                # driver.get('file:///C:/Users/name/Downloads/')

                root1 = driver.find_element_by_tag_name('downloads-manager')
                shadow_root1 = expand_shadow_element(root1)

                time.sleep(2)

                root2 = shadow_root1.find_element_by_css_selector('downloads-item')
                shadow_root2 = expand_shadow_element(root2)

                time.sleep(1.5)

                openEchoFileButton = shadow_root2.find_element_by_id('file-link')
                mhtFileName = openEchoFileButton.text

                driver.get('file:///C:/Users/name/Downloads/' + mhtFileName)  # go to web page
                try:
                    echoDateElement = WebDriverWait(driver, delay).until(
                        EC.presence_of_element_located((By.XPATH, '/html/body/div[3]/p[1]/span[3]')))
                except TimeoutException:
                    print("Loading page took too much time!")

I'm trying to figure out why it suddenly fails to open the downloaded mht files. Last time our team tried using this code is back in 2020 and was successful. Were there any updates to Chrome perhaps?

Help would be immensely appreciated. Thank you so much in advance.

1

There are 1 best solutions below

2
Ross Patterson On

There are three obvious weaknesses in this code. The first two are the use of time.sleep() to wait for the element to appear and be manipulable. What if the machine is busy doing something else, and 1.5 seconds isn't enough? The right way to do that is to repeatedly check for the element to be ready. You've got a great example of how to do that using WebDriverWait() in this code already. The third weakness is the locator used in that presence_of_element_located() call. XPath locators rooted at "/html" are notoriously fragile, subject to breakage by small changes to the web page. Try to find something in the page that you can check via a more stable locator - ideally, an element with an ID= attribute.