I am trying to scrape a web page and to retrieve the URL for an embedded video on a web page using Beautiful Soup and requests modules in Python 3.6. When I inspect the HTML on the webpage in Chrome, I can see the .mp4 link of the video. But when I get the page using requests and Beautiful Soup, I can't find the "video" node. I have understood that the video window is a nested HTML document. In particular, I want to scrape this webpage - http://videolectures.net/icml2015_liang_language_understanding/ and get the video link - http://hydro.ijs.si/v012/6f/n5vruqvdwpj36mdoxxwyxvyg5hje7a4c.mp4 using Beautiful Soup and requests modules. Any help in the right direction will be greatly appreciated. Thank you!
Python 3.6 Beautiful Soup - Trouble to Get Embedded Video URL during Web Scraping
964 Views Asked by user66629 At
1
There are 1 best solutions below
Related Questions in PYTHON
- How to store a date/time in sqlite (or something similar to a date)
- Instagrapi recently showing HTTPError and UnknownError
- How to Retrieve Data from an MySQL Database and Display it in a GUI?
- How to create a regular expression to partition a string that terminates in either ": 45" or ",", without the ": "
- Python Geopandas unable to convert latitude longitude to points
- Influence of Unused FFN on Model Accuracy in PyTorch
- Seeking Python Libraries for Removing Extraneous Characters and Spaces in Text
- Writes to child subprocess.Popen.stdin don't work from within process group?
- Conda has two different python binarys (python and python3) with the same version for a single environment. Why?
- Problem with add new attribute in table with BOTO3 on python
- Can't install packages in python conda environment
- Setting diagonal of a matrix to zero
- List of numbers converted to list of strings to iterate over it. But receiving TypeError messages
- Basic Python Question: Shortening If Statements
- Python and regex, can't understand why some words are left out of the match
Related Questions in WEB-SCRAPING
- Using Puppeteer to scrape a public API only when the data changes
- Scraping information in a span located under nested span
- How to scrape website which loads json content dynamically?
- How can I find a button element and click on it?
- WebScraping doesnt work, even without error
- Need Help Extracting Redirect URL from a div Element with Specific Class Name in Python Selenium
- beautifulsoup library not showing below #document data inside iframe tag in python
- how to create robust scraper for specific website without updating code after develop?
- Optimizing Selenium script for faster execution
- Parse Dynamic Power BI table with selenium
- How to extract table from webpage that requires click/toggle?
- SSL Certificate Verification Error When Scraping Website and Inserting Data into MongoDB
- Scraping all links using BeautifulSoup
- How do I make it so all arrays are the same length?
- I am getting 'NoneType object is not subscriptable' error in web scraping method
Related Questions in BEAUTIFULSOUP
- Scraping information in a span located under nested span
- WebScraping doesnt work, even without error
- beautifulsoup library not showing below #document data inside iframe tag in python
- How to extract url from <a href="TextWithUrlBehind">Something</a> using BeautifulSoup?
- How to extract table from webpage that requires click/toggle?
- Scraping all links using BeautifulSoup
- How to convert scraped HTML document to a dataframe?
- Can I update a variable URL in a loop so it can run without me manually inputting new URL in beautifulsoup python
- Web Scraping 'NoneType' object has no attribute 'find_all' error using BeautifulSoup in python3 Juypter Notebook
- Scraping MLB daily lineups from rotowire using python
- How to include colspan to a table header while web scraping
- How to access Script Tag Variables From a Website using Python
- Can we scrap linkedin using python and without using selinium
- How to handle regex in BeautifulSoup / CSS selector?
- Chain multiple ajax requests in website to show more pages and get full list in single page
Related Questions in PYTHON-REQUESTS
- I can't call a FastAPI POST route using Python's "requests" module, but I'm able to call the same route via cURL command line
- WebScraping doesnt work, even without error
- Python Requests: Handling Exceptions and Ensuring Server Response
- Issue with sending POST request using Python requests library
- Post request response time spikes
- Python GET Request returns data when tried on Postman but the generated python code not working
- downloading pdf using requests not working
- Trying to scrape a dynamic website in python with requests_html
- Chain multiple ajax requests in website to show more pages and get full list in single page
- Steam API - Available stats when I don't own a game?
- Trying to detect expired short urls, trouble with status_code and response url
- How can I download a file from a URL using Python when requests is redirecting to an error page
- certificate verify failed: unable to get local issuer certificate nothing seems wrong
- langchain: how to use a custom deployed fastAPI embedding model locally?
- How to Extract Data from Multiple Pages Using BeautifulSoup?
Related Questions in EMBEDDED-VIDEO
- mp4 embedded videos within github pages website not loading
- Hide share btn on Facebook embedded video?
- How come this Youtube embedded <iframe> video background doesn't play in Safari or Chrome Mobile?
- How to disable an actionbutton until embedded video is complete in Shiny
- Youtube embedded video looks weird on mobile
- How to play fullscreen by default in android for embedded Video player using react-native-webview?
- How to embed videos in Discord
- Youtube to HTML5 Loader forcing 720p with audio
- Is my Network Media List supposed to be empty, in Chrome dev tools?
- How to position youtube embedded video under another section since it has covered the entire page banner after making it responsive?
- Responsive video size (Vimeo)
- Embedded video keeps playing when going to next page
- Embedding a video with Gatsby <iframe>
- How Medium handle Paste link and embedded link
- How to create a embedded HTML video with a fallback video URL?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
There is an obfuscated configuration located here :
http://videolectures.net/icml2015_liang_language_understanding/video/1/page.map
This url itself is built from the fields
site_slugandvideolocated in the original page harcoded in JS :The file is decoded with an algorithm in JS. I've re-arranged the minified code in a nodejs script :
It works well and decode the file into valid xml :
The JS part is located in script-player.js and smile.min.js
The following python script extract the JSON parameters from the source page, get the config, decode the config and parse xml to get the video url :
Try this on repl.it
Output: