I want to pass arguments to should Visit() method in crawler4j . I saw example for documentation library page on github which uses Factory way but I cant understand it.. Please someone provide a sample example to achieve that
How to path args to shouldVisit() method in crawler4j?
120 Views Asked by Ahmed Sakr At
1
There are 1 best solutions below
Related Questions in WEB-CRAWLER
- How do i get the newly opened page after a form submission using puppeteer
- How to crawl 5000 different URLs to find certain links
- Selenium cannot load a page
- FaceBook-Scraper (without API) works nicely - but Login Process failes some how
- Why scrapy shell did not return an output?
- Highcharts Spider Chart with different scale for each category
- Chrome for Testing crashes soon after launching chrome driver in script
- Permission denied When deploy Splash in OpenShift
- scrape( n ′ gcontent−serverapp ′ , ′ How to scrape HTML elements with a specific attribute using Python ′ )
- Puppeteer recognized by BET365 during crawler
- Python requests.get(url) returns empty content in Colab
- I want some of the content in my page to be crawlable but should not be indexed
- Selenium crawler had no problems starting up locally, but it always failed to start up on Linux,org.openqa.selenium.interactions.Coordinates
- Website Branch address not updating in Google search engine even after 1 month
- How can I execute javasript function before page load for search engine crawlers?
Related Questions in CRAWLER4J
- Maximize data returned - Strategy to crawl data from a website
- Unable to Inject URL seed file in stormcrawler
- Scrape a Dynamic Website using Java with Selenium?
- Feign client always throws a null pointer exception in a Spring boot/Crawler4j app
- Directing the search depths in Crawler4j Solr
- crawler4j detects lines between the <script> </script> tag as text
- Shutting Down a specific crawler of 3 working crawlers in Crawler4j?
- Web Crawling Any Pages using Java
- How to add ( integrate ) crawljax with crawler4j?
- How to path args to shouldVisit() method in crawler4j?
- How to send crawler4j data to CrawlerManager?
- How to resume crawling after last depth I reached when I restart my crawler?
- What sequence of steps does crawler4j follow to fetch data?
- Web Crawler vs Html Parser
- Getting maven error while running mvn clean install?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Variant 1: Injecting additional parameters as constructor arguments
Additionl arguments besides the method parameters of
shouldVisit(...), need to be passed as constructor arguments into every singleWebCrawlerclass.That means, you can do the following to achieve it by using a
factoryclass:MyWebCrawler.classwith two custom arguments (customArgument1andcustomArgument2):For this to work, the
factoryshould be something like this:Every time a new instance of
MyWebCrawleris created, you can pass your custom arguments.To use the factory, you would start the crawling process from your
CrawlControllerlike this:A similar working example can be found at the official GitHub repository.
Variant 2: Using
CrawlController#getCustomData()(deprecated)You can use
customDataon theCrawlControllerobject to inject additional data into your web-crawler objects. However, this is the deprecated way and might be removed in future releases ofcrawler4j.