I am running a Crawler4j instance in a Spring boot application and my OpenFeign client is always null.
public class MyCrawler extends WebCrawler {
@Autowired
HubClient hubClient;
@Override
public void visit(Page page) {
// Lots of crawler code...
if (page.getParseData() instanceof HtmlParseData) {
hubClient.send(webPage.toString()); // Throws null pointer exception
}
}
My Hubclient
@FeignClient("hub-worker")
public interface HubClient {
@RequestMapping(method = RequestMethod.POST, value = "/pages", consumes = "application/json")
void send(String webPage);
//void createPage(WebPage webPage);
}
My MainApplication
@EnableEurekaClient
@EnableFeignClients
@SpringBootApplication
public class CrawlerApplication {
public static void main(String[] args) throws Exception {
SpringApplication.run(CrawlerApplication.class, args);
}
}
The stacktrace
ext length: 89106
Html length: 1048334
Number of outgoing links: 158
10:14:38.634 [Crawler 164] WARN e.u.ics.crawler4j.crawler.WebCrawler - Unhandled exception while fetching https://www.cnn.com: null
10:14:38.634 [Crawler 164] INFO e.u.ics.crawler4j.crawler.WebCrawler - Stacktrace:
java.lang.NullPointerException: null
at com.phishspider.crawler.MyCrawler.visit(MyCrawler.java:79)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:523)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:306)
at java.base/java.lang.Thread.run(Thread.java:834)
Line 79 is the hubClient call. When I factor out the hubVlient into another class that I instantiate in the crawler class like hubclient hc = new hubclient() and then have some method hc.send(page) the hubClient in that factored out class will throw the null pointer.
In order to inject Spring beans (your client) into your crawler4j Web crawler object, you need to instantiate the Web crawler object via Spring.
For this purpose, you need to write a custom implementation of a WebCrawlerFactory, which provides / creates Spring-managed Web crawler objects. To do so, your Web crawler implementation needs to be a Spring Bean, i.e. at least annotated with
@Component.