I'm trying to scrape in Spring boot with the Jsoup library. I have an empty json as a result of the method, I'm out of ideas.
@GetMapping("/test-json")
public String scrapeFacebookPageJson() throws IOException {
try {
String facebookPageUrl = "https://www.facebook.com/abcd"; // your URL
Document doc = Jsoup.connect(facebookPageUrl).get();
Elements posts = doc.select(".post"); //
List<String> results = new ArrayList<>();
for (Element post : posts) {
results.add(post.text());
}
ObjectMapper objectMapper = new ObjectMapper();
return objectMapper.writeValueAsString(results);
} catch (IOException e) {
e.printStackTrace(); // ou logue a exceção
return "Erro ";
}
}
Your idea was to look for elements with class
postand fetch their text content. Seems reasonable enough at first sight.Take a look at a sample Facebook page in the DOM explorer using DevTools (F12). All elements inside
bodyhave obfuscated class names (e.g.x78zum5). So your strategy of querying elements that containpostclass won't work.Not to mention that when you first load the page, you only load the bare bones HTML with a GDPR cookies consent dialog. It doesn't contain much Facebook page content since that is loaded subsequently using JavaScript. You'd have to look into dynamic web page scraping, which is something that jsoup can't do.
If you want to programmatically obtain posts from a FB page, I think your best bet is their GraphQL API. This documentation page in particular might be of interest to you: https://developers.facebook.com/docs/graph-api/reference/v19.0/page/feed