How to get scrape specific URL from multiple URL in Webpage Java

146 Views Asked by At

I am doing data scraping for the first time. My assignment is to get specific URL from webpage where there are multiple links (help, click here etc). How can I get specific url and ignore random links? In this link I only want to get The SEC adopted changes to the exempt offering framework and ignore other links. How do I do that in Java? I was able to extract all URL but not sure how to get specific one. Below is my code

while (rs.next()) {         
            String Content = rs.getString("Content");
            doc = Jsoup.parse(Content);
            
            //email extract
            Pattern p = Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+");
            Matcher matcher = p.matcher(doc.text());
            Set<String> emails = new HashSet<String>();
            while (matcher.find()) {
                emails.add(matcher.group());
            }       
            System.out.println(emails);

            //title extract
            String title = doc.title();
            System.out.println("Title: " + title);                              

        }

        Elements links = doc.select("a");
        for(Element link: links) {
            String url = link.attr("href");
            System.out.println("\nlink :"+ url);
            System.out.println("text: " + link.text());                     
        }
        
        System.out.println("Getting all the images");
        
        Elements image = doc.getElementsByTag("img");
        for(Element src:image) {
            System.out.println("src "+ src.attr("abs:src"));
        }
0

There are 0 best solutions below