Need to fetch the email id and phone number from web scraping

243 Views Asked by At
require 'open-uri'
require 'nokogiri'

def scrape(url)
  html = open(url).read
  nokogiri_doc = Nokogiri::HTML(html)
  final_array = []

  nokogiri_doc.search("a").each do |element|
    element = element.text
    final_array << element
  end

  final_array.each_with_index do |index|
    puts "#{index}"
  end
end


scrape('http://www.infranetsol.com/')

In this I'm only getting the a tag but I need the email id and phone number into an excel file.

1

There are 1 best solutions below

3
GPif On BEST ANSWER

All you have is text. So, what you can do, is to only keep string tha look like email or phone number.

Fo instance, if you keep your result in an array

a = scrape('http://www.infranetsol.com/')

You can get element with an email (string with a '@') :

a.select { |s| s.match(/.*@.*/) }

You can get element with a phone number (string with at least 5 digits) :

a.select{ |s| s.match(/\d{5}/) }

The whole code :

require 'open-uri'
require 'nokogiri'

def scrape(url)
  html = open(url).read
  nokogiri_doc = Nokogiri::HTML(html)
  final_array = []

  nokogiri_doc.search("a").each do |element|
    element = element.text
    final_array << element
  end

  final_array.each_with_index do |index|
    puts "#{index}"
  end
end


a = scrape('http://www.infranetsol.com/')
email = a.select { |s| s.match(/.*@.*/) }
phone = a.select{ |s| s.match(/\d{5}/) }

# in your example, you will have to email in email 
# and unfortunately a complex string for phone.
# you can use scan to extract phone from text and flat_map 
# to get an array without sub array
# But keep in mind it will only worked with this text

phone.flat_map{ |elt| elt.scan(/\d[\d ]*/) }