Another type of scraper is a so-called web-scraper, as shown below. This is a very recent introduction to humanity's toolset - it barely covers a couple of dozen years if not less. Its purpose is to scrape websites for data and save the extracted data as needed. It can be argued that similar to the first type, its purpose is to shape something as well - by extracting desired data and leaving undesired data behind, a web-scraper assists in creating a new piece of information. However, unlike the first type, web-scrapers are non-destructive. While a flake scraper modifies the raw material, a web-scraper does not. All the data stays in the Internet no matter how much one scrapes it.
require "mechanize"
url = 'http://www.archaeologywordsmith.com/lookup.php?category=&where=headword&terms=Scraper'
fp = File.new("scraper_list.txt", "w")
agent = Mechanize.new
html = agent.get(url).body
html_doc = Nokogiri::HTML(html)
fp.write("Types of scrapers:\n\n")
list = html_doc.xpath("//dt[@class='results']")
list.each { |i| fp.write(i.text + "\n") }
No comments :
Post a Comment