Free website scraping tool, data extractor – bulk automation

What this script does: Automates the extraction of selected sections of HTML webpages. This is called batch website scraping or screen scraping. For example, this script can extract the H1, meta tag, image ALT tags from webpages.

How it works: Scrape all website pages specified in Column A of input tab. Column C, D, E, etc use CSS selectors to extract sections of page. Technically you can add as many columns as needed. The script is dynamic and will only read columns with values. Please read the “readme” tab for further instructions on capabilities and examples. You will need to use Internet Explorer inspector to identify the CSS selector of the section of the page you wish to extract. The CSS selectors will be slightly different with Internet Explorer than Chrome and other browsers so for most accurate results use Internet Explorer. This script also supports nesting elements. Nesting is very useful for complicated webpages that don’t use unique ID tags.

Why you would use this: This is a free scraper. While there are other screen scraper software such as Octoparse and ObservePoint they are not free and may require software install or lengthy approvals from your IT department. This tool is also useful if your company restricts software you can install on your work computer. All you need to use this tool is Microsoft Excel. Yes, this uses macros, don’t let this intimate you from using this however, the code is concise, readable and I am only using native Microsoft libraries. Check out the code yourself prior to executing the program. You will have to enable the developer tab in Excel to view the code. Enjoy, let me know if you found this tool helpful.

Actual use cases

  • Quality assurance checks. I used this script to confirm that over 100 product detail pages had pricing.
  • Identify pages that have differences. You can identify pages that do not have a disclosure or using the wrong template.
  • Take a snapshot of the website content. When you run the script, the data you extract will be captured along with the date the script was run. This is a handy timestamp of your website. It may be helpful to run this script daily to know when something changes on the page.

Download free screen scraper tool