Extract values from website dropdown list – using JavaScript




I was recently asked how to extract values from an HTML dropdown. Here’s the code an due credit.

Option 1 – only works with traditional ways dropdowns are defined.

var ddlArray= new Array();
var ddl = document.getElementById('PLACEHOLDER_NAME_OF_SELECT_FORM');
for (i = 0; i < ddl.options.length; i++) {
   ddlArray[i] = ddl.options[i].value;
   
   console.log(ddlArray[i]+',');
}



Option #2 – is more flexible when dropdowns are custom and don’t use the traditional select,option HTML elements.


var ddlArray= new Array();
var printValue="";

var ddl = document.querySelectorAll("#cars > option");
for (i = 0; i < ddl.length; i++) {
   printValue=printValue + ddl[i].innerText+',';
   
}
console.log(printValue);

<!--Example HTML to test out code -->

<select id="cars" name="carlist" form="carform">
  <option value="volvo">Volvo</option>
  <option value="saab">Saab</option>
  <option value="opel">Opel</option>
  <option value="audi">Audi</option>
</select>

Code from – https://stackoverflow.com/questions/6378680/get-all-the-values-of-a-dropdownlist-to-an-array-using-javascript

Free website scraping tool, data extractor – bulk automation

What this script does: Automates the extraction of selected sections of HTML webpages. This is called batch website scraping or screen scraping. For example, this script can extract the H1, meta tag, image ALT tags from webpages.

How it works: Scrape all website pages specified in Column A of input tab. Column C, D, E, etc use CSS selectors to extract sections of page. Technically you can add as many columns as needed. The script is dynamic and will only read columns with values. Please read the “readme” tab for further instructions on capabilities and examples. You will need to use Internet Explorer inspector to identify the CSS selector of the section of the page you wish to extract. The CSS selectors will be slightly different with Internet Explorer than Chrome and other browsers so for most accurate results use Internet Explorer. This script also supports nesting elements. Nesting is very useful for complicated webpages that don’t use unique ID tags.

Why you would use this: This is a free scraper. While there are other screen scraper software such as Octoparse and ObservePoint they are not free and may require software install or lengthy approvals from your IT department. This tool is also useful if your company restricts software you can install on your work computer. All you need to use this tool is Microsoft Excel. Yes, this uses macros, don’t let this intimate you from using this however, the code is concise, readable and I am only using native Microsoft libraries. Check out the code yourself prior to executing the program. You will have to enable the developer tab in Excel to view the code. Enjoy, let me know if you found this tool helpful.

Actual use cases

  • Quality assurance checks. I used this script to confirm that over 100 product detail pages had pricing.
  • Identify pages that have differences. You can identify pages that do not have a disclosure or using the wrong template.
  • Take a snapshot of the website content. When you run the script, the data you extract will be captured along with the date the script was run. This is a handy timestamp of your website. It may be helpful to run this script daily to know when something changes on the page.

Download free screen scraper tool

Spying on your competitor’s AB testing program

As a digital marketer I am always on the look-out for new competitive intelligence tools. There are competitive intelligence tools to monitor your competitor’s website traffic, TV ads, online paid advertising however nothing that I am aware of for AB testing. Thanks to Adobe Target’s most recent at.js release you can now spy on your competitors that also use Adobe Target. Now, the caveat is that this competitor must use at.js 2.0 or later. At.js is the core Javascript library that enables the Adobe Target tool to function. Although there are other ways of spying on your competitors’ AB testing program, I am going to focus this article on exploiting the at.js update.

Prerequisites

  • Adobe Target
  • At.js Target library of 2.0 or higher

Let’s use TD Ameritrade in this example. To validate that the website is using Adobe Target, simply navigate to the website and using Ghostery ( a Google Chrome plugin) look for the Adobe Target tool(screenshot below). Assuming the company has a standard Adobe Target implementation, Ghostery will usually identify the tool. Alternatively, a more accurate way of validating whether a website is using Adobe Target is to use the browser’s network request tool and search for the Adobe Target pixel which will be on a domain such as “mboxedge35.tt.omtrdc.net”.

Ghostery screenshot showing different website tracking tools including Adobe Target
Ghostery screenshot showing different website tracking tools including Adobe Target. Tool was ran on att.com.

For this example, I visited TDAmeritrade’s homepage. Now navigate to developer tools by selecting “inspect element” on your browser and then click on the “Network” tab. In the filter box type in “omtrdc” or “delivery”. Most companies still use a 3rd party implementation of Target so the domain that hosts the target tool will be something like “mboxedge35.tt.omtrdc.net”. Once you identify this pixel, select it and then click on the “preview” tab.

Chrome browser screenshot of developer tools - network tab showing the Adobe Test & Target pixel
Chrome browser screenshot of developer tools – network tab showing the Adobe Test & Target pixel. Note, if you don’t see this structure shown below in the Target pixel, it is likely that the version of at.js is earlier than 2.0.

A wealth of information is housed within this pixel. Expand the “execute” node by clicking on the down arrow or the word “execute”. Continue expanding “pageload” and “options”. Under “options”, you will see the number of A/B tests that are eligible for the current page you are on. Eligibility means that your visitor may need to met additional criteria such as originating from a paid advertising source before they will be included in the test. On initial inspection it appears that there are five A/B tests running on this page (number 0 – 4).
Upon expanding the individual items however, I noticed that all 5 items had the same activity id of “94382”. This means that they are the same test.

Now to determine the test experience, simply look at the experience id and experience name. In my case I was in experience B.

 Chrome browser screenshot of developer tools - network tab showing the Adobe Test & Target pixel and the test user selected
Chrome browser screenshot of developer tools – network tab showing the Adobe Test & Target pixel and the test user selected

Since I’m already in the test experience, I want to see what the default experience is, so I use Ghostery to block Target.

 Ghostery screenshot showing different website tracking tools. Ghostery is blocking all tracking tools in screenshot.
Ghostery screenshot showing different website tracking tools. Ghostery is blocking all tracking tools in screenshot.

After blocking Target from serving content, you will need to refresh the page to see the default content. Here’s a side by side comparison of the default and test experience. You can notice at least 4 changes.

Noticeable changes between test and default content

  • The notification banner is missing from the test
  • There is a new green “Supporting your investing needs” message on the test
  • A new “Say hello to streamlined trading” component on the test
  • The “online brokerage that makes you a smarter investor” component was replaced from the default experience
Two side by side screenshots of default experience on left and test experience on right. A few areas were replaced with new content.
Two side by side screenshots of default experience on left and test experience on right. A few areas were replaced with new content.

If you want to further reverse engineer how this test was setup, you can copy the source code that appears under the “content” tab.

  Chrome browser screenshot of developer tools - network tab showing the Adobe Test & Target pixel. Highlighted node showing HTML code of test experience.
Chrome browser screenshot of developer tools – network tab showing the Adobe Test & Target pixel. Highlighted node showing HTML code of test experience.

The code for node 0 mentions that the type is “insertBefore”. This means the test content is inserted before the content specified in the cssSelector. Using the default version of the page, try copying and pasting the code from node 0 into the console to see it in action. Note, you will have to be on the TDAmeritrade homepage and within the default experience for this to work. You can also reuse this code snippet below. You will need to replace the newline characters however. As you can see below, the code in node 0 inserts a green banner at the top of the page.

Screenshot of page changes after implementing the code below in developer tools console.
Screenshot of page changes after implementing the code below in developer tools console.

cssSelector = document.querySelector( "HTML > BODY > DIV:nth-of-type(1) > DIV:nth-of-type(1) > DIV:nth-of-type(2) > DIV:nth-of-type(1) > DIV:nth-of-type(1) > DIV:nth-of-type(1) > DIV:nth-of-type(1) > DIV:nth-of-type(1) > DIV:nth-of-type(1)");
testContent= "<div id=\"action_insert_16050375677301192\"><style text=\"css\">        .main-header .alert-message .alert-message-container {            display: none !important;        }        .emergency-banner.content-container {            display: inline-block;            padding: 2em;            margin: 20px 0 40px 0;            width: 100%;        }        .emergency-banner.content-container.split-alignment {            display: flex;        }        .emergency-banner.content-container .emergency-banner-content  {            display: inline-block;            font-family: TDASansDisplay, arial, helvetica, sans-serif;            font-weight: 400;            width: 100%;        }        .emergency-banner.content-container.split-alignment .emergency-banner-content.left-align {            display: flex;            padding-right: 2rem;            padding-left: 0;            vertical-align: top;            width: 60%;        }        .emergency-banner.content-container.split-alignment .emergency-banner-content.right-align {            display: flex;            justify-content: flex-end;            margin-left: 8%;            padding-right: 2rem;            vertical-align: top;            width: 30%;        }        .emergency-banner.content-container .emergency-banner-content a {            text-decoration: underline;        }        .emergency-banner.content-container .emergency-banner-content a:hover {            text-decoration: none;        }        .emergency-banner.content-container .emergency-banner-content .emergency-banner-primary-header,        .emergency-banner.content-container .emergency-banner-content .emergency-banner-secondary-header {            margin-bottom: 20px;        }        .emergency-banner.content-container .emergency-banner-content .emergency-banner-primary-header.headline-bold,        .emergency-banner.content-container .emergency-banner-content .emergency-banner-secondary-header.headline-bold {            font-weight: 500;        }        .emergency-banner.content-container .emergency-banner-content .emergency-banner-primary-subheader ul,        .emergency-banner.content-container .emergency-banner-content .emergency-banner-secondary-subheader ul {            list-style-type: disc;            padding-left: 1.5em;            padding-right: 0;        }        .emergency-banner.content-container .emergency-banner-content .emergency-banner-primary-subheader p,        .emergency-banner.content-container .emergency-banner-content .emergency-banner-primary-subheader li  {            font-size: 18px;            line-height: 26px;            margin: 0;        }        .emergency-banner.content-container .emergency-banner-content .emergency-banner-secondary-subheader p,        .emergency-banner.content-container .emergency-banner-content .emergency-banner-secondary-subheader li {            font-size: 18px;            line-height: 26px;            margin: 0;        }        /* ebanner copy color config options */        .emergency-banner.content-container .emergency-banner-content .copy-white {            color: #fff !important;        }        .emergency-banner.content-container .emergency-banner-content .copy-black {            color: #000 !important;        }        .emergency-banner.content-container .emergency-banner-content .copy-utility-green {            color: #087900 !important;        }        .emergency-banner.content-container .emergency-banner-content .copy-core-green {            color: #40a829 !important;        }        .emergency-banner.content-container .emergency-banner-content .copy-sky-blue {            color: #b2d7ed !important;        }        .emergency-banner.content-container .emergency-banner-content .copy-daisy {            color: #fdf3cb !important;        }        .emergency-banner.content-container .emergency-banner-content .copy-dark-grey {            color: #666 !important;        }        /* ebanner primary/secondary header size config options */        .emergency-banner.content-container .emergency-banner-content .ebanner-heading-large {            font-size: 24px;            line-height: 30px;        }        .emergency-banner.content-container .emergency-banner-content .ebanner-heading-medium {            font-size: 24px;            line-height: 30px;        }        .emergency-banner.content-container .emergency-banner-content .ebanner-heading-small {            font-size: 18px;            line-height: 26px;        }        /* ebanner background color config options */        .emergency-banner.white-background-black-text {            background-color: #fff;            color: #000;        }        .emergency-banner.white-background-green-text {            background-color: #fff;            color: #087900;        }        .emergency-banner.pine-background {            background-color: #38635a;            color: #fff;        }        .emergency-banner.dark-green-background {            background-color: #183028;            color: #fff;        }        .emergency-banner.utility-green-background {            background-color: #087900;            color: #fff;        }        .emergency-banner.navy-background {            background-color: #2a5673;            color: #fff;        }        .emergency-banner.off-white-background {            background-color: #f5f1eb;            color: #000;        }        .emergency-banner.cool-grey-eight-background {            background-color: #666;            color: #fff;        }        @media(max-width: 1024px){            .emergency-banner.content-container .emergency-banner-content {                margin-bottom: 30px;            }            .emergency-banner.content-container.split-alignment .emergency-banner-content.left-align {                margin-bottom: 30px;                width: 60%;            }            .emergency-banner.content-container.split-alignment .emergency-banner-content.right-align {                width: 38%;            }            .emergency-banner.content-container.split-alignment .emergency-banner-content.right-align {                margin-left: 0;            }        }        @media(max-width: 768px) {            .emergency-banner.content-container .emergency-banner-content {                margin-bottom: 30px;            }            .emergency-banner.content-container.split-alignment {                display: inline-block;            }            .emergency-banner.content-container.split-alignment .emergency-banner-content.left-align {                display: inline-block;                width: 100%;            }            .emergency-banner.content-container.split-alignment .emergency-banner-content.right-align {                display: inline-block;                width: 100%;            }        }    </style>    <div class=\"emergency-banner content-container pine-background split-alignment\">        <div class=\"emergency-banner-content left-align\">            <div>                <h2 class=\"copy-daisy emergency-banner-primary-header ebanner-heading-large headline-bold\">Supporting your investing needs – no matter what</h2>                <div class=\"emergency-banner-primary-subheader\">                    <p class=\"copy-white\">We've put together some helpful resources to make it quick and easy to self-service and stay informed. If you need to reach us by phone, please understand your wait may be longer than normal due to increased market activity.</p>                </div>            </div>        </div>        <div class=\"emergency-banner-content right-align\">            <div>                <h3 class=\"emergency-banner-secondary-header ebanner-heading-medium headline-bold\">Helpful resources</h3>                <div class=\"emergency-banner-secondary-subheader\">                    <ul>                        <li><a href=\"/why-td-ameritrade/contact-us/top-faqs.page\">Answers to your top questions</a></li>                        <li><a href=\"/education/financial-market-news.page\">Daily market update</a></li>                    </ul>                </div>             </div>        </div>    </div></div>";

var span = document.createElement('div');
span.innerHTML = testContent;
cssSelector.parentNode.insertBefore(span, cssSelector);

Get links and email addresses from webpage

There are many useful tools such as Screaming Frog that can crawl your website and provide a nice clean report of all the URLS or email addresses that appear on your website. Here’s a solution, if you are working on an internal website and if you can’t run your crawling tool for some reason. Simply plug this code into your browser’s console window. Please comment out the “if(strURL.includes” line if you only want to see email addresses. Inspiration from TowardsDataScience.

var urls = document.getElementsByTagName('a');
var bolflag=false;

for (url in urls) 
{

	var strURL=urls[url].href;
	  if(typeof strURL !== 'undefined')
	{

		if(strURL.includes("@")==true)//comment me out you just want to see all links including emails
		
		{

		console.log('%c '+strURL, 'background: #222; color: #bada55');

		bolflag=true;
		}

	}//ends if
	
	
}//ends loop

if(bolflag==false)
{

console.log('%c NO Emails! ', 'background: #222; color: #bada55');

}



//Grabbing subsection of a page. Example Navigation only
var uniqueContent = document.getElementById("PLACEHOLDER_value_UNIQ_DIV");//add a new unique ID to the element you want using browser inspect elements if the html doesn't already have a unique DIV
var contentWlink = uniqueContent.querySelectorAll("a");
//can also attempt to combine statements with more advanced CSS selector. CSS is not my strength however.  "document.querySelectorAll("ul.vertical-list a");

var myarray = []
for (var i=0; i<contentWlink.length; i++){
var nametext = contentWlink[i].textContent;
var cleantext = nametext.replace(/\s+/g, ' ').trim();
var cleanlink = contentWlink[i].href;
myarray.push([cleantext,cleanlink]);
};
function make_table() {
    var table = '<table><thead><th>Name</th><th>Links</th></thead><tbody>';
   for (var i=0; i<myarray.length; i++) {
            table += '<tr><td>'+ myarray[i][0] + '</td><td>'+myarray[i][1]+'</td></tr>';
    };
 
    var w = window.open("");
w.document.write(table); 
}
make_table();

//Code courtesy of Phil Gorman - https://towardsdatascience.com/quickly-extract-all-links-from-a-web-page-using-javascript-and-the-browser-console-49bb6f48127b