Extract values from website dropdown list – using JavaScript




I was recently asked how to extract values from an HTML dropdown. Here’s the code an due credit.

Option 1 – only works with traditional ways dropdowns are defined.

var ddlArray= new Array();
var ddl = document.getElementById('PLACEHOLDER_NAME_OF_SELECT_FORM');
for (i = 0; i < ddl.options.length; i++) {
   ddlArray[i] = ddl.options[i].value;
   
   console.log(ddlArray[i]+',');
}



Option #2 – is more flexible when dropdowns are custom and don’t use the traditional select,option HTML elements.


var ddlArray= new Array();
var printValue="";

var ddl = document.querySelectorAll("#cars > option");
for (i = 0; i < ddl.length; i++) {
   printValue=printValue + ddl[i].innerText+',';
   
}
console.log(printValue);

<!--Example HTML to test out code -->

<select id="cars" name="carlist" form="carform">
  <option value="volvo">Volvo</option>
  <option value="saab">Saab</option>
  <option value="opel">Opel</option>
  <option value="audi">Audi</option>
</select>

Code from – https://stackoverflow.com/questions/6378680/get-all-the-values-of-a-dropdownlist-to-an-array-using-javascript

Free website scraping tool, data extractor – bulk automation

What this script does: Automates the extraction of selected sections of HTML webpages. This is called batch website scraping or screen scraping. For example, this script can extract the H1, meta tag, image ALT tags from webpages.

How it works: Scrape all website pages specified in Column A of input tab. Column C, D, E, etc use CSS selectors to extract sections of page. Technically you can add as many columns as needed. The script is dynamic and will only read columns with values. Please read the “readme” tab for further instructions on capabilities and examples. You will need to use Internet Explorer inspector to identify the CSS selector of the section of the page you wish to extract. The CSS selectors will be slightly different with Internet Explorer than Chrome and other browsers so for most accurate results use Internet Explorer. This script also supports nesting elements. Nesting is very useful for complicated webpages that don’t use unique ID tags.

Why you would use this: This is a free scraper. While there are other screen scraper software such as Octoparse and ObservePoint they are not free and may require software install or lengthy approvals from your IT department. This tool is also useful if your company restricts software you can install on your work computer. All you need to use this tool is Microsoft Excel. Yes, this uses macros, don’t let this intimate you from using this however, the code is concise, readable and I am only using native Microsoft libraries. Check out the code yourself prior to executing the program. You will have to enable the developer tab in Excel to view the code. Enjoy, let me know if you found this tool helpful.

Actual use cases

  • Quality assurance checks. I used this script to confirm that over 100 product detail pages had pricing.
  • Identify pages that have differences. You can identify pages that do not have a disclosure or using the wrong template.
  • Take a snapshot of the website content. When you run the script, the data you extract will be captured along with the date the script was run. This is a handy timestamp of your website. It may be helpful to run this script daily to know when something changes on the page.

Download free screen scraper tool

Get links and email addresses from webpage

There are many useful tools such as Screaming Frog that can crawl your website and provide a nice clean report of all the URLS or email addresses that appear on your website. Here’s a solution, if you are working on an internal website and if you can’t run your crawling tool for some reason. Simply plug this code into your browser’s console window. Please comment out the “if(strURL.includes” line if you only want to see email addresses. Inspiration from TowardsDataScience.

var urls = document.getElementsByTagName('a');
var bolflag=false;

for (url in urls) 
{

	var strURL=urls[url].href;
	  if(typeof strURL !== 'undefined')
	{

		if(strURL.includes("@")==true)//comment me out you just want to see all links including emails
		
		{

		console.log('%c '+strURL, 'background: #222; color: #bada55');

		bolflag=true;
		}

	}//ends if
	
	
}//ends loop

if(bolflag==false)
{

console.log('%c NO Emails! ', 'background: #222; color: #bada55');

}



//Grabbing subsection of a page. Example Navigation only
var uniqueContent = document.getElementById("PLACEHOLDER_value_UNIQ_DIV");//add a new unique ID to the element you want using browser inspect elements if the html doesn't already have a unique DIV
var contentWlink = uniqueContent.querySelectorAll("a");
//can also attempt to combine statements with more advanced CSS selector. CSS is not my strength however.  "document.querySelectorAll("ul.vertical-list a");

var myarray = []
for (var i=0; i<contentWlink.length; i++){
var nametext = contentWlink[i].textContent;
var cleantext = nametext.replace(/\s+/g, ' ').trim();
var cleanlink = contentWlink[i].href;
myarray.push([cleantext,cleanlink]);
};
function make_table() {
    var table = '<table><thead><th>Name</th><th>Links</th></thead><tbody>';
   for (var i=0; i<myarray.length; i++) {
            table += '<tr><td>'+ myarray[i][0] + '</td><td>'+myarray[i][1]+'</td></tr>';
    };
 
    var w = window.open("");
w.document.write(table); 
}
make_table();

//Code courtesy of Phil Gorman - https://towardsdatascience.com/quickly-extract-all-links-from-a-web-page-using-javascript-and-the-browser-console-49bb6f48127b

Troubleshooting DLIBDOTNET and FaceRecognitionDotNet CUDA libraries

I’ve been building a Facial Recognition C# application leveraging the DLIBDOTNET and FaceRecognitionDotNet libraries. I tested a variety of free libraries and this is the most accurate at reliably predicating/recognizing faces. It also does a decent job of differentiating people of color.

There are two libraries, the regular “FaceRecognitionDotNet” library and the CUDA versions. Although it isn’t necessary to have CUDA to use this library, the recognition will be much faster if you leverage the CUDA version.

It’s been a process to get the CUDA version of these two libraries to work but it’s been worth it. Here’s a guide that could help you troubleshoot if you are having difficulties getting this to work.

What you’ll need & Steps

Although not necessary, the setup may work more smoothly if you follow these steps in this order.

  1. 64 bit Intel CPU. Visit the GitHub repository for current supported CPUs.
  2. Nvidia CUDA supported GPU
  3. Visual Studios 2019 & NET CORE 2.0+. I initially wrote my application using the old school Visual Studios Forms. After banging my head against the wall, I did some research and found out that WPF type applications have more native support for GPU applications. After switching to .NET Core 2.0 and a WPF application, I finally got the CUDA version of this library to work.
  4. Nvidia CUDA development kit. The CUDA Nvidia development kit will detect your graphics card and will most likely install an older graphics card driver. Let it install the older version otherwise you will get an error message during the install. After the install completes, reinstall the most current Nvidia drivers for your graphics card. It’s also important to install Visual Studios before the CUDA development kit because the installation package will attempt to install integration tools for Visual Studios. After installation is complete, make sure CUDA was installed properly by running one or more of the sample projects. You shouldn’t receive error messages after building and running the sample project. This is how you know it was installed properly.
  5. NVIDIA cuDNN – Follow the instructions on NVIDIA’s website to install cuDNN. You are simply just copying files into the CUDA installation directory. Yes, there isn’t anymore to the cuDNN installation. I used version 10.0. I also tested with newer version of CUDA however these tests were unsuccessful so I decided to stick with 10.0. Remember, you need to install the CUDA development kit before attempting to install cuDNN.
  6. You will also need at least 25 Gigs free space on the drive that has Windows. Visual Studios, CUDA and NVIDIA graphic drivers are bulky.

This is using a barebones Visual Studios installation. The DLIBDOTNET notes on NuGet mention that you need these files. Visual C++ 2017 Redistributable Package, cublas64_92.dll, cudnn64_7.dll, curand64_92.dll and cusolver64_92.dll. I attempted to simply just install the latest Visual C++ 2019 Redistributable and that itself didn’t work so I installed a few other C++ and .NET CORE components. See below for my barebones install of Visual Studios.

Individual Components

.NET

  • .NET Core 2.1 LTS Runtime
  • .NET Core 3.1 LTS Runtime
  • .NET Core SDK
  • NuGet package manager
  • .NET Framework 4.6.1 targeting pack
  • .NET Framework 4.7.2 SDK
  • .NET Framework 4.7.2 targeting pack
  • .NET Framework 4.8 SDK
  • .NET Native
  • .NET Portable Library targeting pack
  • Development Tools plus .NET Core 2.1

Cloud, database, and server

  • CLR data types for SQL Server

Code tools

  • C# and Visual Basic Roslyn compliers
  • C++ 2019 Redistributable MSMs
  • C++ 2019 Redistributable Update
  • C++ Cmake tools for Windows
  • MSBuild
  • MSVC v142 – VS 2019 C++ x64/x86 build tools (v14.25)

Debugging and testing

  • .NET profiling tools
  • C++ AddressSanitizer (Experimental)
  • C++ profiling tools
  • Just-in-Time debugger

Development activities

  • C# and Visual Basic
  • C++ core features
  • F# language support
  • IntelliCode

Games and Graphics

  • Graphics debugger and GPU profiler for DirectX

SDK, libraries and frameworks

  • C++ ATL for latest v142 build tools (x86 & x64)
  • Windows 10 SDK(10.0.18362.0)

After you are done with the above steps, create a simple Windows WPF .NET CORE application and run call the “TryGetDriverVersion” function. Sample code below. If everything was installed properly, you should see an output like this

“***HELLO CUDA Version:10010”

using DlibDotNet;

//C# example code to check if DLIB Cuda is installed properly. Make sure you are using the CUDA version from NuGet

 Cuda.TryGetDriverVersion(out int version);
         Debug.WriteLine("***************************************HELLO CUDA Version:" + version.ToString()) ;

I’ve been able to get this to work with a few different CUDA supported GPUs following the instructions above. Enjoying coding.