How to Extract All Links from a Web Page?

As an SEO, you would sometimes want to extract all links from a web page to run a crawl and do some checks on the internal links. Usually, this might require the use of browser extensions or Python-based extraction with BeautifulSoup.. However, in this guide, I’ll walk you through creating a simple bookmarklet to scrape all the hyperlinks embedded within HTML content without needing any browser extensions or using Python to retrieve every URL and its associated anchor text from a webpage. And the most convenient part? This rich set of data will automatically download in a tidy CSV format, making your data analysis more straightforward.

How to Extract Links from a Web Page by Creating a Bookmarklet?

Step 1: Establish Your Bookmarklet

Start by copying the JavaScript code given below. This will serve as the URL of your bookmarklet, enabling the extraction of all links from a selected webpage.
javascript:(function() {var results = [['Url', 'Anchor Text', 'External']]; var urls = document.getElementsByTagName('a'); for (var urlIndex in urls) { var url = urls[urlIndex]; var externalLink = url.host !== window.location.host; if(url.href && url.href.indexOf('://')!==-1) results.push([url.href, url.text, externalLink]); } var csvContent = results.map(function(line){ return line.map(function(cell){ if(typeof(cell)==='boolean') return cell ? 'TRUE' : 'FALSE'; if(!cell) return ''; var value = cell.replace(/[\f\n\v]*\n\s*/g, "\n").replace(/[\t\f ]+/g, ' ').trim(); return '"' + value + '"'; }).join(','); }).join("\n"); var blob = new Blob([csvContent], {type: 'text/csv;charset=utf-8;'}); var downloadLink = document.createElement('a'); var url = URL.createObjectURL(blob); downloadLink.href = url; downloadLink.download = 'data.csv'; document.body.appendChild(downloadLink); downloadLink.click(); document.body.removeChild(downloadLink); })()

Next, you need to create a new bookmark in your web browser. If you’re using Chrome, simply click on the three dots in the upper right corner, navigate to Bookmarks and select Bookmark Manager. Within the Bookmark Manager interface, click on the three dots again and choose “Add New Bookmark”.

How to Extract All Links from a Web Page

– At this point, assign a descriptive name to your bookmark, such as “Extract Links”.

– Now, paste the JavaScript code you copied earlier into the URL field.

– To finish up, save the bookmark.

Step 2: Using the ‘Extract Links’ Bookmarklet

First, make sure your Bookmarks Bar is visible in Chrome. If it’s not, press Ctrl + Shift + B (Windows) or Cmd + Shift + B (Mac) to toggle it on. Next, drag the new bookmark you created in the previous step to your Bookmarks Bar for quick access. This allows you to run the tool with just one click whenever you’re browsing a page you want to analyse.

extract all links from webpage chrome bookmarklet

Utilising your newly crafted bookmarklet is simple. Navigate to the webpage you’re interested in analysing, then click on the “Extract Links” bookmarklet you created in the previous step.

Once activated, the bookmarklet will execute the JavaScript code and automatically download a CSV file named ‘data.csv’. This file will contain all the links from the page, their associated anchor text, and an indicator to show if the links are internal or external.

While you can use Python scripts or tools like BuzzStream Link Extractor and HackerTarget Link Finder to extract links, the bookmarklet offers a faster, no-code alternative for quick way to extract all links from a URL.

0 comments… add one

Leave a Reply

Your email address will not be published. Required fields are marked *