How to Extract All Links from a Web Page?

As an SEO, you would sometimes want to extract all links from a web page to run a crawl and do some checks on the internal links. Usually, this might require the use of browser extensions. However, in this guide, I’ll walk you through creating a simple bookmarklet to do the job for you without needing any extensions to retrieve every URL, and its associated anchor text from a webpage. And the most convenient part? This rich set of data will automatically download in a tidy CSV format, making your data analysis more straightforward.

How to Extract Links from a Web Page by Creating a Bookmarklet?

Step 1: Establish Your Bookmarklet

    1. Start by copying the JavaScript code given below. This will serve as the URL of your bookmarklet, enabling the extraction of all links from a selected webpage.
      javascript:(function() {var results = [['Url', 'Anchor Text', 'External']]; var urls = document.getElementsByTagName('a'); for (var urlIndex in urls) { var url = urls[urlIndex]; var externalLink = url.host !== window.location.host; if(url.href && url.href.indexOf('://')!==-1) results.push([url.href, url.text, externalLink]); } var csvContent = results.map(function(line){ return line.map(function(cell){ if(typeof(cell)==='boolean') return cell ? 'TRUE' : 'FALSE'; if(!cell) return ''; var value = cell.replace(/[\f\n\v]*\n\s*/g, "\n").replace(/[\t\f ]+/g, ' ').trim(); return '"' + value + '"'; }).join(','); }).join("\n"); var blob = new Blob([csvContent], {type: 'text/csv;charset=utf-8;'}); var downloadLink = document.createElement('a'); var url = URL.createObjectURL(blob); downloadLink.href = url; downloadLink.download = 'data.csv'; document.body.appendChild(downloadLink); downloadLink.click(); document.body.removeChild(downloadLink); })()
    2. Next, you need to create a new bookmark in your web browser. If you’re using Chrome, simply click on the three dots in the upper right corner, navigate to Bookmarks and select Bookmark Manager. Within the Bookmark Manager interface, click on the three dots again, and choose “Add New Bookmark”.
    3. At this point, assign a descriptive name to your bookmark such as “Extract Links”.
    4. Now, paste the JavaScript code you copied earlier into the URL field.
    5. To finish up, save the bookmark.

Step 2: Using the ‘Extract Links’ Bookmarklet

  1. Utilising your newly crafted bookmarklet is simple. Navigate to the webpage you’re interested in analysing, then click on the “Extract Links” bookmarklet you created in the previous step.
  2. Once activated, the bookmarklet will execute the JavaScript code and automatically download a CSV file named ‘data.csv’. This file will contain all the links from the page, their associated anchor text, and an indicator to show if the links are internal or external.

Equipped with this bookmarklet, you now hold the power to quickly extract all links from a web page.

0 comments… add one

Leave a Reply

Your email address will not be published. Required fields are marked *