Technical SEO should be the foundation of any SEO strategy to improve organic search visibility. I started SEO auditing in 2010-ish. I ran over 100+ SEO audits in the past 10 years both in-house and as one-off projects.
In this post, I am sharing my Technical SEO checklist that you can use to perform a technical audit. I’ll be covering technical SEO concepts such as crawl optimisation, page speed, mobile SEO, JavaScript, International SEO, internal linking, on-page SEO recommendations and more. The recommendations provided must be implemented in order for you to get value from an SEO Audit. Focus on what will make an impact on the site.
- Understanding HTTP Status Codes
- Site Architecture & Internal Links
- URL Structure
- HTTPS Website Security
- Optimize Robot.txt File
- Page Speed Checks & Improvements
- Check your XML Sitemap for Issues
- Check Mobile-First Indexing Best Practices
- Check issues with Canonical Tags
- Check Issues with Pagination Implementation
- Validate Schema Markup
- JavaScript
- A few other things to watch out for.
- Google Search Console Reports
- My Favourite Free & Paid Technical SEO Tools
Understanding HTTP Status Codes
As a part of technical SEO site auditing, the first thing you need to pay attention is to the HTTP status codes of your site pages and resources. A status code is issued by a server in response to a browser’s request. There are over 60 different status codes each having its own meaning. The most common status codes you would come across during your technical SEO audits are 2xx successful, 3xx redirection, and the problematic 4xx client error or 5xx server error status codes as shown in the below table.
Status Code | What they mean |
---|---|
200 | OK (Success). |
301 | Permanent redirects: requested resource moved permanently to another location. |
302 | Temporary redirects: requested resource moved temporarily to another location. |
400 | Bad Request |
403 | Forbidden: requested resource is forbidden for some reason |
404 | Not Found: requested resource is not found in the location. |
410 | Gone: requested resource is permanently gone from the location. |
500 | Internal/Generic Server Error |
503 | Service Unavailable: If a server is overloaded or undergoing maintenance. |
To identify the different status codes of your website, you can use a number of different methods such as;
- Aiyma Redirect Path browser extension to do spot checks of certain pages.
- Or use website site crawlers such as ScreamingFrog / Botify / Deepcrawl / various to run a site crawl of your entire site.
Problematic status codes that you should specifically be looking for when auditing websites to find technical SEO issues are 301’s, 302’s, 404’s, 410’s, 5xx errors and lots of redirects. 3xx and 4xx errors can be resolved by updating internal links to the correct location instead of using redirects. Speak to your IT team who maintains the servers if you encounter a large number of server errors.
Site Architecture & Internal Links
Having a clear site structure with optimal internal linking is key for search engines to be able to fully understand and effectively index your site. Optimal site architecture and a good internal linking strategy can benefit users and search engine bots. For users, good site information architecture will allow them to navigate the website making it easier to discover additional pages and keeping them within the site. It will also help search engine bots clearly crawl the site to understand the site structure hierarchy. Internal links help in spreading link equity and PageRank around the website.
Optimise your internal links to reduce the crawl depth of key pages to spread the link equity in the most effective way possible. Follow the below best practices to optimise the flow of your link equity through the site;
- You should interlink only relevant pages that naturally interlink.
- Reduce the amount of duplicate content to ensure link equity isn’t wasted on these duplicate pages.
- Remove low-quality pages from your site to avoid wasting the flow of link equity to these pages.
- Use the Rel=”nofollow” directive at the link level to signal crawlers that the link should be ignored. This will stop you from passing link equity to the nofollowed pages.
- Add pages or folder structure to the robots.txt to block them from getting crawled and passing link equity.
- Avoid having too many links per page to prevent diluting your link equity.
- Fix your internal link redirects, especially redirect chains that cause more hops for a bot to reach the final page. This can dilute the spread of link equity to the final page.
- Find pages not in the site structure by combining analysis from your site crawl report, log files, search console and analytics data.
Internal links can be present on the navigation menu, body content, footer, sidebar, related links section on blog articles or related products section on product description pages. The navigation menu and the body content links usually link to the most popular pages of the site. It’s no surprise that for a site, the homepage gets the most external links. So, it’s important to optimise the internal links on the homepage to link to most pages of your site. This will pass the link equity to the deeper pages of the site architecture.
There are a number of ways to review the internal links on a website;
- Manually by clicking through the links on your site to check if everything is ok from a user perspective.
- Use tools like ScreamingFrog, Botify or Deepcrawl to pull the internal links for a large number of your site pages. Ensure all your internal links point to 200 status code pages and don’t go through redirects.
- Use tools like Sitebulb or ScreamingFrog itself to visualise your site’s internal linking structure and crawl depth.
- Review your site’s internal linking structure for orphan pages and include them in your main internal lining architecture if the page is useful. An easy way to spot these is by finding pages within your sitemap but not linked within the site pages.
Some examples of sites getting internal linking right;
Find similar items here section on Sports Direct
Vacation Destinations Near Section on Hometogo
Explore the related categories & searches Section of Etsy
Review unique internal links to your most important pages from time to time. Maximise internal linking to pages with the highest search demand and contribution to revenue. A good internal linking structure to your key pages helps bots decide what pages are most important.
URL Structure
URLs act as a minor ranking factor. Pay attention to your URL structures. Make them human-readable to make it easy for both search engines and users to understand the destination page and the structure of your website. They shouldn’t be cluttered and long. Keyword-rich URLs are better for SEO.
Let’s take an example of a facetted URL from the House of Fraser website – https://www.houseoffraser.co.uk/women/dresses/maxi-dresses. The URL format is clear for both users and search engines what the page is about and what they are likely to find within. Make it as clear as possible to search engines by laying out your URLs in an ordered format. You can also see the journey the user has followed to reach that specific maxi dress page.
When it comes to the importance of URL structure versus click depth, John Mueller revealed that Click depth determines page importance more than the URL structure. You must ensure that your key pages are as closely linked to the homepage as possible.
HTTPS Website Security
HTTPS is the most secure data transfer protocol between a webpage and a browser. Google considers a website’s security as a ranking factor as it’s safe for the users. Your site must run on HTTPS as this is the secure version and It improves the trustworthiness of your website. When it comes to HTTPS, you must ensure the following;
- Internal links must point to the HTTPS version.
- Images load via HTTPS.
- Ad networks must load via HTTPS.
- Automatic redirects must be in place from HTTP to HTTPS.
Optimize Robot.txt File
A robots.txt file is located in the root folder. Crawlers and spiders of user agents such as Googlebot, Bingbot, Yandexbot etc will access the robots.txt file to understand what can or can’t be crawled of the site before accessing other areas of your site. Crawlers may still choose to ignore the instructions.
We can block certain sections of a website from crawlers using Robots.txt to prevent crawl budget wastage of site sections you don’t want getting indexed. The 2 common directives used in a robots.txt file are the ‘Allow’ and ‘Disallow’ to instruct the User-agents. Make sure that this file does not exclude any important sections of the site from getting crawled. The robots.txt can also include the XML sitemap URL using the Sitemap directive to aid in your URL discovery.
Use the robots.txt Tester in the Search Console to test the robots.txt markup. Please note that Google does not support the inclusion of noIndex directives within robots.txt files.
Page Speed Checks & Improvements
Fast pages convert and rank better. The benefits are beyond just SEO and are much more important in the mobile-first world. Pages with longer loading times tend to have higher bounce rates. This can have a significant impact on your website performance and rankings.
There are various online tools available to help you analyse your site speed and provide in-depth analysis and recommendations and quick fixes such as;
- Choosing a good Web hosting company.
- Improve server response time.
- Minification of HTML, Javascript, and CSS.
- Having lots of images or large images on your web pages not only affects overall performance for a user but can also have an impact on the ability of the page to rank on Google. Optimising the size of images using image compression techniques. You can use a lossless optimizer such as ImageOptim or FileOptimizer to make your images download faster, without losing quality.
- Eliminate render-blocking resources.
- Fix internal Redirects.
- Enabling Cache – Server and Browser caching. Using CDNs.
- and more…
The most common tools used by SEO to test your Web pages and see how fast they perform are Google Page Speed Insights/Google Lighthouse, GTMetrix/WebPageTest and the speed report which is part of the Google search console.
On Google Page speed insights for example, when you run a test on a webpage, everything that is green and orange is arguably OK, but the problems are usually highlighted in red that you should be looking to fix. On the page speed report within your search console, you can see a high-level view of the number of slow, moderate and fast URLs on your site. You can dig deeper to check the exact details of the issue for both desktop and mobile.
Various tools can be used to analyse the page speed of a page and be presented with detailed opportunities and the potential savings that can be made in load time by fixing the issues. My recommendation would be to use a combination of these speed check tools to get the best possible picture as each tool can highlight a couple of unique opportunities.
Check your XML Sitemap for Issues
An XML Sitemap is a way of telling search engines about your site URLs (pages, videos, images etc) you wish to be indexed in search results. An HTML sitemap, on the other hand, helps users navigate the site. The XML sitemap must contain the URL and the last modified date. It can also contain other optional fields such as alternate language versions for an International site. A clear sitemap quickly shows search engine crawlers the key pages you want them to discover sooner. Especially beneficial for large sites.
Ensure your site has a valid XML sitemap or sitemap index containing only indexable, 200 status code, self-canonicalised site URLs and submit it to the search console.
There are a number of tools that can be used to create XML sitemaps. Most CMSs come with out-of-the-box dynamic XML sitemap generators so the sitemap stays up to date when pages are added/removed. In what comes as good news from the announcement in July 2020, WordPress 5.5 gets a built-in XML sitemap feature and will be included in all future updates.
You could use plugins, free online tools or site crawlers to find XML sitemap issues such as inclusions of non-200 status code URLs / non-self canonicalised URLs / non-indexable URLs within your XML sitemap. As a part of auditing the sitemap for standard issues as explained here, you may even come across orphaned URLs present in the XML sitemap that you could then link within your site architecture.
To check if a site has an XML sitemap or sitemap index, check the site’s robots.txt and look for the sitemap declaration.
Check Mobile-First Indexing Best Practices
It is a no-brainer that users are becoming more active on mobile compared to desktop. So, it’s important to adopt effective mobile-first SEO strategies and be mobile-friendly. Mobile-first indexing which Google announced in March 2018 essentially means Google will use the mobile version of the site for indexing and ranking to better help out primary mobile users find exactly what they are looking for on SERPs. So, this means that Google is primarily going to crawl the mobile version of the website using the Smartphone user agent and index and rank you based on that.
Since the 1st of July 2019, all new sites by default are on the mobile-first index. But this does not mean the site follows mobile best practices. Those sites that existed before this date would be evaluated and moved to the mobile-first index if they follow mobile best practices. Site owners will be notified through the Google search console when their site is moved across the mobile-first index. Follow the Google developers documentation link if you want to read up in-depth details on Mobile-first indexing best practices.
What is the mobile best practice one should follow?
- Ensure you have the same content on your mobile and desktop site. If you have a good responsive design (Google recommends this), you should be OK. Review the following valuable content on both versions of the desktop and mobile site to ensure parity;
- Menu Links
- Main body content.
- Links on Footer and Sidebar.
- Schema markup
- Responsiveness: Your website must be responsive so that it can be easily viewed on different devices
and browsers. If your site is not mobile responsive and has different page URLs for desktop and mobile versions of the page, you need to use a rel=”alternate” tag. This is used to specify alternate versions of web pages. This is what it looks like when you’re defining a mobile version of a page: <link rel=”alternative” media=”only screen and (max-width: 640px)” href= “https://m.example.com/page.html”/> on your desktop version of the page. On your mobile version of the page (https://m.example.com/page.html), you need to add a canonical that points back up to the desktop (master version) page to help establish and maintain that relationship between those two pages. - Ensure your site is mobile-friendly without any usability issues on mobile devices.
- Use the viewport meta tag. This is what it looks like when it appears within the HTML: <meta name=”viewport” content=”width=device-width, initial-scale=1″>. Using the viewport tag effectively is key to ensuring that users have a good experience when visiting the mobile version of websites.
Check issues with Canonical Tags
A canonical tag is used to tell search engines the preferred version of the page URL to which content should be attributed. The non-self-canonical tagged pages are non-Indexable. The canonical can be implemented within HTTP headers, as well as within the HTML head. Because Google treats canonical tags as a hint and not a directive, you can’t completely rely on them. Canonical tags are extremely useful to prevent problems with duplicate pages. It is effectively telling search engines which version needs to be indexed in SERPs.
Canonicals Best Practice
- Implement a self-referential canonical tag on the unique pages you want to index. This tells the search engines that the page itself is the preferred version you want to index.
- On Ecommerce sites, to deal with duplication, canonical tags can be used on your facetted/filter pages to canonicalise to the category page to stop filter pages from targeting the same terms as your category page. Check out my SEO guide for eCommerce sites or WooCommerce-based eCommerce sites here.
- Don’t canonicalise a page to a redirect page, a non-indexable page or a non-200 status code page. These provide mixed signals to search engines. Ensure you don’t end up having canonical chains.
- Ensure all your pages within the paginated series contain a self-referential canonical tag. It is a common pitfall on eCommerce sites to see all the paginated pages within the paginated series canonicalise to the 1st page.
- Check if your parametrised page URLs are canonicalised correctly.
To identify the canonical tags for a page, you could;
- Inspect the DOM (right-click and click Inspect). Search for “canonical”.
- Check Page Source. As opposed to the DOM, this is the unrendered HTML.
- You can also crawl a page using a crawler such as Screaming Frog to investigate canonical tags for your site pages. It is easy to review missing self-referential canonical tags, canonical tags pointing to another page or incorrect implementations of a canonical tag through a site crawl.
- You can also use the Search Console to inspect canonical tags either by;
- Inspect a single URL using the Inspect feature.
- Reviewing the excluded URLs in the coverage report. View the “Duplicate without user-selected canonical section”. This report is used to identify issues where Google has selected a different canonical URL from the one set. Sometimes Google can go wrong.
Check Issues with Pagination Implementation
Typically, pagination is usually implemented either using a traditional numbered pagination or using a ‘Load More’ button which is my preference as it provides an intuitive experience for a user. The first thing that comes to everyone’s mind when we think of Pagination is rel=prev/next markup which used to be an important consideration for defining the paginated series for Google. However, in early 2019, Google announced that rel=prev/next has not been used by them for indexation for a while and hence is no longer crucial. But they can be left on the pages and do no harm if implemented correctly. Incorrect implementation of pagination can cause spider/bot traps.
To identify incorrect rel=prev/next markup implementation, you could;
- Go to the final page of the paginated series and inspect the rel=prev/next meta tag. If the last page contains a value for the next markup, then it is incorrectly implemented.
- You can also use a crawler to check if the Pagination is correctly implemented for a site. On Screaming Frog, for example, you can check the Pagination section once the crawl analysis is complete. Issues such as non-indexable paginated URLs / non-self-referential paginated URLs can be easily identified using Screaming Frog. The latter is a common issue on eCommerce sites where the paginated series of a product lister page usually canonicalise back to page 1.
- A common issue seen on websites using the ‘Load More’ implementation of pagination is, that they are implemented via JavaScript without an underlying crawlable <a href> link for bots. This can severely limit how much the bots can crawl beyond the first page of the paginated series. To check if the ‘load more’ pagination is implemented correctly, you can inspect the DOM (right-click on the ‘load more’ button and click inspect) to identify whether it contains any real anchor links to the next page of the paginated series.
- Some sites may also implement pagination via infinite scroll. Some infinite scroll implementations may use uncrawlable JavaScript events to load the content after a certain point on the page. For the content to be properly crawlable, pages with infinite scrolls should support paginated loading with unique links to each section. To check whether these unique links exist, you can inspect the DOM of the page and search for the next logical page link. Links to the next & previous pages should be provided for all users, so bots & users that do not have JS enabled can crawl all paginated pages easily.
- Your pagination must be indexable and self-canonicalised. A common mistake website owners make is to canonicalise paginated URLs back to page 1. This is an incorrect usage of the canonical element.
I highly recommend you watch the “The State of Pagination and Infinite Scroll on The Web” video from the BrightonSEO 2019 conference.
Validate Schema Markup
Implementing schema markup / structured data can enhance your site’s appearance in the SERPs. By defining what you would like to see for some elements in the structured data you can standardise the display of your brand in SERPs. Some benefits of implementing schema include displaying rich search results, rich cards (on mobile), knowledge graphs, breadcrumbs, carousels and more in SERPs. Common schema use cases include news articles or blog posts, product schema for your eCommerce product pages, breadcrumbs, recipes, reviews, events etc.
Example with Review & FAQPage schema from TripAdvisor
The popular schema markup types are JSON-LD (Google’s preference) & Microdata. Microdata is HTML attributes within markup throughout a page. JSON-LD is a structured JSON object produced and injected into a page in one piece. Where possible use JSON-LD schema markup as it’s generally easier to implement and maintain. Click here to view an example of Product structured data using JSON-LD, RDFa & Microdata.
To validate if the schema markup is implemented correctly, use the Structured Data Testing Tool.
JavaScript
JavaScript is a lightweight programming language, often used to script events and UX elements. In early 2019, Googlebot announced the switch to evergreen Chromium. This has made JavaScript usage by websites less of an issue since Googlebot is now able to render JavaScript. Although Googlebot is able to render JavaScript, it does not mean relying on Googlebot to render a large amount of JavaScript is efficient as the pages can take longer to index due to the intensive process of rendering a large amount of content.
How to Identify if JavaScript is causing an issue on your website?
- If your links or important page content rely on JavaScript to load, then it is an issue. A quick way to check this is to disable JavaScript using a chrome extension and reload the page to compare how the page looks.
- You could also compare the page source with the DOM (the rendered version of the page) to check how much more content relies on the JavaScript version of the page.
- Run two crawls of your site. One using a JavaScript-enabled crawler and one text-only rendering crawler (default on Screaming Frog) and compare the difference.
How do we handle heavy JavaScript Sites to Optimise for Googlebot?
One of the strategies that can be used to ensure JavaScript-heavy sites can be crawled by Googlebot is via Pre-Rendering. In this instance, pages are rendered and cached server-side. The cached version of the page is then served to search engines.
A few other things to watch out for
- Review the Faceted Navigation of your eCommerce sites for common issues. If not handled correctly faceted navigation can cause duplication, massively eat up your crawl budget and dilute your main page’s link equity to low-value pages.
- Broken links: Fixing broken links helps in navigation and improves user experience that otherwise affects user engagement with the website.
- Perform Google searches using the ‘site’ command for your domain – site:domain.com. Review SERPs listings, and look for issues.
- Tracking issues
- Hacked pages.
- Cloaking.
- Blocked resources.
- Hidden on-page content or links.
- Pages wrongly canonicalised.
- Unexpected robots.txt changes
Monitor Google Search Console
Google search console is a free invaluable tool for site owners and SEOs. GSC includes 16 months of search traffic data with key reports such as the index coverage, server errors, sitemap, speed reports (including the new core web vitals reports), links and mobile usability reports and much more. These reports can help you monitor, troubleshoot and fix site issues. In November 2020, Google has released a new and improved version of the crawl stats report that you can use for your site within the search console to find issues.
Personally, on a day-to-day basis, I use the search console for the following;
- Analyse website search query impressions, clicks and position on Google search.
- Monitor Sitemap issues.
- Review index coverage reports.
- Proactively fix site issues upon receiving alerts over email.
- Use the URL inspection tool to analyse the indexation and crawling issues of your pages.
Fix what Google is telling you. Google has put together search console training videos on their YouTube channel to teach you how to monitor your site traffic and make informed decisions to optimise your site’s search appearance on Google SERP to ultimately increase your organic traffic. Don’t forget to connect your site to Bing Webmaster Tools as they have been revamping a lot of their offerings recently.
My Favourite Free & Paid Technical SEO Tools
- Webmaster Tools: Google Search Console, Bing Webmaster Tools
- Log analysis + cloud-based site crawlers: Deepcrawl, Oncrawl, Botify.
- Desktop website crawlers: Screaming Frog SEO Spider, Sitebulb.
- Page Speed tools: WebPageTest, GTMetrix, Google Page Speed Insights (Also install the Lighthouse extension)
- Image Compression Tools: ImageOptim, FileOptimizer
- Site Audit Tools: SEMRush / Ahrefs.
- Browser Extensions: Ayima redirects path browser extension, SEO META in 1 CLICK.
- Site monitoring tools: Robotto, Little Warden, VisualPing
No time to deal with Technical SEO? I provide technical SEO audit services. As a Freelance Technical SEO consultant, my job is to crawl and gather lots of data, interpret the data and provide actionable recommendations. Contact me today to discuss how I could help.