Manick Bhan
ON Nov 15, 2022
8 minute read

What is a robots.txt file and Common Issues With Implementation

Although a more technical part of SEO optimization, robots.txt files are a great way to improve the crawling and indexing of your website. This article [...]

Although a more technical part of SEO optimization, robots.txt files are a great way to improve the crawling and indexing of your website.

This article will break down all of the details related to robots.txt and highlight common issues related to its implementation.

If you run a seo audit in the SearchAtlas site auditor, you may see issues flagged related to your robots.txt file. You can use this article to troubleshoot and resolve those issues.

What are robots.txt Files?

The robots.txt file tells web crawlers which areas of your website they are allowed to access and which areas they are not allowed to access. It contains a list of user-agent strings (the name of the bot), the robots directive, and the paths (URLs) to which the robot is denied access.

A robots.txt file detailing implementation, outlining allowed and disallowed WordPress and Linkgraph URLs, and addressing common issues, with a sitemap link included at the end.

When you create a website, you may want to restrict access to certain areas pages of your search engine crawlers and prevent specific pages from being indexed. Your Robots.txt file is where web crawlers will understand where they do and do not have access.

Robots.txt is a good way to protect your site’s privacy or to prevent search engines from indexing content that is not rank-worthy or ready for public consumption.

Also, if you don’t want your website to be accessed by other common web crawlers like Applebot, Ahrefbots, or others, you can prevent them from crawling your pages via your robots.txt file.

Jon Fish

Director of Search

Let’s Talk Links–Schedule Your Free Strategy Call

Our experts will help you build a smarter, safer link building plan.

Where is my robots.txt File Located?

Robots.txt file is a text file that is placed in the root directory of a website. If you don’t yet have a robots.txt file, you will need to upload it to your site.

If the web crawler cannot find the robots.txt file at the root directory of your website, it will assume there is no file and proceed with crawling all of your web pages that are accessible via links.

How you upload the file will depend on your website and server architecture. You may need to get in contact with your hosting provider to do so.

If you want our team at LinkGraph to configure or upload robots.txt for your website, order our Technical SEO Fundamentals package in the order builder.

Why Should I Care About robots.txt?

The robots.txt file is considered a fundamental part of technical SEO best practice.

Why? Because search engines discover and understand our websites entirely through their crawlers. The robots.txt file is the best way to communicate to those crawlers directly.

Some of the primary benefits of robots.txt are the following:

Improved crawling efficiency
Prevent less valuable pages from getting indexed (e.g. Thank you pages, confirmation pages, etc.)
Prevents duplicate content and any penalties as a result
Keeps content away from searchers that is not necessarily of high-value

Are You an Agency?
See Our White Label Options

How Does robots.txt Work?

When a search engine robot encounters a robots.txt file, it will read the file and obey the instructions.

For example, if Googlebot comes across the following in a robots.txt:

User-agent: googlebot
Disallow: /confirmation-page/

It also won’t be able to access the page to crawl and index. It also won’t be able to access any other of the pages in that subdirectory, including:

```
/confirmation-page/meeting/
```
```
/confirmation-page/order/
```
```
/confirmation-page/demo/
```

If a URL is not specified in the robots.txt file, then the robot is free to crawl the page as it normally would.

Accelerate Your Growth with AI-Driven SEO

Best Practices for robots.txt

Here are the most important things to keep in mind when implementing robots.txt:

robots.txt is a text file, so it must be encoded in UTF-8 format
robots.txt is case sensitive, and the file must be named “robots.txt”
The robots.txt file must be placed at the root directory of your website
It’s best practice to only have one robots.txt available on your (sub)domain
You can only have one group of directives per user agent
Be as specific as possible as to avoid accidentally blocking access to entire areas of your website, for example, blocking an entire subdirectory rather than just a specific page located within that subdirectory
Don’t use the noindex directive in your robots.txt
robots.txt is publicly available, so make sure your file doesn’t reveal to curious or malicious users the parts of your website that are confidential
robots.txt is not a substitute for properly configuring robots tags on each individual web page

When it comes to website crawlers, there are some common issues that arise when a site’s robots.txt file is not configured properly.

Here are some of the most common ones that occur and will be flagged by your SearchAtlas site audit report if they are present on your website.

1. robots.txt not present

This issue will be flagged if you do not have a robots.txt file or if it is not located in the correct place.

To resolve this issue, you will simply need to create a robots.txt and then add it to the root directory of your website.

2. robots.txt is present on a non-canonical domain variant

To follow robots.txt best practice, you should only have one robotxt.txt file for the (sub)domain where the file is hosted.

If you have a robots.txt located on a (sub)domain that is not the canonical variant, it will be flagged in the site auditor.

Screenshot showing a notification about robots.txt issues on a non-canonical domain variant, with a link to the robots.txt file at thewholesomedish.com.

Non-canonical domain variants are those pages that are considered duplicate pages, or copies of master pages on your website. If your canonical tags are properly formatted, only the master version of the page will be considered the canonical domain, and that is the version of the page where your file should be located.

For example, let’s say your canonical variant is

```
https://www.website.com/
```

Your robots file should be located at:

```
https://www.website.com/robots.txt
```

In contrast, it should not be located at:

```
https://website.com/robots.txt
```
```
http://website.com/robots.txt
```

To resolve this issue, you will want to update the location of your robots.txt. Or, you’ll need to 301 redirect the other non-canonical variants of the robots.txt to the actual canonical version.

3. Invalid directives or syntax included in robots.txt

Including invalid robots directives or syntax can cause crawlers to still access the pages you don’t want them to access.

If the site auditor identifies invalid directives in your robots.txt, it will show you a list of the specific directives that contain the errors.

A website health tool flags robots.txt issues, warning of invalid directives in the robots.txt file and listing two disallowed URLs: scan.linkgraph.io and legacy.linkgraph.io.

Resolving this issue involves editing your robots.txt to include the proper directives and the proper formatting.

4. robots.txt should reference an accessible sitemap

It is considered best practice to reference your XML sitemap at the bottom of your robots.txt file. This helps search engine bots easily locate your sitemap.

If your XML sitemap is not referenced in your robots file, it will be flagged with the following message in the Site Auditor.

Screenshot showing a notification that sitemap is missing in the robots.txt file for "thewholesomedish.com," advising to reference an XML sitemap and highlighting potential robots.txt issues.

To resolve the issue, add a reference to your sitemap at the bottom of your txt file.

Screenshot of a robots.txt file with user-agent rules and a highlighted section showing multiple sitemap URLs at the bottom, illustrating proper robots.txt implementation.

5. robots.txt should not include a crawl directive

The crawl-delay directive instructs some search engines to slow down their crawling, which causes new content and content updates to be picked up later.

This is undesired, as you want search engines to pick up on changes to your website as quickly as possible.

For this reason, the SearchAtlas site auditor will flag a robots.txt file that includes a crawl directive.

A website health checker displays a warning: "crawl-delay directive present," highlighting robots.txt issues and advising against using crawl-delay in your robots.txt file. Status shows "YES" and health is +10.

Conclusion

A properly configured robots.txt file can be very impactful for your SEO. However, the opposite is also true. Do robots.txt incorrectly, and you can create huge problems for your SEO performance.

So if you’re unsure, it may be best to work with professionals to properly configure your robots.txt. Connect with one of our SEO professionals to learn more about our technical SEO services.

				
					console.log( &#039;Code is Poetry&#039; );

Manick Bhan
Founder CEO/CTO

Manick Bhan is the Founder CEO/CTO of Search Atlas and LinkGraph — a serial entrepreneur with 10+ years of experience in technical SEO, machine learning, and growth automation. After starting his career on Wall Street, Manick applied his data-driven mindset to digital marketing, building platforms like Search Atlas and OTTO that solve real SEO challenges at scale. His technologies have helped both startups and Fortune 500 brands grow their organic visibility, proving that bold innovation and deep technical expertise can reshape how teams achieve digital success.

Did you like this post? Share it with:

Manick Bhan
Founder CEO/CTO

Accelerate Your Growth with AI-Driven SEO Solutions

Explore More Insights

SEO for Yoga Studios: Importance, Strategies, and Best Practices

SEO for yoga studios refers to the structured process of optimizing a yoga studio website...

SEO for Locksmiths: Benefits, Strategies, and Implementation Guide

SEO for locksmiths is the strategic process of optimizing a locksmith business’s website, Google Business...

Furniture Store SEO: Importance, Strategies, and Best Practices

Furniture Store SEO is the structured optimization of a furniture retail website, local listings, product...

SEO for Coworking Spaces: Importance, Strategies, and Implementation Guide

SEO for coworking spaces focuses on increasing local visibility for shared offices, flexible workspaces, and...

SEO for Printing Companies: Importance, Key Strategies, and Implementation Guide

SEO for printing companies involves optimizing websites, local listings, service pages, and portfolios to rank...

SEO for Daycare Centers: Importance, Strategies, and Implementation Guide

SEO for daycare centers involves optimizing a childcare center’s online presence to increase visibility in...

Healthcare SEO: Importance, Best Practices, and Implementation Guide

Healthcare SEO, known as medical SEO or SEO for healthcare providers, defines search optimization for...

SEO for Security Companies and Guard Firms: Importance and Strategies

SEO for security companies and guard firms involves optimizing websites, local listings, and online presence...

SEO for Veterinarians, Surgeons, and Clinics: Importance, Strategies, and How to Do It

SEO for veterinarians, surgeons, and clinics focuses on optimizing a veterinary website and online presence...

SEO for Gyms (Fitness Trainers): Importance, Strategies, and Implementation Guide

Gym SEO applies search engine optimization techniques specifically to fitness facilities, boutique studios, CrossFit gyms,...

SEO for Tree Surgeons: Importance, Strategies, and Implementation Guide

SEO for tree surgeons defines the process of optimizing online presence for professional tree care...

SEO for Bankruptcy Lawyers and Law Firms: Strategies and Implementation Guide

SEO for bankruptcy lawyers is the process of optimizing law firms’ websites and digital presence...