How-to Keep Your Gated Content Out of Search Engines

By Erin Gleeson | Inbound Marketing Account Manager

A few weeks ago, we covered whether or not to gate your high-value content, including what types of content you should be gating with a form.

Gating a content resource involves setting up a landing page with a form that a contact must complete in order to access the resource. We’ve discussed the optimal number of form fields in the past (between 3 and 5 fields), along with the form fields engineers are most likely to complete according to research.

Today, we’re going to talk about an oft-overlooked but critical step in publishing gated content: blocking robots from crawling and indexing that gated content piece. 

 

Copy of TREW Blog Featured ImageSWhat are robots, and why should I block them?

Robots, crawlers, spiders: all of these terms refer to programs used by search engines, such as Google, to automatically discover and scan websites. These programs will follow links throughout your site to discover and index your pages. When indexing a page, the search engine downloads the info and stores it to show in search results.

Generally, you want robots to crawl and index your webpages so you can show up in search engine results. You’ve likely spent a lot of time developing content and optimizing it for search, especially if you’ve been following an inbound marketing model for a while. Why would you want to stop a robot from crawling and indexing content?

Well, if a piece of content is valuable enough for a visitor to give you their information to download it, you want to make sure there isn’t a backdoor into that content. Blocking robots on the PDF and thank-you page ensures the content doesn’t show up, un-gated, in search results as a “free” resource.

 

How do I block robots on gated content?

There are a couple steps to follow to block new content from being crawled and indexed by a search engine’s robot. These steps apply across content management systems (CMS), although the technical set-up will vary based on system.

We’ll use TREW’s website hosted in HubSpot as an example.

1. Disable robots on the thank-you page

The optimal gated content flow is CTA -> landing page -> thank-you page. The thank-you page will deliver the resource and may include a CTA or next step for the contact.

landing page conversion process

Since the thank-you page either delivers the gated content or links to it (in the case of a PDF), this page should not appear in search results.

To ensure this page is not indexed, add the following code to the page head:

<meta name="robots" content="noindex">

This is done in HubSpot by editing the relevant page, toggling over to “Settings” and clicking the dropdown for “Advanced Options”. This code can then be added in the Head HTML code box.

Noindex image 

 

2. Block the PDF itself in the robots.txt file

If the gated content piece is a webpage, adding the noindex code will be enough to ensure search engines do not include it in search results. If it is a PDF though, you need to add the PDF link to your robots.txt file to prevent it from being crawled by search engine robots.

In HubSpot, click the settings wheel in the top right corner and navigate to Website, Pages. Select the “SEO & Crawlers” tab. Scroll to the robots.txt box.

robots.txt image 1

The format for robots.txt is:

User-agent: *
Disallow: /example-1

Add additional links under the first, including the disallow tag. Here’s an example:

robots.txt in HubSpot 2

The robots.txt file can be used to block access to specific pages, folders or even entire sites by all or specific web crawlers. Learn more about robots.txt from Moz.

 

My PDF is already showing up in Google search- How do I remove it?

If your existing gated content is already indexed and appearing in search results, you’ll need to submit the URL to Google Search Console for removal (note- this will only work for Google-indexed pages). 

Log in to Search Console and visit the Google Search Console URL removal tool here. Click the “Temporarily Hide” button, enter the URL and click continue.

Search console remove URL

After submission, Google will remove the URL from search results for about 90 days. Follow up by blocking robots in the robots.txt file or adding no index code to permanently prevent page crawling and indexing.

Search results before and after

Search results including a PDF vs immediately after submitting a removal request to Google.


Interested in learning more about SEO? Download TREW’S SEO guide.

Download the SEO Guide

Previous Blog Home Next

Erin Gleeson

Inbound Marketing Account Manager

Search by Subject

    Resources

    Recent Posts

    Subscribe to Our Blog