Duplication: A Webmaster’s Biggest Nightmare

seo
14 Best Online Resources To Learn SEO For Free
April 28, 2017

Duplication: A Webmaster’s Biggest Nightmare

featured-image

Are you aware that internal duplication is one of the biggest problems webmasters face today? Is duplicate content your biggest trouble? Or are you still unaware that it could actually be your problem which is affecting your website rankings?

What is classified as duplicate content?

Consider the following URLs:

  1. example.com
  2. www.example.com
  3. www.example.com/#
  4. https://www.example.com/index.php
  5. https://www.example.com/
  6. https://example.com
  7. http://example.com

When a user enters through any of these URLs, same page will come up on the browser but with a different URL. Content on these pages will come under the category of duplicate content.

Now why is this a problem?

There are 3 major reasons for that:

  1. Dilution of link popularity: In the above example, backlinks to homepage can be given to any one of the seven pages. Since they are different pages, number of backlinks to your desired homepage URL will be less which in turn will affect your rankings.
  2. Inefficient crawling: Crawl budget for a website is more or less fixed. Wasting server resources on duplicate pages may cause a significant delay for search engines in discovering great content on a site.
  3. Search Engines wouldn’t know which version of the page should be ranked and might omit the desirable URL
  4. Page with ugly URL might get indexed

Why does your site have duplicate content?

There are chances that the problem has occurred due to one of these two reasons

  1. www and non www or http and https copies of the same: If you have different versions of your website i.e http://example.com and https://example.com (with “http” or “https” prefix), and both have same content, you have essentially created duplicates. Similarly for websites that have both www and non www versions, you may face duplicate content trouble.
  2. Dynamic URLs get created due to one of the following reasons:
    1. Search: When a user searches for a product the URL of the page gets modified and a new dynamic URL gets created e.g. http://www.amazon.in/Apple-iPhone-7-Black-32GB/dp/B01LZKSVRB/ref=sr_1_1?s=electronics&ie=UTF8&qid=1490770139&sr=1-1&keywords=iphone+7
    2. Different product shapes and sizes: Product pages with different colors/shapes and sizes have multiple dynamic click options each generating a new page with same content e.g. http://www.amazon.in/Apple-iPhone-Jet-Black-128GB/dp/B01M0811EC/ page generates http://www.amazon.in/Apple-iPhone-Jet-Black-128GB/dp/B01LZ8YCVJ/ and http://www.amazon.in/Apple-iPhone-Jet-Black-128GB/dp/B01LXAS8M2/
    3. Website parameters: Some websites tend to keep session IDs, page type information in the URL to keep record of visitors and analyze data e.g. example.com/?ssid324 or www.example.com/?page=tshirts&SSID=324

How do you fix the duplication problem on your website?

There are 5 common ways to ensure that your site does not possess duplicate content. Depending on your use case, you can choose to implement any one or more of following methods:

  1. 301 redirect
  2. Noindex, Nofollow
  3. Blocking with robots.txt
  4. Canonical tag
  5. Alternate link tag

Now the question arises, what to use and where. Let us understand their use case one by one.

301 redirect

301 redirect is essentially telling a search engine that the page which existed here has moved permanently to a new location.

Let us assume you have 7 duplicate pages all capable of ranking for a single search query. What if we combine these 7 pages? The relevancy and popularity of that single page would be much greater than any of the individual 7 pages. Thus to remove the duplication you must put a 301 redirect on 6 out of the 7 pages.

Even Google recommends 301 redirect as the most preferred way to remove duplicate pages. If you can use a 301, you must use it.

Noindex, Nofollow tag

A noindex tag is essentially telling a search engine that the page will exist as is and the search engine can crawl it freely but should not be indexed in the results provided by a search engine. The page can only be accessed with a link to that page. All the link juice and backlinks stay on that URL itself.

Noindex is used in cases when duplicate pages like “tag” pages or “category” pages get created and start to compete with the actual pages that are supposed to rank.

Canonical Tag

A canonical tag (rel= “canonical”) tells search engines to consider a particular URL for indexing and ranking purpose out of all the URLs containing same / similar information. This prevents the remaining pages from competing with each other.

Most of the duplication problem can be solved even before it arises? If you have duplicate static pages, simply select one page which you wish to rank. Mark all the pages (including itself) canonical to that single page. In case of dynamic pages, mark all your pages from where dynamic pages can be generated as canonical to itself (Self Canonicalized). By doing this, you have informed the search engine bots to consider this one page for all ranking and indexing purposes. The link juice from all other pages is transferred to this page and there is no internal competition.

Let us consider the following three static pages to understand this better:

  1. http://www.amazon.in/Apple-iPhone-Black-32GB/dp/B01LZWIOS4
  2. http://www.amazon.in/Apple-iPhone-Rose-Gold-32GB/dp/B01LZWIOS4
  3. http://www.amazon.in/Apple-iPhone-Gold-32GB/dp/B01LZWIOS4

If there is no canonical tag on above pages, then search engine might index all three of them. Further the pages will compete with each other for queries like “iphone 32 gb”.

Now let us consider following three dynamic URLs:

  1. http://www.amazon.in/Apple-iPhone-7-Black-32GB/dp/B01LZKSVRB
  2. http://www.amazon.in/Apple-iPhone-7-Black-32GB/dp/B01LZKSVRB/ref=sr_1_1?ie=UTF8&qid=1490770049&sr=8-1&keywords=phone+7
  3. http://www.amazon.in/Apple-iPhone-7-Black-32GB/dp/B01LZKSVRB/ref=sr_1_1?s=electronics&ie=UTF8&qid=1490770139&sr=1-1&keywords=iphone+7

If your page – http://www.amazon.in/Apple-iPhone-7-Black-32GB/dp/B01LZKSVRB is not self-canonicalized then chances are Google might index all the dynamic pages which get generated via filter or search. Further the pages will compete with each other for queries like “iphone 7 black”.

Therefore, one page should be identified as parent page which should rank for query “iphone 32 gb”/“iphone 7 black” and other two pages should have canonical tag referring to the respective parent page.

How to use a Canonical Tag?

The canonical tag is placed in the HTML header of a webpage. We use this portion of the code to add components relevant from SEO point of view like title tags, meta description etc. The code has the following syntax:

<link rel=”canonical” href=”https://example.com/” />

It is just a single step process, add the above code to the page which is a copy of “https://example.com/” and you are done.

Alternate Link Tag

The alternate tag is very similar to the canonical tag. Although this is used mainly for multilingual pages on a website. If you have pages on your website targeting users of different countries with similar content, alternate tag is used.

<link rel=”alternate” hreflang=”en” href=”https://example.com/” />

It is just a single step process, add the above code to the page which is an alternate in a different language for “https://example.com/” and you are done.

Conclusion

Duplication is a very common problem faced by webmasters across the globe, however the correct usage of above mentioned techniques on your website will definitely fix all your troubles instantly.

Add comment

E-mail is already registered on the site. Please use the Login form or enter another.

You entered an incorrect username or password

Sorry, you must be logged in to post a comment.