Website owners who have original unique content for each page of their website usually will not face any duplicate content related problems. However, due to technical reasons, duplicate content may be generated on any website. Specifically the Content Management System (CMS) used can create many versions of a page with different URLs. Similarly thin content is not a major issue for a majority of websites. However, it can adversely affect the overall ranking of a website in search engines.
Hence it is worth finding the relevant pages, fixing them. Tips on proactively fixing thin and duplicate content problems on your website are discussed in this article:
Using a SEO Audit Tool
The site auditor from Raven tools and the tool for site audit from Ahrefs are some of the most effective tools for finding content which is duplicate or thin. The Raven site auditor will scan the website for duplicate/thin content. It will then inform you which pages have to be updated. Similarly the Ahrefs audit tool has a section on content quality. This section will show if the site has the same content on multiple pages. However, these tools mainly focus on the content within the same website. Duplicate content also incudes copied content on other websites. The Batch search feature from Copyscape allows the user to upload multiple URLs and find out if the content is copied elsewhere. If any text snippet is copied on another website, it is advisable to search for the snippet in Google. If your URL is ranked first, you are considered the original author.
After you discovered all the pages on your site with thin content, it’s time to add more content to those pages. One of the things you will do is checking your competitor sites and see how much words they have on their pages. Your pages should contain more content than your competitor sites who rank on the first page of the search engines.
Noindex Duplicate Content Pages
Though most sites will have at least a few pages with duplicate content, it becomes a problem only when these pages are indexed. Adding the “noindex” tag to these pages can resolve this issue. This tag will tell search engines like Google that the page should not be indexed. The Google Search Console (GSC) will allow you to check if the noindex tag has been configured properly using Test Live URL. Ideally if the tag has been set up properly, you will get the message “excluded by the noindex tag”. Depending on the search engine, it will take a few days or weeks for the search engine to re-crawl the pages which you wish to exclude from their index. You should check the “excluded” tab in the GSC report for coverage to ensure that the pages with the noindex tag are removed from the google index. It is also possible to block the spiders of search engines from crawling pages, by blocking each crawler in the robots.txt file
Using Canonical URLs
Though unique content and noindex tag can fix the duplicate content problem, a third way to fix the problem is by using canonical URLs. Canonical URLs are recommended for pages of a website which have similar content and only small differences. Typically ecommerce websites have a product page, and a page for each variation of the product differing in colour, size or other factors. The canonical tag can be used to indicate that the product page is the main page of the website, all other pages are variants of the main page.
If you need help with setting up your “noindex” tag and Canonical URLs, please contact us for a free consultation.