Duplicate Content and Your Website
Duplicate content can be a serious issue. Yet despite its dangers I don’t often encounter many folks who are even aware of what it is exactly. Last week Google dropped what I call the “Google hammer” on dozens of large websites dubbed “content farms.” (Note: I define the dropping of the Google hammer as the result of Google targeting and penalizing a website or web page in search results. Pretty simple.) These content farms were producing huge amounts of duplicate content on the internet, and Google’s dropping of the hammer was akin to dropping the gauntlet against sites that specialize in duplicate content.
Before you become too worried as to whether your small business website could be affected by Google’s latest actions, take a deep breath – I can assure you your small business website is not on the radar of this latest algorithm change. However, this is still a good time to visit the topic of duplicate content and discuss why it is bad, lousy, detrimental, obnoxious, and any number of other ghastly adjectives.
What is duplicate content?
Duplicate content is the republishing or replicating of content from one web page onto another. This occurs most often in one of two ways.
- Content is pulled from a web page by a bot or program/application and then delivered for publishing onto another website. Content can be published in its entirety or in an excerpt format.
- A content creator uses a web publishing software that creates duplicate content by design. This commonly occurs in content management systems (CMS) like WordPress.
When content from your website is published elsewhere on the internet search engines screen the content and make decisions on how to index the material, how to rank it in search results, or whether or not it should be eliminated altogether from search results (essentially, treated as spam). In some rare instances the duplicate content can rank higher than the original content on your website, or worse, cause your content and the offending content to both be penalized or treated as spam. The good news is that in most cases Google’s algorithm sees right through all of this and drops the hammer on the duplicate content (if it drops the hammer at all).
If you’re concerned as to whether your content is being duplicated online or not, just highlight, copy and paste sections of your website’s web copy into a Google search box. If you’re web copy is being duplicated, Google will likely catch it and display it in the search results (unless, of course, Google chooses to ignore the duplicate content, in which case, score!).
WordPress and Duplicate Content
I develop websites for my clients based on WordPress CMS. What many do not know about WordPress is that it is a duplicate content creating machine! However, duplicate content that occurs within a website is handled differently from that which occurs outside of a website. When duplicate content occurs within a website it forces the search engines to make choices as to which bit of content takes priority in indexing over another. The pages that receive indexing are more likely to rank competitively in search results, and with the number of unique URL’s WordPress kicks out, there is no way all are going to be indexed.
As an example, let’s consider the URL homes that are created as a result of this blog post. WordPress publishes this article to the following:
http://blastsocialmedia.com/wordpress/duplicate-content-website/
http://blastsocialmedia.com/blast-blog/
http://blastsocialmedia.com/category/wordpress
http://blastsocialmedia.com/tag/duplicate-content
http://blastsocialmedia.com/tag/google
http://blastsocialmedia.com/author/stephen-kelly
http://blastsocialmedia.com/2011/03
http://blastsocialmedia.com/wordpress/duplicate-content-website/#respond
http://blastsocialmedia.com/?s=keywordterm
The list goes on… but you get the point.
In most instances an author will want a search engine to index and include in search results an intended page, and in the case of this example, that would be the actual article URL (the first URL listed above). The easiest way to prevent search engines from analyzing huge amounts of duplicate content is to give them instructions on which URL’s you would like them to consider for inclusion in indexing. To accomplish this I modify the robots.txt file.
Robots.txt and WordPress
While there are a number of WordPress plugins that can be used to eliminate duplicate content, I find it easiest to just confront the issue through the use of a website’s robots.txt file. Many SEO experts hold a conviction that the use of the robots.txt file in SEO is poor practice, and in 99% cases they are right. Many SEO issues can instead be solved using 301 redirects and nofollow commands. Yet where I respectfully disagree with these folks is in the case of CMS’s like WordPress that create copious amounts of duplicate content that are not easily controlled by the aforementioned methods. Modification of the robots.txt file is the quickest and simplest way to tell search bots what you would and would not like them to analyze.
To accomplish this most simply I use a little plugin called KB Robots.txt. (Note: one can also simply alter the robots.txt file directly through one’s hosting provider using FTP protocol.) The plugin is very easy to use – just install, activate, and then click on its name in the left sidebar from within Worpress. In the text box provided I dump the following code:
User-agent: *
Disallow: /*.js
Disallow: /*trackback
Disallow: /*.css
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
Disallow: /tag/
Disallow: /author/
Disallow: /comments/
Disallow: /categories/
Disallow: /search
Disallow: /wp-*
Disallow: /events/
Note that the last entry pertains to a special calendar plugin installed on a page called “events”. Upon inspection of the kinds of links this plugin was creating, I noticed that it was producing numerous looping links back and forth between its different components. Since the content of the calendar wasn’t important to the wider SEO goals of the website, I decided to simply block the calendar page from search altogether.
Conclusion
For most small business owners duplicate content is not and never will be a tremendous issue, yet it is still worth while to be aware of its existence and to structure how you generate content to be in line with best practices for avoiding this often unspoken of annoyance. After all, what you don’t know can’t hurt you – until it does.








