OK, let’s have this hypothesis:
“I have a blog post / article / resource / web page. What percent of it is duplicate content?”
Below, my answer to this, from what I perceive Google would do. So, how would Google see a specific article?
Some points on duplicate content:
- Starting from scratch – writing an article from zero, without any content:
- If you are currently writing the article from your mind, don’t fuss about it. Surely, in a 500-words essay, you will use some words which have been used, you will use some expressions which others have written. Ignore it.
- If you are writing as a result of a research, and you write an article by paraphrasing other articles online, ignore the dilemma. It’s your own research, it’s natural to look for other resources online.
- Building on something already written (by copying parts of it):
If you make an article by quoting lots of articles, you should first consider copyright laws (national and International), more than Google. Generally, if you only take small fragments of texts, and you always quote your sources, you should be fine (but read about this, also).
- 100% duplicate content of a single article:
- If you steal other people’s content 100%, no link, no reference, no nothing, you first have a problem with copyright, more than you have with Google. Sure, this is a copyright problem.
- If you have an article all over the Internet, and you want it reproduced on your web site, then it’s OK. A single article is fine. You should quote the source, and also, sometimes, get permission, even if you have written that specific text.
- Various percentage of duplicate content from a single / multiple web sites:
- If you copy large percentage of online resources, and you don’t ask any permission, it’s probably illegal (copyright laws). These should be more important than Google, in this case.
- Let’s say you are famous something (star / company / brand). And you don’t write things on your own (you have a web site – home / about / services / products / contact, but you don’t write new articles). And lots of things appear on the Internet about you. Should you copy them on your web site?
- First, if it’s a single article, the problem is not all that bad. You should still ask for permission, even if it’s content either written by you or involving you a lot. Still, people tend not to be upset for a single article involving you. See the above point.
- If there are lots of articles, then it’s a bit tricky. A general position would be:
- If you don’t want to ask for permission, only copy title + small paragraph + small photo. Then link to the original source.
- If you ask for permission, and get it, you should still, in my opinion, link to the original source. If your web site is filled with articles copied from elsewhere, even if they are talking about you, you should link to the original source of the article. It’s not a natural way of creating content – having lots and lots of content which is a copy from someplace else.
- You should never worry about duplicate content when we are talking about your own created content. Ignore this. (OK, it may be the case that you have an excellent memory, and you reproduce exactly what you read, but you aren’t aware of this; OK, I don’t think that this ever happens in reality)
- You should partially worry if you have a single article which is copied either 100% from another source or partially from different sources.
- You should worry quite a lot if the percentage of content on your web site is a duplicate of content from other web sites.
How about content on your own web site?
- Depends on the scale. If you have lots of categories / sorting of categories / filtering categories which produce lots of duplicate content on your web site, then this is a problem. If you have a presentation web site, and only 10% of the content is duplicate, then this is fine.
- By duplicate content, I mean things like:
- A page is reproduced 100% on other sections of the web site.
- A category of products has 300 products, and they can be shown in different ways, but they’re the same products.
- A web site typically copies lots of texts from one section to another, so the percentage of duplicate content overall in the web site is large.
- If you have a small web site (<50 pages), and you only have very small portions of duplicate content, you may easily ignore the issue.
How about template footprint?
- Yes, this is a problem. A real one. If I visit a web page with an article on your web site, I should see the title, and a big part of the text. The template footprint (header, menus, logo, sidebar, footer) should not be the vast majority of what I see, and the content only a very small percentage, with a photo and a line of text. Don’t fuss about this very much, just remember on putting the emphasis on the content, not on the template surrounding it.
How should you check for duplicate content?
- If you have only a few articles to check, I always go for a search using quotes on Google (example). I try to find a long query, so not two-three words in quotes, but one or two sentences.
- If you have lots of articles to check, you might want to look at CopyScape. I tend not to use this web site for a single reason – it doesn’t really matter if I use it or not:
- I either write an article from scratch, and don’t bother if 10% of it has already been written 5 years ago on a different web site.
- Or I copy parts of articles from other web sites, in which case I know the content is duplicate.
“But others live just fine by copying things…”
- Google will likely get better in time at detecting things like this.
When you should avoid over-stressing?
- When you know that the vast majority of articles on your web site are written from scratch, not copied (partially or totally).
- When you only know of very limited amounts of duplicate content from a page to another.
- When the content is treated very well on your web site, and the template footprint is small.
- Note: it doesn’t matter if you have 5% duplicate content, it is irrelevant, in my opinion.
Note: Also see the Yahoo! Group on which I present similar issues: IMRo. To join, email email@example.com and reply to the confirmation email.