Catching Online Content Scrapers
Content scrapers are all over the Internet. They steal your content and use them for their own blogs without your permission. Some scrapers merely copy the content from your blog but many take content and present it as new.
It is very disconcerting to see your content appear, word for word, on someone else’s website and you know that you had absolutely nothing to do with that (aside from actually writing the content) and you certainly did not give your permission to anyone to use your content without the proper (or any) attribution for you. On the other hand, however, if a person doesn’t change your article and gives you credit and links back to your original article, that is okay.
Catching content scrapers in the act
Most likely, you don’t even know where to begin when it comes to figuring out exactly who is stealing your content. There are several websites that will help you to reveal exactly who is doing you wrong.
Copyscape: Copyscape is a search engine in which you can put the full URL of where your content lives and it will let you know if and where there are duplicates. Copyscape has a search function that won’t cost you anything. If you prefer their premium service, it will allow you to check up to 10,000 pages.
WordPress trackbacks: You can see when someone includes your content in their blogs. If they don’t change the article and give you the credit and link to the original article, that is fine. This is not scraping. If the person puts their noame on your article, it can be considered plagiarism.
Webmaster Tools: If you go to Webmaster Tools, click on “Look Under Your site on the Web” and then click on “Links to Your Site,” columns will appear with linked pages. From this, you can see that websites that aren’t social media websites, social bookmarking websites or loyal fans and that link to a large number of your posts is very possibly a content scraper. If you want to verify this, you should go to those particular websites. In order to do that, you should click on any of the domains to be able to see the details of specifically which pages on your websites they are connecting with.
Using Google Alerts: If you don’t happen to post a high volume of content and you aren’t interested in paying attention to who and how many times your business is mentioned, you can create a Google Alert that matches the titles of your posts verbatim. You do this by putting quotation marks around the titles. You can set it up so that they come to you automatically every day.
Once you have established that your content is being scraped: Once you have figured out that your content is being scraped, you can get credit for your posts that have been scraped. If you use WordPress, you can try the RSS footer plugin, which will let you put your text (or at least a portion of it) at the top or bottom of the RSS feed. An attribution line will appear with your title, you as the author and a list of social media channels where people can connect with you. This is an excellent way to counteract the fact that your content is being stolen and still get something for your business. That scenario is a lot better than you just being a sitting duck and scrapers coming along and taking from you whatever they wish to take.
Putting a stop to content scrapers
If other people stealing (or scraping) your content is abhorrent to you, there are a few effective things that you can do to combat it. The first thing that you can do is to communicate with the website that is stealing from you and basically give them a cease and desist order. You can communicate through a contact form on their website, if they have one or you can send an Email, if there is an available Email address. If there is no contact form on the website, you can go to Whois Lookup and find out who owns that particular domain. If you find that it isn’t registered privately, you should at least be able to find an Email address of the administrator. There are ways of finding out the information that you need in order to make contact. Another thing that you can do is to visit the DMCA and click on their “takedown services,” which will allow you to eliminate anyone whom is stealing your content.
Conclusion
Content scraping is highly unethical but it is done all of the time. Not everyone is as adept as others at producing large quantities of content. That is when the content scrapers get creative. If they aren’t capable of writing the content themselves, they will just take what they want from other people. As a genuine, hardworking content writer, you have a right to protect yourself and your business’s interests. You fight back in whatever way you feel you must. Content scraping is very easy to do but it isn’t about it being easy. It is about doing the right thing. There are many available tools to help you determine if your content is being stolen. It behooves you to make full use of them.
We are pleased to provide you with the insightful comments contained herein. For a free assessment of your online presence, let’s have coffee.
Thanks for the great information. I recently deactivated my “subscribe” plug in, not because I don’t want folks to enjoy what is in the content I write. It’s because of what you so accurately described in this article. I learned the “hard way” about this when I first started writing. The rule of thumb I use now, is time consuming but for me effective, if the email address comes back as invalid when I do a “search google” or when the inquiry reveals “spam” black list I just delete the user. On the other hand when I discover great content produced by a writer, that I feel my readers will find value in what is written, I will place the link to that site in the content I am writing “ethics” is not just a word, its a way of life. Thanks for the great read.
Regards
Via LinkedIn Groups
Group: Business Analyst Professional
Discussion: Catching Online Content Scrapers
Yep, That’s the internet all right.
Even if you were to put in all of the right ownership information, you don’t really own online knowledge any longer. Once you put it out there, you’ve largely given it to the world.
The other half of the problem is that written materials are not safe either.
But, rest assured that musical artists are suffering along with us. If you’ve ever copied and/or shared music with others, you’re no better than the scrapers.
By Douglas J. Roach
+Rob Obey via Google+
Hi this post was useful, thanks for the share. I have been getting 3 or 4 trackbacks a day to a new article I published but upon checking I was credited with the work so I assume that’s ok.
+Heather Mitchell via Google+
Yes, and not just your blog they will also steal your videos and other content. It’s very annoying.
+Kori Miller via Google+
How does content scraping differ from content curation?
+Kori Miller with content curation you give attributes to the original writer by including their name, bio and link to the original article. Content scrapping is stealing.
+Andy Nathan via Google+
Content scraping will become less of an issue as Google continues to pummel them!
Via LinkedIn Groups
Group: The NJ Networking Forum
Discussion: Catching Online Content Scrapers
That was a great article! I have never heard of this expression before but can only assume that content is stolen on a regular basis, such as images are. Thank you for posting.
By Linda C. Modica
Via LinkedIn Groups
Group: Informed Ideas For Writers
Discussion: Catching Online Content Scrapers
VERY helpful and insightful article, thanks much! Many years ago I came across a content scraper who took an entire post from a parody site of mine and pasted it into her site without attribution. Since it was still the “Wild West Days” of internet 2.0 I just sent a snarky email pointing out the theft and also pointing out that academics get fired for that sort of thing (she was a professor at a major university). Can’t remember if they removed the content but it made me feel better!
It wasn’t that big of a deal to me because they were just trying to look clever and smart and there was no intent to impinge on my livelihood. What really chaps me these days as my book is about to be released are sites that offer free downloads of my book. I found one of those a few days ago and I immediately sent the link to my publisher and they are sending a cease and desist letter to the site.
We shall see what good it does.
By Michael Corcoran
Thanks for this! Very helpful! I haven’t had it happen much, but in the past when I had a non-self-hosted blog, I did. I appreciate the information on the takedown bit.
I try to make it harder for them to do that, by disallowing cut and paste. Then, they actually have to type the stuff they want to copy- being the lazy SOB’s (or lacking integrity) that they are, it’s usually enough to stop them.
+hubze via Google+
We have actually had this happen. I was doing a search one day and found our blog posts were being scraped and added to another blog via an RSS Feed, so I started adding (hubze.com) like that at the beginning of all our posts. It’s a shame, but it is the Internet. Still like the wild west sometimes. ~David?