News Content Scraping and Attribution


Last week I recieved a “pingback” on this post. I was curious about what may have been discussed so I navigated to the site and discovered they had simply scraped the entire post, representing it as their own. Content scraping is nothing new of course and happens all the time on large, popular sites, but this was the first time this happened to me, to something I wrote. I was a little taken aback by how blantant it was.

I reached out the offending site’s administrator to request that he/she conform to the terms of the Creative Commons Attribution 3.0 license by providing attribution, or remove the content. I had little hope that they would comply, much less respond, and had resigned myself to hoping that Google could sort out which content source would be authoritative when it came to search queries. I was pleasantly surprised then when roughly a day later I received an e-mail from the site’s administrator admitting to copying the content (“because it’s beautiful”), but more importantly agreeing to attribute the content to me. We agreed on the following notice, which was placed at the bottom of the (copied) post: “This article was originally published at and is reprinted here with the author’s permission.”

All of the content I post here at is provided under the Creative Commons Attribution 3.0 license. What this means is that you are free to copy, distribute and transmit this content, to adapt it to something you may be working on, even make commercial use of the content. All I ask is that you attribute the original content to me in some way.


Leave a Reply