My content is being scraped by Thow Chiang Chua's website

I noticed more than a year ago that the site learn-mandarin.math-chinese-tutor.com was scraping content from East Asia Student, and emailed the registered owner Thow Chiang Chua asking him to remove the copied content and stop scraping in future. He removed the content I specifically mentioned but has continued to copy other material since then.

I’ve now noticed that the site is also copying content from Lingomi, and maybe other sites too. If you write about learning Mandarin I’d suggest you check Google to see if this guy is scraping your content. Another good measure to take is to install the Wordpress SEO plugin, as it lets you automatically insert canonical links back to your content in the RSS feed, which can help Google identify the original source of scraped material.

I’m considering getting rid of the Creative Commons license for this site, which I’ve modified slightly to require “clear and unambiguous attribution”. I intended the license to let people quote me and modify my content easily, but it’s too easy to use it as an excuse to just copy content wholesale in this way. I’ll have to look into other licenses (including other Creative Commons options) or just write a simple one myself.

If anyone has any suggestions for ways to deal with this content scraper please share them in the comments. I’m intending to figure out who his webhost is and contact them, plus report his Google Adsense account. In an odd twist I got adverts for scraping software on his site last time I checked, as you can see in the screenshot above.