I noticed more than a year ago that the site learn-mandarin.math-chinese-tutor.com was scraping content from East Asia Student, and emailed the registered owner Thow Chiang Chua asking him to remove the copied content and stop scraping in future. He removed the content I specifically mentioned but has continued to copy other material since then.

I’ve now noticed that the site is also copying content from Lingomi, and maybe other sites too. If you write about learning Mandarin I’d suggest you check Google to see if this guy is scraping your content. Another good measure to take is to install the WordPress SEO plugin, as it lets you automatically insert canonical links back to your content in the RSS feed, which can help Google identify the original source of scraped material.
I’m considering getting rid of the Creative Commons license for this site, which I’ve modified slightly to require “clear and unambiguous attribution”. I intended the license to let people quote me and modify my content easily, but it’s too easy to use it as an excuse to just copy content wholesale in this way. I’ll have to look into other licenses (including other Creative Commons options) or just write a simple one myself.
If anyone has any suggestions for ways to deal with this content scraper please share them in the comments. I’m intending to figure out who his webhost is and contact them, plus report his Google Adsense account. In an odd twist I got adverts for scraping software on his site last time I checked, as you can see in the screenshot above.
If you found this useful, consider helping me out in return.



I’m sorry to hear that you are having issues with people not respecting your authorship and flat out copying your data. In addition to filing DMCA requests with Google and their web host, try his DNS provider as well. Also check out my company Distil. We can help prevent him from scraping your content in the future. If your interested I would be happy to set you up with a free account to help.
Why is that a problem? There are lots of sites that copy my sites. It usually doesn’t cause a problem. The problem would be if Google thought that site was the original and your content the duplicate. Is that happening?
What bothers me is that he’s profiting by putting adverts with content he didn’t produce. He’s reducing the quality of the Web. Even if it’s not influencing me directly, I still think it’s worth chasing up on principle.
I noticed he copied my interview that was published on Lingomi. I hope a way can be found that he couldn’t continue scraping others’ blog posts. Making money (he has ads there) with stolen articles is not okay.