East Asia Student

Random Stuff Related to East Asia

Study

The Google collocation test for translation

<img class="alignright size-full” title="Searching for “为了更好地理解” on Google” src=”/img/2011/03/google-collocation.png” alt=”” />“Collocation, collocation, collocation” is a good mantra for translation. Using a dictionary, you can produce language that is technically correct, but isn’t necessarily natural. To avoid this, it’s crucial to get the right collocation for the vocabulary you’re using.

Unfortunately, tracking down this information is often pretty difficult. However, Google (or any other search engine, really) is actually a brilliant tool for figuring out what collocates with what.

Using quote marks ("") and stars (*), you can quickly test how often a particular combination of words is used, at least on the Web.

Translating ‘to better understand’ into Mandarin

Here’s an example. I’m trying to translate the phrase ‘to better understand’ into Mandarin Chinese. First I want to know if “更好地理解” is a good conversion of ‘better understand’. Into Google it goes, and gets [21.7m results](http://www.google.com/search?q=%22%E6%9B%B4%E5%A5%BD%E5%9C%B0%E7%90%86%E8%A7%A3%22 “Search for “更好地理解” on Google”); it seems to be a good way to phrase it.

I then combine it with “为了” to see if my translation of the whole phrase is any good: “为了更好地理解”. [2.9m results this time](http://www.google.com/search?q=%22%E4%B8%BA%E4%BA%86%E6%9B%B4%E5%A5%BD%E5%9C%B0%E7%90%86%E8%A7%A3%22 “Search for “为了更好地理解” on Google”). Obviously the number of results has decreased, as the phrase is now more specific. 2.9m results strongly suggests that the phrase sounds natural in Mandarin.

If Google didn’t return so many results (perhaps it returns less than 1000 results for a phrase), then it’d be time to rethink the translation. The phrase should be broken down into smaller and smaller chunks to see if it’s just one part of it that’s wrong. Perhaps “更好地” isn’t very natural. Google will reveal whether or not this is the case.

Caveats

Obviously, this method isn’t flawless; it’s just a quick trick. As mentioned above, it can only show how language is used on the Web. For most purposes this is fine, but there are situations where that information wouldn’t be so useful. Also, you may pick up a lot of slang that’s very natural to native speakers, but might not look good in a formal article. Google can’t really be used to distinguish the too.

Also, a phrase may be natural but there might not be a good way to search for it in Google. If it’s fairly complex, for example containing an embedded clause, it’s very tricky to search for on Google. The best bet here is to break it down as far as possible, and identify the specific relationship that needs testing, and search for it in the most general form possible.


Contact me: mhg@eastasiastudent.net

Tags