2 0 1 3 年 8 月 1 6 日

Convenient mark-up for CJK languages in WordPress (HTML + CSS)


As you might expect, I have a lot of content in CJK languages (Chinese, Japanese and Korean) here on this site. Currently 99% of it is modern and classical Chinese, but the balance will slowly shift over time. My situation is slightly unusual, though, in that I’m always mixing various languages together on one page.

On a basic level, this doesn’t matter at all. You can easily have Latin letters, 漢字, ひらがな, カタカナ and 한글 mixed up on a web page without any issues. You just need unicode encoding, and the users need to have fonts that can display those characters. No particular mark-up is necessary to display CJK languages and English together in this way.

However, it’s often nice to use different fonts and styles for CJK languages and English. This is particularly true for me here – usually I want to make the CJK language a little bigger than the English, as this is a site in English aimed at people interested in CJK languages. It’s also nice to use appropriate fonts for each CJK language. This is quite easy to do and I thought I’d write it up here.

HTML for CJK languages

First up, some HTML you might want in your page. What I used to do for CJK languages was wrap them in a <span> tag with a class set to “hanzi” or whatever, then style that class of span in CSS with some different fonts and a relative font size.

Recently I’ve realised that you can improve on that a little bit by using the lang attribute as well. So for Chinese, I think it’s best to use a span that looks like this:

<span class=”chinese” lang=”zh”>我喜欢汉字。</span>

It is a little bit of a pain to wrap all the Chinese on your page with span tags like that, but it’s worth it. Also, there’s a way to make it a lot easier to do in WordPress (see below).

Similarly, you might wrap Japanese and Korean as follows:

<span class=”japanese” lang=”ja”>私は平仮名が好きです。</span>

<span class=”korean” lang=”kr”>나는 그 한글 좋아한다.</span>

Leaving it at that won’t change anything as far as the user is concerned. Their browser will know that the text is in that language, and so will search engine spiders, but other than that there’s no real difference. That’s where the CSS styling comes in.

CSS for CJK languages

Once you’ve got your CJK text wrapped in those spans, it’s very easy to style the separate languages using CSS. You can specify that those languages should all be displayed slightly larger than the rest of the page, and also specify appropriate fonts for each language. Here’s a nice list of CJK fonts you could specify.

The first thing you might want to add to your CSS is this:

:lang(zh), :lang(ja), :lang(kr), .chinese, .japanese, .korean {
font-size: 123%;
}

You might be wondering what the point of adding the classes and the lang attributes is. It’s just to cover any older browsers that don’t support the lang attribute – they’ll hopefully still pick up the span class and apply the styling. Note that it’s important to keep these together in one group in the CSS, otherwise the class and lang attribute would get double selected and the font size would be set to 151% (123% twice).

The CSS above makes your CJK text 23% bigger so it should stand out a little and be easier to read for people who are less familiar with it. I think it’s particularly nice to do this for 漢字, which can get mashed together in smaller font sizes. The next thing to do is select some good fonts for these languages. Here’s what I use:

:lang(zh), .chinese {
font-family: KaiTi, STKaiti, AR PL Ukai HK, UKai, sans;
}

:lang(ja), .japanese {
font-family: TakaoPGothic, TakaoGothic, Droid Sans, MS PGothic, sans;
}

:lang(kr), .korean {
font-family: NanumGothic, AppleMyungjo, Batang, NanumMyeongjo, sans;
}

Those are just some general, free fonts for each language that the user will hopefully have installed. If not, note that there’s a fall-back “sans” language family at the end. This is a good last resort because “sans” is very general – it lets the system select the font it thinks is best so it’s most likely to contain these characters.

I also have two styles in my CSS file are for readings (e.g. pinyin) and glosses (the literal glosses I add to Classical Chinese translations).

Easily tag CJK languages in WordPress

Finally, there’s a nice little plugin for WordPress that makes it a lot easier and faster to tag the CJK language sections in your page. It’s called AddQuicktag, and it adds options to the WordPress editor interface to instantly tag the selected text with whatever you want.

AddQuicktagAs you can see, you just set up the tags you want in the options and then you can easily add them from the WordPress editor interface:

Adding tags for CJK languages with AddQuicktag in WordPress

(I’ve uploaded my AddQuicktag config file for my own future reference, feel free to use it if you like.)

Then all you’ve got to do is highlight CJK text and select the language you want it tagged as. Then you can easily make the CJK languages in your WordPress posts look a bit nicer. Compare these (hopefully your system has some fonts for them):

汉字 ひらがな カタカナ 한글

汉字 ひらがな カタカナ 한글

Also these:

我喜欢汉字。 vs 我喜欢汉字。

私は平仮名が好きです。 vs 私は平仮名が好きです。

나는 그 한글 좋아한다. vs 나는 그 한글 좋아한다.