Useful terminal commands for language students
This is a list of Linux terminal commands that I’ve come across that are often useful for studying languages.
In most cases they’ve been provided by helpful users at Ubuntu Forums.
Remove duplicate lines from a file
This command is pretty useful in general, but is particularly handy for quickly removing duplicate entries from vocabulary lists.
awk ‘!x[$0]++’ file > output
In this command, file should be replaced with the input file, and output with the name of the new file which will have duplicate lines removed.
Remove all ASCII characters from a file
Removing non-ASCII characters is quite a common requirement, but this command does the opposite. It’s useful if you’ve got a file with a mix of Latin and CJK characters, and you want to leave only the CJK ones (e.g. delete everything except Chinese characters in a file).
sed -e ‘s/[0-9a-zA-Z]//g’ -e ‘s/[[:punct:]]//g’ -e ‘/^$/d’
file
>
output
Again, file is the input file and output is the file that will contain only CJK characters. Note that this isn’t perfect for isolating CJK characters - it will leave pretty much any unusual characters in the output file.
Get all lines with non-ASCII characters from a file
This command will go through a file and output any lines containing non-ASCII characters (such as Chinese, Japanese or Korean). This could be useful if you want to keep ASCII characters that appear on the same line as CJK ones, rather than completely removing everything except CJK (as the command above does).
grep -P “[x80-xFF]”
*file*
>
*output*
Again, replace file with the input file, and output with the desired output file name.
Remove all lines containing X from a file
This command is useful for stripping out lines from a file that contain a specific term. I use it most often for removing lines marked with the word ‘simplified’ in vocab lists; it works well when you have tagged lists like that.
sed -e “/
text
/d”
file
>
output
Convert a .MO file to a readable .PO one (plain text)
A lot of software translations use MO and PO files to store a translated interface. The MO is the machine-readable one and is fairly useless if you want to get at the translations yourself. Use this command to convert an MO file into a plain text PO file:
msgunfmt
*file*
-o
output
You’ll often want to get at the data in multiple .mo files at once, and this is easy to do by putting *.mo as the file in the command above. The command will still output the text into a single file.
Find the location of translation (.mo) files
To get hold of .mo files, you can either download language packs (e.g. language-pack-zh) and go through them with an archive browser, or you can search for the .mo files already in use in your installation. You can use the following command to do this:
dpkg -L
language-pack
Replace language-pack with whatever language files you’re looking for, and the command will return the locations of all the .mo files for that language pack in your installation.
If you notice a mistake, or have another useful command to add, please share it in the comments.