So along came Unicode to the rescue. Unicode provides a framework for all alphabets of the world to be represented on computers. UTF-8 is the most popular Unicode implementation because it preserves backwards compatibility with ASCII. Which is all fun to know, but what good is it when you're looking at piles of computer files that need to converted from ISO-8859-1 (Latin-1, Western European) into whatever encoding you prefer? Naturally, there are a number of utilities just for this task.
GNU Recode supports over 150 character sets, and converts just about anything to anything. For example, there are still users of legacy Linux systems that still run ISO-8859-1. Recode will convert these to nice modern UTF-8, like this:
$ recode UTF-8 recode-test.txt
Check out the GNU Recode Manual for instructions.
That's fast and easy enough, but there's one more job- converting the filename. The convmv command is just the tool for this job. This example converts all the ISO-8859-1 filenames in the files/ directory to UTF-8:
$ convmv -f iso-8859-1 -t utf8 --notest files/
convmv run without the --notest option does a dry-run without changing anything, which is probably a wise thing to do first.
ResourcesThe subject of character encoding is huge and bewildering, especially for us dinosaurs from the typewriter era. By golly, when you hit a typewriter key it came out the same way every single time. Wikipedia has a number of excellent introductory articles:
This article was first published on LinuxPlanet.com.