Localization lab specs
From Inforail
Contents
Keywords
l10n, i18n, localization, internationalization, Unicode, ASCII, BOM, multibyte, TCHAR, widechar, translation, GUI
Objectives
Develop a mechanism that enables a program to display its interface in multiple languages, depending on how it is configured.
A GUI is not mandatory, it is sufficient to develop a command line program that will print some strings in one language or another (depending on the command line arguments). Ex: myProg.exe /FR will display French strings, while myProg.exe /EN will display English strings.
Warnings:
- "Solutions" such as this one are not solutions: if (param == "EN") {print "English";} else if (param == "FR") {print "French";}. What happens here is another form of hard-coding, avoid it like the plague.
- You'll use this mechanism in your future assignments.
Requirements
- The program must use Unicode for all string-related procedures
- The strings for each language must be loaded from a file
- The program must display the paths to the special folders in the system on which it is ran (for the currently logged on user):
- Program Files
- My Documents
- On POSIX systems, look for the home directory; if you're running a fancy desktop environment, get the paths to the Photos and Music folder
- The program must display the current date and time using the system's regional settings formatting
Typical problems with localization
- Hard-coding strings into the code instead of loading them from an external resource
- Using ASCII instead of Unicode - this makes the program unable to display special characters if the operating system is not configured accordingly
- Special folder names, ex: "Documents and Settings", "Program Files" will have different names on different versions of Windows. Instead of hard-coding the path, use a function that determines the path to this folder in the current locale: SHGetSpecialFolderLocation
- String parsing functions can behave incorrectly when dealing with multibyte characters, the first byte of which is NULL
- Hard-coding symbols or formats into your parsing functions, ex:
- ',' as a separator in CSV files is sometimes replaced with ';' in other cultures
- Date and time formats
- Calendars - some cultures use different ones
- Units - some cultures use the metric system, but not all of them
- Hard-coding coordinates of widgets. Not only that this may make the program look ugly on screens with a different resolution, but it may look entirely unnatural for people from other cultures (see 'Arabic interfaces' in the References)
- Using exact sizes when rendering windows, widgets or texts may result in strings in a different language not fitting into the given space
- If strings are not managed in "one place to rule them all", there is a great chance that inconsistency will arise, ex: http://habrahabr.ru/blogs/ui_design_and_usability/70762/
- Hardcoded order of elements in a string.
References
- Babel - a Python localization library
- Pootle - Python-powered online translation tool (crowdsourcing)
- Transifex - A tool similar to Pootle, free to use for open-source projects
- Virtaal - offline translation tool
- “Writing Unicode-aware Applications in Python”
- Python Unicode howto
- Globalization step by step, a comprehensive guide from Microsoft that covers the pitfalls of localizing software
- Examples of Arabic user interfaces, note how everything is mirrored
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets, by Joel Spolsky
- BOM, the Byte order mark
- "Pragmatic Unicode, or, How do I stop the pain?"