i18n (Internationalization)


Case Study

Preparing nuclearfree.earth to be "world ready"


joseph.honton@nuclearfree.earth

December 2015



Navigation:  ←  ↑  →  ↓  f  ESC

https://nuclearfree.earth

Hello

HallóIcelandic   HejSwedish   SveikiLatvian   Dia duitIrish   HeloWelsh   BonjourFrench   HolaSpanish   Guten TagGerman   Dobrý denCzech   CześćPolish   SziaHungarian   PozdraviSlovene   PërshëndetjeShqip   Γεια σαςGreek    SannuHausa   NdewoIgbo   HujamboSwahili   ЗдравоMacedonian   ЗдравствуйтеUkrainian   Добры дзеньBelarusian   您好Chinese   여보Korean   今日はJapanese   مرحباArabic   הָלוֹHebrew   अभिवादनNepali   नमस्तेHindi   હેલોGujarati   హలోTelugu   ಹಲೋ Kannada   வணக்கம்Tamil   ਸਤਿ ਸ੍ਰੀ ਅਕਾਲPunjabi    हॅलोMarati    হ্যালোBengali   สวัสดีThai   ສະບາຍດີLao   ជំរាបសួរKhmer   AlôVietnamese   SalamIndonesian   Kia oraMaori

https://nuclearfree.earth
https://nuclearfree.earth

Legacy charsets

7-bit ASCII: English
A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P  Q  R  S  T  U  V  W  X  Y  Z  a  b  c  d  e  b  g  h  i  j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z    (plus 10 numerals, 33 punctuations marks, 33 communication controls)

8-bit ANSI Extended: European
Š  Œ  Ž  š  œ  ž  Ÿ  À  Á  Â  Ã  Ä  Å  Æ  Ç  È  É  Ê  Ë  Ì  Í  Î  Ï  Ð  Ñ  Ò  Ó  Ô  Õ  Ö  Ø  Ù  Ú  Û  Ü  Ý  Þ  ß  à  á  â  ã  ä  å  æ  ç  è  é  ê  ë  ì  í  î  ï  ð  ñ  ò  ó  ô  õ  ö  ø  ù  ú  û  ü  ý  þ  ÿ

ISO-8859-5: Cyrillic
А  Б  В  Г  Д  Е  Ж  З  И  Й  К  Л  М  Н  О  П  Р  С  Т  У  Ф  Х  Ц  Ч  Ш  Щ  Ъ  Ы  Ь  Э  Ю  Я  а  б  в  г  д  е  ж  з  и  й  к  л  м  н  о  п  р  с  т  у  ф  х  ц  ч  ш  щ  ъ  ы  ь  э  ю  я

ISO-8859-7: Greek
Α  Β  Γ  Δ  Ε  Ζ  Η  Θ  Ι  Κ  Λ  Μ  Ν  Ξ  Ο  Π  Ρ  Σ  Τ  Υ  Φ  Χ  Ψ  Ω  α  β  γ  δ  ε  ζ  η  θ  ι  κ  λ  μ  ν  ξ  ο  π  ρ  ς  σ  τ  υ  φ  χ  ψ  ω

Shift-JIS: Japanese
あ  い  う  え  お  か  き  く  け  こ  さ  し  す  せ  そ  た  ち  つ  て  と  な  に  ぬ  ね  の  は  ひ  ふ  へ  ほ  ま  み  む  め  も  ゃ  や  ゅ  ゆ  ょ  よ  ら  り  る  れ  ろ  わ  を  ん  ア  イ  ウ  エ  オ  カ  キ  ク  ケ  コ  サ  シ  ス  セ  ソ  タ  チ  ツ  テ  ト  ナ  ニ  ヌ  ネ  ノ  ハ  ヒ  フ  ヘ  ホ  マ  ミ  ム  メ  モ  ヤ  ユ  ヨ  ラ  リ  ル  レ  ロ  ヮ  ワ  ヲ  ン  (2524 kana and 6355 kanji)

Big5: Chinese
一  乙  丁  七  乃  九  了  二  人  儿  入  八  几  刀  刁  力  匕  十  卜  又  三  下  丈  上  丫  丸  凡  久  么  也  乞  于  亡  兀  刃  勺  千  叉  口  土  士  夕  大  女  子  孑  孓  寸  小  尢  尸  山  川  工  己  已  巳  巾  干  廾  弋  弓  才  丑  丐  不  中  丰  丹  之  尹  予  云  井  互  五  亢  仁  什  仃  仆  仇  仍  今  介  仄  元  允  內  六  兮  公  冗  凶  分  切  刈  勻  勾  勿  化  匹  午  升  卅  卞  厄  友  及  反  壬  天  夫  太  夭  孔  少  尤  尺  屯  巴  幻  廿  弔  引  心  戈  戶  手  扎  支  文  斗  斤  方  日  曰  月  木  欠  止  歹  毋  比  毛  氏  水  火  爪  父  爻  片  牙  牛  犬  王  丙  (4808 common plus 6334 supplemental)

https://nuclearfree.earth

Unicode

ISO-10646 + properties and algorithms

Version 8.0 has 120,737 characters

UTF-8

  • 1 codepoint ➜ 1, 2, 3, or 4 bytes
  • range is 0x0000 to 0x10FFFF
  • The default for XML

UCS-2

  • 1 codepoint ➜ 2 bytes
  • range is 0x0000 to 0xFFFF
  • The default for JavaScript

UTF-16

  • 1 codepoint ➜ 2 or 4 bytes
  • range is 0x0000 to 0x10FFFF

UTF-32

  • 1 codepoint ➜ 4 bytes
  • range is 0x0000 to 0x10FFFF
https://nuclearfree.earth

UTF-8 IETF RFC 3629

Front-end

        HTML <meta charset="UTF-8"> 
CSS @charset "UTF-8";
JavaScript <script src="file.js" charset="UTF-8">

Network

        Apache httpd.conf AddDefaultCharset UTF-8 
HTTP header Content-Type: text/html; charset=UTF-8

Back-end

        MySQL my.cnf
default-character-set = utf8
character-set-server = utf8
collation-server = utf8_general_ci
https://nuclearfree.earth

Language codes

ISO-639-1 two-letter codes

isIcelandic noNorwegian svSwedish fiFinnish daDanish etEstonian lvLatvian ltLithuanian gaIrish glGalician cyWelsh nlDutch deGerman enEnglish eoEsperanto htCreole frFrench caCatalan euBasque esSpanish ptPortuguese itItalian mtMaltese laLatin plPolish csCzech skSlovak huHungarian slSlovene sqAlbanian elGreek trTurkish hrCroatian bsBosnian srSerbian mkMacedonian roRomanian bgBulgarian beBelarusian ukUkrainian ruRussian mnMongolian kaGeorgian hyArmenian azAzerbaijani yiYiddish heHebrew faFarsi urUrdu arArabic soSomali haHausa igIgbo swSwahili afAfrikaans
neNepali paPunjabi hiHindi guGujarati mrMarathi teTelugu knKannada taTamil bnBengali
zhChinese koKorean jaJapanese thThai loPhasa Lao kmKhmer viVietnamese msMalay idIndonesia jvJavanese miMaori

ISO-639-2 three-letter codes

hrmHmong   filFilipino   cebCebuno

https://nuclearfree.earth

Script codes

ISO-15924 four-letter codes

ArabArabic   ArmnArmenian   BengBengali   CyrlCyrillic   DevaDevanagari   GeorGeorgian   GrekGreek   GujrGujarati   GuruGurmukhī   HangHangul   HansHan simplified   HantHan traditional   HebrHebrew   JpanJapanese   KhmrKhmer   KndaKannada
LaooLao   LatnLatin   TamlTamil   ThaiThai

https://nuclearfree.earth

Language tags

IETF RFC 5646

Required for languages with more than one writing system

Bosnianbs-Cyrl  bs-Latn

Serbiansr-Cyrl  sr-Latn

Azerbaijaniaz-Latn  az-Cyrl  az-Arab

Chinese: zh-Hans  zh-Hant

Should be omitted when unambiguous

Englishen-Latn

Russianru-Cyrl

Greekel-Grek

Hebrewhe-Hebr

https://nuclearfree.earth

"Private use" tags

IETF RFC 5646 allows language tags to be extended

Here, writing mode is indicated using extended keywords:

Sero Garo Tate Yoko Zong Heng

ko-x-Sero한국어 vertical ko-x-Garo한국어 horizontal ja-x-Tate日本語 vertical ja-x-Yoko日本語 horizontal
zh-Hans-x-Zong简体中文 vertical zh-Hans-x-Heng简体中文 horizontal zh-Hant-x-Zong繁體中文 vertical zh-Hant-x-Heng繁體中文 horizontal
https://nuclearfree.earth

Country codes

ISO-3166 two-letter codes

English

US

IN

PK

NG

GB

CA

AU

USUnited States INIndia PKPakistan NGNigeria GBGreat Britain CACanada AUAustralia

Portuguese

PT

BR

PTPortugal BRBrazil

RFC 5646 says "region" codes can be omitted when not needed, leaving just ...

en  pt  pt-BR

https://nuclearfree.earth

DNS case insensitivity IETF RFC 4343

af.nuclearfree.earth.           IN     A        104.131.149.101
; Afrikaans

az-latn.nuclearfree.earth. IN A 104.131.149.101
; Azərbaycanca (Azerbaijani latin)
az-cyrl.nuclearfree.earth. IN A 104.131.149.101
; Азәрбајҹан дили (Azerbaijani cyrillic)
az-arab.nuclearfree.earth. IN A 104.131.149.101
; آذریجه (Azerbaijani arabic)

ja-yoko.nuclearfree.earth. IN A 104.131.149.101
; 横書き日本語 (Japanese yokogaki) horizontal
ja-tate.nuclearfree.earth. IN A 104.131.149.101
; 縦書き日本語 (Japanese tategaki) vertical

pt.nuclearfree.earth. IN A 104.131.149.101
; Português (Portuguese)
pt-br.nuclearfree.earth. IN A 104.131.149.101
; Português (Brasilian Portuguese)
https://nuclearfree.earth

HTTP content negotiation RFC 7231

1. Browser request

            GET / HTTP/1.1
Host: nuclearfree.earth
Accept-Language: de,en;q=0.7,en-us;q=0.3

2. Server response

            Content-Language: de
Location: https://de.nuclearfree.earth/
Status: HTTP/1.1 303 See Other

3. Server redirect request

            GET / HTTP/1.1
Host: de.nuclearfree.earth
Accept-Language: de,en;q=0.7,en-us;q=0.3

4. Server response

            Content-Language: de
Location: https://de.nuclearfree.earth/
Status: HTTP/1.1 200 OK
https://nuclearfree.earth

Writing directions

Modern languages with right-to-left reading order:

Arabic, Azerbaijani, Farsi, Urdu, Yiddish and Hebrew

        <!DOCTYPE html>
<html lang='he' dir='RTL'>
...
</html>

All others can omit the dir attribute

        <!DOCTYPE html>
<html lang='en'>
...
</html>
https://nuclearfree.earth

Unicode bidirectional algorithm


English text

And remember, this occurred 25 years after Chernobyl, with ample time to learn from previous mistakes.


Arabic with <p dir='RTL'> (correct)

وتذكر، وهذا حدث بعد 25 سنة من Chernobyl ، مع متسع من الوقت للتعلم من الأخطاء السابقة


Arabic with <p dir='LTR'> (wrong)

وتذكر، وهذا حدث بعد 25 سنة من Chernobyl ، مع متسع من الوقت للتعلم من الأخطاء السابقة


https://nuclearfree.earth

Writing modes

Modern languages with vertical right-to-left reading order:

Chinese, Japanese, Korean

        <!DOCTYPE html>
<html lang='ja-x-Tate' class='cs_tate'>
<style>
.cs_tate body {
-webkit-writing-mode: vertical-rl;
-moz-writing-mode: vertical-rl;
-ms-writing-mode: tb-rl;
writing-mode: vertical-rl; }
</style>
...
</html>

All others use { writing-mode: horizontal-tb; } which can be omitted

https://nuclearfree.earth

Rubies: Japanese furigana


English text

And remember, this occurred 25 years after Chernobyl, with ample time to learn from previous mistakes.


Japanese text

そして、覚えて、これは、以前の過ちから学ぶために十分な時間で、25年 Chernobylチェルノブイリ 後に発生した。

https://nuclearfree.earth

Languages without spaces: line wrapping

Chinese, Japanese, Korean

These three can break between almost any two characters

Omit CSS { word-break: normal; word-wrap: normal; }


Lao

Written as one continuous script

Use CSS { word-break: break-all; }

https://nuclearfree.earth

Numbered lists

Spotty support for numbered lists other than 1, 2, 3 ...

        list-style-type: hebrew;
list-style-type: armenian;
list-style-type: cjk-ideographic;
list-style-type: arabic-indic;
list-style-type: lao;
list-style-type: khmer;
list-style-type: georgian;
list-style-type: thai;
list-style-type: persian;
https://nuclearfree.earth

Dates

General rule: use the Gregorian calendar

Use digits for the day and year; spell out the month

'Afrikaans', 'Januarie',   'Februarie',   'Maart','April','Mei','Junie','Julie','Augustus','September','Oktober','November','Desember'),
'Cebuano', 'Enero', 'Pebrero', 'Marso','Abril','Mayo','Hunyo','Hulyo','Agosto','Septiyembre','Oktubre','November','December'),
'Greek', 'Iανουάριος', 'Φεβρουάριος', 'Μάρτιος','Aρίλιος','Μάιος','Iούνιος','Iούλιος','Αύγουστος','Σεπτέμβριος','Oκτώβριος','Νοέμβριος','Δεκέμβριος'),
'Galician', 'Xaneiro', 'Febreiro', 'Marzo','Abril','Maio','Xuño','Xullo','Agosto','Setembro','Outubro','Novembro','Decembro'),
'Gujarati', 'જાન્યુઆરી', 'ફેબ્રુઆરી', 'માર્ચ','એપ્રિલ','મે','જૂન','જુલાઇ','ઓગસ્ટ','સપ્ટેમ્બર','ઓકટોબર','નવેમ્બર','ડિસેમ્બર'),
'Hindi', 'जनवरी', ''फ़रवरी', 'मार्च','अप्रैल','मई','जून','जुलाई','अगस्त','सितम्बर','अक्टूबर','नवम्बर','दिसम्बर'),
'Italian', 'Gennaio', 'Febbraio', 'Marzo','Aprile','Maggio','Giugno','Luglio','Agosto','Settembre','Ottobre','Novembre','Dicembre'),
'Japanese', '1月', '2月', '3月','4月','5月','6月','7月','8月','9月','10月','11月','12月'),
'Georgian', 'იანვარი', 'თებერვალი', 'მარტი','აპრილი','მაისი','ივნისი','ივლისი','აგვისტო','სექტემბერი','ოქტომბერი','ნოემბერი','დეკემბერი'),
'Korean', '일월', '이월', '삼월','사월','오월','6월','7월','8월','9월','10월','11월','12월'),
'Russian', 'Январь', 'февраль', 'март','апрель','май','июнь','июль','август','сентябрь','октябрь','ноябрь','декабрь'),
'Vietnamese','Tháng Một', 'Tháng Hai', 'Tháng Ba','Tháng Tư','Tháng Năm','Tháng Sáu','Tháng Bảy','Tháng Tám','Tháng Chín','Tháng Mười','Tháng Mười Một','Tháng Mười Hai'),
'Chinese', '一月', '二月', '三月','四月','五月','六月','七月','八月','九月','十月','十一月','十二月'),

Modern alternatives: Luach, Hijiri, Saka, Yin

https://nuclearfree.earth

Responsive design

Many of the world's newest Internet users
only have cellphones


        <!DOCTYPE html>
<html lang='ig'>
<head>
<meta charset='UTF-8' />
<meta name='viewport' content='width=device-width' />
...
https://nuclearfree.earth

Specifications

https://nuclearfree.earth

Can you help to translate the Nuclear Free Pledge into your native language?

github.com/joehonton/nuclearfree-i18n


joseph.honton@nuclearfree.earth





- 30 -

https://nuclearfree.earth