Case Study
Preparing nuclearfree.earth to be "world ready"
joseph.honton@nuclearfree.earth
December 2015
Navigation: ← ↑ → ↓ f ESC
7-bit ASCII: English
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e b g h i j k l m n o p q r s t u v w x y z (plus 10 numerals, 33 punctuations marks, 33 communication controls)
8-bit ANSI Extended: European
Š Œ Ž š œ ž Ÿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý þ ÿ
ISO-8859-5: Cyrillic
А Б В Г Д Е Ж З И Й К Л М Н О П Р С Т У Ф Х Ц Ч Ш Щ Ъ Ы Ь Э Ю Я а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я
ISO-8859-7: Greek
Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω
Shift-JIS: Japanese
あ い う え お か き く け こ さ し す せ そ た ち つ て と な に ぬ ね の は ひ ふ へ ほ ま み む め も ゃ や ゅ ゆ ょ よ ら り る れ ろ わ を ん ア イ ウ エ オ カ キ ク ケ コ サ シ ス セ ソ タ チ ツ テ ト ナ ニ ヌ ネ ノ ハ ヒ フ ヘ ホ マ ミ ム メ モ ヤ ユ ヨ ラ リ ル レ ロ ヮ ワ ヲ ン (2524 kana and 6355 kanji)
Big5: Chinese
一 乙 丁 七 乃 九 了 二 人 儿 入 八 几 刀 刁 力 匕 十 卜 又 三 下 丈 上 丫 丸 凡 久 么 也 乞 于 亡 兀 刃 勺 千 叉 口 土 士 夕 大 女 子 孑 孓 寸 小 尢 尸 山 川 工 己 已 巳 巾 干 廾 弋 弓 才 丑 丐 不 中 丰 丹 之 尹 予 云 井 互 五 亢 仁 什 仃 仆 仇 仍 今 介 仄 元 允 內 六 兮 公 冗 凶 分 切 刈 勻 勾 勿 化 匹 午 升 卅 卞 厄 友 及 反 壬 天 夫 太 夭 孔 少 尤 尺 屯 巴 幻 廿 弔 引 心 戈 戶 手 扎 支 文 斗 斤 方 日 曰 月 木 欠 止 歹 毋 比 毛 氏 水 火 爪 父 爻 片 牙 牛 犬 王 丙 (4808 common plus 6334 supplemental)
ISO-10646 + properties and algorithms
Version 8.0 has 120,737 characters
UTF-8
|
UCS-2
|
UTF-16
|
UTF-32
|
Front-end
HTML <meta charset="UTF-8">
CSS @charset "UTF-8";
JavaScript <script src="file.js" charset="UTF-8">
Network
Apache httpd.conf AddDefaultCharset UTF-8
HTTP header Content-Type: text/html; charset=UTF-8
Back-end
MySQL my.cnf
default-character-set = utf8
character-set-server = utf8
collation-server = utf8_general_ci
ISO-639-1 two-letter codes
ISO-639-2 three-letter codes
ISO-15924 four-letter codes
IETF RFC 5646
Required for languages with more than one writing system
Bosnianbs-Cyrl bs-Latn
Serbiansr-Cyrl sr-Latn
Azerbaijaniaz-Latn az-Cyrl az-Arab
Chinese: zh-Hans zh-Hant
Should be omitted when unambiguous
Englishen-Latn
Russianru-Cyrl
Greekel-Grek
Hebrewhe-Hebr
IETF RFC 5646 allows language tags to be extended
Here, writing mode is indicated using extended keywords:
ISO-3166 two-letter codes
English
Portuguese
RFC 5646 says "region" codes can be omitted when not needed, leaving just ...
en pt pt-BR
af.nuclearfree.earth. IN A 104.131.149.101
; Afrikaans
az-latn.nuclearfree.earth. IN A 104.131.149.101
; Azərbaycanca (Azerbaijani latin)
az-cyrl.nuclearfree.earth. IN A 104.131.149.101
; Азәрбајҹан дили (Azerbaijani cyrillic)
az-arab.nuclearfree.earth. IN A 104.131.149.101
; آذریجه (Azerbaijani arabic)
ja-yoko.nuclearfree.earth. IN A 104.131.149.101
; 横書き日本語 (Japanese yokogaki) horizontal
ja-tate.nuclearfree.earth. IN A 104.131.149.101
; 縦書き日本語 (Japanese tategaki) vertical
pt.nuclearfree.earth. IN A 104.131.149.101
; Português (Portuguese)
pt-br.nuclearfree.earth. IN A 104.131.149.101
; Português (Brasilian Portuguese)
1. Browser request
GET / HTTP/1.1
Host: nuclearfree.earth
Accept-Language: de,en;q=0.7,en-us;q=0.3
2. Server response
Content-Language: de
Location: https://de.nuclearfree.earth/
Status: HTTP/1.1 303 See Other
3. Server redirect request
GET / HTTP/1.1
Host: de.nuclearfree.earth
Accept-Language: de,en;q=0.7,en-us;q=0.3
4. Server response
Content-Language: de
Location: https://de.nuclearfree.earth/
Status: HTTP/1.1 200 OK
Modern languages with right-to-left reading order:
Arabic, Azerbaijani, Farsi, Urdu, Yiddish and Hebrew
<!DOCTYPE html>
<html lang='he' dir='RTL'>
...
</html>
All others can omit the dir attribute
<!DOCTYPE html>
<html lang='en'>
...
</html>
English text
And remember, this occurred 25 years after Chernobyl, with ample time to learn from previous mistakes.
Arabic with <p dir='RTL'> (correct)
وتذكر، وهذا حدث بعد 25 سنة من Chernobyl ، مع متسع من الوقت للتعلم من الأخطاء السابقة
Arabic with <p dir='LTR'> (wrong)
وتذكر، وهذا حدث بعد 25 سنة من Chernobyl ، مع متسع من الوقت للتعلم من الأخطاء السابقة
Modern languages with vertical right-to-left reading order:
Chinese, Japanese, Korean
<!DOCTYPE html>
<html lang='ja-x-Tate' class='cs_tate'>
<style>
.cs_tate body {
-webkit-writing-mode: vertical-rl;
-moz-writing-mode: vertical-rl;
-ms-writing-mode: tb-rl;
writing-mode: vertical-rl; }
</style>
...
</html>
All others use { writing-mode: horizontal-tb; }
which can be omitted
English text
And remember, this occurred 25 years after Chernobyl, with ample time to learn from previous mistakes.
Japanese text
そして、覚えて、これは、以前の過ちから学ぶために十分な時間で、25年
Chinese, Japanese, Korean
These three can break between almost any two characters
Omit CSS {
word-break: normal; word-wrap: normal; }
Lao
Written as one continuous script
Use CSS { word-break: break-all; }
Spotty support for numbered lists other than 1, 2, 3 ...
list-style-type: hebrew;
list-style-type: armenian;
list-style-type: cjk-ideographic;
list-style-type: arabic-indic;
list-style-type: lao;
list-style-type: khmer;
list-style-type: georgian;
list-style-type: thai;
list-style-type: persian;
General rule: use the Gregorian calendar
Use digits for the day and year; spell out the month
'Afrikaans', 'Januarie', 'Februarie', 'Maart','April','Mei','Junie','Julie','Augustus','September','Oktober','November','Desember'),
'Cebuano', 'Enero', 'Pebrero', 'Marso','Abril','Mayo','Hunyo','Hulyo','Agosto','Septiyembre','Oktubre','November','December'),
'Greek', 'Iανουάριος', 'Φεβρουάριος', 'Μάρτιος','Aρίλιος','Μάιος','Iούνιος','Iούλιος','Αύγουστος','Σεπτέμβριος','Oκτώβριος','Νοέμβριος','Δεκέμβριος'),
'Galician', 'Xaneiro', 'Febreiro', 'Marzo','Abril','Maio','Xuño','Xullo','Agosto','Setembro','Outubro','Novembro','Decembro'),
'Gujarati', 'જાન્યુઆરી', 'ફેબ્રુઆરી', 'માર્ચ','એપ્રિલ','મે','જૂન','જુલાઇ','ઓગસ્ટ','સપ્ટેમ્બર','ઓકટોબર','નવેમ્બર','ડિસેમ્બર'),
'Hindi', 'जनवरी', ''फ़रवरी', 'मार्च','अप्रैल','मई','जून','जुलाई','अगस्त','सितम्बर','अक्टूबर','नवम्बर','दिसम्बर'),
'Italian', 'Gennaio', 'Febbraio', 'Marzo','Aprile','Maggio','Giugno','Luglio','Agosto','Settembre','Ottobre','Novembre','Dicembre'),
'Japanese', '1月', '2月', '3月','4月','5月','6月','7月','8月','9月','10月','11月','12月'),
'Georgian', 'იანვარი', 'თებერვალი', 'მარტი','აპრილი','მაისი','ივნისი','ივლისი','აგვისტო','სექტემბერი','ოქტომბერი','ნოემბერი','დეკემბერი'),
'Korean', '일월', '이월', '삼월','사월','오월','6월','7월','8월','9월','10월','11월','12월'),
'Russian', 'Январь', 'февраль', 'март','апрель','май','июнь','июль','август','сентябрь','октябрь','ноябрь','декабрь'),
'Vietnamese','Tháng Một', 'Tháng Hai', 'Tháng Ba','Tháng Tư','Tháng Năm','Tháng Sáu','Tháng Bảy','Tháng Tám','Tháng Chín','Tháng Mười','Tháng Mười Một','Tháng Mười Hai'),
'Chinese', '一月', '二月', '三月','四月','五月','六月','七月','八月','九月','十月','十一月','十二月'),
Modern alternatives: Luach, Hijiri, Saka, Yin
Many of the world's newest Internet users
only have cellphones
<!DOCTYPE html>
<html lang='ig'>
<head>
<meta charset='UTF-8' />
<meta name='viewport' content='width=device-width' />
...
IETF RFC 3629 Unicode transformation format
ISO 639 two-letter and three-letter language codes
ISO 15924 four-letter script codes
IETF RFC 3282 content language headers
ISO 3166 two-letter country codes
IETF RFC 4343 DNS case insensitivity
IETF RFC 7231 § 5.3.5 content negotiation
Can you help to translate the Nuclear Free Pledge into your native language?
github.com/joehonton/nuclearfree-i18n
joseph.honton@nuclearfree.earth
- 30 -