Quoting Mark Davis of the Unicode Consortium fame for anybody interested in helping in i18n standardization:
We can now generate text file that shows the language -> territory / script mappings in CLDR in a more readable form than the XML; see
http://unicode.org/cldr/data/dropbox/language_info.txt
For cross-checking, the end of the file contains the languages / scripts that are not represented. This information is currently in draft state; we'd appreciate it if knowledgeable people looked over the missing items to see what information can be added. For example, here is the first part of the 'unrepresented' scripts list and languages list:
======================================= Scripts Not Represented ======================================= [Bali] Balinese [Batk] Batak [Blis] Blissymbols [Brah] Brahmi [Brai] Braille [Bugi] Buginese [Buhd] Buhid [Cham] Cham [Cirt] Cirth [Copt] Coptic [Cprt] Cypriot ...
======================================= Languages Not Represented ======================================= [aa] Afar [ace] Achinese [ach] Acoli [ada] Adangme [ady] Adyghe [ae] Avestan [afa] Afro-Asiatic (Other) [afh] Afrihili [ak] Akan [akk] Akkadian [ale] Aleut [alg] Algonquian Languages ...
If some of these are in modern use for some languages / territories, please file a new reply on http://www.jtcsv.com/cgibin/locale-bugs?findid=471. Please include both the name and the code of the new relation and whether the language should be starred or not (see the text file). Example format:
Please add: [cop*] Coptic; written in [Copt] Coptic, used in [EG] Egypt and Antarctica [AQ] ..
The data looks to be in a very good shape now. If you have a use for it, you should preferably try the XML version.
You can also send the feedback to me (roozbeh at farsiweb dot info) if you wish someone to review it for you before it gets to the Unicode CLDR committee.
