The following changes should be interesting to the Persian and Iranianist computing community (based on an original post to the Persian Computing list):
- Sixteen symbols have been encoded in the Arabic
Presentations
Forms-A block for use in pedagogical materials and documents
discussing the features of the Arabic script.
Please note that these are not combining characters but stand-alone symbols. These should only be used to display the dots and diacritics in isolation, and not for making new letters. For example, one can *not* use a Seen and add U+FBB6 Arabic Symbol Three dots Above to get a Sheen. If you type that, you will get a Seen followed by three dots. According to the standard, "These are spacing symbols representing Arabic letter diacritics considered in isolation, as for example as in discussions about the Arabic script."
Updated Unicode chart:
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-FB50.pdf - The Qur'anic character U+06DE ARABIC START OF RUB EL
HIZB has had
its glyph and properties changed.
For some unknown historical reason, the character was mistakenly classified as a combining character instead of just a symbol, which made it unusable. The character is now a normal spacing symbol and is usable as originally intended.
Background document for the change (which I authored):
http://unicode.org/review/pr-171-rub-el-hizb.pdf - Two characters have been encoded in the Arabic script
block for use
in Kashmiri, one of the official languages of Jammu and
Kashmir, the
Indian-administered part of Kashmir. The language is written
in both
Arabic and Devanagari, along religious lines of Muslims and
Hindus.
The two new characters are U+0620 Arabic Letter Kashmiri Yeh and U+065F Arabic Wavy Hamza Below. Also, U+0673 Arabic Letter Alef With Wavy Hamza Below has been deprecated (the first Arabic script character to ever get deprecated in Unicode), and the character sequence <U+0627, U+065F> should be used instead of it.
Unicode proposal (I'm a coauthor):
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3673.pdfUpdated Unicode chart:
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-0600.pdf - Mandaic has been encoded. Mandaic is the script used by the
Mandaeans (mostly living in southern Iraq and southwestern Iran,
especially Khouzestan) for liturgical purposes. This the
community
that some people believe the Qur'an refers to as Sabians,
the third
member group of the People of the Book (next to Jews and
Christians).
Michael Everson's proposal:
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3485.pdfUnicode chart:
http://www.unicode.org/charts/PDF/U0840.pdf - Brahmi is also encoded, which is of use to Iranianists
(some Iranian
languages like Khotanese have been written in Brahmi).
The most detailed proposal (although not the final one that got encoded):
http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3491.pdfFinal Unicode chart:
http://www.unicode.org/charts/PDF/U11000.pdf - Unicode Standard Annex #9, The Unicode Bidirectional
Algorithm, has
been updated to include more information and some
clarifications. Note
that the algorithm has not changed. The update just explains the
original intentions in more details. For the list of
informational
changes to the text, see the following link (Behdad Esfahbod
and I
have contributed to this and previous versions of the
standard annex):
http://www.unicode.org/reports/tr9/tr9-23.html#Modifications - A new data file has been added to the Unicode character
database,
listing some characters that are used with several scripts
(and which
scripts those are). For example, from the data file one can
learn that
the Arabic Tatweel and some of the Arabic harakat are also
used with
the Syriac script, the Arabic-Indic digits are also used
with Thaana,
and the Arabic comma, semicolon, and question mark are also
used with
both Syriac and Thaana:
http://www.unicode.org/Public/UNIDATA/ScriptExtensions.txt - More than a thousand new symbols have been added,
including lots of
symbols that you can find on electronics, maps, menus,
signs, etc.
Most of these were added to support Emoji, symbols mostly
used on
Japanese mobile phones for text messages, emails, chat, and even
cellphone novels:
http://en.wikipedia.org/wiki/Emoji
http://www.unicode.org/faq/emoji_dingbats.htmlFor you chart browsers over there, here are some of the blocks that contain the new symbols (color-coded yellow):
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-2300.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-2600.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-2700.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F0A0.pdf (playing cards)
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F100.pdf
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F300.pdf (lots of interesting new symbols, including symbols for beverage containers)
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F600.pdf (emoticons, also known as smileys)
http://www.unicode.org/charts/PDF/Unicode-6.0/U60-1F680.pdf (transport and map symbols)Please note that Unicode encodes beverage containers, but not alcoholic beverages (I personally made sure of that, to reduce possible objections). For example, there is no BEER encoded, but only BEER MUG (which is also used for non-alcoholic beer, among other uses).
Religiously devout people that may object to some game characters or musical instruments getting encoded should note that Unicode implementations are not required to support any specific character, and are allowed to choose their own set of characters to support. The game symbols are encoded only for the sake of Unicode implementations (especially those in East Asia) that need them to support their users.
- And finally, the official detail of additions and
changes to the
standard, for the hardcore:
http://www.unicode.org/versions/Unicode6.0.0/