- Sixteen symbols have been encoded in the Arabic Presentations
Forms-A block for use in pedagogical materials and documents
discussing the features of the Arabic script.
Please note that these are not combining characters but stand-alone
symbols. These should only be used to display the dots and diacritics
in isolation, and not for making new letters. For example, one can
*not* use a Seen and add U+FBB6 Arabic Symbol Three dots Above to get
a Sheen. If you type that, you will get a Seen followed by three dots.
According to the standard, "These are spacing symbols representing
Arabic letter diacritics considered in isolation, as for example as in
discussions about the Arabic script."
Updated Unicode chart:
- The Qur'anic character U+06DE ARABIC START OF RUB EL HIZB has had
its glyph and properties changed.
For some unknown historical reason, the character was mistakenly
classified as a combining character instead of just a symbol, which
made it unusable. The character is now a normal spacing symbol and is
usable as originally intended.
Background document for the change (which I authored):
- Two characters have been encoded in the Arabic script block for use
in Kashmiri, one of the official languages of Jammu and Kashmir, the
Indian-administered part of Kashmir. The language is written in both
Arabic and Devanagari, along religious lines of Muslims and Hindus.
The two new characters are U+0620 Arabic Letter Kashmiri Yeh and
U+065F Arabic Wavy Hamza Below. Also, U+0673 Arabic Letter Alef With
Wavy Hamza Below has been deprecated (the first Arabic script
character to ever get deprecated in Unicode), and the character
sequence <U+0627, U+065F> should be used instead of it.
Unicode proposal (I'm a coauthor):
Updated Unicode chart:
- Mandaic has been encoded. Mandaic is the script used by the
Mandaeans (mostly living in southern Iraq and southwestern Iran,
especially Khouzestan) for liturgical purposes. This the community
that some people believe the Qur'an refers to as Sabians, the third
member group of the People of the Book (next to Jews and Christians).
Michael Everson's proposal:
- Brahmi is also encoded, which is of use to Iranianists (some Iranian
languages like Khotanese have been written in Brahmi).
The most detailed proposal (although not the final one that got
Final Unicode chart:
- Unicode Standard Annex #9, The Unicode Bidirectional Algorithm, has
been updated to include more information and some clarifications. Note
that the algorithm has not changed. The update just explains the
original intentions in more details. For the list of informational
changes to the text, see the following link (Behdad Esfahbod and I
have contributed to this and previous versions of the standard annex):
- A new data file has been added to the Unicode character database,
listing some characters that are used with several scripts (and which
scripts those are). For example, from the data file one can learn that
the Arabic Tatweel and some of the Arabic harakat are also used with
the Syriac script, the Arabic-Indic digits are also used with Thaana,
and the Arabic comma, semicolon, and question mark are also used with
both Syriac and Thaana:
- More than a thousand new symbols have been added, including lots of
symbols that you can find on electronics, maps, menus, signs, etc.
Most of these were added to support Emoji, symbols mostly used on
Japanese mobile phones for text messages, emails, chat, and even
For you chart browsers over there, here are some of the blocks that
contain the new symbols (color-coded yellow):
interesting new symbols, including symbols for beverage containers)
(emoticons, also known as smileys)
(transport and map symbols)
Please note that Unicode encodes beverage containers, but not
alcoholic beverages (I personally made sure of that, to reduce
possible objections). For example, there is no BEER encoded, but only
BEER MUG (which is also used for non-alcoholic beer, among other
Religiously devout people that may object to some game characters or
musical instruments getting encoded should note that Unicode
implementations are not required to support any specific character,
and are allowed to choose their own set of characters to support. The
game symbols are encoded only for the sake of Unicode implementations
(especially those in East Asia) that need them to support their users.
- And finally, the official detail of additions and changes to the
standard, for the hardcore: