Main Content

Announcing The Unicode Standard, Version 12.0

Version 12.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 554 characters, for a total of 137,929 characters. These additions include four new scripts, for a total of 150 scripts, as well as 61 new emoji characters.

The new scripts and characters in Version 12.0 add support for lesser-used languages and unique written requirements worldwide, including:
Elymaic, historically used to write Achaemenid Aramaic in the southwestern portion of modern-day Iran
Nandinagari, historically used to write Sanskrit and Kannada in southern India
Nyiakeng Puachue Hmong, used to write modern White Hmong and Green Hmong languages in Laos, Thailand, Vietnam, France, Australia, Canada, and the United States
Wancho, used to write the modern Wancho language in India, Myanmar, and Bhutan
Additional support for lesser-used languages and scholarly work was extended worldwide, including:
Miao script additions to write several Miao and Yi dialects in China
Hiragana and Katakana small letters, used to write archaic Japanese
Tamil historic fractions and symbols, used in South India
Lao letters used to write Pali
Latin letters used in Egyptological and Ugaritic transliteration
Hieroglyph format controls, enabling full formatting of quadrats for Egyptian Hieroglyphs
The Egyptian temple ceiling painting shown above (from the Wikipedia article on Medinet Habu) includes a line of hieroglyphic text. That exact text is rendered again below the painting, represented in Unicode plain text, illustrating the use of the new hieroglyphic format controls, as well as cartouche brackets and directional controls. The example was developed by Andrew Glass, based on Microsoft’s Segoe UI Historic font, with outlines designed by James P. Allen.

Popular symbol additions include:
61 emoji characters, including several new emoji for accessibility
Marca registrada sign
Heterodox and fairy chess symbols
For the full list of new emoji characters, see emoji additions for Unicode 12.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji. Version 12.0 also includes additional guidelines on gender and skin tone included in UTS #51 and data files.

Also in Version 12.0, the following Unicode Standard Annexes have notable modifications, often in coordination with changes to character properties. In particular, there are changes to:
UAX #14, Unicode Linebreaking Algorithm
UAX #29, Unicode Text Segmentation
UAX #31, Unicode Identifier and Pattern Syntax
UAX #38, Unicode Han Database (Unihan)
UAX #45, U-Source Ideographs
Three other important Unicode specifications have been updated for Version 12.0:
UTS #10, Unicode Collation Algorithm—sorting Unicode text
UTS #39, Unicode Security Mechanisms—reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing—compatible processing of non-ASCII URLs
The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.”

Link to article

Related Content