Thursday, August 11, 2011

UTF8 and AL32UTF8

In short, I think we should use AL32UTF8 to keep characterset support latest Unicode standard. 

 Oracle 8i (or lower) RDBMS and client releases connect to an 9i (or up) AL32UTF8 system or when connecting using database links from an 9i (or up) AL32UTF8 database to an 8i (or lower) database.  

UTF8 is Unicode revision 3.0 in 8.1.7 and up. AL32UTF8 is updated with newer Unicode versions in each major release.

If you need to understand "code point" , check code point and description in UTF-8 wiki


As far as these two character sets go in Oracle,  the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character).  Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set.

in

Difference between UTF8 and AL32UTF8


Caution:
AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters. 

Using database character set UTF8 for XML data could cause a fatal error or affect security negatively.

References:

UTF-8 wiki

Problems connecting to AL32UTF8 databases from older versions (8i and lower) [ID 237593.1]
Unicode Character Sets In The Oracle Database [ID 260893.1]