Supported Encodings
The java.io.InputStreamReader
, java.io.OutputStreamWriter
, java.lang.String
classes, and classes in the java.nio.charset
package can convert between Unicode and a number of other character encodings. The supported encodings vary between different implementations of the Java Platform, Standard Edition 7 (Java SE 7). The class description for java.nio.charset.Charset
lists the encodings that any implementation of the Java Platform, Standard Edition 7 is required to support.
Oracle's Java SE Development Kit 7 (Java SE 7) for all platforms (Solaris, Linux, and Microsoft Windows) and the Java SE Runtime Environment 7 (JRE 7) for Solaris and Linux support all encodings shown on this page. Oracle's JRE 7 for Microsoft Windows may be installed as a complete international version or as a European languages version. By default, the JRE 7 installer installs a European languages version if it recognizes that the host operating system only supports European languages. If the installer recognizes that any other language is needed, or if the user requests support for non-European languages in a customized installation, a complete international version is installed. The European languages version only supports the encodings shown in the following Basic Encoding Set table. The international version (which includes the lib/charsets.jar file) supports all encodings shown on this page.
The following tables show the encoding sets supported by Java SE 7. The canonical names used by the new java.nio
APIs are in many cases not the same as those used in the java.io
and java.lang
APIs.
Basic Encoding Set (contained in lib/rt.jar)
Canonical Name for |
Canonical Name for |
Description |
---|---|---|
IBM00858 |
Cp858 |
Variant of Cp850 with Euro character |
IBM437 |
Cp437 |
MS-DOS United States, Australia, New Zealand, South Africa |
IBM775 |
Cp775 |
PC Baltic |
IBM850 |
Cp850 |
MS-DOS Latin-1 |
IBM852 |
Cp852 |
MS-DOS Latin-2 |
IBM855 |
Cp855 |
IBM Cyrillic |
IBM857 |
Cp857 |
IBM Turkish |
IBM862 |
Cp862 |
PC Hebrew |
IBM866 |
Cp866 |
MS-DOS Russian |
ISO-8859-1 |
ISO8859_1 |
ISO-8859-1, Latin Alphabet No. 1 |
ISO-8859-2 |
ISO8859_2 |
Latin Alphabet No. 2 |
ISO-8859-4 |
ISO8859_4 |
Latin Alphabet No. 4 |
ISO-8859-5 |
ISO8859_5 |
Latin/Cyrillic Alphabet |
ISO-8859-7 |
ISO8859_7 |
Latin/Greek Alphabet (ISO-8859-7:2003) |
ISO-8859-9 |
ISO8859_9 |
Latin Alphabet No. 5 |
ISO-8859-13 |
ISO8859_13 |
Latin Alphabet No. 7 |
ISO-8859-15 |
ISO8859_15 |
Latin Alphabet No. 9 |
KOI8-R |
KOI8_R |
KOI8-R, Russian |
KOI8-U |
KOI8_U |
KOI8-U, Ukrainian |
US-ASCII |
ASCII |
American Standard Code for Information Interchange |
UTF-8 |
UTF8 |
Eight-bit Unicode (or UCS) Transformation Format |
UTF-16 |
UTF-16 |
Sixteen-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark |
UTF-16BE |
UnicodeBigUnmarked |
Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order |
UTF-16LE |
UnicodeLittleUnmarked |
Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order |
UTF-32 |
UTF_32 |
32-bit Unicode (or UCS) Transformation Format, byte order identified by an optional byte-order mark |
UTF-32BE |
UTF_32BE |
32-bit Unicode (or UCS) Transformation Format, big-endian byte order |
UTF-32LE |
UTF_32LE |
32-bit Unicode (or UCS) Transformation Format, little-endian byte order |
x-UTF-32BE-BOM |
UTF_32BE_BOM |
32-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark |
x-UTF-32LE-BOM |
UTF_32LE_BOM |
32-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark |
windows-1250 |
Cp1250 |
Windows Eastern European |
windows-1251 |
Cp1251 |
Windows Cyrillic |
windows-1252 |
Cp1252 |
Windows Latin-1 |
windows-1253 |
Cp1253 |
Windows Greek |
windows-1254 |
Cp1254 |
Windows Turkish |
windows-1257 |
Cp1257 |
Windows Baltic |
Not available |
UnicodeBig |
Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order, with byte-order mark |
x-IBM737 |
Cp737 |
PC Greek |
x-IBM874 |
Cp874 |
IBM Thai |
x-UTF-16LE-BOM |
UnicodeLittle |
Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order, with byte-order mark |
Extended Encoding Set (contained in lib/charsets.jar)
Canonical Name for |
Canonical Name for |
Description |
---|---|---|
Big5 |
Big5 |
Big5, Traditional Chinese |
Big5-HKSCS |
Big5_HKSCS |
Big5 with Hong Kong extensions, Traditional Chinese (incorporating 2001 revision) |
EUC-JP |
EUC_JP |
JISX 0201, 0208 and 0212, EUC encoding Japanese |
EUC-KR |
EUC_KR |
KS C 5601, EUC encoding, Korean |
GB18030 |
GB18030 |
Simplified Chinese, PRC standard |
GB2312 |
EUC_CN |
GB2312, EUC encoding, Simplified Chinese |
GBK |
GBK |
GBK, Simplified Chinese |
IBM-Thai |
Cp838 |
IBM Thailand extended SBCS |
IBM01140 |
Cp1140 |
Variant of Cp037 with Euro character |
IBM01141 |
Cp1141 |
Variant of Cp273 with Euro character |
IBM01142 |
Cp1142 |
Variant of Cp277 with Euro character |
IBM01143 |
Cp1143 |
Variant of Cp278 with Euro character |
IBM01144 |
Cp1144 |
Variant of Cp280 with Euro character |
IBM01145 |
Cp1145 |
Variant of Cp284 with Euro character |
IBM01146 |
Cp1146 |
Variant of Cp285 with Euro character |
IBM01147 |
Cp1147 |
Variant of Cp297 with Euro character |
IBM01148 |
Cp1148 |
Variant of Cp500 with Euro character |
IBM01149 |
Cp1149 |
Variant of Cp871 with Euro character |
IBM037 |
Cp037 |
USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia |
IBM1026 |
Cp1026 |
IBM Latin-5, Turkey |
IBM1047 |
Cp1047 |
Latin-1 character set for EBCDIC hosts |
IBM273 |
Cp273 |
IBM Austria, Germany |
IBM277 |
Cp277 |
IBM Denmark, Norway |
IBM278 |
Cp278 |
IBM Finland, Sweden |
IBM280 |
Cp280 |
IBM Italy |
IBM284 |
Cp284 |
IBM Catalan/Spain, Spanish Latin America |
IBM285 |
Cp285 |
IBM United Kingdom, Ireland |
IBM297 |
Cp297 |
IBM France |
IBM420 |
Cp420 |
IBM Arabic |
IBM424 |
Cp424 |
IBM Hebrew |
IBM500 |
Cp500 |
EBCDIC 500V1 |
IBM860 |
Cp860 |
MS-DOS Portuguese |
IBM861 |
Cp861 |
MS-DOS Icelandic |
IBM863 |
Cp863 |
MS-DOS Canadian French |
IBM864 |
Cp864 |
PC Arabic |
IBM865 |
Cp865 |
MS-DOS Nordic |
IBM868 |
Cp868 |
MS-DOS Pakistan |
IBM869 |
Cp869 |
IBM Modern Greek |
IBM870 |
Cp870 |
IBM Multilingual Latin-2 |
IBM871 |
Cp871 |
IBM Iceland |
IBM918 |
Cp918 |
IBM Pakistan (Urdu) |
ISO-2022-CN |
ISO2022CN |
GB2312 and CNS11643 in ISO 2022 CN form, Simplified and Traditional Chinese (conversion to Unicode only) |
ISO-2022-JP |
ISO2022JP |
JIS X 0201, 0208, in ISO 2022 form, Japanese |
ISO-2022-KR |
ISO2022KR |
ISO 2022 KR, Korean |
ISO-8859-3 |
ISO8859_3 |
Latin Alphabet No. 3 |
ISO-8859-6 |
ISO8859_6 |
Latin/Arabic Alphabet |
ISO-8859-8 |
ISO8859_8 |
Latin/Hebrew Alphabet |
JIS_X0201 |
JIS_X0201 |
JIS X 0201 |
JIS_X0212-1990 |
JIS_X0212-1990 |
JIS X 0212 |
Shift_JIS |
SJIS |
Shift-JIS, Japanese |
TIS-620 |
TIS620 |
TIS620, Thai |
windows-1255 |
Cp1255 |
Windows Hebrew |
windows-1256 |
Cp1256 |
Windows Arabic |
windows-1258 |
Cp1258 |
Windows Vietnamese |
windows-31j |
MS932 |
Windows Japanese |
x-Big5-Solaris |
Big5_Solaris |
Big5 with seven additional Hanzi ideograph character mappings for the Solaris zh_TW.BIG5 locale |
x-euc-jp-linux |
EUC_JP_LINUX |
JISX 0201, 0208, EUC encoding Japanese |
x-EUC-TW |
EUC_TW |
CNS11643 (Plane 1-7,15), EUC encoding, Traditional Chinese |
x-eucJP-Open |
EUC_JP_Solaris |
JISX 0201, 0208, 0212, EUC encoding Japanese |
x-IBM1006 |
Cp1006 |
IBM AIX Pakistan (Urdu) |
x-IBM1025 |
Cp1025 |
IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR) |
x-IBM1046 |
Cp1046 |
IBM Arabic - Windows |
x-IBM1097 |
Cp1097 |
IBM Iran (Farsi)/Persian |
x-IBM1098 |
Cp1098 |
IBM Iran (Farsi)/Persian (PC) |
x-IBM1112 |
Cp1112 |
IBM Latvia, Lithuania |
x-IBM1122 |
Cp1122 |
IBM Estonia |
x-IBM1123 |
Cp1123 |
IBM Ukraine |
x-IBM1124 |
Cp1124 |
IBM AIX Ukraine |
x-IBM1381 |
Cp1381 |
IBM OS/2, DOS People's Republic of China (PRC) |
x-IBM1383 |
Cp1383 |
IBM AIX People's Republic of China (PRC) |
x-IBM33722 |
Cp33722 |
IBM-eucJP - Japanese (superset of 5050) |
x-IBM834 |
Cp834 |
IBM EBCDIC DBCS-only Korean |
x-IBM856 |
Cp856 |
IBM Hebrew |
x-IBM875 |
Cp875 |
IBM Greek |
x-IBM921 |
Cp921 |
IBM Latvia, Lithuania (AIX, DOS) |
x-IBM922 |
Cp922 |
IBM Estonia (AIX, DOS) |
x-IBM930 |
Cp930 |
Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026 |
x-IBM933 |
Cp933 |
Korean Mixed with 1880 UDC, superset of 5029 |
x-IBM935 |
Cp935 |
Simplified Chinese Host mixed with 1880 UDC, superset of 5031 |
x-IBM937 |
Cp937 |
Traditional Chinese Host miexed with 6204 UDC, superset of 5033 |
x-IBM939 |
Cp939 |
Japanese Latin Kanji mixed with 4370 UDC, superset of 5035 |
x-IBM942 |
Cp942 |
IBM OS/2 Japanese, superset of Cp932 |
x-IBM942C |
Cp942C |
Variant of Cp942 |
x-IBM943 |
Cp943 |
IBM OS/2 Japanese, superset of Cp932 and Shift-JIS |
x-IBM943C |
Cp943C |
Variant of Cp943 |
x-IBM948 |
Cp948 |
OS/2 Chinese (Taiwan) superset of 938 |
x-IBM949 |
Cp949 |
PC Korean |
x-IBM949C |
Cp949C |
Variant of Cp949 |
x-IBM950 |
Cp950 |
PC Chinese (Hong Kong, Taiwan) |
x-IBM964 |
Cp964 |
AIX Chinese (Taiwan) |
x-IBM970 |
Cp970 |
AIX Korean |
x-ISCII91 |
ISCII91 |
ISCII91 encoding of Indic scripts |
x-ISO2022-CN-CNS |
ISO2022_CN_CNS |
CNS11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only) |
x-ISO2022-CN-GB |
ISO2022_CN_GB |
GB2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only) |
x-iso-8859-11 |
x-iso-8859-11 |
Latin/Thai Alphabet |
x-JIS0208 |
x-JIS0208 |
JIS X 0208 |
x-JISAutoDetect |
JISAutoDetect |
Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only) |
x-Johab |
x-Johab |
Korean, Johab character set |
x-MacArabic |
MacArabic |
Macintosh Arabic |
x-MacCentralEurope |
MacCentralEurope |
Macintosh Latin-2 |
x-MacCroatian |
MacCroatian |
Macintosh Croatian |
x-MacCyrillic |
MacCyrillic |
Macintosh Cyrillic |
x-MacDingbat |
MacDingbat |
Macintosh Dingbat |
x-MacGreek |
MacGreek |
Macintosh Greek |
x-MacHebrew |
MacHebrew |
Macintosh Hebrew |
x-MacIceland |
MacIceland |
Macintosh Iceland |
x-MacRoman |
MacRoman |
Macintosh Roman |
x-MacRomania |
MacRomania |
Macintosh Romania |
x-MacSymbol |
MacSymbol |
Macintosh Symbol |
x-MacThai |
MacThai |
Macintosh Thai |
x-MacTurkish |
MacTurkish |
Macintosh Turkish |
x-MacUkraine |
MacUkraine |
Macintosh Ukraine |
x-MS950-HKSCS |
MS950_HKSCS |
Windows Traditional Chinese with Hong Kong extensions |
x-mswin-936 |
MS936 |
Windows Simplified Chinese |
x-PCK |
PCK |
Solaris version of Shift_JIS |
x-SJIS_0213 |
x-SJIS_0213 |
Shift_JISX0213 |
x-windows-50220 |
Cp50220 |
Windows Codepage 50220 (7-bit implementation) |
x-windows-50221 |
Cp50221 |
Windows Codepage 50221 (7-bit implementation) |
x-windows-874 |
MS874 |
Windows Thai |
x-windows-949 |
MS949 |
Windows Korean |
x-windows-950 |
MS950 |
Windows Traditional Chinese |
x-windows-iso2022jp |
x-windows-iso2022jp |
Variant ISO-2022-JP (MS932 based) |