utf8 man page on PC-BSD

Man page or keyword search:  
man Server   9747 pages
apropos Keyword Search (all sections)
Output format
PC-BSD logo
[printable version]

UTF8(5)			    BSD File Formats Manual		       UTF8(5)

NAME
     utf8 — UTF-8, a transformation format of ISO 10646

SYNOPSIS
     ENCODING "UTF-8"

DESCRIPTION
     The UTF-8 encoding represents UCS-4 characters as a sequence of octets,
     using between 1 and 6 for each character.	It is backwards compatible
     with ASCII, so 0x00-0x7f refer to the ASCII character set.	 The multibyte
     encoding of non-ASCII characters consist entirely of bytes whose high
     order bit is set.	The actual encoding is represented by the following
     table:

     [0x00000000 - 0x0000007f] [00000000.0bbbbbbb] -> 0bbbbbbb
     [0x00000080 - 0x000007ff] [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
     [0x00000800 - 0x0000ffff] [bbbbbbbb.bbbbbbbb] ->
	     1110bbbb, 10bbbbbb, 10bbbbbb
     [0x00010000 - 0x001fffff] [00000000.000bbbbb.bbbbbbbb.bbbbbbbb] ->
	     11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
     [0x00200000 - 0x03ffffff] [000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
	     111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
     [0x04000000 - 0x7fffffff] [0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb] ->
	     1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb

     If more than a single representation of a value exists (for example,
     0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation is always
     used.  Longer ones are detected as an error as they pose a potential
     security risk, and destroy the 1:1 character:octet sequence mapping.

SEE ALSO
     euc(5)

     Rob Pike and Ken Thompson, "Hello World", Proceedings of the Winter 1993
     USENIX Technical Conference, USENIX Association, January 1993.

     F. Yergeau, UTF-8, a transformation format of ISO 10646, January 1998,
     RFC 2279.

     The Unicode Standard, Version 3.0, The Unicode Consortium, 2000, as
     amended by the Unicode Standard Annex #27: Unicode 3.1 and by the Unicode
     Standard Annex #28: Unicode 3.2.

STANDARDS
     The utf8 encoding is compatible with RFC 2279 and Unicode 3.2.

BSD				 April 7, 2004				   BSD
[top]

List of man pages available for PC-BSD

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net