Unicode::MapUTF8 man page on UnixWare

Man page or keyword search:  
man Server   3616 pages
apropos Keyword Search (all sections)
Output format
UnixWare logo
[printable version]

Unicode::MapUTF8(3)   User Contributed Perl Documentation  Unicode::MapUTF8(3)

NAME
       Unicode::MapUTF8 - Conversions to and from arbitrary character sets and
       UTF8

SYNOPSIS
	use Unicode::MapUTF8 qw(to_utf8 from_utf8 utf8_supported_charset);

	# Convert a string in 'ISO-8859-1' to 'UTF8'
	my $output = to_utf8({ -string => 'An example', -charset => 'ISO-8859-1' });

	# Convert a string in 'UTF8' encoding to encoding 'ISO-8859-1'
	my $other  = from_utf8({ -string => 'Other text', -charset => 'ISO-8859-1' });

	# List available character set encodings
	my @character_sets = utf8_supported_charset;

	# Add a character set alias
	utf8_charset_alias({ 'ms-japanese' => 'sjis' });

	# Convert between two arbitrary (but largely compatible) charset encodings
	# (SJIS to EUC-JP)
	my $utf8_string	  = to_utf8({ -string =>$sjis_string, -charset => 'sjis'});
	my $euc_jp_string = from_utf8({ -string => $utf8_string, -charset => 'euc-jp' })

	# Verify that a specific character set is supported
	if (utf8_supported_charset('ISO-8859-1') {
	    # Yes
	}

DESCRIPTION
       Provides an adapter layer between core routines for converting to and
       from UTF8 and other encodings. In essence, a way to give multiple
       existing Unicode modules a single common interface so you don't have to
       know the underlaying implementations to do simple UTF8 to-from other
       character set encoding conversions. As such, it wraps the Uni‐
       code::String, Unicode::Map8, Unicode::Map and Jcode modules in a stan‐
       dardized and simple API.

       This also provides general character set conversion operation based on
       UTF8 - it is possible to convert between any two compatible and sup‐
       ported character sets via a simple two step chaining of conversions.

       As with most things Perlish - if you give it a few big chunks of text
       to chew on instead of lots of small ones it will handle many more char‐
       acters per second.

       By design, it can be easily extended to encompass any new charset
       encoding conversion modules that arrive on the scene.

       This module is intended to provide good Unicode support to versions of
       Perl prior to 5.8. If you are using Perl 5.8.0 or later, you probably
       want to be using the Encode module instead. This module does work with
       Perl 5.8, but Encode is the preferred method in that environment.

CHANGES
	1.11 2005.10.10	  Documentation changes. Addition of Build.PL support.
			  Added various build tests, LICENSE, Artistic_License.txt,
			  GPL_License.txt. Split documentation into seperate
			  .pod file. Added Japanese translation of POD.

	1.10 2005.05.22 - Fixed bug in conversion of ISO-2022-JP to UTF-8.
			  Problem and fix found by Masahiro HONMA
			  <masahiro.honma@tsutaya.co.jp>.

			  Similar bugs in conversions of shift_jis and euc-jp
			  to UTF-8 fixed as well.

	1.09 2001.08.22 - Fixed multiple typo occurances of 'uft'
			  where 'utf' was meant in code. Problem affected
			  utf16 and utf7 encodings. Problem found
			  by devon smith <devon@taller.PSCL.cwru.edu>

	1.08 2000.11.06 - Added 'utf8_charset_alias' function to
			  allow for runtime setting of character
			  set aliases. Added several alternate
			  names for 'sjis' (shiftjis, shift-jis,
			  shift_jis, s-jis, and s_jis).

			  Corrected 'croak' messages for
			  'from_utf8' functions to appropriate
			  function name.

			  Tightened up initialization encapsulation

			  Corrected fatal problem in jcode from
			  unicode internals. Problem and fix
			  found by Brian Wisti <wbrian2@uswest.net>.

	1.07 2000.11.01 - Added 'croak' to use Carp declaration to
			  fix error messages.  Problem and fix
			  found by Brian Wisti
			  <wbrian2@uswest.net>.

	1.06 2000.10.30 - Fix to handle change in stringification
			  of overloaded objects between Perl 5.005
			  and 5.6. Problem noticed by Brian Wisti
			  <wbrian2@uswest.net>.

	1.05 2000.10.23 - Error in conversions from UTF8 to
			  multibyte encodings corrected

	1.04 2000.10.23 - Additional diagnostic messages added
			  for internal error conditions

	1.03 2000.10.22 - Bug fix for load time autodetction of
			  Unicode::Map8 encodings

	1.02 2000.10.22 - Added load time autodetection of
			  Unicode::Map8 supported character set
			  encodings.

			  Fixed internal calling error for some
			  character sets with 'from_utf8'. Thanks
			  goes to Ilia Lobsanov
			  <ilia@lobsanov.com> for reporting this
			  problem.

	1.01 2000.10.02 - Fixed handling of empty strings and
			  added more identification for error
			  messages.

	1.00 2000.09.29 - Pre-release version

FUNCTIONS
       utf8_charset_alias({ $alias => $charset });
	   Used for runtime assignment of character set aliases.

	   Called with no parameters, returns a hash of defined aliases and
	   the character sets they map to.

	   Example:

	     my $aliases     = utf8_charset_alias;
	     my @alias_names = keys %$aliases;

	   If called with ONE parameter, returns the name of the 'real'
	   charset if the alias is defined. Returns undef if it is not found
	   in the aliases.

	   Example:

	       if (! utf8_charset_alias('VISCII')) {
		   # No alias for this
	       }

	   If called with a list of 'alias' => 'charset' pairs, defines those
	   aliases for use.

	   Example:

	       utf8_charset_alias({ 'japanese' => 'sjis', 'japan' => 'sjis' });

	   Note: It will croak if a passed pair does not map to a character
	   set defined in the predefined set of character encoding. It is NOT
	   allowed to alias something to another alias.

	   Multiple character set aliases can be set with a single call.

	   To clear an alias, pass a character set mapping of undef.

	   Example:

	       utf8_charset_alias({ 'japanese' => undef });

	   While an alias is set, the 'utf8_supported_charset' function will
	   return the alias as if it were a predefined charset.

	   Overriding a base defined character encoding with an alias will
	   generate a warning message to STDERR.

       utf8_supported_charset($charset_name);
	   Returns true if the named charset is supported (including user
	   defined aliases).

	   Returns false if it is not.

	   Example:

	       if (! utf8_supported_charset('VISCII')) {
		   # No support yet
	       }

	   If called in a list context with no parameters, it will return a
	   list of all supported character set names (including user defined
	   aliases).

	   Example:

	       my @charsets = utf8_supported_charset;

       to_utf8({ -string => $string, -charset => $source_charset });
	   Returns the string converted to UTF8 from the specified source
	   charset.

       from_utf8({ -string => $string, -charset => $target_charset});
	   Returns the string converted from UTF8 to the specified target
	   charset.

VERSION
       1.11 2005.10.10

TODO
       Regression tests for Jcode, 2-byte encodings and encoding aliases

SEE ALSO
       Unicode::String Unicode::Map8 Unicode::Map Jcode Encode

COPYRIGHT
       Copyright 2000-2005, Benjamin Franz. All rights reserved.

AUTHOR
       Benjamin Franz <snowhare@nihongo.org>

LICENSE
       This program is free software; you can redistribute it and/or modify it
       under the same terms and conditions as Perl itself.

       This means that you can, at your option, redistribute it and/or modify
       it under either the terms the GNU Public License (GPL) version 1 or
       later, or under the Perl Artistic License.

       See http://dev.perl.org/licenses/

DISCLAIMER
       THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
       WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
       MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.

       Use of this software in any way or in any form, source or binary, is
       not allowed in any country which prohibits disclaimers of any implied
       warranties of merchantability or fitness for a particular purpose or
       any disclaimers of a similar nature.

       IN NO EVENT SHALL I BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPE‐
       CIAL, INCIDENTAL,  OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF
       THIS SOFTWARE AND ITS DOCUMENTATION (INCLUDING, BUT NOT LIMITED TO,
       LOST PROFITS) EVEN IF I HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
       DAMAGE

perl v5.8.8			  2008-05-13		   Unicode::MapUTF8(3)
[top]
                             _         _         _ 
                            | |       | |       | |     
                            | |       | |       | |     
                         __ | | __ __ | | __ __ | | __  
                         \ \| |/ / \ \| |/ / \ \| |/ /  
                          \ \ / /   \ \ / /   \ \ / /   
                           \   /     \   /     \   /    
                            \_/       \_/       \_/ 
More information is available in HTML format for server UnixWare

List of man pages available for UnixWare

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net