iconv(3C)iconv(3C)NAMEiconv(), iconv_open(), iconv_close() - codeset conversion routines
SYNOPSISDESCRIPTION
The routine uses the following configuration files in descending order
of precedence:
·
·
The system file is searched first. It cannot be modified and contains
codeset names supported by the operating system. The file, on the
other hand, is user-modifiable and can be used by the system adminstra‐
tor or third-party applications to add custom iconv converters.
The configuration files are divided into two sections. The first sec‐
tion is for defining aliases to the canonical codeset names used in the
second section. It is ended by the keyword The second section contains
the set of conversions (codeset names) that are supported by The first
two columns correspond to the fromcode and tocode names. These names
may be directly used or their corresponding aliases may be used as
parameters to The remaining three columns corresponds to the name of
the translation table, the iconv method, and the corresponding function
name and method library (for multi-byte codeset).
The symbol is used if the columns are not applicable. The and method
library suffices in these configuration files, if present, are now
redundant. They are ignored by the function, and the right architec‐
ture specific method library extension will be appended automatically.
Returns a conversion descriptor that describes a conversion from
the codeset specified by the string pointed to by
the fromcode argument to the codeset specified by
the tocode argument.
A conversion descriptor remains valid in a process
until that process closes it.
The fromcode and tocode arguments must have a cor‐
responding entry in at least one of the iconv con‐
figuration files.
The function searches the codeset names first in
and then in to check if the requested conversion is
supported. If so, determines which table and/or
method to use for the conversion.
Converts a sequence of characters from one codeset
that is contained in the array specified by inbuf,
into a sequence of corresponding characters in
another codeset, contained in the array specified
by outbuf. The codesets are those specified in the
call that returned the conversion descriptor cd.
The inbuf argument points to a variable that points
to the first character in the input buffer, and
inbytesleft indicates the number of remaining bytes
in the buffer being converted. The outbuf argument
points to a variable that points to the first
available byte in the output buffer, and out‐
bytesleft indicates the number of the available
remaining bytes in the buffer.
If a sequence of input bytes does not form a valid
character in the specified codeset, conversion
stops after the previous successfully converted
character. If the input buffer ends with an incom‐
plete character or shift sequence (see section),
conversion stops after the previous successfully
converted character. If the output buffer is not
large enough to hold the entire converted output,
conversion stops just prior to the character that
would cause the output buffer to overflow. The
variable pointed to by inbuf is updated to point to
the byte following the last byte successfully used
in the conversion. The value pointed to by
inbyesleft is reduced to reflect the number of
bytes still not converted in the input buffer. The
variable pointed to by outbuf is updated to point
to the byte following the last byte of converted
output data. The value pointed to by outbytesleft
is reduced to reflect the number of bytes still
available in the output buffer.
If encounters a character in the input buffer that
is legal but for which an identical character does
not exist in the target codeset, maps this charac‐
ter to a pre-defined character, called the "galley
character" that is defined at the time of table
generation. (See genxlt(1)).
Deallocates the conversion descriptor
cd and all other associated resources allocated by
APPLICATION USAGE
Portable applications must assume that conversion descriptors are not
valid after calls to any of the functions.
Special Usage
In state-dependent encodings, the characters are interpreted depending
on "state" of the input. State shifts occur when a specific sequence
of bytes are seen in the input. These sequences will change the way
subsequent characters are interpreted (that is, initially the charac‐
ters may be single-byte characters, after a state shift, subsequent
characters may be interpreted as two-byte characters). For state-
dependent encodings, the conversion descriptor after is in a codeset-
dependent initial shift state, ready for immediate use with
For state-dependent encodings, the conversion descriptor cd is placed
into its initial shift state by a call to for which the inbuf is a null
pointer, or for which inbuf points to a null pointer. When is called
in this way, and outbuf is not a null pointer or a pointer to a null
pointer, and outbytesleft points to a positive value, places the byte
sequence to change the output buffer to its initial shift state. If
the output buffer is not large enough to hold the entire reset
sequence, fails and sets to Subsequent calls with inbuf set to other
than a null pointer or a pointer to a null pointer cause the conversion
to take place from the current state of the conversion descriptor.
For state-dependent encodings, the conversion descriptor is updated to
reflect the shift state in effect at the end of the last successfully
converted byte sequence.
RETURN VALUE
Upon successful completion,
returns a conversion descriptor for use on subse‐
quent calls to Otherwise returns and sets to indi‐
cate the error.
updates the variables pointed to by the arguments
to reflect the extent of conversion, and returns
the number of non-identical conversions performed.
If the entire string in the input buffer is con‐
verted, the value pointed to by inbytesleft is
zero. If an error occurs, returns and sets to
indicate the error.
Upon successful completion,
returns a value of zero. Otherwise it returns −1
and sets to indicate the error.
ERRORS
fails if any of the following conditions are encountered:
Insufficient storage space is available.
The conversion specified by the
fromcode and tocode is not supported, or the ta‐
ble or method specified in the configuration file
could not be read or loaded correctly. This
error will also occur if the configuration file
itself is faulty.
fails if any of the following conditions are encountered:
Input conversion stopped due to an input character that does not
belong to
the input codeset, or if the conversion table
does not contain an entry corresponding to this
input character and a galley character was not
defined for that particular table.
Input conversion stopped due to lack of space in the output buf‐
fer.
Input conversion stopped due to an incomplete character or shift
sequence
at the end of the input buffer.
The cd argument is not a valid open conversion
descriptor.
fails if any of the following conditions are encountered:
The conversion descriptor is invalid.
EXAMPLES
The following example shows how the interfaces maybe used for conver‐
sions.
#include <iconv.h>
#include <errno.h>
main()
{
...
convert("roman8", "iso88591", fd);
...
}
int
convert(tocode, fromcode, Input)
char *tocode; /* tocode name */
char *fromcode /* fromcode name */
int Input; /* input file descriptor */
{
extern void error(); /* local error message */
iconv_t cd; /* conversion descriptor */
unsigned char *table; /* ptr to translation table */
int bytesread; /* num bytes read into input buffer */
unsigned char inbuf[BUFSIZ]; /* input buffer */
unsigned char *inchar; /* ptr to input character */
size_t inbytesleft; /* num bytes left in input buffer */
unsigned char outbuf[BUFSIZ]; /* output buffer */
unsigned char *outchar; /* ptr to output character */
size_t outbytesleft; /* num bytes left in output buffer */
size_t ret_val; /* number of conversions */
/* Initiate conversion -- get conversion descriptor */
if ((cd = iconv_open(tocode, fromcode)) == (iconv_t)-1) {
error(FATAL, BAD_OPEN);
}
inbytesleft = 0; /* no. of bytes converted */
/* translate the characters */
for ( ;; ) {
/*
* if any bytes are leftover, they will be in the
* beginning of the buffer on the next read().
*/
inchar = inbuf; /* points to input buffer */
outchar = outbuf; /* points to output buffer */
outbytesleft = BUFSIZ; /* no of bytes to be converted */
if ((bytesread = read(Input, inbuf+inbytesleft,
(size_t)BUFSIZ-inbytesleft)) < 0) {
perror("prog");
return BAD;
}
if (!(inbytesleft += bytesread)) {
break; /* end of conversions */
}
ret_val = iconv(cd, &inchar, &inbytesleft,
&outchar, &outbytesleft);
if (write(1, outbuf, (size_t)BUFSIZ-outbytesleft) < 0) {
perror("prog");
return BAD;
}
/* iconv() returns the number of non-identical conversions
* performed. If the entire string in the input buffer is
* converted, the value pointed to by inbytesleft will be
* zero. If the conversion stopped due to any reason, the
* value pointed to by inbytesleft will be non-zero and
* errno is set to indicate the condition.
*/
if ((ret_val == -1) && (errno == EINVAL)) {
/* Input conversion stopped due to an incomplete
* character or shift sequence at the end of the
* input buffer.
*/
/* Copy data left, to the start of buffer */
memcpy((char *)inbuf, (char *)inchar,
(size_t)inbytesleft);
} else if ((ret_val == -1) && (errno == EILSEQ)) {
/* Input conversion stopped due to an input byte
* that does not belong to the input codeset.
*/
error(FATAL, BAD_CONVERSION);
} else if ((ret_val == -1) && (errno == E2BIG)) {
/* Input conversion stopped due to lack of space
* in the output buffer. inbytesleft has the
* number of bytes to be converted.
*/
memcpy((char *)inbuf, (char *)inchar,
(size_t)inbytesleft);
}
/* Go back and read from the input file. */
}
/* end conversion & get rid of the conversion table */
if (iconv_close(cd) == BAD) {
error(FATAL, BAD_CLOSE);
}
return GOOD;
}
WARNINGS
If you use and compile/link your application archive on PA-RISC sys‐
tems, note that has a dependency on that will require a change to the
compile/link command:
Compile :
Or compile with and
The option is positionally dependent and should occur at the beginning
of the compile line. For optimum compatibility in future releases, you
should avoid using archive libc with other shared libraries except for
libdld.sl as needed above.
There is a corner-case situation for multi-byte characters that is not
correctly handled by If the last character in the file being converted
is an invalid multi-byte character, returns instead of The application
can get around this by checking whether EOF is reached or if this is
the last buffer being converted. In this case, should be treated as
AUTHOR
was developed by HP.
FILES
System configuration file containing codeset names supported by the
operating system.
User customizable
configuration file containing additional codeset names.
Directory containing tables used for conversion.
Directory containing methods used for conversion.
SEE ALSOgenxlt(1), iconv(1), thread_safety(5).
STANDARDS CONFORMANCEiconv(3C)