charmap(4)charmap(4)NAMEcharmap - symbolic translation file for localedef scripts
SYNOPSIScharmapDESCRIPTION
Invoking the command with the option causes symbolic names in the
locale description file to be translated into the encodings given in
the charmap file (see localedef(1M)). As a recommendation, a locale
description file should be written completely with symbolic names.
The charmap file has three sections: a declarations section, a charac‐
ter definition section, and an optional width specification section.
Declarations Section
Declarations can precede the character definitions.
Each consists of the symbol (including the surrounding angle brackets),
followed by one or more blanks (or tabs or space characters), followed
by the value of the symbol.
Certain declarations are required for multibyte character codesets.
For single-byte codesets, all are optional.
Following is a list of possible declarations:
value
Used to declare the name of the coded character set for which
the charmap file is defined. This keyword is required for
multibyte character codesets. For HP15 encoding scheme, needs
to be part of the name. For EUC encoding scheme, needs to be
part of the name.
value
Used to declare the cswidth parameter of the coded character set
for which the charmap file is defined (see eucset(1)).
value
Used to declare the maximum number of bytes in a multibyte char‐
acter. Defaults to 1 if not given. For multibyte character
codesets, this keyword must be specified.
value
Used to declare the minimum number of bytes in a character for
the encoded character set. The value must be less than or equal
to If not given, the default is equal to
value
Used to declare the escape character, which is used to escape
characters that otherwise would have special meaning. If not
given, the default is backslash
value
Used to declare the comment character, which is used to begin
comments and should be placed in column one of the charmap file.
If not given, the default is the character.
Character Definition Section
The character-set mapping definitions immediately follow an identifier
line containing the string and precede a trailer line consisting of the
string (Empty lines and lines beginning with the comment character are
ignored.)
The character definitions are of two forms.
The first form defines a single character and its encoding:
encoding
A symbolic_name is one or more visible characters from the portable
character set as specified by XPG, enclosed in angle brackets.
Metacharacters such as angle brackets, escape characters, or comment
characters must be escaped if they are used in the name. Two or more
symbolic names can be given for the same encoding.
The encoding is a character constant in one of four forms:
decimal An escape character followed by the letter fol‐
lowed by one to three decimal digits.
octal An escape character followed by one to three
octal digits.
hexadecimal An escape character followed by an followed by
two hexadecimal digits.
Unicode An escape character followed by a followed by
four or five hexadecimal digits. This encoding
form can only be used when the option of the com‐
mand is specified.
Multibyte characters are represented by the concatenation of character
constants. All constants used in the encoding of a multibyte character
must be of the same form.
The second form defines a range of characters consisting of all charac‐
ters from the first symbolic name to the second, inclusive:
... encoding
The symbolic name must consist of one or more nonnumeric characters
followed by an integer formed of one or more decimal digits. The inte‐
ger part of the second symbolic name must be larger than that of the
first. The range is then interpreted as a list of symbolic names con‐
sisting of the same character portion and successive integer values
from the first through the last. These names are assigned successive
encodings starting with the one given.
For example, the character definition line
is equivalent to:
Width Specification
The following declarations can follow the character set mapping defini‐
tions (after the statement). Each consists of one of the keywords
shown in the following list, starting in column 1, followed by the
value(s) associated with the keyword, as defined below.
A positive integer value (either 1 or 2) defining the column
width for the
printable character in the coded character set
mapping definitions. Coded character set charac‐
ter values are defined using symbolic character
names followed by column width values. Defining
a character with more than one produces undefined
results. The keyword is used to terminate the
definitions. Specifying the width of a non-
printable character in a declaration produces
undefined results. Ellipses (...) can be used
between two symbolic character names to specify a
range of characters.
A positive integer value defining the default column width for
any
printable character not listed by one of the key‐
words. If no keyword is included in the charmap,
the default character width is 1.
EXAMPLES
For examples, see any of the files under the directory.
After the statement, a syntax for a width definition would be:
WIDTH
<A> 1
<B> 1
<C>...<Z> 1
...
<wc1>...<wcn> 2
END WIDTH
In this example, the numerical code point values represented by the
symbols <A> and <B> are assigned a width of 1. The code point values
<C> to <Z> inclusive, that is, <C>, <D>, <E>, and so on, are also
assigned a width of 1. Using <A>...<Z> would have required fewer
lines, but the alternative was shown to demonstrate flexibility. The
keyword could have been added as appropriate.
SEE ALSOeucset(1), localedef(1M), localedef(4).
STANDARDS CONFORMANCEcharmap(4)