kcc man page on YellowDog

Man page or keyword search:  
man Server   18644 pages
apropos Keyword Search (all sections)
Output format
YellowDog logo
[printable version]

KCC(L)									KCC(L)

NAME
       kcc - Kanji code coverter with encoding auto detection

SYNOPSIS
       kcc [ -IOchnvxz ] [ -b bufsize ] [ file ] ...

DESCRIPTION
       kcc  is a filter that reads file sequencially, converts kanji encodings
       and output to stdou.  If no file is specified, or specified - as	 file‐
       name,  it  read	from  stdin.   You  can	 specify  kanji	 encodings for
       input/output. However, kcc detect input encodig automatically,  if  you
       don't specify input encoding.

       Available  kanji	 encodings  are	 JIS  (7  bit  and/or  8  bit),	 Shift
       JISEUCDEC.  For input encoding, you can mix when these are pair of  one
       of  EUC	DEC  or Shift JIS and 7 bit JIS.  SI/SOESC(I are recognized as
       halfwidth of JIS.

OPTIONS
       -O
       -IO    I for input kanji encoding¡¤O for output kanji  encoding.	  When
	      no  input encoding specified, it will be detected automatically,
	      and if both of input/output aren't specified, output encoding is
	      7 bit JIS.

	      You  can	specify	 one  of the followings for the input encoding
	      option, I.

		 e	EUC(available with 7 bit JIS )
		 d	DEC(available with 7 bit JIS )
		 s	Shift JIS(available with 7 bit JIS )
		 j7 or k
			7 bit JIS
		 8	8 bit JIS

	      You can specify  one  of	the  followings	 for  output  encoding
	      option, O.

		 e	EUC
		 d	DEC
		 s	Shift JIS
		 jXY or 7XY
			7 bit JIS(usingSI/SO for JIS kana designation)
		 kXY	7 bit JIS(usingESC(I for JIS kana designation)
		 8XY	8 bit JIS

	      By XY in O option, You can specify which escape sequence used in
	      JIS encoding.  BJ is default.   Supplimental  kanji  designation
	      is fixed to ESC$(D

		 X	Kanji is designated by:
		      B	     ESC$B(JIS X0208-1983)
		      @	     ESC$@(JIS X0208-1978)
		      +	     ESC&@ESC$B(JIS X0212-1990)
		 Y	Alpha Numerical is designated by:
		      B	     ESC(B(ASCII)
		      J	     ESC(J(JIS Roman; JIS X0201)
		      H	     ESC(H(Swedish; strongly deprecated)

       -v     outputs result of input encoding detection to stderr.

       -x     Extension mode.  By auto detection of input encodings, recognize
	      user-defined characters and extended character region (  out  of
	      range  of	 EUC,  undefined halfwidth kana, control character, C1
	      area and/or extended character region Shift C1  JIS  ).  Distin‐
	      guish between DEC and EUC is done in this mode.

       -z     Shrink  mode. Don't recognize halfwidth kana (except 7 bit JIS )
	      with input encoding detection.  With this	 option,  accuracy  of
	      auto  detection  of input encodings becomes much better for file
	      without halfwidth kana.

       -h     Normally, When converted halfwidth kana  to  DEC	,  it  becomes
	      fullwidth Katakana.  With this option, it becomes Hiragana.

       -n     user-defined  characters,	 extended  characters and supplimental
	      kanji characters areconverted to fullwidth white box, and	 unde‐
	      fined  region  of halfwidth kana are converted to halfwidth cen‐
	      tered dot.

       -b bufsize
	      specify buffer size.  8kbytes is default.

       -c     don't convert but check input encoding and print result to  std‐
	      out.   Different	with normal auto-detection,  whole contents of
	      file is checked.	However, when inconsistency  of	 encodings  is
	      found,  abort  reading  and print "data".	 Options except -x¡¤-z
	      are ignored.

EXAMPLES
       % kcc -e file
	      Input encoding are detect automatically, and output  is  in  EUC
	      encoding.

       % kcc -sj file1 file2
	      Two files in Shift JIS concatinated with converting to JIS.

       % command | kcc -k+J
	      output  of  command  are	converted to JIS(JIS JIS X0208 JIS JIS
	      Roman¡¤ESC(I Halfwidth Kana JIS )

       % kcc -c file
	      Encoding of contents of file is detected(no conversion)

BUG
       Auto detection of input encoding is well done for normal case, however,
       it has the following problems.

       7 bit JIS is recognized by escape sequence in certain.  EUC and DEC are
       the same (refered as EUC series).  Halfwidth kana of 8 bit JIS  is  the
       same  as	 halfwidth  kana  of  Shift JIS (refered as Shift JIS series).
       However, EUC series and JIS , which are both 8 bit encoding, are	 shar‐
       ing  the	 same  regions	widely.	  So, the problem in auto detection is
       detection of these 2 encodings.

       Detection of EUC series/Shift JIS series is done in line by line,  When
       it  is  found  that  it's not Shift JIS series, or it's not EUC series,
       encoding is determined.	When inconsistensy found, it will  be  treated
       as "data" and contents of output is not guaranteed.

       While  determined  between  EUC series/Shift JIS series after 8bit code
       found,  conversions are pending and put input data in buffer,  however,
       buffer  is  fulled, it assumes it's EUC series and forces to start con‐
       version. Rationale. Usually, we can assume that	documents  with	 kanji
       include JIS non-kanji or JIS first standard, it can be detected in cer‐
       tain if it is Shift JIS , which does not share region with EUC.	So  if
       it can't be determined, it's very likely to be EUC.

       8  bit  JIS  and it has always even number of halfwidth kana sequences,
       then it will be wrongly detected as EUC kanji. Be ceraful.

       If input encoding doesn't have halfwidth kana, use -z and  accuracy  of
       detection  become  much	better.	  This	is  because  shared region are
       restricted to area of JIS second standards.

       Extended region of Shift JIS user-defined area of EUC, control  charac‐
       ters  C1	 of  EUC, undefined region of halfwidth kana of EUC are out of
       range of auto detection, so it will fails to detect encodings if	 input
       has these characters.  Use -x option to specify extended mode, or spec‐
       ify input code.

SEE ALSO
       cat(1)

NOTES
       Usually, user-defined  characters,  extended  characters,  supplimental
       kanji  characters  are  mapped respectively. However characters that is
       out of range of extended characters become  FCFC	 in  hexadecimal  when
       converted  to  Shift  JIS.  Although control character region C1 of EUC
       and DEC remains when converted to JIS , these will be deleted when con‐
       verted  to  Shift JIS Undefined area of halfwidth kana become halfwidth
       centered dot when convered to Shift JIS Halfwidth kana become fullwidth
       kana when converted to DEC.

       When  output  is JIS encoding, control characters such as newline, TAB,
       DEL and white space (halfwidth) will be output in ASCII mode.

       When encoding of input is detected wrongly, or input undefined  charac‐
       ter for expected character sets, output is indefined.

       This  manual  are  translated by Fumitoshi UKAI <ukai@debian.or.jp> for
       Debian system, but you can use it for any purpose.

Y. Tonooka		       November 19, 1992			KCC(L)
[top]

List of man pages available for YellowDog

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net