apertium(1)apertium(1)NAMEapertium - This application is part of ( apertium )
This tool is part of the apertium machine translation architecture:
http://apertium.sf.net.
SYNOPSISapertium [-d datadir] [-f format] [-u] [-a] {language-pair} [infile
[outfile]]
DESCRIPTIONapertium is the application that most people will be using as it sim‐
plifies the use of apertium/lt-toolbox tools for machine translation
purposes.
This tool tries to ease the use of lt-toolbox (which contains all the
lexical processing modules and tools) and apertium (which contains the
rest of the engine) by providing a unique front-end to the end-user.
The different modules behind the apertium machine translation architec‐
ture are in order:
· de-formatter: Separates the text to be translated from the
format information.
· morphological-analyser: Tokenizes the text in surface forms.
· part-of-speech tagger: Chooses one surface forms among homo‐
graphs.
· lexical transfer module: Reads each source-language lexical
form and delivers a corresponding target-language lexical form.
· structural transfer module: Detects fixed-length patterns of
lexical forms (chunks or phrases) needing special processing due
to grammatical divergences between the two languages and per‐
forms the corresponding transformations.
· morphological generator: Delivers a target-language surface
form for each target-language lexical form, by suitably inflect‐
ing it.
· post-generator: Performs orthographical operations such as
contractions and apostrophations.
· re-formatter: Restores the format information encapsulated by
the de-formatter into the translated text and removes the encap‐
sulation sequences used to protect certain characters in the
source text.
OPTIONS-d datadir The directory holding the linguistic data. By default it
will used the expected installation path.
language-pair The language pair: LANG1-LANG2 (for instance es-ca or ca-
es).
-f format Specifies the format of the input and output files which can
have these values:
· txt (default value) Input and output files are in text format.
· html Input and output files are in "html" format. This "html"
is the one accepted by the vast majority of web browsers.
· html-noent Input and output files are in "html" format, but
preserving native encoding characters rather than using HTML
text entities.
· rtf Input and output files are in "rtf" format. The accepted
"rtf" is the one generated by Microsoft WordPad (C) and Micro‐
soft Office (C) up to and including Office-97.
-u Disable marking of unknown words with the '*' character.
-a Enable marking of disambiguated words with the '=' character.
FILES
These are the two files that can be used with this command:
-m memory.tmx use a translation memory to recycle translations
-o direction translation direction using the translation memory, by
default 'direction' is used instead
-l lists the available translation directions and exits direction typi‐
cally, LANG1-LANG2, but see modes.xml in language data
infile Input file (stdin by default).
outfile Output file (stdout by default).
SEE ALSOlt-proc(1), lt-comp(1), lt-expand(1), apertium-tagger(1).
BUGS
Lots of...lurking in the dark and waiting for you!
AUTHOR
(c) 2005,2006 Universitat d'Alacant / Universidad de Alicante. All
rights reserved.
2006-03-08 apertium(1)