IXBUILD(1)IXBUILD(1)NAMEixbuild - build inverted indexes on file system subtrees
SYNOPSIS
/usr/bin/ixbuild [ -aAcCdfgloprsuv ] [ -Dfile ] [ -Ffile ]
[ -Llanguage ] [ -M# ] [ -Nfile ] [ -P# ] [ -Sfile ] [ -Tfile ]
[ -ystring ] [ path ... ]
DESCRIPTIONixbuild creates or updates indexes for the files or directories named
on the command line. For each directory named on the command line, or
for the current directory by default, ixbuild creates or updates the
associated index. Each index is located in a file named .index.store
at the root of its subtree.
An index is a special kind of file used by the Indexing Kit, called a
store file, which has an IXStoreDirectory containing an IXFileFinder
(named “FileFinder”). The IXFileFinder is responsible for actual
manipulation of the indexes, and is accessible through the
IXStoreDirectory by applications that use the Indexing Kit, whose
documentation is available online in Digital Librarian.
ixbuild makes use of several special files when first creating an
index. The contents of these files are incorporated into the index
itself, so they aren't referenced when an index is updated. However,
if the index is deleted, and rebuilt from scratch, these files will be
used again, so you may not want to delete them. Here are brief
descriptions of the files, their uses, and formats:
.index.ftype contains information about the types of files that will be
included in the index. A file's type is used to determine how tokens
(words) should be extracted from it, or how to convert it to a form
that the Indexing Kit can index. Each line in this file should be of
the form:
typename pattern format offset filename
Each field must be separated from the next by exactly one tab. Any
field may be “-”, in which case the field won't be used. typename is
the name that should be used for the type; for example, “man” or “ps”.
pattern is a sequence of characters within a file that may be used to
identify it (for example, “%!PS”); if pattern begins with a `/', or if
the format is regex (see below) it's interpreted as a regular
expression. format is the data type of pattern; it may be one of byte,
short, long, regex, or string. string is the default format. offset
is the unit offset into the file at which pattern is expected to occur.
The unit is that of formatthat is, if format is long, offset is
measured in amounts of 4 bytes. filename is a filename that should be
matched to the type; it may contain wildcards (for example, “*.rtf”).
This might be the ftype entry for PostScript files, for example:
ps %!PS string 0 -
.index.itype contains the names of types of files (as defined in
.index.ftype) that will not be included in the index. Each type name
should be on a separate line.
.index.iname contains the base names (without paths) of files that will
not be included in the index. The filename must be exact; shell
wildcards are not allowed. Each file name should be on a separate
line.
.index.swords contains stop words, which will not be included in the
index. Each word should be on a separate line, and should be in post-
processed form (that is, if you use case folding, all stop words should
be lowercase, and if you use stem reduction, all words should be stems
only).
.index.domain contains a weighting domain used for peculiarity
weighting (see the IXWeightingDomain and IXAttributeParser class
specifications in the Indexing Kit documentation). You can use the
ixparse(1) command to convert histogram or NEXTSTEP Release 2 WFTable
files to domain format.
OPTIONS
The following options control how an index is built or updated. Using
them with an existing index will alter its configuration (for example,
changing its weighting type); if you want the configuration of an index
to be retained when updating it, specify the -o option.
-- Lists these options.
-a Use absolute weighting. The weight of a token (word) is its
number of occurrences in the files of the directory.
-A Don't fold plural word forms. The default is to do plural
folding.
-c Clean indexes after updating, removing out-of-date
information.
-C Don't fold case to lower case. The default is to fold case.
-d Cross device boundaries (mounted disks, for example).
-Dfile Use the supplied weighting domain file (default
.index.domain). This is used for generating peculiarity
weights.
-f Use frequency weighting (number of occurrences / total
tokens).
-Ffile Use the supplied file type table file (default
.index.ftype).
-g Generate descriptions automatically from file contents.
-l Traverse symbolic links.
-Llanguage Parse files as though they contain text in the language
language. If no language is specified, the system default
language is used.
-M# Use the supplied minimum weight; words below this weight are
dropped from the index.
-Nfile Use the supplied ignored name list file (default
.index.iname)
-o Don't reset options when updating an existing index.
-p Use peculiarity weighting in conjunction with a weighting
domain (see -D).
-P# Use the supplied percentage passed; words below this
percentage are dropped from the index.
-r Reduce words to stems; writer -> write. The default is not
to do this.
-s Build indexes for a static collection (that is, for
directories whose files won't change).
-Sfile Use the supplied stop words file (default .index.swords).
-Tfile Use the supplied ignored type list file (default
.index.itype).
-u Disable automatic updating for index.
-v Generate verbose output.
-ystring Use the supplied punctuation string to delimit words; for
example, -y".,; ".
FILES
.index.store an index file created by ixbuild
.index.ftype file type table
.index.iname ignored file names
.index.itype ignored file types
.index.swords stop words (dropped from index)
.index.domain weighting domain
SEE ALSOixsearch(1), ixparse(1), Indexing Kit Documentation in NEXTSTEP General
Reference
NeXT Computer, Inc. July 14, 1992 IXBUILD(1)