glimpse man page on Ultrix

Printed from http://www.polarhome.com/service/man/?qf=glimpse&af=0&tf=2&of=Ultrix

GLIMPSE(l)							    GLIMPSE(l)

NAME
       glimpse 4.1 - search quickly through entire file systems

OVERVIEW
       Glimpse	(which	stands	for  GLobal IMPlicit SEarch) is a very popular
       UNIX indexing and query system that allows  you	to  search  through  a
       large  set  of  files  very  quickly.  Glimpse supports most of agrep's
       options (agrep is our powerful version of grep)	including  approximate
       matching	 (e.g.,	 finding  misspelled words), Boolean queries, and even
       some limited forms of regular expressions.  It is used in the same way,
       except that you don't have to specify file names.  So, if you are look‐
       ing for a needle anywhere in your file system, all you have  to	do  is
       say glimpse needle and all lines containing needle will appear preceded
       by the file name.

       To use glimpse you first need to index your  files  with	 glimpseindex.
       For  example, glimpseindex -o ~	will index everything at or below your
       home directory.	See man glimpseindex for more details.

       Glimpse is also available for web sites,	 as  a	set  of	 tools	called
       WebGlimpse.   (The  old	glimpseHTTP  is no longer supported and is not
       recommended.)  See http://glimpse.cs.arizona.edu/webglimpse/  for  more
       information.

       Glimpse	includes all of agrep and can be used instead of agrep by giv‐
       ing a file name(s) at the end of the command.  This will cause  glimpse
       to  ignore  the	index and run agrep as usual.  For example, glimpse -1
       pattern file is the same as agrep -1 pattern file.  Agrep  is  distrib‐
       uted  as a self-contained package within glimpse, and can be used sepa‐
       rately.	We added a new option to agrep:	 -r searches  recursively  the
       directory and everything below it (see agrep options below); it is used
       only when glimpse reverts to agrep.

       Mail glimpse-request@cs.arizona.edu to be added to the glimpse  mailing
       list.   Mail glimpse@cs.arizona.edu to report bugs, ask questions, dis‐
       cuss tricks for using glimpse, etc. (this is a moderated	 mailing  list
       with very little traffic, mostly announcements).	 HTML version of these
       manual pages can	 be  found  in	http://glimpse.cs.arizona.edu/glimpse‐
       help.html  Also,	 see  the glimpse home pages in http://glimpse.cs.ari‐
       zona.edu/

SYNOPSIS
       glimpse - [almost all letters] pattern

INTRODUCTION
       We start with simple ways to use glimpse and describe all  the  options
       in  detail  later  on.	Once  an  index	 is built, using glimpseindex,
       searching for pattern is as easy as saying

       glimpse pattern

       The output of glimpse is similar to that of agrep (or any other	grep).
       The  pattern can be any agrep legal pattern including a regular expres‐
       sion or a Boolean query (e.g., searching for Tucson AND Arizona is done
       by glimpse 'Tucson;Arizona').

       The  speed  of  glimpse	depends	 mainly on the number and sizes of the
       files that contain a match and only to a second	degree	on  the	 total
       size of all indexed files.  If the pattern is reasonably uncommon, then
       all matches will be reported in a few seconds even if the indexed files
       total  500MB or more.  Some information on how glimpse works and a ref‐
       erence to a detailed article are given below.

       Most of agrep (and  other  grep's)  options  are	 supported,  including
       approximate matching.  For example,

       glimpse -1 'Tuson;Arezona'

       will  output  all  lines containing both patterns allowing one spelling
       error in any of the patterns (either insertion, deletion, or  substitu‐
       tion), which in this case is definitely needed.

       glimpse -w -i 'parent'

       specifies  case	insensitive (-i) and match on complete words (-w).  So
       'Parent' and  'PARENT'  will  match,  'parent/child'  will  match,  but
       'parenthesis'  or  'parents' will not match.  (Starting at version 3.0,
       glimpse can be much faster when these two options are specified,	 espe‐
       cially for very large indexes.  You may want to set an alias especially
       for "glimpse -w -i".)

       The -F option provides a pattern that must match the  file  name.   For
       example,

       glimpse -F '\.c$' needle

       will  find  the	pattern	 needle	 in all files whose name ends with .c.
       (Glimpse will first check its index to determine which files  may  con‐
       tain  the pattern and then run agrep on the file names to further limit
       the search.)  The -F option should not be put at the end after the main
       pattern (e.g., "glimpse needle -F hay" is incorrect).

A Detailed Description of All the Options of Glimpse
       -#     # is an integer between 1 and 8 specifying the maximum number of
	      errors permitted in finding the approximate matches (the default
	      is  zero).  Generally, each insertion, deletion, or substitution
	      counts as one error.  It is possible to adjust the relative cost
	      of  insertions,  deletions  and  substitutions (see -I -D and -S
	      options).	 Since the index stores only  lower  case  characters,
	      errors  of substituting upper case with lower case may be missed
	      (see LIMITATIONS).  Allowing errors in the match	requires  more
	      time  and	 can  slow down the match by a factor of 2-4.  Be very
	      careful when specifying more than one error, as  the  number  of
	      matches tend to grow very quickly.

       -a     prints  attribute	 names.	  This	option applies only to Harvest
	      SOIF  structured	data  (used  with  glimpseindex	  -s).	  (See
	      http://harvest.transarc.com  for more information about the Har‐
	      vest project.)

       -A     used for glimpse internals.

       -b     prints the byte offset (from the beginning of the file)  of  the
	      end of each match.  The first character in a file has offset 0.

       -B     Best  match mode.	 (Warning: -B sometimes misses matches.	 It is
	      safer to specify the number of errors explicitly.)  When	-B  is
	      specified	 and no exact matches are found, glimpse will continue
	      to search until the closest matches (i.e., the ones with minimum
	      number  of  errors) are found, at which point the following mes‐
	      sage will be shown: "the best match contains x errors, there are
	      y matches, output them? (y/n)" This message refers to the number
	      of matches found in the index.  There may be many	 more  matches
	      in the actual text (or there may be none if -F is used to filter
	      files).  When the -#, -c, or -l options are  specified,  the  -B
	      option  is  ignored.   In general, -B may be slower than -#, but
	      not by very much.	 Since the index stores only lower case	 char‐
	      acters, errors of substituting upper case with lower case may be
	      missed (see LIMITATIONS).

       -c     Display only the count of matching  records.   Only  files  with
	      count > 0 are displayed.

       -C     tells glimpse to send its queries to glimpseserver.

       -d 'delim'
	      Define  delim  to	 be  the  separator  between two records.  The
	      default value is '$', namely a record  is	 by  default  a	 line.
	      delim  can be a string of size at most 8 (with possible use of ^
	      and $), but not a regular expression.  Text between two delim's,
	      before  the  first delim, and after the last delim is considered
	      as one record.  For  example,  -d	 '$$'  defines	paragraphs  as
	      records  and  -d	'^From '  defines  mail	 messages  as records.
	      glimpse matches each record separately.  This  option  does  not
	      currently work with regular expressions.	The -d option is espe‐
	      cially useful for Boolean AND queries, because the patterns need
	      not  appear  in the same line but in the same record.  For exam‐
	      ple, glimpse -F mail -d '^From '	'glimpse;arizona;announcement'
	      will  output all mail messages (in their entirety) that have the
	      3 patterns anywhere in the message  (or  the  header),  assuming
	      that  files with 'mail' in their name contain mail messages.  If
	      you want the scope of the record to be the whole file,  use  the
	      -W  option.  Glimpse warning: Use this option with care.	If the
	      delimiter is set	to  match  mail	 messages,  for	 example,  and
	      glimpse finds the pattern in a regular file, it may not find the
	      delimiter and will therefore output the  whole  file.   (The  -t
	      option  - see below - can be used to put the delim at the end of
	      the record.)  Performance Note: Agrep (and glimpse)  resorts  to
	      more  complex  search when the -d option is used.	 The search is
	      slower and unfortunately no more than 32 characters can be  used
	      in the pattern.

       -Dk    Set the cost of a deletion to k (k is a positive integer).  This
	      option does not currently work with regular expressions.

       -e pattern
	      Same as a simple pattern argument, but useful when  the  pattern
	      begins with a `-'.

       -E     prints  the  lines  in  the  index (as they appear in the index)
	      which match the pattern.	Used mostly for debugging and  mainte‐
	      nance  of the index.  This is not an option that a user needs to
	      know about.

       -f file_name
	      this option has a different meaning for agrep than for  glimpse:
	      In  glimpse,  only the files whose names are listed in file_name
	      are  matched.   (The  file  names	  have	 to   appear   as   in
	      .glimpse_filenames.)   In agrep, the file_name contains the list
	      of the patterns that are searched.  (Starting  at	 version  3.6,
	      this option for glimpse is much faster for large files.)

       -F file_pattern
	      limits the search to those files whose name (including the whole
	      path) matches file_pattern.  This option can be used in a	 vari‐
	      ety of applications to provide limited search even for one large
	      index.  If file_pattern matches a directory, then all files with
	      this  directory  on their path will be considered.  To limit the
	      search to actual file names, use $ at the end  of	 the  pattern.
	      file_pattern can be a regular expression and even a Boolean pat‐
	      tern.  This option is implemented by running agrep  file_pattern
	      on  the  list of file names obtained from the index.  Therefore,
	      searching the index itself takes the same amount	of  time,  but
	      limiting	the second phase of the search to only a few files can
	      speed up the search significantly.  For example,

	      glimpse -F 'src#\.c$' needle

	      will search for needle in all .c files with src somewhere	 along
	      the  path.   The	-F  file_pattern must appear before the search
	      pattern (e.g., glimpse needle -F '\.c$' will not work).	It  is
	      possible	to  use	 some  of  agrep's  options when matching file
	      names.  In this case all options as  well	 as  the  file_pattern
	      should  be  in quotes.  (-B and -v do not work very well as part
	      of a file_pattern.)  For example,

	      glimpse -F '-1 \.html' pattern

	      will allow one spelling error when matching .html	 to  the  file
	      names (so ".htm" and ".shtml" will match as well).

	      glimpse -F '-v \.c$' counter

	      will search for 'counter' in all files except for .c files.

       -g     prints  the  file number (its position in the .glimpse_filenames
	      file) rather than its name.

       -G     Output the (whole) files that contain a match.

       -h     Do not display filenames.

       -H directory_name
	      searches for the index and the other .glimpse  files  in	direc‐
	      tory_name.   The	default is the home directory.	This option is
	      useful, for example, if several different indexes are maintained
	      for  different  archives	(e.g.,	one for mail messages, one for
	      source code, one for articles).

       -i     Case-insensitive search —	 e.g.,	"A"  and  "a"  are  considered
	      equivalent.   Glimpse's  index stores all patterns in lower case
	      (see LIMITATIONS below).	Performance  Note:  When  -i  is  used
	      together	with the -w option, the search may become much faster.
	      It is recommended to have -i and -w as  defaults,	 for  example,
	      through an alias.	 We use the following alias in our .cshrc file
	      alias glwi 'glimpse -w -i'

       -Ik    Set  the	cost  of  an insertion to k (k is a positive integer).
	      This option does not currently work with regular expressions.

       -j     If the index was constructed with the -t option,	then  -j  will
	      output  the  files last modification dates in addition to every‐
	      thing else.  There are no major performance penalties  for  this
	      option.

       -J host_name
	      used  in	conjunction  with glimpseserver (-C) to connect to one
	      particular server.

       -k     No symbol in the pattern is treated as a	meta  character.   For
	      example,	glimpse	 -k  'a(b|c)*d'	 will  find the occurrences of
	      a(b|c)*d whereas glimpse 'a(b|c)*d' will	find  substrings  that
	      match the regular expression 'a(b|c)*d'.	(The only exception is
	      ^ at the beginning of the pattern and $ at the end of  the  pat‐
	      tern,  which  are still interpreted in the usual way.  Use \^ or
	      \$ if you need them verbatim.)

       -K port_number
	      used in conjunction with glimpseserver (-C) to  connect  to  one
	      particular server at the specified TCP port number.

       -l     Output  only  the files names that contain a match.  This option
	      differs from the -N option in  that  the	files  themselves  are
	      searched, but the matching lines are not shown.

       -L x | x:y | x:y:z
	      if  one  number  is  given, it is a limit on the total number of
	      matches.	Glimpse outputs only the first x matches.   If	-l  is
	      used  (i.e.,  only  file names are sought), then the limit is on
	      the number of files; otherwise, the limit is on  the  number  of
	      records.	 If  two  numbers  are given (x:y), then y is an added
	      limit on the total number of files.  If three numbers are	 given
	      (x:y:z),	then  z is an added limit on the number of matches per
	      file.  If any of the x, y, or z is set to 0, it means to	ignore
	      it  (in other words 0 = infinity in this case);  for example, -L
	      0:10 will output all matches to the first 10 files that  contain
	      a	 match.	  This	option is particularly useful for servers that
	      needs to limit the amount of output provided to clients.

       -m     used for glimpse internals.

       -M     used for glimpse internals.

       -n     Each matching record (line) is prefixed  by  its	record	(line)
	      number   in   the	  file.	  Performance  Note:  To  compute  the
	      record/line number, agrep needs to search for all record	delim‐
	      iters (or line breaks), which can slow down the search.

       -N     searches	only the index (so the search is faster).  If -o or -b
	      are used then the result is the number  of  files	 that  have  a
	      potential match plus a prompt to ask if you want to see the file
	      names.  (If -y is used, then there is no prompt and the names of
	      the files will be shown.)	 This could be a way to get the match‐
	      ing file names without even having access	 to  the  files	 them‐
	      selves.	However,  because  only	 the  index  is searched, some
	      potential matches may not be real matches.  In other words, with
	      -N  you will not miss any file but you may get extra files.  For
	      example, since the index stores  everything  in  lower  case,  a
	      case-sensitive  query  may  match	 a  file that has only a case-
	      insensitive match.  Boolean queries may match a  file  that  has
	      all  the	keywords  but  not  in the same line (indexing with -b
	      allows glimpse to figure out whether the keywords are close, but
	      it  cannot figure out from the index whether they are exactly on
	      the same line or in the  same  record  without  looking  at  the
	      file).   If  the	index  was  not build with -o or -b, then this
	      option outputs the number of blocks matching the pattern.	  This
	      is  useful  as  an  indication of how long the search will take.
	      All files are partitioned into usually 200-250 blocks.  The file
	      .glimpse_statistics  contains  the  total	 number	 of blocks (or
	      glimpse -N a will give a pretty good estimate; only blocks  with
	      no occurrences of 'a' will be missed).

       -o     the opposite of -t: the delimiter is not output at the tail, but
	      at the beginning of the matched record.

       -O     the file names are not  printed  before  every  matched  record;
	      instead, each filename is printed just once, and all the matched
	      records within it are printed after it.

       -p     (from version 4.0B1 only) Supports  reading  compressed  set  of
	      filenames.   The	-p  option  allows  you	 to utilize compressed
	      `neighborhoods' (sets of filenames) to limit your search,	 with‐
	      out uncompressing them.  Added mostly for WebGlimpse.  The usage
	      is:
	      "-p filename:X:Y:Z" where "filename" is the file with compressed
	      neighborhoods, X is an offset into that file (usually 0, must be
	      a multiple of sizeof(int)), Y is the length glimpse must	access
	      from  that  file	(if  0, then whole file; must be a multiple of
	      sizeof(int)), and Z must be 2 (it indicates that "filename"  has
	      the  sparse-set  representation of compressed neighborhoods: the
	      other values are for internal use only). Note that any colon ":"
	      in filename must be escaped using a backslash .

       -P     used for glimpse internals.

       -q     prints  the  offsets  of	the  beginning and end of each matched
	      record.  The difference between -q and -b is that -b prints  the
	      offsets  of  the actual matched string, while -q prints the off‐
	      sets of the whole record where the match occurred.   The	output
	      format  is  @x{y},  where x is the beginning offset and y is the
	      end offset.

       -Q     when used together with -N glimpse not only displays  the	 file‐
	      name where the match occurs, but the exact occurrences (offsets)
	      as seen in the index.  This option is relevant only if the index
	      was  built with -b;  otherwise, the offsets are not available in
	      the index.  This option is ignored when used not with -N.

       -r     This option is an	 agrep	option	and  it	 will  be  ignored  in
	      glimpse,	unless	glimpse	 is  used  with a file name at the end
	      which makes it run as agrep.  If the file name  is  a  directory
	      name,  the  -r option will search (recursively) the whole direc‐
	      tory and everything below it.  (The glimpse index	 will  not  be
	      used.)

       -R k   defines  the  maximum  size (in bytes) of a record.  The maximum
	      value (which is the default) is 48K.  Defining the maximum to be
	      lower than the deafult may speed up some searches.

       -s     Work  silently,  that is, display nothing except error messages.
	      This is useful for checking the error status.

       -Sk    Set the cost of a substitution to k (k is a  positive  integer).
	      This option does not currently work with regular expressions.

       -t     Similar  to  the -d option, except that the delimiter is assumed
	      to appear at the end of the record.   Glimpse  will  output  the
	      record  starting	from  the  end of delim to (and including) the
	      next delim.  (See warning for the -d option.)

       -T directory
	      Use directory as	a  place  where	 temporary  files  are	built.
	      (Glimpse	produces  some small temporary files usually in /tmp.)
	      This option is  useful  mainly  in  the  context	of  structured
	      queries  for  the Harvest project, where the temporary files may
	      be non-trivial, and the /tmp directory may not have enough space
	      for them.

       -U     (starting at version 4.0B1) Interprets an index created with the
	      -X  or  the  -U  option  in  glimpseindex.   Useful  mostly  for
	      WebGlimpse  or  similar  web applications.  When glimpse outputs
	      matches, it will display the filename, the URL,  and  the	 title
	      automatically.

       -v     (This  option  is	 an  agrep  option  and	 it will be ignored in
	      glimpse, unless glimpse is used with a  file  name  at  the  end
	      which  makes it run as agrep.)  Output all records/lines that do
	      not contain a match.  (Glimpse does not support the NOT operator
	      yet.)

       -V     prints the current version of glimpse.

       -w     Search  for  the	pattern	 as  a word — i.e., surrounded by non-
	      alphanumeric characters.	For example, glimpse -w car will match
	      car,  but	 not  characters  and not car10.  The non-alphanumeric
	      must surround the match;	they  cannot  be  counted  as  errors.
	      This option does not work with regular expressions.  Performance
	      Note: When -w is used together with the -i  option,  the	search
	      may  become  much faster.	 The -w will not work with $, ^, and _
	      (see BUGS below).	 It is	recommended  to	 have  -i  and	-w  as
	      defaults,	 for  example, through an alias.  We use the following
	      alias in our .cshrc file
	      alias glwi 'glimpse -w -i'

       -W     The default for Boolean AND  queries  is	that  they  cover  one
	      record  (the  default  for a record is one line) at a time.  For
	      example, glimpse 'good;bad' will	output	all  lines  containing
	      both 'good' and 'bad'.  The -W option changes the scope of Bool‐
	      eans to be the whole file.  Within a file	 glimpse  will	output
	      all  matches  to any of the patterns.  So, glimpse -W 'good;bad'
	      will output all lines containing 'good' or 'bad',	 but  only  in
	      files  that  contain both patterns.  The NOT operator '~' can be
	      used only with -W.  It is described later on.  The  OR  operator
	      is  essentially unaffected (unless it is in combination with the
	      other Boolean operations).  For structured queries, the scope is
	      always the whole attribute or file.

       -x     The  pattern  must match the whole line.	(This option is trans‐
	      lated to -w when the index is searched and it is used only  when
	      the actual text is searched.  It is of limited use in glimpse.)

       -X     (from version 4.0B1 only) Output the names of files that contain
	      a match even if these files have been deleted  since  the	 index
	      was built.  Without this option glimpse will simply ignore these
	      files.

       -y     Do not prompt.  Proceed with the match as if the answer  to  any
	      prompt  is y.  Servers (or any other scripts) using glimpse will
	      probably want to use this option.

       -Y k   If the index was constructed with the -t option, then -Y x  will
	      output  only  matches  to	 files	that  were created or modified
	      within the last x days.  There are no major  performance	penal‐
	      ties for this option.

       -z     Allow customizable filtering, using the file .glimpse_filters to
	      perform the programs listed there	 for  each  match.   The  best
	      example is compress/decompress.  If .glimpse_filters include the
	      line
	      *.Z   uncompress <
	      (separated by tabs) then before indexing any file	 that  matches
	      the  pattern "*.Z" (same syntax as the one for .glimpse_exclude)
	      the command listed is executed first  (assuming  input  is  from
	      stdin, which is why uncompress needs <) and its output (assuming
	      it goes to stdout) is indexed.  The file itself is  not  changed
	      (i.e.,  it  stays	 compressed).  Then if glimpse -z is used, the
	      same program is used on these files on the fly.  Any program can
	      be  used (we run 'exec').	 For example, one can filter out parts
	      of files that should not	be  indexed.   Glimpseindex  tries  to
	      apply  all  filters  in  .glimpse_filters	 in the order they are
	      given.  For example, if you want to uncompress a file  and  then
	      extract  some part of it, put the compression command (the exam‐
	      ple above) first	and  then  another  line  that	specifies  the
	      extraction.  Note that this can slow down the search because the
	      filters need to be run before files  are	searched.   (See  also
	      glimpseindex.)

       -Z     No op.  (It's useful for glimpse's internals. Trust us.)

       The  characters	`$',  `^', `∗', `[', `]', `^', `|', `(', `)', `!', and
       `\' can cause unexpected results when included in the pattern, as these
       characters  are also meaningful to the shell.  To avoid these problems,
       enclose the entire pattern in single quotes, i.e., 'pattern'.   Do  not
       use double quotes (").

PATTERNS
       glimpse supports a large variety of patterns, including simple strings,
       strings with classes of characters, sets of strings,  wild  cards,  and
       regular expressions (see LIMITATIONS).

       Strings
	      Strings  are  any	 sequence of characters, including the special
	      symbols `^' for beginning of line and `$' for end of line.   The
	      following	 special  characters  (	 `$', `^', `∗', `[', `^', `|',
	      `(', `)', `!', and `\' ) as well as the following	 meta  charac‐
	      ters  special  to	 glimpse (and agrep): `;', `,', `#', `<', `>',
	      `-', and `.', should be preceded	by  `\'	 if  they  are	to  be
	      matched as regular characters.  For example, \^abc\\ corresponds
	      to the string ^abc\, whereas ^abc corresponds to the string  abc
	      at the beginning of a line.

       Classes of characters
	      a	 list  of  characters  inside [] (in order) corresponds to any
	      character from the list.	For example, [a-ho-z] is any character
	      between  a  and  h or between o and z.  The symbol `^' inside []
	      complements the list.  For example, [^i-n] denote any  character
	      in  the  character  set except character 'i' to 'n'.  The symbol
	      `^' thus has two meanings, but this is  consistent  with	egrep.
	      The  symbol  `.'	(don't care) stands for any symbol (except for
	      the newline symbol).

       Boolean operations
	      Glimpse supports an `AND' operation denoted by the symbol `;' an
	      `OR' operation denoted by the symbol `,', a limited version of a
	      'NOT' operation (starting at version 4.0B1) denoted by the  sym‐
	      bol   `~',   or	any   combination.    For   example,   glimpse
	      'pizza;cheeseburger' will output all lines containing both  pat‐
	      terns.   glimpse	-F 'gnu;\.c$' 'define;DEFAULT' will output all
	      lines containing both 'define' and 'DEFAULT'  (anywhere  in  the
	      line,  not  necessarily  in  order) in files whose name contains
	      'gnu' and ends with .c.  glimpse	'{political,computer};science'
	      will  match  'political science' or 'science of computers'.  The
	      NOT operation works only together with the -W option and	it  is
	      generally	 applies  only	to the whole file rather to individual
	      records.	Its output may sometimes seem  counterintuitive.   Use
	      with  care.  glimpse -W 'fame;~glory' will output all lines con‐
	      taining 'fame' in all files that contain 'fame' but do not  con‐
	      tain  'glory';  This  is the most common use of NOT, and in this
	      case it works as expected.  glimpse -W '~{fame;glory}'  will  be
	      limited to files that do not contain both words, and will output
	      all lines containing one of them.

       Wild cards
	      The symbol '#' is used  to  denote  a  sequence  of  any	number
	      (including  0)  of  arbitrary characters (see LIMITATIONS).  The
	      symbol # is equivalent to .* in egrep.  In fact,	.*  will  work
	      too,  because  it is a valid regular expression (see below), but
	      unless this is part of an actual regular expression, # will work
	      faster.	(Currently  glimpse is experiencing some problems with
	      #.)

       Combination of exact and approximate matching
	      Any pattern inside angle brackets <> must match the text exactly
	      even  if	the  match is with errors.  For example, <mathemat>ics
	      matches mathematical with one error (replacing the last  s  with
	      an  a),  but mathe<matics> does not match mathematical no matter
	      how many errors are allowed.   (This  option  is	buggy  at  the
	      moment.)

       Regular expressions
	      Since  the  index is word based, a regular expression must match
	      words that appear in the index for glimpse to find it.   Glimpse
	      first  strips  the  regular  expression  from all non-alphabetic
	      characters, and searches the index for all remaining words.   It
	      then  applies  the  regular expression matching algorithm to the
	      files found in the index.	 For example, glimpse 'abc.*xyz'  will
	      search  the  index  for  all  files  that contain both 'abc' and
	      'xyz', and then search directly for 'abc.*xyz' in	 those	files.
	      (If  you	use  glimpse  -w 'abc.*xyz', then 'abcxyz' will not be
	      found, because glimpse will think that abc and xyz  need	to  be
	      matches  to  whole words.)  The syntax of regular expressions in
	      glimpse is in general the same as that  for  agrep.   The	 union
	      operation	 `|',  Kleene  closure `*', and parentheses () are all
	      supported.  Currently '+' is not supported.  Regular expressions
	      are  currently limited to approximately 30 characters (generally
	      excluding meta characters).  Some options (-d, -w, -t,  -x,  -D,
	      -I,  -S)	do  not	 currently work with regular expressions.  The
	      maximal number of errors for regular expressions that use '*' or
	      '|' is 4. (See LIMITATIONS.)

       structured queries
	      Glimpse supports some form of structured queries using Harvest's
	      SOIF format.  See STRUCTURED QUERIES below for details.

EXAMPLES
       (Run "glimpse '^glimpse' this-file" to get a list of all examples, some
       of which were given earlier.)

       glimpse -F 'haystack.h$' needle
	      finds all needles in all haystack.h's files.

       glimpse -2 -F html Anestesiology
	      outputs  all  occurrences	 of  Anestesiology  with two errors in
	      files with html somewhere in their full name.

       glimpse -l -F '\.c$' variablename
	      lists the names of all .c files that contain  variablename  (the
	      -l  option  lists	 file  names  rather  than  output the matched
	      lines).

       glimpse -F 'mail;1993' 'windsurfing;Arizona'
	      finds all lines containing windsurfing and Arizona in all	 files
	      having `mail' and '1993' somewhere in their full name.

       glimpse -F mail 't.j@#uk'
	      finds  all mail addresses (search only files with mail somewhere
	      in their name) from the uk, where the login name ends with  t.j,
	      where  the . stands for any one character.  (This is very useful
	      to find a login name of someone  whose  middle  name  you	 don't
	      know.)

       glimpse -F mbox -h -G  . > MBOX
	      concatenates  all	 files	whose name matches `mbox' into one big
	      one.

SEARCHING IN COMPRESSED FILES
       Glimpse includes an optional  new  compression  program,	 called	 cast,
       which allows glimpse (and agrep) to search the compressed files without
       having to decompress them.  The search is actually significantly faster
       when  the  files	 are  compressed.  However, we have not tested cast as
       thoroughly as we would have liked, and a mishap in a compression	 algo‐
       rithm can cause loss of data, so we recommend at this point to use cast
       very carefully.	We do not  support  or	maintain  cast.	  (Unless  you
       specifically use cast, the default is to ignore it.)

GLIMPSEINDEX FILES
       All  files  used by glimpse are located at the directory(ies) where the
       index(es) is (are) stored and have .glimpse_ as a  prefix.   The	 first
       two  files  (.glimpse_exclude and .glimpse_include) are optionally sup‐
       plied by the user.  The other files are built and read by glimpse.

       .glimpse_exclude
	      contains a list of files that glimpseindex is explicitly told to
	      ignore.	In  general, the syntax of .glimpse_exclude/include is
	      the same as that of agrep (or any other grep).  The lines in the
	      .glimpse_exclude file are matched to the file names, and if they
	      match, the files are excluded.  Notice  that  agrep  matches  to
	      parts   of   the	 string!   e.g.,  agrep	 /ftp/pub  will	 match
	      /home/ftp/pub and /ftp/pub/whatever.  So, if you want to exclude
	      /ftp/pub/core,  you just list it, as is, in the .glimpse_exclude
	      file.  If you  put  "/home/ftp/pub/cdrom"	 in  .glimpse_exclude,
	      every file name that matches that string will be excluded, mean‐
	      ing all files below it.  You can use ^ to indicate the beginning
	      of  a  file  name, and $ to indicate the end of one, and you can
	      use * and ? in the  usual	 way.	For  example  /ftp/*html  will
	      exclude	  /ftp/pub/foo.html,	but    will    also    exclude
	      /home/ftp/pub/html/whatever;  if you want to exclude files  that
	      start  with  /ftp	 and end with html use ^/ftp*html$ Notice that
	      putting a * at the beginning or at  the  end  is	redundant  (in
	      fact,  in	 this case glimpseindex will remove the * when it does
	      the  indexing).	No  other  meta	 characters  are  allowed   in
	      .glimpse_exclude	(e.g.,	don't use .* or # or |).  Lines with *
	      or ? must	 have  no  more	 than  30  characters.	 Notice	 that,
	      although	the index itself will not be indexed, the list of file
	      names (.glimpse_filenames) will be indexed unless it is  explic‐
	      itly listed in .glimpse_exclude.

       .glimpse_filters
	      See the description above for the -z option.

       .glimpse_include
	      contains a list of files that glimpseindex is explicitly told to
	      include in the index even though they  may  look	like  non-text
	      files.  Symbolic links are followed by glimpseindex only if they
	      are  specifically	 included  here.   If  a  file	is   in	  both
	      .glimpse_exclude and .glimpse_include it will be excluded.

       .glimpse_filenames
	      contains the list of all indexed file names, one per line.  This
	      is an ASCII file that can also be used with agrep to search  for
	      a file name leading to a fast find command.  For example,
	      glimpse 'count#\.c$' ~/.glimpse_filenames
	      will  output  the	 names	of  all	 (indexed)  .c files that have
	      'count' in their name (including anywhere on the path  from  the
	      index).	Setting	 the following alias in the .login file may be
	      useful:
	      alias findfile 'glimpse -h :1 ~/.glimpse_filenames'

       .glimpse_index
	      contains the index.  The index consists of lines, each  starting
	      with  a  word followed by a list of block numbers (unless the -o
	      or -b options are used, in which case each word is  followed  by
	      an  offset  into the file .glimpse_partitions where all pointers
	      are kept).  The block/file numbers are stored in binary form, so
	      this is not an ASCII file.

       .glimpse_messages
	      contains the output of the -w option (see above).

       .glimpse_partitions
	      contains	the  partition	of  the indexed space into blocks and,
	      when the index is built with the -o or -b options, some part  of
	      the  index.  This file is used internally by glimpse and it is a
	      non-ASCII file.

       .glimpse_statistics
	      contains some statistics about the makeup of the index.	Useful
	      for some advanced applications and customization of glimpse.

       .glimpse_turbo
	      An  added data structure (used under glimpseindex -o or -b only)
	      that helps to speed up queries significantly for large  indexes.
	      Its size is 0.25MB.  Glimpse will work without it if needed.

STRUCTURED QUERIES
       Glimpse	can search for Boolean combinations of "attribute=value" terms
       by using the Harvest SOIF parser library (in glimpse/libtemplate).   To
       search  this  way,  the	index  must  be made by using the -s option of
       glimpseindex (this can be used in conjunction with  other  glimpseindex
       options). For glimpse and glimpseindex to recognize "structured" files,
       they must be in SOIF format. In this format, each value is prefixed  by
       an attribute-name with the size of the value (in bytes) present in "{}"
       after the name of the attribute.	 For example, The following lines  are
       part of an SOIF file:
       type{17}:       Directory-Listing
       md5{32}:	       3858c73d68616df0ed58a44d306b12ba
       Any   string   can   serve   as	 an  attribute	name.	Glimpse	 "pat‐
       tern;type=Directory-Listing" will search for "pattern"  only  in	 files
       whose type is "Directory-Listing".  The file itself is considered to be
       one "object" and its name/url appears as the first  attribute  with  an
       "@"  prefix;  e.g., @FILE { http://xxx... } The scope of Boolean opera‐
       tions changes from records  (lines)  to	whole  files  when  structured
       queries	are  used in glimpse (since individual query terms can look at
       different attributes and they may not be "covered" by the record/line).
       Note  that  glimpse  can only search for patterns in the value parts of
       the SOIF file: there are some attributes (like the TTL, MD5, etc.) that
       are  interpreted	 by  Harvest's	internal  routines.   See  http://har‐
       vest.cs.colorado.edu/harvest/user-manual/ for more detailed information
       of the SOIF format.

REFERENCES
       1.     U.  Manber  and S. Wu, "GLIMPSE: A Tool to Search Through Entire
	      File Systems," Usenix Winter  1994  Technical  Conference	 (best
	      paper  award),  San  Francisco (January 1994), pp. 23-32.	 Also,
	      Technical Report #TR 93-34, Dept. of Computer  Science,  Univer‐
	      sity of Arizona, October 1993 (a postscript file is available by
	      anonymous		  ftp		at	     ftp://ftp.cs.ari‐
	      zona.edu/reports/1993/TR93-34.ps).

       2.     S. Wu and U. Manber, "Fast Text Searching Allowing Errors," Com‐
	      munications of the ACM 35 (October 1992), pp. 83-91.

SEE ALSO
       agrep(1), ed(1),	 ex(1),	 glimpseindex(1),  glimpseserver(1),  grep(1),
       sh(1), csh(1).

LIMITATIONS
       The  index of glimpse is word based.  A pattern that contains more than
       one word cannot be found in the index.  The way glimpse overcomes  this
       weakness	 is  by splitting any multi-word pattern into its set of words
       and looking for all of them in the index.  For example, glimpse 'linear
       programming'  will first consult the index to find all files containing
       both linear and programming, and then apply agrep to find the  combined
       pattern.	 This is usually an effective solution, but it can be slow for
       cases where both words are very common, but their combination is not.

       As was mentioned in the section	on  PATTERNS  above,  some  characters
       serve  as meta characters for glimpse and need to be preceded by '\' to
       search for them.	 The most  common  examples  are  the  characters  '.'
       (which  stands  for  a  wild  card), and '*' (the Kleene closure).  So,
       "glimpse ab.de" will match abcde, but "glimpse ab\.de"  will  not,  and
       "glimpse	 ab*de"	 will not match ab*de, but "glimpse ab\*de" will.  The
       meta character - is translated  automatically  to  a  hypen  unless  it
       appears between [] (in which case it denotes a range of characters).

       The  index  of glimpse stores all patterns in lower case.  When glimpse
       searches the index it first converts all patterns to lower case,	 finds
       the  appropriate	 files,	 and  then searches the actual files using the
       original patterns.  So, for example, glimpse ABCXYZ will first find all
       files  containing  abcxyz  in any combination of lower and upper cases,
       and then searches these files directly, so only the right cases will be
       found.  One problem with this approach is discovering misspellings that
       are caused by wrong cases.  For example, glimpse -B abcXYZ  will	 first
       search  the  index for the best match to abcxyz (because the pattern is
       converted to lower case); it will find that there are matches  with  no
       errors,	and  will go to those files to search them directly, this time
       with the original upper cases.  If the closest match  is,  say  AbcXYZ,
       glimpse may miss it, because it doesn't expect an error.	 Another prob‐
       lem is speed.  If you search for "ATT", it will look at the  index  for
       "att".	Unless you use -w to match the whole word, glimpse may have to
       search all files containing, for example, "Seattle" which has "att"  in
       it.

       There  is  no size limit for simple patterns and simple patterns within
       Boolean	expressions.   More  complicated  patterns,  such  as  regular
       expressions,  are  currently  limited  to  approximately 30 characters.
       Lines are limited to 1024 characters.  Records are limited to 48K,  and
       may  be	truncated  if  they are larger than that.  The limit of record
       length can be changed by modifying the parameter Max_record in agrep.h.

       Glimpseindex does not index words of size > 64.

BUGS
       In some rare cases, regular expressions using * or # may not match cor‐
       rectly.

       A  query	 that  contains	 no alphanumeric characters is not recommended
       (unless glimpse is used as agrep and  the  file	names  are  provided).
       This is an understatement.

       The  notion  of "match to the whole word" (the -w option) can be tricky
       sometimes.  For example, glimpse	 -w  'word$'  will  not	 match	'word'
       appearing at the end of a line, because the extra '$' makes the pattern
       more than just one simple word.	The same thing can happen with	^  and
       with  _.	  To be on the safe side, use the -w option only when the pat‐
       terns are actual words.

       Please send bug reports or comments to glimpse@cs.arizona.edu.

DIAGNOSTICS
       Exit status is 0 if any matches are found, 1  if	 none,	2  for	syntax
       errors or inaccessible files.

AUTHORS
       Udi  Manber and Burra Gopal, Department of Computer Science, University
       of Arizona, and Sun Wu, the National  Chung-Cheng  University,  Taiwan.
       (Email:	glimpse@cs.arizona.edu)

			       November 10, 1997		    GLIMPSE(l)
[top]

List of man pages available for Ultrix

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net