KinoSearch::Analysis::UseraContributed PerlKinoSearch::Analysis::Stopalizer(3)NAMEKinoSearch::Analysis::Stopalizer - Suppress a "stoplist" of common
words.
SYNOPSIS
my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
language => 'fr',
);
my $polyanalyzer = KinoSearch::Analysis::PolyAnalyzer->new(
analyzers => [ $case_folder, $tokenizer, $stopalizer, $stemmer ],
);
This class uses Lingua::StopWords for its default stoplists, so it
supports the same set of languages.
DESCRIPTION
A "stoplist" is collection of "stopwords": words which are common
enough to be of little value when determining search results. For
example, so many documents in English contain "the", "if", and "maybe"
that it may improve both performance and relevance to block them.
Before filtering stopwords:
( "i", "am", "the", "walrus" )
After filtering stopwords:
( "walrus" )
Stopalizer provides default stoplists for several languages, courtesy
of the Snowball project (<http://snowball.tartarus.org>), or you may
supply your own.
CONSTRUCTORS
new( [labeled params] )
my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
language => 'de',
);
# or...
my $stopalizer = KinoSearch::Analysis::Stopalizer->new(
stoplist => \%stoplist,
);
· stoplist - A hash with stopwords as the keys.
· language - The ISO code for a supported language.
INHERITANCEKinoSearch::Analysis::Stopalizer isa KinoSearch::Analysis::Analyzer isa
KinoSearch::Object::Obj.
COPYRIGHT AND LICENSE
Copyright 2005-2010 Marvin Humphrey
This program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
perl v5.14.12011-06-2KinoSearch::Analysis::Stopalizer(3)