Bio::DB::Flat::BinarySearch man page on Fedora

Man page or keyword search:  
man Server   31170 pages
apropos Keyword Search (all sections)
Output format
Fedora logo
[printable version]

Bio::DB::Flat::BinarySUserhContributed Perl DocuBio::DB::Flat::BinarySearch(3)

NAME
       Bio::DB::Flat::BinarySearch - BinarySearch search indexing system for
       sequence files

SYNOPSIS
	 TODO: SYNOPSIS NEEDED!

DESCRIPTION
       This module can be used both to index sequence files and also to
       retrieve sequences from existing sequence files.

       This object allows indexing of sequence files both by a primary key
       (say accession) and multiple secondary keys (say ids).  This is
       different from the Bio::Index::Abstract (see Bio::Index::Abstract)
       which uses DBM files as storage.	 This module uses a binary search to
       retrieve sequences which is more efficient for large datasets.

   Index creation
	   my $sequencefile;  # Some fasta sequence file

       Patterns have to be entered to define where the keys are to be indexed
       and also where the start of each record.	 E.g. for fasta

	   my $start_pattern   = '^>';
	   my $primary_pattern = '^>(\S+)';

       So the start of a record is a line starting with a > and the primary
       key is all characters up to the first space after the >

       A string also has to be entered to defined what the primary key
       (primary_namespace) is called.

       The index can now be created using

	   my $index = Bio::DB::Flat::BinarySearch->new(
		    -directory	       => "/home/max/",
		    -dbname	       => "mydb",
			 -start_pattern	    => $start_pattern,
			 -primary_pattern   => $primary_pattern,
		    -primary_namespace => "ID",
			 -format	    => "fasta" );

	   my @files = ("file1","file2","file3");

	   $index->build_index(@files);

       The index is now ready to use.  For large sequence files the perl way
       of indexing takes a *long* time and a *huge* amount of memory.  For
       indexing things like dbEST I recommend using the DB_File indexer, BDB.

       The formats currently supported by this module are fasta, Swissprot,
       and EMBL.

   Creating indices with secondary keys
       Sometimes just indexing files with one id per entry is not enough.  For
       instance you may want to retrieve sequences from swissprot using their
       accessions as well as their ids.

       To be able to do this when creating your index you need to pass in a
       hash of secondary_patterns which have their namespaces as the keys to
       the hash.

       e.g. For Indexing something like

       ID   1433_CAEEL	   STANDARD;	  PRT;	 248 AA.  AC   P41932; DT
       01-NOV-1995 (Rel. 32, Created) DT   01-NOV-1995 (Rel. 32, Last sequence
       update) DT   15-DEC-1998 (Rel. 37, Last annotation update) DE
       14-3-3-LIKE PROTEIN 1.  GN   FTT-1 OR M117.2.  OS   Caenorhabditis
       elegans.	 OC   Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida;
       Rhabditoidea; OC	  Rhabditidae; Peloderinae; Caenorhabditis.  OX
       NCBI_TaxID=6239; RN   [1]

       where we want to index the accession (P41932) as the primary key and
       the id (1433_CAEEL) as the secondary id.	 The index is created as
       follows

	   my %secondary_patterns;

	   my $start_pattern   = '^ID	(\S+)';
	   my $primary_pattern = '^AC	(\S+)\;';

	   $secondary_patterns{"ID"} = '^ID   (\S+)';

	   my $index = Bio::DB::Flat::BinarySearch->new(
		       -directory	   => $index_directory,
				 -dbname	     => "ppp",
				 -write_flag	     => 1,
		       -verbose		   => 1,
		       -start_pattern	   => $start_pattern,
		       -primary_pattern	   => $primary_pattern,
		       -primary_namespace  => 'AC',
		       -secondary_patterns => \%secondary_patterns);

	   $index->build_index($seqfile);

       Of course having secondary indices makes indexing slower and use more
       memory.

   Index reading
       To fetch sequences using an existing index first of all create your
       sequence object

	   my $index = Bio::DB::Flat::BinarySearch->new(
			 -directory => $index_directory);

       Now you can happily fetch sequences either by the primary key or by the
       secondary keys.

	   my $entry = $index->get_entry_by_id('HBA_HUMAN');

       This returns just a string containing the whole entry.  This is useful
       is you just want to print the sequence to screen or write it to a file.

       Other ways of getting sequences are

	   my $fh = $index->get_stream_by_id('HBA_HUMAN');

       This can then be passed to a seqio object for output or converting into
       objects.

	   my $seq = Bio::SeqIO->new(-fh     => $fh,
						   -format => 'fasta');

       The last way is to retrieve a sequence directly.	 This is the slowest
       way of extracting as the sequence objects need to be made.

	   my $seq = $index->get_Seq_by_id('HBA_HUMAN');

       To access the secondary indices the secondary namespace needs to be
       known

	   $index->secondary_namespaces("ID");

       Then the following call can be used

	   my $seq   = $index->get_Seq_by_secondary('ID','1433_CAEEL');

       These calls are not yet implemented

	   my $fh    = $index->get_stream_by_secondary('ID','1433_CAEEL');
	   my $entry = $index->get_entry_by_secondary('ID','1433_CAEEL');

FEEDBACK
   Mailing Lists
       User feedback is an integral part of the evolution of this and other
       Bioperl modules. Send your comments and suggestions preferably to one
       of the Bioperl mailing lists.  Your participation is much appreciated.

	 bioperl-l@bioperl.org			- General discussion
	 http://bioperl.org/wiki/Mailing_lists	- About the mailing lists

   Support
       Please direct usage questions or support issues to the mailing list:

       bioperl-l@bioperl.org

       rather than to the module maintainer directly. Many experienced and
       reponsive experts will be able look at the problem and quickly address
       it. Please include a thorough description of the problem with code and
       data examples if at all possible.

   Reporting Bugs
       Report bugs to the Bioperl bug tracking system to help us keep track
       the bugs and their resolution.  Bug reports can be submitted via the
       web:

	 http://bugzilla.open-bio.org/

AUTHOR - Michele Clamp
       Email - michele@sanger.ac.uk

CONTRIBUTORS
       Jason Stajich, jason@bioperl.org

APPENDIX
       The rest of the documentation details each of the object methods.
       Internal methods are usually preceded with an "_" (underscore).

   new
	Title	: new
	Usage	: For reading
		    my $index = Bio::DB::Flat::BinarySearch->new(
			    -directory => '/Users/michele/indices/dbest',
			    -dbname    => 'mydb',
			    -format    => 'fasta');

		  For writing

		    my %secondary_patterns = {"ACC" => "^>\\S+ +(\\S+)"}
		    my $index = Bio::DB::Flat::BinarySearch->new(
			    -directory		=> '/Users/michele/indices',
			    -dbname		=> 'mydb',
			    -primary_pattern	=> "^>(\\S+)",
			    -secondary_patterns => \%secondary_patterns,
			    -primary_namespace	=> "ID");

		    my @files = ('file1','file2','file3');

		    $index->build_index(@files);

	Function: create a new Bio::DB::Flat::BinarySearch object
	Returns : new Bio::DB::Flat::BinarySearch
	Args	: -directory	      Root directory for index files
		  -dbname	      Name of subdirectory containing indices
				      for named database
		  -write_flag	      Allow building index
		  -primary_pattern    Regexp defining the primary id
		  -secondary_patterns A hash ref containing the secondary
				      patterns with the namespaces as keys
		  -primary_namespace  A string defining what the primary key
				      is

	Status	: Public

   get_Seq_by_id
	Title	: get_Seq_by_id
	Usage	: $obj->get_Seq_by_id($newval)
	Function:
	Example :
	Returns : value of get_Seq_by_id
	Args	: newvalue (optional)

   get_entry_by_id
	Title	: get_entry_by_id
	Usage	: $obj->get_entry_by_id($newval)
	Function: Get a Bio::SeqI object for a unique ID
	Returns : Bio::SeqI
	Args	: string

   get_stream_by_id
	Title	: get_stream_by_id
	Usage	: $obj->get_stream_by_id($id)
	Function: Gets a Sequence stream for an id
	Returns : Bio::SeqIO stream
	Args	: Id to lookup by

   get_Seq_by_acc
	Title	: get_Seq_by_acc
	Usage	: $obj->get_Seq_by_acc($acc)
	Function: Gets a Bio::SeqI object by accession number
	Returns : Bio::SeqI object
	Args	: string representing accession number

   get_Seq_by_version
	Title	: get_Seq_by_version
	Usage	: $obj->get_Seq_by_version($version)
	Function: Gets a Bio::SeqI object by accession.version number
	Returns : Bio::SeqI object
	Args	: string representing accession.version number

   get_Seq_by_secondary
	Title	: get_Seq_by_secondary
	Usage	: $obj->get_Seq_by_secondary($namespace,$acc)
	Function: Gets a Bio::SeqI object looking up secondary accessions
	Returns : Bio::SeqI object
	Args	: namespace name to check secondary namespace and an id

   read_header
	Title	: read_header
	Usage	: $obj->read_header($fhl)
	Function: Reads the header from the db file
	Returns : width of a record
	Args	: filehandle

   read_record
	Title	: read_record
	Usage	: $obj->read_record($fh,$pos,$len)
	Function: Reads a record from a filehandle
	Returns : String
	Args	: filehandle, offset, and length

   get_all_primary_ids
	Title	: get_all_primary_ids
	Usage	: @ids = $seqdb->get_all_primary_ids()
	Function: gives an array of all the primary_ids of the
		  sequence objects in the database.
	Returns : an array of strings
	Args	: none

   find_entry
	Title	: find_entry
	Usage	: $obj->find_entry($fh,$start,$end,$id,$recsize)
	Function: Extract an entry based on the start,end,id and record size
	Returns : string
	Args	: filehandle, start, end, id, recordsize

   build_index
	Title	: build_index
	Usage	: $obj->build_index(@files)
	Function: Build the index based on a set of files
	Returns : count of the number of entries
	Args	: List of filenames

   _index_file
	Title	: _index_file
	Usage	: $obj->_index_file($newval)
	Function:
	Example :
	Returns : value of _index_file
	Args	: newvalue (optional)

   write_primary_index
	Title	: write_primary_index
	Usage	: $obj->write_primary_index($newval)
	Function:
	Example :
	Returns : value of write_primary_index
	Args	: newvalue (optional)

   write_secondary_indices
	Title	: write_secondary_indices
	Usage	: $obj->write_secondary_indices($newval)
	Function:
	Example :
	Returns : value of write_secondary_indices
	Args	: newvalue (optional)

   new_secondary_filehandle
	Title	: new_secondary_filehandle
	Usage	: $obj->new_secondary_filehandle($newval)
	Function:
	Example :
	Returns : value of new_secondary_filehandle
	Args	: newvalue (optional)

   open_secondary_index
	Title	: open_secondary_index
	Usage	: $obj->open_secondary_index($newval)
	Function:
	Example :
	Returns : value of open_secondary_index
	Args	: newvalue (optional)

   _add_id_position
	Title	: _add_id_position
	Usage	: $obj->_add_id_position($newval)
	Function:
	Example :
	Returns : value of _add_id_position
	Args	: newvalue (optional)

   make_config_file
	Title	: make_config_file
	Usage	: $obj->make_config_file($newval)
	Function:
	Example :
	Returns : value of make_config_file
	Args	: newvalue (optional)

   read_config_file
	Title	: read_config_file
	Usage	: $obj->read_config_file($newval)
	Function:
	Example :
	Returns : value of read_config_file
	Args	: newvalue (optional)

   get_fileid_by_filename
	Title	: get_fileid_by_filename
	Usage	: $obj->get_fileid_by_filename($newval)
	Function:
	Example :
	Returns : value of get_fileid_by_filename
	Args	: newvalue (optional)

   get_filehandle_by_fileid
	Title	: get_filehandle_by_fileid
	Usage	: $obj->get_filehandle_by_fileid($newval)
	Function:
	Example :
	Returns : value of get_filehandle_by_fileid
	Args	: newvalue (optional)

   primary_index_file
	Title	: primary_index_file
	Usage	: $obj->primary_index_file($newval)
	Function:
	Example :
	Returns : value of primary_index_file
	Args	: newvalue (optional)

   primary_index_filehandle
	Title	: primary_index_filehandle
	Usage	: $obj->primary_index_filehandle($newval)
	Function:
	Example :
	Returns : value of primary_index_filehandle
	Args	: newvalue (optional)

   format
	Title	: format
	Usage	: $obj->format($newval)
	Function:
	Example :
	Returns : value of format
	Args	: newvalue (optional)

   write_flag
	Title	: write_flag
	Usage	: $obj->write_flag($newval)
	Function:
	Example :
	Returns : value of write_flag
	Args	: newvalue (optional)

   dbname
	Title	: dbname
	Usage	: $obj->dbname($newval)
	Function: get/set database name
	Example :
	Returns : value of dbname
	Args	: newvalue (optional)

   index_directory
	Title	: index_directory
	Usage	: $obj->index_directory($newval)
	Function:
	Example :
	Returns : value of index_directory
	Args	: newvalue (optional)

   record_size
	Title	: record_size
	Usage	: $obj->record_size($newval)
	Function:
	Example :
	Returns : value of record_size
	Args	: newvalue (optional)

   primary_namespace
	Title	: primary_namespace
	Usage	: $obj->primary_namespace($newval)
	Function:
	Example :
	Returns : value of primary_namespace
	Args	: newvalue (optional)

   index_type
	Title	: index_type
	Usage	: $obj->index_type($newval)
	Function:
	Example :
	Returns : value of index_type
	Args	: newvalue (optional)

   index_version
	Title	: index_version
	Usage	: $obj->index_version($newval)
	Function:
	Example :
	Returns : value of index_version
	Args	: newvalue (optional)

   primary_pattern
	Title	: primary_pattern
	Usage	: $obj->primary_pattern($newval)
	Function:
	Example :
	Returns : value of primary_pattern
	Args	: newvalue (optional)

   start_pattern
	Title	: start_pattern
	Usage	: $obj->start_pattern($newval)
	Function:
	Example :
	Returns : value of start_pattern
	Args	: newvalue (optional)

   secondary_patterns
	Title	: secondary_patterns
	Usage	: $obj->secondary_patterns($newval)
	Function:
	Example :
	Returns : value of secondary_patterns
	Args	: newvalue (optional)

   secondary_namespaces
	Title	: secondary_namespaces
	Usage	: $obj->secondary_namespaces($newval)
	Function:
	Example :
	Returns : value of secondary_namespaces
	Args	: newvalue (optional)

perl v5.14.1			  2011-07-22	Bio::DB::Flat::BinarySearch(3)
[top]

List of man pages available for Fedora

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net