DPSLDLT_SolveM man page on IRIX

Man page or keyword search:  
man Server   31559 pages
apropos Keyword Search (all sections)
Output format
IRIX logo
[printable version]



DPSLDLT(3S)							   DPSLDLT(3S)

NAME
     DPSLDLT_Destroy, DPSLDLT_ExtractPerm, DPSLDLT_Factor, DPSLDLT_FactorOOC,
     DPSLDLT_OOCLimit, DPSLDLT_OOCPath, DPSLDLT_Ordering, DPSLDLT_Preprocess,
     DPSLDLT_PreprocessZ, DPSLDLT_Solve, DPSLDLT_SolveM, DPSLDLT_Storage -
     Parallel sparse symmetric solver for linear systems of real equations

SYNOPSIS
     Fortran synopsis:

	  SUBROUTINE DPSLDLT_DESTROY (token)
	  INTEGER token

	  SUBROUTINE DPSLDLT_EXTRACTPERM (token, perm)
	  INTEGER token, perm(*)

	  SUBROUTINE DPSLDLT_FACTOR (token, n, pointers, indices, values)
	  INTEGER token, n, pointers(*), indices(*)
	  DOUBLE PRECISION values(*)

	  SUBROUTINE DPSLDLT_FACTOROOC (token, n, pointers, indices, values)
	  INTEGER token, n, pointers(*), indices(*)
	  DOUBLE PRECISION values(*)

	  SUBROUTINE DPSLDLT_OOCLIMIT (token, ooclimit)
	  INTEGER token
	  DOUBLE PRECISION ooclimit

	  SUBROUTINE DPSLDLT_OOCPATH (token, oocpath)
	  INTEGER token
	  CHARACTER oocpath(*)

	  SUBROUTINE DPSLDLT_ORDERING (token, method)
	  INTEGER token, method

	  SUBROUTINE DPSLDLT_PREPROCESS (token, n, pointers, indices,
	  non_zeros, ops)
	  INTEGER token, n, pointers(*), indices(*)
	  INTEGER*8 non_zeros
	  DOUBLE PRECISION ops

	  SUBROUTINE DPSLDLT_PREPROCESSZ (token, n, pointers, indices, mask,
	  non_zeros, ops)
	  INTEGER token, n, pointers(*), indices(*), mask(*)
	  INTEGER*8 non_zeros
	  DOUBLE PRECISION ops

	  SUBROUTINE DPSLDLT_SOLVE (token, x, b)
	  INTEGER token
	  DOUBLE PRECISION x(*), b(*)

									Page 1

DPSLDLT(3S)							   DPSLDLT(3S)

	  SUBROUTINE DPSLDLT_SOLVEM (token, X, ldx, B, ldb, nrhs)
	  INTEGER token, ldx, ldb, nrhs
	  DOUBLE PRECISION X(*), B(*)

	  DOUBLE PRECISION FUNCTION DPSLDLT_STORAGE(token)
	  INTEGER token

     C/C++ synopsis:

	  #include <scsl_sparse.h>

	  void DPSLDLT_Destroy (int token);

	  void DPSLDLT_ExtractPerm (int token, int perm[]);

	  void DPSLDLT_Factor (int token, int n, int pointers[], int
	  indices[], double values[]);

	  void DPSLDLT_FactorOOC (int token, int n, int pointers[], int
	  indices[], double values[]);

	  void DPSLDLT_OOCLimit (int token, double ooclimit);

	  void DPSLDLT_OOCPath (int token, char oocpath[]);

	  void DPSLDLT_Ordering (int token, int method);

	  void DPSLDLT_Preprocess (int token, int n, int pointers[], int
	  indices[], long long *non_zeros, double *ops);

	  void DPSLDLT_PreprocessZ (int token, int n, int pointers[], int
	  indices[], int mask[], long long *non_zeros, double *ops);

	  void DPSLDLT_Solve (int token, double x[], double b[]);

	  void DPSLDLT_SolveM (int token, double X[], int ldx, double B[], int
	  ldb, int nrhs);

	  double DPSLDLT_Storage (int token);

IMPLEMENTATION
     These routines are part of the SCSL Scientific Library and can be loaded
     using either the -lscs or the -lscs_mp option.  The -lscs_mp option
     directs the linker to use the multi-processor version of the library.

     When linking to SCSL with -lscs or -lscs_mp, the default integer size is
     4 bytes (32 bits). Another version of SCSL is available in which integers
     are 8 bytes (64 bits). This version allows the user access to larger
     memory sizes and helps when porting legacy Cray codes.  It can be loaded
     by using the -lscs_i8 option or the -lscs_i8_mp option.  A program may
     use only one of the two versions; 4-byte integer and 8-byte integer
     library calls cannot be mixed.

									Page 2

DPSLDLT(3S)							   DPSLDLT(3S)

     The C and C++ prototypes shown above are appropriate for the 4-byte
     integer version of SCSL. When using the 8-byte integer version, the
     variables of type int become long long and the <scsl_sparse_i8.h> header
     file should be included.

DESCRIPTION
     DPSLDLT solves sparse symmetric linear systems of the form Ax = b where A
     is a real n-by-n symmetric input matrix, b is a real input vector of
     length n, and x is an unknown real vector of length n.

     DPSLDLT uses a direct method. A is factored into the following form:

	  A = L D LT

     where L is a lower triangular matrix with unit diagonal and D is a
     diagonal matrix.

     Note that NO PIVOTING FOR STABILITY is performed during factorization.

     The DPSLDLT library contains five main routines.

     *	 DPSLDLT_Ordering() allows the user to select one of five possible
	 reordering methods to be used in the matrix preprocessing phase.

     *	 DPSLDLT_Preprocess() performs preprocessing operations on the
	 structure of A (heuristic reordering to reduce fill in L, symbolic
	 factorization, etc.).

     *	 DPSLDLT_Factor() factors the matrix A into L and D, using the
	 previously computed preprocessing data.

     *	 DPSLDLT_Solve() solves for a vector x, given an input vector b.

     *	 DPSLDLT_Destroy() frees all storage associated with the matrix A
	 (including L, D, and various data structures computed during
	 preprocessing).

     The user can call DPSLDLT_Factor() several times after a single call to
     DPSLDLT_Preprocess() to factor multiple matrices with identical non-zero
     structures but different values.  Similarly, the user can call
     DPSLDLT_Solve() several times after a single call to DPSLDLT_Factor() to
     solve for multiple right-hand-sides.  Also, the user can call
     DPSLDLT_SolveM() to solve for multiple right-hand-sides all stored in a
     single array.

   Sparse Matrix Format
     Sparse matrix A must be input to DPSLDLT in Harwell-Boeing format (also
     known as Compressed Column Storage format).

     The matrix is held in three arrays: pointers, indices, and values.	 The
     indices array contains the row indices of the non-zeros in A. The values
     array holds the corresponding non-zero values. The pointers array

									Page 3

DPSLDLT(3S)							   DPSLDLT(3S)

     contains the index in indices for the first non-zero in each column of A.
     Thus, the row indices for the non-zeros in column i can be found in
     locations indices[pointers[i]] through indices[pointers[i+1]-1]. The
     corresponding values can be found in location values[pointers[i]] through
     values[pointers[i+1]-1].

     For a symmetric matrix A, the user must input either the lower or upper
     triangle of A, but not both.  Non-zeroes within a column of A can be
     stored in any order.

     In the following example, the symmetric matrix

     1.0
     0.0 3.0
     2.0 0.0 5.0
     0.0 4.0 0.0 6.0

     would be represented in FORTRAN as follows:

      INTEGER pointers(5), indices(6), i
      DOUBLE PRECISION values(6)
      DATA (pointers(i), i = 1, 5) / 1, 3, 5, 6, 7 /
      DATA (indices(i),	 i = 1, 6) / 1, 3, 2, 4, 3, 4 /
      DATA (values(i),	 i = 1, 6) / 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 /

     Zero-based indexing is used in C, so the pointers and indices arrays
     would instead contain the following:

     int pointers[]  = {0, 2, 4, 5, 6};
     int indices[]   = {0, 2, 1, 3, 2, 3};
     double values[] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};

   Ordering Methods
     The DPSLDLT_Ordering(token, method) routine allows the user to change the
     ordering method used to pre-order the matrix before factorization.	 This
     routine must be called before calling DPSLDLT_Preprocess(). Five options
     are currently available for the method parameter:

     *	 Method 0 performs no pre-ordering

     *	 Method 1 performs Approximate Minimum Fill ordering

     *	 Method 2 performs a single nested dissection ordering (default).
	 This method is often called "Extreme matrix ordering".

     *	 Method 3 performs multiple nested dissection orderings (in parallel)

									Page 4

DPSLDLT(3S)							   DPSLDLT(3S)

     *	 Method 4 performs multiple nested dissection (the same as in Method
	 3), but it uses a feedback file to "learn" from the previous solves
	 of the same matrix structure and it performs more orderings. The
	 multiple nested dissection technique of Methods 3 and 4 is also
	 referred to as "Extreme2 matrix ordering".

     Method 2 is significantly more expensive than Method 1, but it usually
     produces significantly better orderings.  Method 3 is especially
     effective on multi-processor systems.  It computes OMP_NUM_THREADS (where
     OMP_NUM_THREADS is an environment variable indicating the number of
     processors to be used for parallel computation) matrix orderings using
     different starting points for the algorithm and uses the ordering that
     will lead to the fewest floating-point operations to factorize the
     matrix.

     Method 4 is useful only when the same non-zero structure is used for
     multiple solves.  Method 4 keeps a record in a "feedback" file of a
     signature for non-zero structures for a maximum of 200 matrices and of
     the starting point that was saved from a previous solve for that
     structure.	 In the next Method 4 ordering for that non-zero structure,
     that best starting point and 2 * OMP_NUM_THREADS - 1 new ones generate
     orderings.	 The best ordering is used.  In this way, the quality of
     orderings stay the same or improve over time.

     Methods 3 and 4 typically take more time for the matrix preprocessing
     than the default.	However, on large systems or on repeated
     factorizations, significant overall speedups (1.1X to 2X) can be obtained
     compared to Method 2.

   Extracting the permutation vector
     Unless ordering Method 0 is used, DPSLDLT applies a symmetric permutation
     to matrix A before the factorization step; the resulting permuted matrix
     generally has significantly less fill-in than the original matrix.	 The
     user can obtain the permutation matrix associated with a given token by
     calling DPSLDLT_ExtractPerm(token, perm). The permutation is returned as
     an integer array of length n, with 1 <= perm(i) <= n (0 <= perm[i] < n
     for C code).

     A value of k for perm(i) implies that node k in the original ordering is
     node i in the new ordering.

   Matrices with zeros on the diagonal
     As noted above, no pivoting is done for stability during factorization;
     when zero or near-zero pivots are encountered, DPSLDLT usually fails. In
     these cases, it may be possible to use DPSLDLT_PreprocessZ() to obtain a
     slightly different, but stable, ordering.	The user provides an
     additional integer array, mask, as an argument to DPSLDLT_PreprocessZ().
     If mask(i)=0, then DPSLDLT will attempt to maximize the diagonal element
     |Aii|.

									Page 5

DPSLDLT(3S)							   DPSLDLT(3S)

   Memory usage
     The returned value of DPSLDLT_Storage() is an estimate of the amount of
     storage required (in millions of bytes) by the solver's data structures
     for a given matrix system.

   Out-of-core Factorization
     The storage associated with the factor can be managed in two ways.	 The
     DPSLDLT_Factor() routine allocates memory for the factor and manages it
     internally, releasing it only when DPSLDLT_Destroy() is called.  The
     alternative is to do out-of-core factorization by calling
     DPSLDLT_FactorOOC(). This routine uses a small amount of in-core memory,
     placing the remainder of the factor matrix on disk as it is computed.
     The user can call DPSLDLT_OOCPath() to indicate the directory in which
     the factor file should be written, and DPSLDLT_OOCLimit() to indicate how
     much memory to use to hold portions of the factor matrix in-core.	More
     in-core memory generally leads to less disk I/O and higher performance
     during the factorization.	The only required change is to move from in-
     core factorization to out-of-core factorization is the change from
     DPSLDLT_Factor() to DPSLDLT_FactorOOC().  The other routines
     (DPSLDLT_Solve(), DPSLDLT_Destroy(), etc.) handle out-of-core factors
     transparently.  Note that DPSLDLT_FactorOOC() and subsequent calls to
     DPSLDLT_Solve() are not parallelized (but calls to DPSLDLT_SolveM() are
     parallelized, as discussed below).

   Multiple Right-Hand-Sides
     DPSLDLT can solve for large numbers of right-hand-sides with one call to
     DPSLDLT_SolveM().	It solves these right hand sides in parallel, with
     each processor solving up to four at a time for in-core systems and up to
     PSLDLT_OOCBLK at a time for out-of-core systems, where PSLDLT_OOCBLK is
     an environment variable whose default value is 1.

   In-place Solves
     Both DPSLDLT_Solve() and DPSLDLT_SolveM() allow the solution vector(s) to
     overwrite the right-hand-side(s) when identical vectors or matrices are
     supplied to these routines.  For example,

      CALL DPSLDLT_SOLVE(token, b, b)

     takes the right-hand-side input from b and also returns the solution
     vector in b.  When this option is used with DPSLDLT_SolveM(), the leading
     dimensions for the solution and right-hand-side matrices must agree.  The
     amount of memory actually saved by performing an in-place solve depends
     on the number of right-hand-sides used.  For a single right-hand-side,
     there are no net savings versus an out-of-place solve because a temporary
     copy of the input vector is made internally.  For multiple right-hand-
     sides the memory overhead decreases as the ratio of right-hand-sides to
     processors used increases.

   Arguments
     These routines have the following arguments:

									Page 6

DPSLDLT(3S)							   DPSLDLT(3S)

     token     (input) DPSLDLT can handle multiple matrices simultaneously.
	       The token distinguishes between active matrices.	 The token
	       passed to DPSLDLT_Factor() must match the token used in some
	       previous call to DPSLDLT_Preprocess().  Similarly, the token
	       passed to DPSLDLT_Solve() must match the token used in some
	       previous call to DPSLDLT_Factor().  0 <= token <= 19.

     method    (input) An integer specifying the ordering method used during
	       preprocessing.  0 <= method <= 4.

     n	       (input) The number of rows and columns in the matrix A.	n >=
	       0.

     pointers, indices, values
	       (input) The pointers and indices arrays store the non-zero
	       structure of sparse input matrix A in Harwell-Boeing or
	       Compressed Sparse Column (CSC) format.

	       The pointers array stores n+1 integers, where pointers[i] gives
	       the index in indices of the first non-zero in column i of A.
	       The indices array stores the row indices of the non-zeros in A.
	       The values array stores the non-zero values in the matrix A.

     non_zeros (output) The number of non-zero values in L.

     ops       (output) The number of floating-point operations required to
	       factor A.

     mask      (input) An integer array of length n used in
	       DPSLDLT_PreprocessZ().  If mask(i) = 0, then node i of matrix A
	       is ordered after all of its neighbors in an attempt to avoid a
	       zero pivot.

     b	       (input) The right-hand-side vector in a DPSLDLT_Solve() call.

     x	       (output) The solution vector in a DPSLDLT_Solve() call.

     nrhs      (input) The number of right-hand side vectors present in a
	       DPSLDLT_SolveM() call.

     B	       (input) The right-hand-side matrix in a DPSLDLT_SolveM() call.
	       Must be stored in column-major order.

     ldb       (input) The leading dimension of matrix B. ldb >= n.

     X	       (output) The solution matrix in a DPSLDLT_SolveM() call. Must
	       be stored in column-major order.

     ldx       (input) The leading dimension of matrix X. ldx >= n.

									Page 7

DPSLDLT(3S)							   DPSLDLT(3S)

     oocpath   (input) A character array/string with a path to the directory
	       where the temporary out-of-core factor files should be stored.
	       If this path is on a striped (or raid-0) file system, the
	       performance of the out-of-core solves can be considerably
	       improved.  The default path is /usr/tmp.

     ooclimit  (input) A double precision number indicating the number of
	       Mbytes of random access memory that should be used for factor
	       storage during a call to DPSLDLT_FactorOOC(). Note that there
	       are many other arrays used besides those directly used to store
	       the factorization, so total RAM usage by the solve will exceed
	       this number.  The default is 64 MB.

     perm      (output) An integer array of length n containing the
	       permutation used to reorder matrix A.

ENVIRONMENT VARIABLES
     Two environment variables can affect the operation of ordering methods 3
     and 4.  SPARSE_NUM_ORDERS can be used to change the number of orderings
     performed from the default of OMP_NUM_THREADS for Method 3 and
     (2*OMP_NUM_THREADS) for Method 4.	SPARSE_FEEDBACK_FILE can be set to the
     path and file name where the feedback information will be kept;
     otherwise, the default feedback file is $HOME/.sparseFeedback.  This file
     will be less than 5K bytes.

     The environment variable OMP_NUM_THREADS determines the number of
     processors that are used for the numerical factorization and solve
     phases.  Out-of-core solves can be performed in groups of PSLDLT_OOCBLK
     right-hand-sides per processor.  Setting the environment variable
     PSLDLT_VERBOSE causes DPSLDLT to output information about the
     factorization.

NOTES
     These routines are optimized and parallelized for the SGI R8000 and
     R1x000 platforms.

SEE ALSO
     INTRO_SCSL(3S), INTRO_SOLVERS(3S), DPSLDU(3S), ZPSLDLT(3S), ZPSLDU(3S)

									Page 8

[top]

List of man pages available for IRIX

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net