The PSBLAS library, developed with the aim to facilitate the parallelization of computationally intensive scientific applications, is designed to address parallel implementation of iterative solvers for sparse linear systems through the distributed memory paradigm. It includes routines for multiplying sparse matrices by dense matrices, solving block diagonal systems with triangular diagonal entries, preprocessing sparse matrices, and contains additional routines for dense matrix operations. The current implementation of PSBLAS addresses a distributed memory execution model operating with message passing.

The PSBLAS library version 3 is implemented in the Fortran 2003 [17] programming language, with reuse and/or adaptation of existing Fortran 77 and Fortran 95 software, plus a handful of C routines.

The use of Fortran 2003 offers a number of advantages over Fortran 95, mostly in the handling of requirements for evolution and adaptation of the library to new computing architectures and integration of new algorithms. For a detailed discussion of our design see [11]; other works discussing advanced programming in Fortran 2003 include [1, 18]; sufficient support for Fortran 2003 is now available from many compilers, including the GNU Fortran compiler from the Free Software Foundation (as of version 4.8).

Previous approaches have been based on mixing Fortran 95, with its support for object-based design, with other languages; these have been advocated by a number of authors, e.g. [16]. Moreover, the Fortran 95 facilities for dynamic memory management and interface overloading greatly enhance the usability of the PSBLAS subroutines. In this way, the library can take care of runtime memory requirements that are quite difficult or even impossible to predict at implementation or compilation time.

The presentation of the PSBLAS library follows the general structure of the proposal for serial Sparse BLAS [8, 9], which in its turn is based on the proposal for BLAS on dense matrices [15, 5, 6].

The applicability of sparse iterative solvers to many different areas causes some terminology problems because the same concept may be denoted through different names depending on the application area. The PSBLAS features presented in this document will be discussed referring to a finite difference discretization of a Partial Differential Equation (PDE). However, the scope of the library is wider than that: for example, it can be applied to finite element discretizations of PDEs, and even to different classes of problems such as nonlinear optimization, for example in optimal control problems.

The design of a solver for sparse linear systems is driven by many conflicting objectives, such as limiting occupation of storage resources, exploiting regularities in the input data, exploiting hardware characteristics of the parallel platform. To achieve an optimal communication to computation ratio on distributed memory machines it is essential to keep the data locality as high as possible; this can be done through an appropriate data allocation strategy. The choice of the preconditioner is another very important factor that affects efficiency of the implemented application. Optimal data distribution requirements for a given preconditioner may conflict with distribution requirements of the rest of the solver. Finding the optimal trade-off may be very difficult because it is application dependent. Possible solutions to these problems and other important inputs to the development of the PSBLAS software package have come from an established experience in applying the PSBLAS solvers to computational fluid dynamics applications.