The BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra
Package) are basic building blocks for many codes. The BLAS perform
such basic operations as innner products, matrix-vector and matrix-matrix
products. The LAPACK routines use the BLAS routines to perform
dense matrix operations such as LU decomposition to solve linear equation,
QR decomposition to solve least square problems, and also singular value
and eigen problems.

Advantages of using the LAPACK and BLAS libraries are in having portable
fast code. Fortran (C is also possible with a bit more fiddling) codes
calling LAPACK and BLAS can be ported easily to a variety of archectectures.
The code is high quality, giving not only good performance, but also
handling exceptional cases and avoiding numeric under and overflow.
For problems too large to fit in cache, LAPACK codes often run in one third
or less of the time required by the predecessor packages EISPACK and LINPACK.
(For small matrices, size a few hundred or less, EISPACK and LINPACK
may sometimes be faster, having fewer levels of subroutine).

For solving a 12K by 12K linear system on a single processor, a user found that the Numeric
Recipes solver timed out after ten hours. The LAPACK solver dgesv required
fifteen minutes with the PGI compiler and PGI supplied BLAS, and about ten minutes
with the Intel compiler and BLAS.

The efficiency of the implementation depends mainly on the quality of the
underlying BLAS library. Good BLAS implementations allow matrix matrix
multiplications and such operations as LU and QR decomposition to run
at near the peak CPU clock rates.

This exposition is out of date in not going into more detail on how to use
the shared memory parallel mp libraries (allowing use of more than one core).
Also the distributed memory SCALAPACK (allowing use of distributed memory) libraries
have been incorporated into the newer mkl libraries.

Speeds obtained by downloading the standard BLAS source code and compiling
it are slower than for a tuned library. Several tuned BLAS and
LAPACK libraries are available on the blade cluster. Instructions on linking
to the Intel, PGI, and Atlas versions of the library are included above.