Some Recent Resumes, Papers, Theses, and Technical Reports
Vita
Iterative solution of sparse linear least squares using LU Factorization"G.W. Howell and M. Babouin, to appear in HPC Asia 2018.
LU Preconditioning for Overdetermined Sparse Least Squares Problems G.W. Howell and M. Baboulin, PPAM, 2015.
Data files for least squares A tar file containing R data frames and plots for "LU Preconditioning for Overdetermined Sparse Least Squares Problems" G.W. Howell and M. Baboulin
An Efficient Parallel Solution to the Wigner-Poisson Equation, A.S.Costalanski, C.T.Kelley, G.W.Howell, A.G.Salinger, HPC 2013, Orlando.
"Wide or Tall" and "Sparse Matrix Dense Matrix" Multiplications
, HPC 2011, Boston.
Block Householder Computation of Sparse Matrix
Singular Values Supported by NIH Molecular Libraries Roadmap for Medical Research,
Grant 1 P20 Hg003900-01, HPC 2010, Orlando.
Lazy Householder Decomposition of Sparse Matrices Supported by NIH Molecular Libraries Roadmap for Medical Research,
Grant 1 P20 Hg003900-01.
Sparse BLAS-3 Reduction to Banded Upper Triangular Presented March 11, 2008 to SIAM Conference on Scientific and Parallel
Computation. Supported by NIH Molecular Libraries Roadmap for Medical
Research, Grant 1 P20 Hg003900-01. This discusses stable block Householder
reduction to a banded upper triangular form. At the price of doubling storage
we get an all BLAS-3 algorithm, hence one that executes at a reasonble
proportion of peak speed. As available RAM increases, why not use it?
Cache Efficient Bidiagonalization Using
BLAS 2.5 Operators (ps)>
(pdf)>
Joint work with C.T. Fulton and K. Marmol,
J. Demmel and S. Hammarling.(thanks also to K. Stanley). draft of
paper which appeared in ACM Transactions on Mathematical Software,
volume 32, number 3, pp. 13-46, May 2008.
A preliminary version is LApack Working Note 174 (pdf)
. Combining left and right matrix vector multiplies into a
BLAS 2.5 GEMVT call allows a significant speedup compared to LAPACK.
Supported by National Science Foundation Grant EIA-0103642, Next
Generation Software. The work was also funded in part by the
NIH Roadmap for Medical Research, Grant 1 P20 HG003900-01.
Information on the Molecular Libraries Roadmap Initiative can
be obtained from http://nihroadmap.nih.gov/molecularlibraries/
Algorithm 841: BHESS: Gaussian Reduction to a Similar
Banded Hessenberg Form (ps)
(pdf )
This is an earlier draft of an ACM Transactions on Mathematical
Software paper, appeared March 2005, joint
with N. Diaa. Supported by National Science Foundation Grant EIA-0103642
A Case Study in Using Local IO and GPFS to
Improve Simulation Scalability
authors are V. Bannister, G.W. Howell, C. T. Kelley, E. Sills, and Q. Zhang.
Presented to the 2007 LCI Conference, May 2007, Tahoe, Calif.
BR Eigenvalue Iteration (ps)
(pdf)
joint work with G. A. Geist and D. W. Watkins. Appears in
SIAM Journal of Matrix Analysis and Applications (SIMAX),
July 1999, 20, 4 pp. 1083-1097 BR iteration is a bulge chasing
algorithm to determine eigenvalues of small - band Hessenberg matrices
(which result from BHESS or from QMR).
BHESS-BR poster
BHESS-BR (ps)
(pdf)
1998 SIAM Conference Atlanta. At that time, BHESS-BR calculcated
eigenvalues about five
times as fast as LAPACK. BHESS is available as TOMS Algorithm 841.
Some BR iteration codes are available below.
Error Analysis of Reduction to Banded Hessenberg Form (ps)
(pdf)
This gives a backward error analysis for the BHESS algorithm.
Also it tests the BHESS algorithm with Rowan's INSTAB program
which is designed to automatically produce example problems
for which algorithms are not backward stable. BHESS is similar
in backward error to ELMHES (EISPACK reduction Hessenberg form by elementary
similarity transformations). INSTAB finds no examples of
instability. It appears as a 1997 ORNL technical report, jointly
authored with Tom Rowan and Al Geist.
Sparse Householder bidiagonalization (ps)
presented at
CERFACS Sparse Day, Toulouse, FRANCE June 15, 2001.
Florida Tech CS seminar talk on high performance bidiagonalization (ps)
pdf
Talk on high performance bidiagonalization. Joint work
with Fulton, Marmol, and Demmel, update of talk given at
SIAM math conference in North Carolina, and at Leuven.
Numerical experiments for ELMRES
These are
postscript files displaying results from Desi Stephens
work in incorporating an implicit pivoting Fortran 77
version of ELMRES into Saad's package SPARSKIT.
Main results of Desmond Stephen's dissertation (ps)
(pdf)
This is a paper which contains the main
results of Desmond Stephen's 1999 dissertation,
"Oblique Projection Method for Solving Systems of Sparse
Linear Linear Equations" Gary Howell and Desmond Stephens.
Note November 13, 2001. Most of the results of the paper
are given in work by Hassane Sadok as the method CMRH.
Appeared in Numerical Algorithms 20(1999) 303-321.
CMRH: A new method for solving nonsymmetric linear
systems based on the Hessenberg reduction algorithm.
He has sent us a technical report on CMRH from February 1993.
Note based on succeeding talk.
(pdf)
This is based on the talk listed next.
ELMRES -- Elementary Residual Method (ps)
(pdf)
-- Oct. 19, 1998 joint work with D. Stephens.
This was a talk presented at the Fourth IMACS International Symposium
on Iterative Methods in Scientific Computation -- Honoring David M. Young,
Jr. ELMRES is an oblique variant of GMRES. Compared to GMRES,
ELMRES is easier to parallelize and requires fewer operations.
Note, found in 2000
that this work substantially duplicates the CMRH algorithm
published in 1999 by H. Sakok which was the subject of a 1993
dissertation by Sadok's student Heyouni.
Recursions to compute Bernoulli numbers (ps)
-- spring 2001, joint work with P. Godfrey.
Recursions to compute Bernoulli numbers. Appeared in Dec. 2001
in the Proceedings of the Marathwanda Mathematical Society.
Parallel GEMVT-Cache-Efficiency in Combined Left and
right Matrix Vector Multiplications
Howell, Fulton,
and Premkumar. Draft document with code, results, and analysis
from Dec. 2003, used as source material for Premkumar's thesis.
Supported by National Science Foundation Grant EIA-0103642
and by use of computer resources at ERDC MSRC.
Cache Efficient Bidiagonalization (ps)
(pdf)
Paper presented at SIAM linear algebra conference
in October 2000 (Raleigh NC).
Parallel and Serial Cache Efficient
Bidiagonalization (ps)
Paper from the SIAM linear algebra proceedings
in July 2003 (Williamsburg, VA). Joint with C. Fulton, S. Malhotra,
and J. Parker. Cache efficient bidiagonalization and work on
optimizing the performance of serial and parallel GEMVT. This work
was supported by National Science Foundation Grant EIA-0103642.
ELMRES: An Oblique Projection
Method to Solve Sparse Non-Symmetric Linear Systems (ps)
(pdf)
Ph.D. dissertation of Desi Stephens.
August 1999. Florida Institute of Technology.
Some Issues in Efficient Implementation of a Vector Based Model for Document Retrieval(ps)
Proceedings article authored by a seminar
class at Florida Tech in spring 2001. Trying for efficient sparse matrix
vector multiplications, with an eye to data retrieval.
Sparse - sparse representations appear most efficient for
the vector space model of data bases. This work was supported
by National Science Foundation Grant EIA-0103642.
Some Software
bhess.tar.gz
joint work with N. Diaa. To extract files, type
gunzip -9c bhess.tar.gz | tar xvf -
This will create a directory which will include fortran
files for BHESS, make files, a README file, a matlab script
file for BHESS, and also the file howell.ps which is a preprint
for a Transactions on Mathematical Software paper which appeared
in ACM TOMS in spring 2005.
Example MPI programs (tar)
Pacheco's Example MPI programs (tar)
hpc.tar and pachec.tar are programs from a short course given by Gary Howell
at NC State in November 2003. pachec.tar includes files from Pete Pacheco's
book from Morgan Kaufman Press. These files show some examples of
Fortran 77 MPI usage.
Matlab codes for Householder bidiagonalization
This version is relatively clean and is general enough to handle the
complex case. This work was supported by National Science Foundation
Grant EIA-0103642.
Matlab code to show how to block (block size = 2)
the Ralha-Barlow bidiagonalization.
Written by Gary Howell and Britney Owens during her ERDC internship
(at the request of Jesse Barlow). August 13, 2003.
Tar file that also has a required subroutine
ELMRES embedded in SPARSKIT
The package sparski.tar.gz is a preliminary version of SPARSKIT with the
minimal revisions necessary to use ELMRES (have to include an extra
argument for pivoting, see paper on elmres above). Update on May 15,
2001 to remove extra object files which prevent successful builds.
Observe that some makefile will have to be changed to get the correct
blas library on your machine. Also the -O2 flag may be helpful in
getting faster code. Please send comments when this doesn't install well.
By the way, the run programs are in ITSOL but to change them you need
to .. a directory and do a global make, then go back to ITSOL and do
a make. Else you do not change the matrix you are using.
ALSO THE ACTUAL SOLVERS ARE IN ITSOL/iters.f
which also has some documentation on the ipar and fpar
parameters which control preconditioning and convergence
criteria and are set in riters.f (for riters.ex executable).
This will create a directory called svdblast.
May also include FELMRES (ELMRES with flexible preconditioning)
Tar file of Fortran 77 codes for BR iteration.
BR iteration finds eigenvalues of small-band Hessenberg matrices.
These include br*.f files which actually perform br iterations, I'm not
currently clear which ones are best. There are also some qmr things in
this directory (qmr is a look-ahead Lanczos method, which can be used
to return some extremal eigenvalues). The qmr is due to
Freund and Nachtigal.
Fortran 77 codes for Householder bidiagonalization
This package contains a bidiagonalization routine which performs
Householder bidiagonalization rather faster than the current
LAPACK dgebrd, and which is designed to be LAPACK compatible
(for inclusion in LAPACK).
This work was supported by National Science Foundation Grant EIA-0103642.