1<HTML> 2<HEAD> 3<BASE HREF="http://www.mcs.anl.gov/petsc/benchmarks.html"> 4<TITLE>PETSc Benchmarks</TITLE> 5</HEAD> 6<BODY BGCOLOR="#ffffff" LINK="#0000ff" VLINK="#ff0000" ALINK="#ff0000" TEXT="#000000"> 7 8<H1 align=center>Sample PETSc Floating Point Performance</H1> 9<P> 10<H3> 11<MENU> 12<LI> <a href="petsc.html#singleprocessor">Single Processor Floating Point Performance</a> 13<LI> <a href="petsc.html#multiprocessor">Parallel Performance for Euler Solver</a> 14<LI> <a href="petsc.html#laplacian">Scalability for Laplacian</a> 15</MENU> 16</H3> 17<P> 18We provide these floating point performance numbers as a guide to users to indicate 19the type of floating point rates they should expect while using PETSc. We have done 20our best to provide fair and accurate values but do not guarantee 21any of the numbers presented here. 22<P> 23See the "Profiling" chapter of <a href="http://www.mcs.anl.gov/petsc/manual.html#Node100"> 24the PETSc users manual</a> for instructions on techniques to obtain accurate performance 25numbers with PETSc 26 27<P><HR><P> 28 29<A NAME="singleprocessor"> <H1 align=center>Single Processor Performance</H1></A> 30 31In many PDE application codes one most solve systems of linear equations 32arising from the descretization of multicomponent PDEs, the sparse matrices computed 33naturally have a block structure. 34<P> 35PETSc has special sparse matrix storage formats and routines to take advantage of 36that block structure to deliver much higher (two or three times as high) floating 37point computation rates. Below we give the 38floating point rates for the matrix-vector product for a 1503 by 1503 sparse matrix with a block 39size of three arising from a simple oil reservoir simulation. 40 41<p> 42<A HREF="http://ftp.mcs.anl.gov/pub/petsc/matmultbench.ps">Embed here</A> 43<p> 44 45The next table depicts performance for the entire linear solve using GMRES(30) and 46ILU(0) preconditioning. 47 48<P> 49<A HREF="http://ftp.mcs.anl.gov/pub/petsc/solvebench.ps">Embed here</A> 50<P> 51 52These tests were run using 53the code src/sles/examples/tutorials/ex10.c with the options 54<p> 55<tt> 56mpiexec -n 1 ex10 -f0 arco1 -f1 arco1 -pc_type ilu -ksp_gmres_unmodifiedgramschmidt -optionsleft -mat_baij -matload_block_size 3 -log_view 57</tt> 58 59<P><HR><P> 60 61<A NAME="multiprocessor"> <H1 align=center>Parallel Performance for Euler Solver</H1></A> 62 63<A NAME="laplacian"> <H1 align=center>Scalability for Laplacian</H1></A> 64A typical "model" problem people work with in numerical analysis for PDEs is the 65Laplacian. Discretization of the Laplacian in two dimensions with finite differences 66is typically done using the "five point" stencil. This results in a very sparse 67(at most five nonzeros per row), ill-conditioned matrix. 68 69<P> 70Because the matrix is so sparse and has no block structure it is difficult to get 71very good sequential or parallel floating point performance, especially for small 72problems. Here we demonstrate scalability of the parallel PETSc matrix vector product 73for the five point stencil on two grids. These were run on three machines: 74an IBM SP2 with the Power2Super chip and two memory cards at ANL, the Cray T3E at NERSC and 75the Origin2000 at NCSA. 76 77<P> 78Since PETSc is intended for much more general problems then the Laplacian we don't consider 79the Laplacian to be a particularlly important benchmark; we include it due to interest 80from the community. 81 82<P><HR><P> 83 84<H2 align=center>100 by 100 Grid: Absolute Time and Speed-Up</H1> 85 86100by100 grid 87<P> 88Notes: The problem here is simply to small to parallelize on a distributed memory 89computer. 90<P> 91 92<H2 align=center>1000 by 1000 Grid: Absolute Time and Speed-Up</H1> 93 941000by1000 grid 95<P> 96 97 98 99</BODY> 100</HTML> 101