xref: /petsc/src/benchmarks/results/benchmarks.html (revision 391e379223c8ad233f201b9d10bca97f975e49a9)
1eaa51f12SBarry Smith<HTML>
2eaa51f12SBarry Smith<HEAD>
3*a8d69d7bSBarry Smith<BASE HREF="https://www.mcs.anl.gov/petsc/benchmarks.html">
4ae2c315eSBarry Smith<TITLE>PETSc Benchmarks</TITLE>
5eaa51f12SBarry Smith</HEAD>
6eaa51f12SBarry Smith<BODY BGCOLOR="#ffffff" LINK="#0000ff" VLINK="#ff0000" ALINK="#ff0000" TEXT="#000000">
7eaa51f12SBarry Smith
8ae2c315eSBarry Smith<H1 align=center>Sample PETSc Floating Point Performance</H1>
9eaa51f12SBarry Smith<P>
10eaa51f12SBarry Smith<H3>
11eaa51f12SBarry Smith<MENU>
12ae2c315eSBarry Smith<LI> <a href="petsc.html#singleprocessor">Single Processor Floating Point Performance</a>
13ae2c315eSBarry Smith<LI> <a href="petsc.html#multiprocessor">Parallel Performance for Euler Solver</a>
14ae2c315eSBarry Smith<LI> <a href="petsc.html#laplacian">Scalability for Laplacian</a>
15eaa51f12SBarry Smith</MENU>
16eaa51f12SBarry Smith</H3>
17ae2c315eSBarry Smith<P>
18ae2c315eSBarry SmithWe provide these floating point performance numbers as a guide to users to indicate
19ae2c315eSBarry Smiththe type of floating point rates they should expect while using PETSc. We have done
20ae2c315eSBarry Smithour best to provide fair and accurate values but do not guarantee
21ae2c315eSBarry Smithany of the numbers presented here.
22ae2c315eSBarry Smith<P>
23*a8d69d7bSBarry SmithSee the "Profiling" chapter of <a href="https://www.mcs.anl.gov/petsc/manual.html#Node100">
24ae2c315eSBarry Smiththe PETSc users manual</a> for instructions on techniques to obtain accurate performance
25ae2c315eSBarry Smithnumbers with PETSc
26ae2c315eSBarry Smith
27eaa51f12SBarry Smith<P><HR><P>
28eaa51f12SBarry Smith
29ae2c315eSBarry Smith<A NAME="singleprocessor"> <H1 align=center>Single Processor Performance</H1></A>
30eaa51f12SBarry Smith
31ae2c315eSBarry SmithIn many PDE application codes one most solve systems of linear equations
32ae2c315eSBarry Smitharising from the descretization of multicomponent PDEs, the sparse matrices computed
33ae2c315eSBarry Smithnaturally have a block structure.
34ae2c315eSBarry Smith<P>
35ae2c315eSBarry SmithPETSc has special sparse matrix storage formats and routines to take advantage of
36ae2c315eSBarry Smiththat block structure to deliver much higher (two or three times as high) floating
37ae2c315eSBarry Smithpoint computation rates. Below we give the
38ae2c315eSBarry Smithfloating point rates for the matrix-vector product for a 1503 by 1503 sparse matrix with a block
39ae2c315eSBarry Smithsize of three arising from a simple oil reservoir simulation.
40eaa51f12SBarry Smith
41eaa51f12SBarry Smith<p>
42f963788bSSatish Balay<A HREF="http://ftp.mcs.anl.gov/pub/petsc/matmultbench.ps">Embed here</A>
43eaa51f12SBarry Smith<p>
44eaa51f12SBarry Smith
45ae2c315eSBarry SmithThe next table depicts performance for the entire linear solve using GMRES(30) and
46ae2c315eSBarry SmithILU(0) preconditioning.
47eaa51f12SBarry Smith
48eaa51f12SBarry Smith<P>
49f963788bSSatish Balay<A HREF="http://ftp.mcs.anl.gov/pub/petsc/solvebench.ps">Embed here</A>
50ae2c315eSBarry Smith<P>
51eaa51f12SBarry Smith
52ae2c315eSBarry SmithThese tests were run using
53ae2c315eSBarry Smiththe code src/sles/examples/tutorials/ex10.c with the options
54eaa51f12SBarry Smith<p>
55ae2c315eSBarry Smith<tt>
56609bdbeeSBarry Smithmpiexec -n 1 ex10 -f0 arco1 -f1 arco1 -pc_type ilu -ksp_gmres_unmodifiedgramschmidt -optionsleft -mat_baij -matload_block_size 3 -log_view
57ae2c315eSBarry Smith</tt>
58eaa51f12SBarry Smith
59eaa51f12SBarry Smith<P><HR><P>
60eaa51f12SBarry Smith
61ae2c315eSBarry Smith<A NAME="multiprocessor"> <H1 align=center>Parallel Performance for Euler Solver</H1></A>
62ae2c315eSBarry Smith
63ae2c315eSBarry Smith<A NAME="laplacian"> <H1 align=center>Scalability for Laplacian</H1></A>
64ae2c315eSBarry SmithA typical "model" problem people work with in numerical analysis for PDEs is the
65ae2c315eSBarry SmithLaplacian. Discretization of the Laplacian in two dimensions with finite differences
66ae2c315eSBarry Smithis typically done using the "five point" stencil. This results in a very sparse
67ae2c315eSBarry Smith(at most five nonzeros per row), ill-conditioned matrix.
68ae2c315eSBarry Smith
69ae2c315eSBarry Smith<P>
70ae2c315eSBarry SmithBecause the matrix is so sparse and has no block structure it is difficult to get
71ae2c315eSBarry Smithvery good sequential or parallel floating point performance, especially for small
72ae2c315eSBarry Smithproblems. Here we demonstrate scalability of the parallel PETSc matrix vector product
73ae2c315eSBarry Smithfor the five point stencil on two grids. These were run on three machines:
74ae2c315eSBarry Smithan IBM SP2 with the Power2Super chip and two memory cards at ANL, the Cray T3E at NERSC and
75ae2c315eSBarry Smiththe Origin2000 at NCSA.
76ae2c315eSBarry Smith
77ae2c315eSBarry Smith<P>
78ae2c315eSBarry SmithSince PETSc is intended for much more general problems then the Laplacian we don't consider
79ae2c315eSBarry Smiththe Laplacian to be a particularlly important benchmark; we include it due to interest
80ae2c315eSBarry Smithfrom the community.
81eaa51f12SBarry Smith
82eaa51f12SBarry Smith<P><HR><P>
83eaa51f12SBarry Smith
84ae2c315eSBarry Smith<H2 align=center>100 by 100 Grid: Absolute Time and Speed-Up</H1>
85eaa51f12SBarry Smith
86ae2c315eSBarry Smith100by100 grid
87eaa51f12SBarry Smith<P>
88ae2c315eSBarry SmithNotes: The problem here is simply to small to parallelize on a distributed memory
89ae2c315eSBarry Smithcomputer.
90eaa51f12SBarry Smith<P>
91eaa51f12SBarry Smith
92ae2c315eSBarry Smith<H2 align=center>1000 by 1000 Grid: Absolute Time and Speed-Up</H1>
93ae2c315eSBarry Smith
94ae2c315eSBarry Smith1000by1000 grid
95eaa51f12SBarry Smith<P>
96eaa51f12SBarry Smith
97eaa51f12SBarry Smith
98eaa51f12SBarry Smith
99eaa51f12SBarry Smith</BODY>
100eaa51f12SBarry Smith</HTML>
101