| b29a8671 | 19-Dec-2023 |
Junchao Zhang <jczhang@anl.gov> |
Vec: add GEMV optimizations for VecMDot and friends for VecStandard
Remove KSPPIPEFGMRES from example with skip convergence test since very sensitive to happy ending
Appears to have a sweet spot of
Vec: add GEMV optimizations for VecMDot and friends for VecStandard
Remove KSPPIPEFGMRES from example with skip convergence test since very sensitive to happy ending
Appears to have a sweet spot of much better performance for smallish vectors then matches unrolled code for large vectors
Sample results on Barry's Apple M2 Laptop (using Apple's BLAS)
./ex19 -da_refine 5 -pc_type none -log_view -ksp_gmres_preallocate -ksp_view
Vector length 37,636
VecMDot 1920 1.0 1.9707e-01 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 25 29 0 0 0 25 29 0 0 0 11291
-vec_mdot_use_gemv
VecMDot 1920 1.0 7.5098e-02 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 12 29 0 0 0 12 29 0 0 0 29693 VecMDot 1920 1.0 8.1523e-02 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 12 29 0 0 0 12 29 0 0 0 27353 VecMDot 1920 1.0 7.0889e-02 1.0 2.23e+09 1.0 0.0e+00 0.0e+00 0.0e+00 11 29 0 0 0 11 29 0 0 0 31456
-da_refine 6
Vector length 148,996
VecMDot 4340 1.0 1.7666e+00 1.0 2.00e+10 1.0 0.0e+00 0.0e+00 0.0e+00 20 29 0 0 0 20 29 0 0 0 11319
-vec_mdot_use_gemv
VecMDot 4422 1.0 1.3725e+00 1.0 2.04e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 29 0 0 0 15 29 0 0 0 14884 VecMDot 4422 1.0 1.4354e+00 1.0 2.04e+10 1.0 0.0e+00 0.0e+00 0.0e+00 16 29 0 0 0 16 29 0 0 0 14231
./ex19 -da_refine 7 -pc_type none -log_view -ksp_gmres_preallocate -ksp_view -vec_mdot_use_gemv -ksp_max_it 100 -snes_max_it 1
Vector length 592,900
VecMDot 100 1.0 1.5915e-01 1.0 1.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27 0 0 0 14 27 0 0 0 10804
-vec_mdot_use_gemv
VecMDot 100 1.0 1.6854e-01 1.0 1.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27 0 0 0 14 27 0 0 0 10230 VecMDot 100 1.0 1.5698e-01 1.0 1.72e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27 0 0 0 14 27 0 0 0 10983
-da_refine 8
vector length 2,365,444
VecMDot 100 1.0 6.2499e-01 1.0 6.86e+09 1.0 0.0e+00 0.0e+00 0.0e+00 13 27 0 0 0 13 27 0 0 0 10976
-vec_mdot_use_gemv
VecMDot 100 1.0 6.8197e-01 1.0 6.88e+09 1.0 0.0e+00 0.0e+00 0.0e+00 14 27 0 0 0 14 27 0 0 0 10087
show more ...
|
| ca4445c7 | 20-Jul-2023 |
Ilya Fursov <ilya.foursov.7bd@gmail.com> |
TSEvent: refactor and fix bugs, add TSSetPostEventStep()
Refactor the core algorithm for resolution of events: TSEventHandler() and the helper functions, fixing the existing bugs.
Chang
TSEvent: refactor and fix bugs, add TSSetPostEventStep()
Refactor the core algorithm for resolution of events: TSEventHandler() and the helper functions, fixing the existing bugs.
Change event indicator functions from PetscScalar to PetscReal. Change the API of TSSetEventHandler(): in the user `indicator()` callback, the 'fvalue' argument type changed from PetscScalar[] to PetscReal[].
Add TSSetPostEventStep(), deprecate TSSetPostEventIntervalStep(). Deprecate option -ts_event_post_eventinterval_step.
Fix bugs in interaction of TSEvent with tspan. Add six new test examples.
Below are the examples of bugs fixed by this patch. The source codes (ex3.c, ex3span.c, ex4.c, ex5.c) can be found in src/ts/event/tests. To run them with the older Petsc versions, one would need to comment out "#define NEW_VERSION". Behaviour for three library versions is reported below: * ORIG: current Petsc version, before the proposed patch. * 6688: independently of the proposed patch, Merge Request 6688 was developed, which fixed some bugs with zero-crossing directions. * NEW : the proposed patch.
./ex3 -ts_monitor -ts_event_monitor -ts_view -ts_type beuler \ -ts_adapt_type basic -flg -V 1e9 -ts_adapt_dt_min 1e-6 -change5 1 -dir 1 * ORIG: fails to resolve 5 out of 6 events, and resolves 23 incorrect events. * 6688: fails to exit the TSEvent iteration via the step size (bracket size) criterion. * NEW : ok, resolves all 6 events.
./ex4 -ts_adapt_type basic -ts_type rk -ts_dt 0.25 -ts_event_tol 1e-8 \ -dir 0 -ts_adapt_dt_min 1e-10 -ts_view -ts_monitor -ts_event_monitor * ORIG: only reaches t = 0.0300236 after 10000 TS steps. * 6688: only reaches t = 0.0300236 after 10000 TS steps. * NEW : ok, reaches the final time t = 4.0 after 96 TS steps, resolving all 16 events.
./ex5 -ts_monitor -ts_event_monitor -ts_type rk -ts_adapt_type basic \ -ts_view -ts_dt 0.25 -flg -dir 0 * ORIG: only reaches t = 4.0 after 10000 TS steps, erroneously reports event at t = 4.0 around 5000 times. * 6688: only reaches t = 4.0 after 10000 TS steps. * NEW : ok, reaches the final time t = 10.0 after 99 TS steps, resolving all 34 events.
./ex5 -ts_monitor -ts_event_monitor -ts_type rk -ts_adapt_type basic \ -ts_view -ts_dt 0.25 -flg -dir 1 * ORIG: fails, starts taking negative time steps, no events are correctly resolved. * 6688: ok, but slower: 90 TS steps to resolve all 17 events. * NEW : ok, and faster: 48 TS steps to resolve all 17 events.
./ex5 -ts_monitor -ts_event_monitor -ts_type rk -ts_adapt_type basic \ -ts_view -ts_dt 0.25 -flg -dir -1 * ORIG: fails, starts taking 'nan' time steps. * 6688: mostly fails, only reaches t = 4.99993 after 10000 TS steps. * NEW : ok, reaches the final time t = 10.0 after 74 TS steps, resolving all 17 events.
The same run in parallel: mpirun -n 2 ./ex5 -ts_monitor -ts_event_monitor -ts_type rk \ -ts_adapt_type basic -ts_view -ts_dt 0.25 -flg -dir -1 * ORIG: fails, starts taking negative time steps, exits with runtime error. * 6688: mostly fails, only reaches t = 3, besides, the parallel run is not consistent with the serial run (see above). * NEW : ok, reaches the final time t = 10.0 after 74 TS steps, resolving all 17 events.
./ex3span -ts_monitor -ts_event_monitor * ORIG: (confused by events) misses tspan points: 4.02, 4.21, 4.98, 5.01, 5.21, 5.98, 6, 6.01, 6.02, 6.21, 6.99, 7.21, 8.01, 8.21, 9.01. * 6688: (confused by events) misses tspan points: 4.02, 4.21, 4.98, 5.01, 5.21, 5.98, 6, 6.01, 6.02, 6.21, 6.99, 7.21, 8.01, 8.21, 9.01. * NEW : ok.
./ex3span -ts_monitor -ts_event_monitor -ts_event_post_event_step 0.5 * ORIG: misses the majority of tspan points (except 0.01 and 0.21), and also resolves the last event location at wrong time t = 9.21. * 6688: misses the majority of tspan points (except 0.01 and 0.21), and also resolves the last event location at wrong time t = 9.21. * NEW : ok.
show more ...
|