performance.md - OpenGrok cross reference for /petsc/doc/manual/performance.md

Lines Matching refs:socket
42 :alt: Memory bandwidth obtained on Intel hardware (dual socket except KNL) over the
48 Memory bandwidth obtained on Intel hardware (dual socket except KNL)
57 dual-socket system equipped with Haswell 12-core Xeon CPUs achieves more
59 processes per socket (8 total), cf. {numref}`fig_stream_intel`.
69 on a dual-socket system equipped with two six-core-CPUs with
106 CPUs in nodes with more than one CPU socket are internally connected via
110 memory is accessed in a non-uniform way: A process running on one socket
112 requests for memory attached to a different CPU socket need to go
120 :alt: Schematic of a two-socket NUMA system. Processes should be spread across both
125 Schematic of a two-socket NUMA system. Processes should be spread
140 recommended placement of a 8-way parallel run on a four-socket machine
141 is to assign two processes to each CPU socket. To do so, one needs to
144 `lstopo` (part of the hwloc package) for the following two-socket
195 same socket and have a common L3 cache.
198 processes on the first socket and three processes on the second socket.
203 `--bind-to core --map-by socket` to `mpiexec`:
206 $ mpiexec -n 6 --bind-to core --map-by socket ./stream
217 the first socket (with IDs 0 and 12), process 1 is bound to the first
218 core on the second socket (IDs 6 and 18), and similarly for the
221 all MPI processes are located on the same socket, memory bandwidth drops
235 All processes are now mapped to cores on the same socket. As a result,
240 the results obtained by passing `--bind-to core --map-by socket`:
272 $ make streams MPI_BINDING="--bind-to core --map-by socket"
303 executed on the same socket. Only with 14 or more processes, the
325 are bound to the resources within each socket. Performance on the two