performance.md - OpenGrok cross reference for /petsc/doc/manual/performance.md

Lines Matching refs:memory
6 with PETSc, particularly on distributed-memory machines with multiple
15 each byte loaded or stored from global memory. Therefore, the
22 (*memory bandwidth limited*) rather than by the rate of floating point
25 This section discusses ways to maximize the memory bandwidth achieved by
43 :  number of processes used. One can get close to peak memory bandwidth with only a
49 over the number of processes used. One can get close to peak memory
56 cores is required to saturate the memory channels. For example, a
58 than 80 percent of achievable peak memory bandwidth with only four
64 PETSc provides a simple way to measure memory bandwidth for different
67 one can obtain on the given machine (not necessarily a shared memory
89 On this machine, one should expect a speed-up of typical memory
108 exchange as well as cache coherency. Because main memory on modern
109 systems is connected via the integrated memory controllers on each CPU,
110 memory is accessed in a non-uniform way: A process running on one socket
111 has direct access to the memory channels of the respective CPU, whereas
112 requests for memory attached to a different CPU socket need to go
113 through the high-speed fabric. Consequently, best aggregate memory
114 bandwidth on the node is obtained when the memory controllers on each
115 CPU are fully saturated. However, full saturation of memory channels is
116 only possible if the data is distributed across the different memory
129 Data in memory on modern machines is allocated by the operating system
130 based on a first-touch policy. That is, memory is not allocated at the
132 memory segment is actually touched (read or write). Upon first-touch,
133 memory is allocated on the memory channel associated with the respective
134 CPU the process is running on. Only if all memory on the respective CPU
135 is already in use (either allocated or as IO cache), memory available
138 Maximum memory bandwidth can be achieved by ensuring that processes are
221 all MPI processes are located on the same socket, memory bandwidth drops
236 only the first memory channel is fully saturated at 25.5 GB/sec.
320 For a typical, memory bandwidth-limited PETSc application, the primary
322 evenly distributed among sockets, and hence using all available memory
342 vs. farther apart to maximize available resources (memory channels,
396 memory leak checks, and memory corruption checks. Note that PETSc has no
510   their codes. Hundreds or thousands of memory allocations may be
553 PETSc provides tools to aid in understanding PETSc memory usage and detecting problems with
554 memory allocation, including leaks and use of uninitialized space. Internally, PETSc uses
555 the routines `PetscMalloc()` and `PetscFree()` for memory allocation; instead of directly calling `…
556 This allows PETSc to track its memory usage and perform error checking. Users are urged to use thes…
559 - The option `-malloc_debug` turns on PETSc's extensive runtime error checking of memory for corrup…
564 …in your shell startup file to automatically enable runtime check memory for developing code but not
567 …`-check_pointer_intensity 0` for long run debug runs that do not need extensive memory corruption …
570   `-malloc_dump` will print a list of memory locations that have not been freed at the
571   conclusion of a program. If all memory has been freed no message
577   is `-malloc_view`, which reports memory usage in all routines at the conclusion of the program.
585   `PetscMallocGetMaximumUsage()` for memory allocated by PETSc, or
587   for the total memory used by the program. Note that
591 - The option `-memory_view` provides a high-level view of all memory usage,
592   not just the memory used by `PetscMalloc()`, at the conclusion of the program.
595   memory was allocated and freed during each logged event. This is useful
596   to understand what phases of a computation require the most memory.
598 One can also use [Valgrind](http://valgrind.org) to track memory usage and find bugs, see {any}`FAQ…
687 - **Problem too large for physical memory size**: When timing a
689   between the total memory a process is using and the physical size of
690   the machine’s memory. One way to estimate the amount of memory used
693   memory usage, including any Fortran arrays in an application code.
705   with lower memory bandwidths (slow memory access) attempt to
715   other users on the machine, thrashing (using more virtual memory than
716   available physical memory), or paging in of the initial executable.