Difference between revisions of "Debugging"
Line 52: | Line 52: | ||
ADD TO ME | ADD TO ME | ||
− | == '''Debugging | + | == '''Debugging with command line arguments''' == |
+ | '''addr2line command:''' | ||
+ | Sometimes there are bugs that are exclusive to high memory intensive problems that make debugging in common debug GUIs like totalview impossible. In this case, parameters like the amount of memory and maximum values of variable types (i.e. a 4-byte integer has a maximum value of 2,147,483,647) should be high contenders for errors. When there is an error output, it will point to the place where it crashed by naming the path to the executable and some hexadecimal extention like so: | ||
+ | /projects/jopa6460/SCOREC-core/build-14-190604dev_omp110/test/chef[0x6fabad] | ||
+ | |||
+ | When the command | ||
+ | |||
+ | addr2line -e /projects/jopa6460/SCOREC-core/build-14-190604dev_omp110/test/chef 0x6fabad | ||
+ | |||
+ | is run, the output is the path followed by the line in the code of where the code crashed. ex: | ||
+ | |||
+ | /projects/jopa6460/SCOREC-core/core/phasta/phCook.cc:44 | ||
+ | |||
+ | |||
+ | '''Debugging in parallel with GDB:''' | ||
A useful command to debug a parallel job running with a low number of processes on the viz nodes at Colorado is the following: | A useful command to debug a parallel job running with a low number of processes on the viz nodes at Colorado is the following: | ||
Revision as of 19:15, 5 November 2019
Contents
Introduction
Debugging can be a very difficult skill to learn. It requires patience, persistence and determination but it can certainly be aided by skills and tips gained from hard earned experience.
The goal of this wiki is to organize our groups experience with debugging in general and specific tips for debugging our software.
General
There are broadly three types of bugs that we encounter: 1) wrong results, 2) segmentation fault, and 3) trapped unexpected behavior. These are ordered in decreasing difficulty as will be clear shortly.
Wrong results: A lack of convergence or slow divergence of a problems that is expected or known to work with the given inputs is not uncommon for code development but it is far and away the hardest bug to find. The most effective strategy here is usually to compare against a working code that does not exhibit this behavior and be sure that all of the "deviations" from the known working code are expected.
Segmentation fault: Here, there is typically a corruption of memory such as an array going out of bounds, Many of the debuggers mentioned below will stop at the location of the segmentation fault but be aware that this is not necessarily the location of the bug. In many cases, memory corruption occurred in a non-fatal way at some early stage of the code and the place the code reports the segmentation fault is were the "dead body" is found and you must look at the corrupted array and look backward to see where it was modified which may lead to other arrays that were incorrectly modified etc. This is especially true with indirect addressing arrays and memory pointers. You should expect the compiling in debug mode will "move" the problem to a different location and, sadly in some situations, completely hide the problem (e.g., runs through successfully). This is because when a code is compiled in debug mode, extra "padding" is put into the code to track the variables and you may be corrupting only this padding and not vital code. Valgrind (someone provide a wiki tutorial on its use please) can be very effective in finding this type of bug. This type of fault can also occur when you run out of memory for the machine you are running on. Please use top while your job is running to check for this type of failure.
Trapped unexpected behavior: Here checks were placed in the code to stop or report some information when execution led to an unexpected result (e.g., a negative volume computation or a residual that is not a number). In general we like to put these kind of things in our code to make debugging easier but there is always a tradeoff between the time it takes to instrument the code, the effect the checks have on performance, and the benefit to debugging. One compromise that is often employed is to put these checks inside of a compiler flag so that they are compiled in ONLY when running in debug mode. MAJOR TIP: When a code crashes ALWAYS recompile it in debug mode and rerun it as you may then get more information about where the bug is (e.g., go from Seg Fault to Trapped unexpected behavior).
Choice of Debugger
The first choice is what tool to debug with. Your weapons range from graphical debuggers like totalview and idb, to command line debuggers like gdb, to print statements. Different bugs/codes are handled best by different tools. Launching and effectively getting help and/or running graphical debuggers are covered in separate wikis linked to the names in the list. Here we only give a general overview of some experience (grow it!!) with each.
Graphical debuggers are best for marching through the code. They can really help a new user learn the flow of the code and see various arrays being changed. Any code that you want to have mastery of is worth spending time marching through the code in this way. Graphical debuggers are also effective at determining the location of bugs.
Command line debuggers link to gdb....some comments on when they are effective
Print statements When you don't have effective access to the two above, inserting print statements can be an effective way of finding a bug. It is more effective the more you know about your bug (e.g., when you know pretty close to where in the code the problem is and what variable is going bad). Included in this category is writing out entire fields in a format that our post-processors can read so that you can look at the whole field in something like paraview that may give you a clue as to where the problem of the category "Wrong results" might be occurring.
Debugging Strategies
Recursive bisection: When you know where the code fails (or is producing wrong results) but don't have much of a clue where and why, this can be a good strategy. What you need is some measure of correctness of some variable you can watch. Your strategy here is to first define the location in the code where it is believed to be correct and a second location where it is believed to be wrong. Then you bisect the code and check the middle location and determine if it is correct or wrong and then replace either the first or second location based on your answer (replace correct or wrong). Keep applying this recursively and eventually you have a small section of code to find your bug in (e.g., one line). This works extremely well for some bugs but it does have a few difficult assumptions that don't always hold true: 1) you know what is correct and wrong, 2) you know how to bisect your code (this can be hard before you actually know the code you are working with pretty well), 3) you are able to track a single variable through a function stack (not always possible) and/or know the code well enough to switch targets (and adjust measure of correctness).
March (and compare): Marching in a debugger is tedious and many think it is boring but if you get your mind in the right place you can learn a lot about the code while you are marching (tell yourself you are not wasting time finding a bug but using the debugger to better learn the code). As above, this strategy is more effective the better you know how to differentiate a correct result from and incorrect result on each "line" of the code, something that experience improves. This is where "compare" is a powerful tool for beginners (and everyone else). Your ideal scenario is that you have a version of the code saved that was working when you started your development. Before you start, make that working source code directory not writable ( chmod u-w * ). You then make two directories with your inputs and launch two versions of the debugger in each of these directories, one with the correct executable and one with the broken executable. You then march through the code looking for differences. Of course if you meet the requirements of Recursive bisection you can apply it to both codes and combine these strategies but when you don't, marching line by line and comparing results can often be effective (and you will in the process learn the code well AND likely develop the understanding of correctness required to use recursive bisection).
Specific tips for SCOREC/Simmetrix Meshing and MeshAdapt software and associated databases:
Because the mesh database is very rich and complex it can often be non-transparent to debugging. To address this issues, functions have been created to help users debug. If your debugging has identified a location and you have successfully trapped the offending vertex, face, or region (typically you are iterating over one of these when some unexpected result occurs and is trapped), then you can call a function to get lots of information about that entity with
V_info(ivert) // gives information about the the vertex ivert F_info(iface) // gives information about the the face iface R_info(iregion) // gives information about the the region iregion
These functions can either be compiled into the code and printed from traps that catch the unexpected behavior (and thereby help you understand the geometric location of the problem) or, with many debuggers, they can be called from the current command line to give you more information while stepping through the code.
Once you find the offending geometric location, it is often possible to load the mesh (in cases where the mesher completed but some other program downstream of it fails) in paraview and draw spheres at the location of the vertices, zoom to the vertices, cut the mesh with extract cells by region filter centered on the bad location and then see what is wrong (perhaps using a few different normal planes to get a clear view). Often refining the mesh near this surface will improve element quality and fix the problem.
ADD TO ME
Specific tips for PHASTA: ADD TO ME
Debugging with command line arguments
addr2line command: Sometimes there are bugs that are exclusive to high memory intensive problems that make debugging in common debug GUIs like totalview impossible. In this case, parameters like the amount of memory and maximum values of variable types (i.e. a 4-byte integer has a maximum value of 2,147,483,647) should be high contenders for errors. When there is an error output, it will point to the place where it crashed by naming the path to the executable and some hexadecimal extention like so:
/projects/jopa6460/SCOREC-core/build-14-190604dev_omp110/test/chef[0x6fabad]
When the command
addr2line -e /projects/jopa6460/SCOREC-core/build-14-190604dev_omp110/test/chef 0x6fabad
is run, the output is the path followed by the line in the code of where the code crashed. ex:
/projects/jopa6460/SCOREC-core/core/phasta/phCook.cc:44
Debugging in parallel with GDB:
A useful command to debug a parallel job running with a low number of processes on the viz nodes at Colorado is the following:
mpirun -np <np> gnome-terminal --disable-factory -e 'gdb <exec name>'
This will open a terminal running gdb for every process of your application. You can then execute in every terminal the usual gdb commands.
In particular, for PHASTA, you can also type in your terminal
set catchDebugger=1 mpirun -np <np> <phasta exec name>
No need in this case to specify any additional arguments to the mpi command. it will launch debug windows automatically for you. Every instance of PHASTA gets trapped in an infinite loop in common/phasta.cc (see routine catchDebugger()). To release PHASTA, you need for every process to pause the execution and then
gdb> set debuggerPresent=1
for every process/terminal.
Debugging Seg Faults
Reading core files generated after a seg fault can be done with GDB using the command
gdb <phasta exec name> <path to core file>
From here, there are a number of options including "disas" which shows the disassembly of the function where the seg fault occurred. Alternatively, it can be rather convenient to view the disassembly in vim or a vim like editor:
objdump -D <phasta exec name> | vim -
Other Useful Wiki Pages
Mtrace - Memory Leaks
Address Sanitizer - Bad memory accesses