Difference between revisions of "Debugging"
|  (→Debugging Strategies) |  (→General) | ||
| Line 12: | Line 12: | ||
| ''Wrong results:''  A lack of convergence or slow divergence of a problems that is expected or known to work with the given inputs  is not uncommon for code development but it is far and away the hardest bug to find.  The most effective strategy here is usually to compare against a working code that does not exhibit this behavior and be sure that all of the "deviations" from the known working code are expected. | ''Wrong results:''  A lack of convergence or slow divergence of a problems that is expected or known to work with the given inputs  is not uncommon for code development but it is far and away the hardest bug to find.  The most effective strategy here is usually to compare against a working code that does not exhibit this behavior and be sure that all of the "deviations" from the known working code are expected. | ||
| − | ''Segmentation fault:''  Here, there is typically a corruption of memory such as an array going out of bounds,  Many of the debuggers mentioned below will stop at the location of the segmentation fault but be aware that this is not necessarily the location of the bug.  In many cases, memory corruption occurred in a non-fatal way at some early stage of the code and the place the code reports the segmentation fault is were the "dead body" is found and you must look at the corrupted array and look backward to see where it was modified which may lead to other arrays that were incorrectly modified etc.  This is especially true with indirect addressing arrays and memory pointers.  You should expect the compiling in debug mode will "move" the problem to a different location and, sadly in some situations, completely hide the problem (e.g., runs through successfully).  This is because when a code is compiled in debug mode, extra "padding" is put into the code to track the variables and you may be corrupting only this padding and not vital code.  '''''Valgrind''''' (someone provide  | + | ''Segmentation fault:''  Here, there is typically a corruption of memory such as an array going out of bounds,  Many of the debuggers mentioned below will stop at the location of the segmentation fault but be aware that this is not necessarily the location of the bug.  In many cases, memory corruption occurred in a non-fatal way at some early stage of the code and the place the code reports the segmentation fault is were the "dead body" is found and you must look at the corrupted array and look backward to see where it was modified which may lead to other arrays that were incorrectly modified etc.  This is especially true with indirect addressing arrays and memory pointers.  You should expect the compiling in debug mode will "move" the problem to a different location and, sadly in some situations, completely hide the problem (e.g., runs through successfully).  This is because when a code is compiled in debug mode, extra "padding" is put into the code to track the variables and you may be corrupting only this padding and not vital code.  '''''[http://fluid.colorado.edu/wiki/index.php/Valgrind Valgrind]''''' (someone provide a wiki tutorial on its use please) can be very effective in finding this type of bug.  This type of fault can also occur when you run out of memory for the machine you are running on.  Please use '''''top''''' while your job is running to check for this type of failure. | 
| ''Trapped unexpected behavior:''  Here checks were placed in the code to stop or report some information when execution led to an unexpected result (e.g., a negative volume computation or a residual that is not a number).  In general we like to put these kind of things in our code to make debugging easier but their is always a tradeoff between the time it takes to instrument the code, the effect the checks have on performance, and the benefit to debugging.  One compromise that is often employed is to put these checks inside of a compiler flag so that they are compiled in ONLY when running in debug mode.  '''MAJOR TIP:  When a code crashes ALWAYS recompile it in debug mode and rerun it as you may then get more information about where the bug is (e.g., go from Seg Fault to Trapped unexpected behavior).''' | ''Trapped unexpected behavior:''  Here checks were placed in the code to stop or report some information when execution led to an unexpected result (e.g., a negative volume computation or a residual that is not a number).  In general we like to put these kind of things in our code to make debugging easier but their is always a tradeoff between the time it takes to instrument the code, the effect the checks have on performance, and the benefit to debugging.  One compromise that is often employed is to put these checks inside of a compiler flag so that they are compiled in ONLY when running in debug mode.  '''MAJOR TIP:  When a code crashes ALWAYS recompile it in debug mode and rerun it as you may then get more information about where the bug is (e.g., go from Seg Fault to Trapped unexpected behavior).''' | ||
| − | |||
| == '''Choice of Debugger''' == | == '''Choice of Debugger''' == | ||
Revision as of 14:08, 24 August 2011
Introduction
Debugging can be a very difficult skill to learn. It requires patience, persistence and determination but it can certainly be aided by skills and tips gained from hard earned experience.
The goal of this wiki is to organize our groups experience with debugging in general and specific tips for debugging our software.
General
There are broadly three types of bugs that we encounter: 1) wrong results, 2) segmentation fault, and 3) trapped unexpected behavior. These are ordered in decreasing difficulty as will be clear shortly.
Wrong results: A lack of convergence or slow divergence of a problems that is expected or known to work with the given inputs is not uncommon for code development but it is far and away the hardest bug to find. The most effective strategy here is usually to compare against a working code that does not exhibit this behavior and be sure that all of the "deviations" from the known working code are expected.
Segmentation fault: Here, there is typically a corruption of memory such as an array going out of bounds, Many of the debuggers mentioned below will stop at the location of the segmentation fault but be aware that this is not necessarily the location of the bug. In many cases, memory corruption occurred in a non-fatal way at some early stage of the code and the place the code reports the segmentation fault is were the "dead body" is found and you must look at the corrupted array and look backward to see where it was modified which may lead to other arrays that were incorrectly modified etc. This is especially true with indirect addressing arrays and memory pointers. You should expect the compiling in debug mode will "move" the problem to a different location and, sadly in some situations, completely hide the problem (e.g., runs through successfully). This is because when a code is compiled in debug mode, extra "padding" is put into the code to track the variables and you may be corrupting only this padding and not vital code. Valgrind (someone provide a wiki tutorial on its use please) can be very effective in finding this type of bug. This type of fault can also occur when you run out of memory for the machine you are running on. Please use top while your job is running to check for this type of failure.
Trapped unexpected behavior: Here checks were placed in the code to stop or report some information when execution led to an unexpected result (e.g., a negative volume computation or a residual that is not a number). In general we like to put these kind of things in our code to make debugging easier but their is always a tradeoff between the time it takes to instrument the code, the effect the checks have on performance, and the benefit to debugging. One compromise that is often employed is to put these checks inside of a compiler flag so that they are compiled in ONLY when running in debug mode. MAJOR TIP: When a code crashes ALWAYS recompile it in debug mode and rerun it as you may then get more information about where the bug is (e.g., go from Seg Fault to Trapped unexpected behavior).
Choice of Debugger
The first choice is what tool to debug with. Your weapons range from graphical debuggers like totalview and idb, to command line debuggers like gdb, to print statements. Different bugs/codes are handled best by different tools. Launching and effectively getting help and/or running graphical debuggers are covered in separate wikis linked to the names in the list. Here we only give a general overview of some experience (grow it!!) with each.
Graphical debuggers are best for marching through the code. They can really help a new user learn the flow of the code and see various arrays being changed. Any code that you want to have mastery of is worth spending time marching through the code in this way. Graphical debuggers are also effective at determining the location of bugs.
Command line debuggers link to gdb....some comments on when they are effective
Print statements When you don't have effective access to the two above, inserting print statements can be an effective way of finding a bug. It is more effective the more you know about your bug (e.g., when you know pretty close to where in the code the problem is and what variable is going bad). Included in this category is writing out entire fields in a format that our post-processors can read so that you can look at the whole field in something like paraview that may give you a clue as to where the problem of the category "Wrong results" might be occurring.
Debugging Strategies
Recursive bisection: When you know where the code fails (or is producing wrong results) but don't have much of a clue where and why, this can be a good strategy. What you need is some measure of correctness of some variable you can watch. Your strategy here is to first define the location in the code where it is believed to be correct and a second location where it is believed to be wrong. Then you bisect the code and check the middle location and determine if it is correct or wrong and then replace either the first or second location based on your answer (replace correct or wrong). Keep applying this recursively and eventually you have a small section of code to find your bug in (e.g., one line). This works extremely well for some bugs but it does have a few difficult assumptions that don't always hold true: 1) you know what is correct and wrong, 2) you know how to bisect your code (this can be hard before you actually know the code you are working with pretty well), 3) you are able to track a single variable through a function stack (not always possible) and/or know the code well enough to switch targets (and adjust measure of correctness).
March (and compare): Marching in a debugger is tedious and many think it is boring but if you get your mind in the right place you can learn a lot about the code while you are marching (tell yourself you are not wasting time finding a bug but using the debugger to better learn the code).  As above, this strategy is more effective the better you know how to differentiate a correct result from and incorrect result on each "line" of the code, something that experience improves. This is where "compare" is a powerful tool for beginners (and everyone else).  Your ideal scenario is that you have a version of the code saved that was working when you started your development.  Before you start, make that working source code directory not writable ( chmod u-w * ).  You then make two directories with your inputs and  launch two versions of the debugger in each of these directories, one with the correct executable and one with the broken executable.  You then march through the code looking for differences.  Of course if you meet the requirements of Recursive bisection you can apply it to both codes and combine these strategies but when you don't, marching line by line and comparing results can often be effective (and you will in the process learn the code well AND likely develop the understanding of correctness required to use recursive bisection).
Specific tips for SCOREC/Simmetrix Meshing and MeshAdapt software and associated databases: 
Because the mesh database is very rich and complex it can often be non-transparent to debugging.  To address this issues, functions have been created to help users debug.  If your debugging has identified a location and you  have successfully trapped the offending vertex, face, or region (typically you are iterating over one of these when some unexpected result occurs and is trapped), then you can call a function to get lots of information about that entity with
V_info(ivert) // gives information about the the vertex ivert F_info(iface) // gives information about the the face iface R_info(iregion) // gives information about the the region iregion
These functions can either be compiled into the code and printed from traps that catch the unexpected behavior (and thereby help you understand the geometric location of the problem) or, with many debuggers, they can be called from the current command line to give you more information while stepping through the code.
Once you find the offending geometric location, it is often possible to load the mesh (in cases where the mesher completed but some other program downstream of it fails) in paraview and draw spheres at the location of the vertices, zoom to the vertices, cut the mesh with extract cells by region filter centered on the bad location and then see what is wrong (perhaps using a few different normal planes to get a clear view). Often refining the mesh near this surface will improve element quality and fix the problem.
ADD TO ME
Specific tips for PHASTA: ADD TO ME
