The On Ramp/Level 1/Partition

Chef

Chef is an open source SCOREC tool that is this group's primary tool to partition a problem domain to many subdomains. This is done to allow any array of compute nodes to each focus on solving the problem in parallel. That is, to divide the problem domain into subdomains to give to different computational workers. When we perform this step, all that is needed for inputs are the, 1) SCOREC mesh constructed from the output of the conversion and meshing steps, and 2) the number of parts (subdomains) to divide the problem up into. A generic workflow for this step is described below and is a good place to start when completing the workflow for the first time. As your studies continue, many aspects of the layout and modifiers will not be consistent with this On Ramp documentation.

Creating the serial case via Viz nodes

Create a 1-1-Chef subdirectory inside the PrepAndRun folder. Enter the new folder and copy over the files adapt.inp and runChef.sh from the /projects/tutorials/OnRamp folder. To learn more about the adapt.inp file, enter the command:

more adapt.inp

Pay particular attention to the splitFactor, attributeFileName, modelFileName, meshFileName bz2, and outMeshFileName bz2 specifications. These are often the first nobs that the new user learns to control. When it comes to first creating "checkpoint" files that are needed for the PHASTA executable. splitFactor will tell the Chef executable how many partitions each incoming part needs to be partitioned by. Immediately after the conversion step we want this to be set to 1 because we desire a single "checkpoint" file. attributeFileName will tell Chef where the file that contains the attributes of each geometry entity (region, face, line, and vertex info) is. Likewise, the modelFileName will tell Chef where the file that contains the geometry entity information is. The meshFileName bz2 and outMeshFileName bz2 specifications determine the directory where Chef will look to gather the SCOREC mesh and where to put the partitioned SCOREC mesh. Notice that this pre-written adapt.inp file points the meshFileName bz2 attribute to the mdsMesh_bz2 folder located in the simMeshToMdsMesh directory. Run the Chef bash script by entering the command ./runChef.sh 1. This should output a 1-procs_case directory, an mdsMesh_bz2 directory, and a chef.log file.

Partioning to 8 processes via Viz nodes

Like the creation of the serial case above, we will make another subdirectory named 8-1-Chef. Copy from the 1-1-Chef directory the adapt.inp as well as the runChef.sh bash script that were used into this 8-1-Chef subdirectory. We will alter the meshFileName bz2 and splitFactor specifications within newly created adapt.inp. Set the specification meshFileName bz2 to "../1-1-Chef/mdsMesh_bz2". Set the splitFactor to 8. Run Chef via the bash script ./runChef.sh 8. The executable should be able to read the mesh from the 1-1-Chef directory as well as the geom files as before.

Now that you have successfully partitioned the mesh, it's time to use PHASTA in the Level 1 Solve step.

Further Partitioning to N processes

By now you may be able to guess how to go about further partitioning your case into more and more partitions. If the user desired the end part count of 32 processes ( call that N), the next steps would mimic the latter with the subdirectory named 32-8-Chef and the splitFactor be set to 4. There are a few things to consider when partitioning an actual case you care about. It is tempting to make the successive split factors large as to have a fewer amount of repeated partions. However, the splitfactor should not often exceed 8 especially on an unstructured grid. If it does, Chef will have a more difficult time splitting the elements evenly amongst the parts, causing an imbalance in the workloads from process to process, and therefore reducing the effectiveness of the PHASTA run. Secondly, when the mesh is large, the partitioning that is performed on the viznode should not exceed 32 parts. The successive partitionings past this amount should be done on larger machines. Sometimes if a mesh is large enough, the preliminary serial partition should be performed on a fatter node that what is available on the viz nodes (ie CU's summit resource).