Difference between revisions of "Partition"

Revision as of 19:34, 14 September 2020

Chef

Chef an open source SCOREC tool that is this group's primary tool to partition a problem domain to many subdomains. This is done to allow any array of compute nodes to each focus on solving the problem in parallel. That is, to divide the problem domain into subdomains to give to different computational workers. When we perform this step, all that is needed for inputs are the, 1) SCOREC mesh constructed from the output of the conversion and meshing steps, and 2) the number of parts (subdomains) to divide the problem up into. A generic workflow for this step is described below and is a good place to start when desiring to complete the workflow for the first time. As the readers' studies continue, many aspects of the layout and modifiers will not be consistent with this onRanp documentation.

Creating the serial case via Viz nodes

Within a folder, name it Chef, softlink to the <case>.smd file, naming the link as geom.smd. Also, have the path to the directory containing the 0.smb file that was gathered from the previous steps. Create a 1-1-Chef subdirectory that contains an adapt.inp file and a runChef.sh bash script. Edit the runChef.sh script to look something like this:

#!/bin/bash
/usr/local/openmpi/1.10.6-gnu49-thread/bin/mpirun -np $1 /projects/tools/SCOREC-core/build-viz003/test/chef 2>&1 | tee chef.log

or

#!/bin/bash
/usr/local/openmpi/1.10.6-gnu49-thread/bin/mpirun -np $1 /projects/tools/SCOREC-core/build-14-190416dev_omp/test/chef 2>&1 | tee chef.log

The adapt.inp file should contain specifications that looks as follows:

timeStepNumber 0 
ensa_dof 6
attributeFileName ../geom.smd
modelFileName ../geom.smd
meshFileName bz2:<path to .smb dir>
outMeshFileName bz2:mdsMesh_bz2/
adaptFlag 0
phastaIO 1
AdaptStrategy 7 
RecursiveUR 0
splitFactor 1
elementsPerMigration 10000
SolutionMigration 0
DisplacementMigration 0
Tetrahedronize 0
partitionMethod graph
dwalMigration 0
buildMapping 1

Pay particular attention to the splitFactor, attributeFileName, modelFileName, meshFileName bz2, and outMeshFileName bz2 specifications. These are often the first nobs that the new user learns to control. When it comes to first creating "checkpoint" files that are needed for the PHASTA executable. splitFactor will tell the Chef executable how many partitions each incoming part needs to be partitioned by. Immediately after the conversion step we want this to be set to 1 because we desire a single "checkpoint" file. attributeFileName will tell Chef where the file that contains the attributes of each geometry entity (region, face, line, and vertex info) is. Likewise, the modelFileName will tell Chef where the file that contains the geometry entity information is. For our purposes we will point Chef to the softlink that was created for the <case>.smd file. The meshFileName bz2 and outMeshFileName bz2 specifications determine the directory where Chef will look to gather the SCOREC mesh and where to put the partitioned SCOREC mesh.

Run the bash script by the command "./runChef.sh 1" is executed Chef should be able to load the input and output 1)the 1-procs_case directory containing numstart.dat, restart.dat.0.1, and geombc.1.dat, as well as 2) mdsMesh_bz2 directory containing 0.smb.

Partioning to N processes via Viz nodes

Like the creation of the serial case above, we will make another subdirectory named 8-1-Chef. Copy from the 1-1-Chef directory the adapt.inp as well as the runChef.sh bash script that were used into this 8-1-Chef subdirectory. We will alter the meshFileName bz2 and splitFactor specifications within newly created adapt.inp. Set the specification meshFileName bz2 to "../1-1-Chef/mdsMesh_bz2". Set the splitFactor to 8. Run Chef via the bash script "./runChef.sh 8". The executable should be able to read the mesh from the 1-1-Chef directory as well as the geom files as before.

Further Partitioning to N processes

By now you may be able to guess how to go about further partitioning your case into more and more partitions. If the user desired the end part count of 32 processes ( call that N), the next steps would mimic the latter with the subdirectory named 32-8-Chef and the splitFactor be set to 4. There are a few things to consider when partitioning an actual case you care about. Firstly, it is tempting to make the successive split factors large as to have a fewer amount of repeated partions. As a general rule of thumb however, the splitfactor should not often exceed 8 especially on an unstructured grid. If it does, Chef will have a more difficult time splitting the elements evenly amongst the parts, causing an imbalance in the workloads from process to process, and therefore reducing the efficacy of the PHASTA run. Secondly, when the mesh is large, the partitioning that is performed on the viznode should not exceed 32 parts. The successive partitionings past this amount should be done on larger machines. Sometimes if a mesh is large enough, the preliminary serial partition should be performed on a fatter node that what is available on the viz nodes (ie CU's summit resource).

A few helpful video tutorials

Video notes of PHASTA_workflow_RB.mkv

@@ Line 45: / Line 45: @@
 There are a few things to consider when partitioning an actual case you care about. Firstly, it is tempting to make the successive split factors large as to have a fewer amount of repeated partions. As a general rule of thumb however, the splitfactor should not often exceed 8 especially on an unstructured grid. If it does, Chef will have a more difficult time splitting the elements evenly amongst the parts, causing an imbalance in the workloads from process to process, and therefore reducing the efficacy of the PHASTA run. Secondly, when the mesh is large, the partitioning that is performed on the viznode should not exceed 32 parts. The successive partitionings past this amount should be done on larger machines. Sometimes if a mesh is large enough, the preliminary serial partition should be performed on a fatter node that what is available on the viz nodes (ie CU's summit resource).
 === A few helpful video tutorials===
+[[Tutorial_Video_Overviews#PHASTA_workflow_RB.mkv|Video notes of PHASTA_workflow_RB.mkv]]

Difference between revisions of "Partition"

Revision as of 19:34, 14 September 2020

Contents

Chef

Creating the serial case via Viz nodes

Partioning to N processes via Viz nodes

Further Partitioning to N processes

A few helpful video tutorials

Navigation menu

Views

Personal tools

Navigation

Search

Tools