1 subroutine commuInt (global, ilwork, n, code) 2c--------------------------------------------------------------------- 3c 4c This subroutine is responsible for interprocessor communication of 5c the residual and solution vectors. 6c 7c input: 8c global(nshg,n): global vector to be communicated. Note that 9c this vector is local to the processor, (i.e. 10c not distributed across processors) 11c ilwork(nlwork): this is the local interprocessor work array. 12c This array is local to the processor, (i.e. 13c each processor has a unique ilwork array. 14c n: second dimension of the array to be communicated 15c code: = 'in' for communicating with the residual 16c = 'out' for cummunicating the solution 17c 18c--------------------------------------------------------------------- 19c 20c The array ilwork describes the details of the communications. 21c Each communication step (call of this routine) consists of a 22c sequence of "tasks", where a task is defined as a communication 23c between two processors where data is exchanged. This would imply 24c that for a given processor, there will be as many tasks as there 25c are processors with which it must communicate. Details of the 26c ilwork array appear below. 27c 28c--------------------------------------------------------------------- 29c 30 include "common.h" 31 include "mpif.h" 32 include "auxmpi.h" 33 integer status(MPI_STATUS_SIZE) 34 integer stat(MPI_STATUS_SIZE, 2*maxtask), req(2*maxtask) 35 36 integer*8 global(nshg,n), 37 & rtemp(maxfront*n,maxtask) 38 integer ilwork(nlwork) 39 40 character*3 code 41 42 43 if (code .ne. 'in ' .and. code .ne. 'out') 44 & call error ('commuInt','code ',0) 45 46 if (n .eq. 1) then ! like a scalar 47 kdof = 16 48 else 49 call error ('commuInt','n ',n) 50 endif 51c return 52c... Note that when adding another kdof to the above set, we must 53c... also make changes in ctypes.f and auxmpi.h 54 55c--------------------------------------------------------------------- 56c ilwork(1): number of tasks 57c 58c The following information is contained in ilwork for each task: 59c itag: tag of the communication 60c iacc: == 0 if task is a send 61c == 1 if task is a recieve 62c iother: rank of processor with which this communication occurs 63c numseg: number of data "segments" to be sent or recieved. A 64c segment is defined as a continuous section of the global 65c vector to be communicated, (i.e. a group of nodes (or, 66c rather, "shape function coefficients") which occur 67c sequentially in the array global(nshg,n)). 68c isbeg: location of the first segment in the array owned by the 69c current processor. 70c 71c The two types of communication are 'in', where the residual is being 72c communicated, and 'out', where the solution is being communicated. 73c Note that when the type is 'out', senders recieve and recievers send. 74c 75c The following comment pertains to a communication of type 'in': 76c 77c If the task is a send, then all of the numseg segments are 78c sent with a single call to MPI_SEND. Where these segments live in 79c the array is built into the array sevsegtype, which is a common 80c array constructed in the subroutine "ctypes.f". In other words, 81c sevsegtype is a data type that describes the indices of the blocks 82c to be sent, in terms of there beginning index, and the length of 83c each segment. Using this, we can make a single send to take care of 84c all the segments for this task. 85c 86c If the task is a recieve, then once the vector is recieved, the 87c recieved segments must be added to the correct locations in the 88c current array. These locations are described in ilwork as the 89c beginning position, then the length of the segment. 90c 91c--------------------------------------------------------------------- 92 numtask = ilwork(1) 93 94 itkbeg = 1 95 m = 0 96 idl=0 97 98 DO itask = 1, numtask 99 m = m + 1 100 itag = ilwork (itkbeg + 1) 101 iacc = ilwork (itkbeg + 2) 102 iother = ilwork (itkbeg + 3) 103 numseg = ilwork (itkbeg + 4) 104 isgbeg = ilwork (itkbeg + 5) 105c 106c.... if iacc == 0, then this task is a send. 107c slave 108c 109 if (iacc .EQ. 0) then 110c 111c.... residual communication 112c 113 if (code .eq. 'in ') then 114 call MPI_ISEND(global(isgbeg, 1), 1, sevsegtype(itask,kdof), 115 & iother, itag, MPI_COMM_WORLD, req(m), ierr) 116 endif 117c 118c.... solution communication 119c 120 if (code .eq. 'out') then 121 call MPI_IRECV(global(isgbeg, 1), 1, sevsegtype(itask,kdof), 122 & iother, itag, MPI_COMM_WORLD, req(m), ierr) 123c call MPI_RECV(global(isgbeg,1), 1, sevsegtype(itask,kdof), 124c & iother, itag, MPI_COMM_WORLD, status, ierr) 125 endif 126c 127c.... if iacc == 1, then this task is a recieve. 128c master 129c 130 else 131 if (code .eq. 'in ') then 132c 133c.... determine the number of total number of nodes involved in this 134c communication (lfront), including all segments 135c 136 lfront = 0 137 do is = 1,numseg 138 lenseg = ilwork (itkbeg + 4 + 2*is) 139 lfront = lfront + lenseg 140 enddo 141c 142c.... recieve all segments for this task in a single step 143c 144 idl=idl+1 ! stands for i Do Later, the number to fix later 145 call MPI_IRECV(rtemp(1,idl), lfront*n, MPI_INTEGER, 146 & iother, itag, MPI_COMM_WORLD, req(m), ierr) 147 endif 148 if (code .eq. 'out') then 149 call MPI_ISEND(global(isgbeg, 1), 1, sevsegtype(itask,kdof), 150 & iother, itag, MPI_COMM_WORLD, req(m), ierr) 151 endif 152 endif 153 154 itkbeg = itkbeg + 4 + 2*numseg 155 156 enddo !! end tasks loop 157 158 call MPI_WAITALL(m, req, stat, ierr) 159 160c 161c Stuff added below is a delayed assembly of that which was communicated 162c above but due to the switch to non-blocking receivves could not be 163c assembled until after the waitall. Only necessary for commu "in" 164c 165 166 if(code .eq. 'in ') then 167 itkbeg=1 168 jdl=0 169 do j=1,numtask ! time to do all the segments that needed to be 170 ! assembled into the global vector 171 172 iacc = ilwork (itkbeg + 2) 173 numseg = ilwork (itkbeg + 4) 174 isgbeg = ilwork (itkbeg + 5) 175 if(iacc.eq.1) then 176 jdl=jdl+1 ! keep track of order of rtemp's 177c 178c... add the recieved data to the global array on the current processor. 179c Note that this involves splitting up the chunk of recieved data 180c into its correct segment locations for the current processor. 181c 182 itemp = 1 183 do idof = 1,n 184 do is = 1,numseg 185 isgbeg = ilwork (itkbeg + 3 + 2*is) 186 lenseg = ilwork (itkbeg + 4 + 2*is) 187 isgend = isgbeg + lenseg - 1 188 global(isgbeg:isgend,idof) = global(isgbeg:isgend,idof) 189 & + rtemp (itemp:itemp+lenseg-1,jdl) 190 itemp = itemp + lenseg 191 enddo 192 enddo 193 endif ! end of receive (iacc=1) 194 itkbeg = itkbeg + 4 + 2*numseg 195 enddo 196 endif ! commu "in" 197 return 198 end 199 200 201 202