1af0996ceSBarry Smith /* 2af0996ceSBarry Smith PetscKernelAXPY - X = X + alpha * Y 3af0996ceSBarry Smith 4af0996ceSBarry Smith Input Parameters: 5af0996ceSBarry Smith + X, Y - arrays 6af0996ceSBarry Smith . alpha - scalar 7af0996ceSBarry Smith - n - length of arrays 8af0996ceSBarry Smith 9af0996ceSBarry Smith Also PetscKernelAXPY2(), PetscKernelAXPY3(), PetscKernelAXPY4() 10af0996ceSBarry Smith 11af0996ceSBarry Smith */ 12af0996ceSBarry Smith 13a4963045SJacob Faibussowitsch #pragma once 14af0996ceSBarry Smith 15af0996ceSBarry Smith #if defined(PETSC_USE_FORTRAN_KERNEL_MAXPY) 16af0996ceSBarry Smith #if defined(PETSC_HAVE_FORTRAN_CAPS) 17af0996ceSBarry Smith #define fortranmaxpy4_ FORTRANMAXPY4 18af0996ceSBarry Smith #define fortranmaxpy3_ FORTRANMAXPY3 19af0996ceSBarry Smith #define fortranmaxpy2_ FORTRANMAXPY2 20af0996ceSBarry Smith #elif !defined(PETSC_HAVE_FORTRAN_UNDERSCORE) 21af0996ceSBarry Smith #define fortranmaxpy4_ fortranmaxpy4 22af0996ceSBarry Smith #define fortranmaxpy3_ fortranmaxpy3 23af0996ceSBarry Smith #define fortranmaxpy2_ fortranmaxpy2 24af0996ceSBarry Smith #endif 25ce3b7df2SJacob Faibussowitsch PETSC_EXTERN void fortranmaxpy4_(void *, const void *, const void *, const void *, const void *, const void *, const void *, const void *, const void *, const PetscInt *); 26ce3b7df2SJacob Faibussowitsch PETSC_EXTERN void fortranmaxpy3_(void *, const void *, const void *, const void *, const void *, const void *, const void *, const PetscInt *); 27ce3b7df2SJacob Faibussowitsch PETSC_EXTERN void fortranmaxpy2_(void *, const void *, const void *, const void *, const void *, const PetscInt *); 28af0996ceSBarry Smith #endif 29af0996ceSBarry Smith #include <petscblaslapack.h> 30af0996ceSBarry Smith 31af0996ceSBarry Smith #if defined(PETSC_USE_FORTRAN_KERNEL_MAXPY) 329371c9d4SSatish Balay #define PetscKernelAXPY(U, a1, p1, n) \ 339371c9d4SSatish Balay { \ 349371c9d4SSatish Balay PetscBLASInt one = 1; \ 359371c9d4SSatish Balay PetscBLASInt nn = (PetscBLASInt)n; \ 369371c9d4SSatish Balay PetscCallBLAS("BLASaxpy", BLASaxpy_(&nn, &a1, p1, &one, U, &one)); \ 379371c9d4SSatish Balay } 389371c9d4SSatish Balay #define PetscKernelAXPY2(U, a1, a2, p1, p2, n) \ 39d71ae5a4SJacob Faibussowitsch { \ 40d71ae5a4SJacob Faibussowitsch fortranmaxpy2_(U, &a1, &a2, p1, p2, &n); \ 41d71ae5a4SJacob Faibussowitsch } 429371c9d4SSatish Balay #define PetscKernelAXPY3(U, a1, a2, a3, p1, p2, p3, n) \ 43d71ae5a4SJacob Faibussowitsch { \ 44d71ae5a4SJacob Faibussowitsch fortranmaxpy3_(U, &a1, &a2, &a3, p1, p2, p3, &n); \ 45d71ae5a4SJacob Faibussowitsch } 469371c9d4SSatish Balay #define PetscKernelAXPY4(U, a1, a2, a3, a4, p1, p2, p3, p4, n) \ 47d71ae5a4SJacob Faibussowitsch { \ 48d71ae5a4SJacob Faibussowitsch fortranmaxpy4_(U, &a1, &a2, &a3, &a4, p1, p2, p3, p4, &n); \ 49d71ae5a4SJacob Faibussowitsch } 50af0996ceSBarry Smith 51af0996ceSBarry Smith #elif defined(PETSC_USE_UNROLL_KERNELS) 52af0996ceSBarry Smith 539371c9d4SSatish Balay #define PetscKernelAXPY(U, Alpha, P, n) \ 549371c9d4SSatish Balay { \ 55af0996ceSBarry Smith switch (n & 0x3) { \ 56d71ae5a4SJacob Faibussowitsch case 3: \ 57d71ae5a4SJacob Faibussowitsch *U++ += Alpha * *P++; \ 58d71ae5a4SJacob Faibussowitsch case 2: \ 59d71ae5a4SJacob Faibussowitsch *U++ += Alpha * *P++; \ 60d71ae5a4SJacob Faibussowitsch case 1: \ 61d71ae5a4SJacob Faibussowitsch *U++ += Alpha * *P++; \ 62d71ae5a4SJacob Faibussowitsch n -= 4; \ 63d71ae5a4SJacob Faibussowitsch case 0: \ 64d71ae5a4SJacob Faibussowitsch break; \ 659371c9d4SSatish Balay } \ 669371c9d4SSatish Balay while (n > 0) { \ 679371c9d4SSatish Balay U[0] += Alpha * P[0]; \ 689371c9d4SSatish Balay U[1] += Alpha * P[1]; \ 699371c9d4SSatish Balay U[2] += Alpha * P[2]; \ 709371c9d4SSatish Balay U[3] += Alpha * P[3]; \ 719371c9d4SSatish Balay U += 4; \ 729371c9d4SSatish Balay P += 4; \ 739371c9d4SSatish Balay n -= 4; \ 749371c9d4SSatish Balay } \ 759371c9d4SSatish Balay } 769371c9d4SSatish Balay #define PetscKernelAXPY2(U, a1, a2, p1, p2, n) \ 779371c9d4SSatish Balay { \ 78af0996ceSBarry Smith switch (n & 0x3) { \ 79d71ae5a4SJacob Faibussowitsch case 3: \ 80d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++; \ 81d71ae5a4SJacob Faibussowitsch case 2: \ 82d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++; \ 83d71ae5a4SJacob Faibussowitsch case 1: \ 84d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++; \ 85d71ae5a4SJacob Faibussowitsch n -= 4; \ 86d71ae5a4SJacob Faibussowitsch case 0: \ 87d71ae5a4SJacob Faibussowitsch break; \ 889371c9d4SSatish Balay } \ 899371c9d4SSatish Balay while (n > 0) { \ 909371c9d4SSatish Balay U[0] += a1 * p1[0] + a2 * p2[0]; \ 919371c9d4SSatish Balay U[1] += a1 * p1[1] + a2 * p2[1]; \ 929371c9d4SSatish Balay U[2] += a1 * p1[2] + a2 * p2[2]; \ 939371c9d4SSatish Balay U[3] += a1 * p1[3] + a2 * p2[3]; \ 949371c9d4SSatish Balay U += 4; \ 959371c9d4SSatish Balay p1 += 4; \ 969371c9d4SSatish Balay p2 += 4; \ 979371c9d4SSatish Balay n -= 4; \ 989371c9d4SSatish Balay } \ 999371c9d4SSatish Balay } 1009371c9d4SSatish Balay #define PetscKernelAXPY3(U, a1, a2, a3, p1, p2, p3, n) \ 1019371c9d4SSatish Balay { \ 102af0996ceSBarry Smith switch (n & 0x3) { \ 103d71ae5a4SJacob Faibussowitsch case 3: \ 104d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++; \ 105d71ae5a4SJacob Faibussowitsch case 2: \ 106d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++; \ 107d71ae5a4SJacob Faibussowitsch case 1: \ 108d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++; \ 109d71ae5a4SJacob Faibussowitsch n -= 4; \ 110d71ae5a4SJacob Faibussowitsch case 0: \ 111d71ae5a4SJacob Faibussowitsch break; \ 1129371c9d4SSatish Balay } \ 1139371c9d4SSatish Balay while (n > 0) { \ 1149371c9d4SSatish Balay U[0] += a1 * p1[0] + a2 * p2[0] + a3 * p3[0]; \ 1159371c9d4SSatish Balay U[1] += a1 * p1[1] + a2 * p2[1] + a3 * p3[1]; \ 1169371c9d4SSatish Balay U[2] += a1 * p1[2] + a2 * p2[2] + a3 * p3[2]; \ 1179371c9d4SSatish Balay U[3] += a1 * p1[3] + a2 * p2[3] + a3 * p3[3]; \ 1189371c9d4SSatish Balay U += 4; \ 1199371c9d4SSatish Balay p1 += 4; \ 1209371c9d4SSatish Balay p2 += 4; \ 1219371c9d4SSatish Balay p3 += 4; \ 1229371c9d4SSatish Balay n -= 4; \ 1239371c9d4SSatish Balay } \ 1249371c9d4SSatish Balay } 1259371c9d4SSatish Balay #define PetscKernelAXPY4(U, a1, a2, a3, a4, p1, p2, p3, p4, n) \ 1269371c9d4SSatish Balay { \ 127af0996ceSBarry Smith switch (n & 0x3) { \ 128d71ae5a4SJacob Faibussowitsch case 3: \ 129d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++ + a4 * *p4++; \ 130d71ae5a4SJacob Faibussowitsch case 2: \ 131d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++ + a4 * *p4++; \ 132d71ae5a4SJacob Faibussowitsch case 1: \ 133d71ae5a4SJacob Faibussowitsch *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++ + a4 * *p4++; \ 134d71ae5a4SJacob Faibussowitsch n -= 4; \ 135d71ae5a4SJacob Faibussowitsch case 0: \ 136d71ae5a4SJacob Faibussowitsch break; \ 1379371c9d4SSatish Balay } \ 1389371c9d4SSatish Balay while (n > 0) { \ 1399371c9d4SSatish Balay U[0] += a1 * p1[0] + a2 * p2[0] + a3 * p3[0] + a4 * p4[0]; \ 1409371c9d4SSatish Balay U[1] += a1 * p1[1] + a2 * p2[1] + a3 * p3[1] + a4 * p4[1]; \ 1419371c9d4SSatish Balay U[2] += a1 * p1[2] + a2 * p2[2] + a3 * p3[2] + a4 * p4[2]; \ 1429371c9d4SSatish Balay U[3] += a1 * p1[3] + a2 * p2[3] + a3 * p3[3] + a4 * p4[3]; \ 1439371c9d4SSatish Balay U += 4; \ 1449371c9d4SSatish Balay p1 += 4; \ 1459371c9d4SSatish Balay p2 += 4; \ 1469371c9d4SSatish Balay p3 += 4; \ 1479371c9d4SSatish Balay p4 += 4; \ 1489371c9d4SSatish Balay n -= 4; \ 1499371c9d4SSatish Balay } \ 1509371c9d4SSatish Balay } 151af0996ceSBarry Smith 152af0996ceSBarry Smith #elif defined(PETSC_USE_WHILE_KERNELS) 153af0996ceSBarry Smith 1549371c9d4SSatish Balay #define PetscKernelAXPY(U, a1, p1, n) \ 1559371c9d4SSatish Balay { \ 1569371c9d4SSatish Balay while (n--) *U++ += a1 * *p1++; \ 1579371c9d4SSatish Balay } 1589371c9d4SSatish Balay #define PetscKernelAXPY2(U, a1, a2, p1, p2, n) \ 1599371c9d4SSatish Balay { \ 1609371c9d4SSatish Balay while (n--) *U++ += a1 * *p1++ + a2 * *p2++; \ 1619371c9d4SSatish Balay } 1629371c9d4SSatish Balay #define PetscKernelAXPY3(U, a1, a2, a3, p1, p2, p3, n) \ 1639371c9d4SSatish Balay { \ 1649371c9d4SSatish Balay while (n--) *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++; \ 1659371c9d4SSatish Balay } 1669371c9d4SSatish Balay #define PetscKernelAXPY4(U, a1, a2, a3, a4, p1, p2, p3, p4, n) \ 1679371c9d4SSatish Balay { \ 1689371c9d4SSatish Balay while (n--) *U++ += a1 * *p1++ + a2 * *p2++ + a3 * *p3++ + a4 * *p4++; \ 1699371c9d4SSatish Balay } 170af0996ceSBarry Smith 171af0996ceSBarry Smith #elif defined(PETSC_USE_BLAS_KERNELS) 172af0996ceSBarry Smith 1739371c9d4SSatish Balay #define PetscKernelAXPY(U, a1, p1, n) \ 1749371c9d4SSatish Balay { \ 1759371c9d4SSatish Balay PetscBLASInt one = 1; \ 1769371c9d4SSatish Balay PetscBLASInt nn = (PetscBLASInt)n; \ 1779371c9d4SSatish Balay PetscCallBLAS("BLASaxpy", BLASaxpy_(&nn, &a1, p1, &one, U, &one)); \ 1789371c9d4SSatish Balay } 1799371c9d4SSatish Balay #define PetscKernelAXPY2(U, a1, a2, p1, p2, n) \ 1809371c9d4SSatish Balay { \ 1819371c9d4SSatish Balay PetscKernelAXPY(U, a1, p1, n); \ 1829371c9d4SSatish Balay PetscKernelAXPY(U, a2, p2, n); \ 1839371c9d4SSatish Balay } 1849371c9d4SSatish Balay #define PetscKernelAXPY3(U, a1, a2, a3, p1, p2, p3, n) \ 1859371c9d4SSatish Balay { \ 1869371c9d4SSatish Balay PetscKernelAXPY2(U, a1, a2, p1, p2, n); \ 1879371c9d4SSatish Balay PetscKernelAXPY(U, a3, p3, n); \ 1889371c9d4SSatish Balay } 1899371c9d4SSatish Balay #define PetscKernelAXPY4(U, a1, a2, a3, a4, p1, p2, p3, p4, n) \ 1909371c9d4SSatish Balay { \ 1919371c9d4SSatish Balay PetscKernelAXPY2(U, a1, a2, p1, p2, n); \ 1929371c9d4SSatish Balay PetscKernelAXPY2(U, a3, a4, p3, p4, n); \ 1939371c9d4SSatish Balay } 194af0996ceSBarry Smith 195af0996ceSBarry Smith #else 196af0996ceSBarry Smith 1979371c9d4SSatish Balay #define PetscKernelAXPY(U, a1, p1, n) \ 198ce3b7df2SJacob Faibussowitsch do { \ 199ce3b7df2SJacob Faibussowitsch const PetscInt _n = n; \ 200ce3b7df2SJacob Faibussowitsch const PetscScalar _a1 = a1; \ 201ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p1 = p1; \ 202ce3b7df2SJacob Faibussowitsch PetscScalar *PETSC_RESTRICT _U = U; \ 2038d031ccaSJunchao Zhang PetscPragmaUseOMPKernels(parallel for) \ 204*a88f9afeSZach Atkins for (PetscInt __i = 0; __i < _n; __i++) _U[__i] += _a1 * _p1[__i]; \ 205ce3b7df2SJacob Faibussowitsch } while (0) 2069371c9d4SSatish Balay #define PetscKernelAXPY2(U, a1, a2, p1, p2, n) \ 207ce3b7df2SJacob Faibussowitsch do { \ 208ce3b7df2SJacob Faibussowitsch const PetscInt _n = n; \ 209ce3b7df2SJacob Faibussowitsch const PetscScalar _a1 = a1; \ 210ce3b7df2SJacob Faibussowitsch const PetscScalar _a2 = a2; \ 211ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p1 = p1; \ 212ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p2 = p2; \ 213ce3b7df2SJacob Faibussowitsch PetscScalar *PETSC_RESTRICT _U = U; \ 2148d031ccaSJunchao Zhang PetscPragmaUseOMPKernels(parallel for) \ 215ce3b7df2SJacob Faibussowitsch for (PetscInt __i = 0; __i < _n; __i++) _U[__i] += _a1 * _p1[__i] + _a2 * _p2[__i]; \ 216ce3b7df2SJacob Faibussowitsch } while (0) 2179371c9d4SSatish Balay #define PetscKernelAXPY3(U, a1, a2, a3, p1, p2, p3, n) \ 218ce3b7df2SJacob Faibussowitsch do { \ 219ce3b7df2SJacob Faibussowitsch const PetscInt _n = n; \ 220ce3b7df2SJacob Faibussowitsch const PetscScalar _a1 = a1; \ 221ce3b7df2SJacob Faibussowitsch const PetscScalar _a2 = a2; \ 222ce3b7df2SJacob Faibussowitsch const PetscScalar _a3 = a3; \ 223ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p1 = p1; \ 224ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p2 = p2; \ 225ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p3 = p3; \ 226ce3b7df2SJacob Faibussowitsch PetscScalar *PETSC_RESTRICT _U = U; \ 2278d031ccaSJunchao Zhang PetscPragmaUseOMPKernels(parallel for) \ 228ce3b7df2SJacob Faibussowitsch for (PetscInt __i = 0; __i < _n; __i++) _U[__i] += _a1 * _p1[__i] + _a2 * _p2[__i] + _a3 * _p3[__i]; \ 229ce3b7df2SJacob Faibussowitsch } while (0) 2309371c9d4SSatish Balay #define PetscKernelAXPY4(U, a1, a2, a3, a4, p1, p2, p3, p4, n) \ 231ce3b7df2SJacob Faibussowitsch do { \ 232ce3b7df2SJacob Faibussowitsch const PetscInt _n = n; \ 233ce3b7df2SJacob Faibussowitsch const PetscScalar _a1 = a1; \ 234ce3b7df2SJacob Faibussowitsch const PetscScalar _a2 = a2; \ 235ce3b7df2SJacob Faibussowitsch const PetscScalar _a3 = a3; \ 236ce3b7df2SJacob Faibussowitsch const PetscScalar _a4 = a4; \ 237ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p1 = p1; \ 238ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p2 = p2; \ 239ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p3 = p3; \ 240ce3b7df2SJacob Faibussowitsch const PetscScalar *PETSC_RESTRICT _p4 = p4; \ 241ce3b7df2SJacob Faibussowitsch PetscScalar *PETSC_RESTRICT _U = U; \ 2428d031ccaSJunchao Zhang PetscPragmaUseOMPKernels(parallel for) \ 243ce3b7df2SJacob Faibussowitsch for (PetscInt __i = 0; __i < _n; __i++) _U[__i] += _a1 * _p1[__i] + _a2 * _p2[__i] + _a3 * _p3[__i] + _a4 * _p4[__i]; \ 244ce3b7df2SJacob Faibussowitsch } while (0) 245af0996ceSBarry Smith 246af0996ceSBarry Smith #endif 247