c - Does Intel array notation and elementary functions vectorize well with Xeon Phi ISA? -
i try find proper material explains different ways write c/c++ source code can vectorized intel compiler using array notation , elementary functions. materials online take trivial examples: saxpy, reduction etc. there lack of explanation on how vectorize code has conditional branching or contains loop loop-dependence.
for example: there sequential code want run different arrays. matrix stored in major row format. columns of matrix computed compute_seq() function:
#define n 256 #define stride 256 __attribute__((vector)) inline void compute_seq(float *sum, float* a) { int i; *sum = 0.0f; for(i=0; i<n; i++) *sum += a[i*stride]; } int main() { // initialize float *a = malloc(n*n*sizeof(float)); float sums[n]; // following line not going valid, somthing this: compute_seq(sums[:],*(a[0:n:1])); }
any comments appreciated.
here corrected version of example.
__attribute__((vector(linear(sum),linear(a)))) inline void compute_seq(float *sum, float* a) { int i; *sum = 0.0f; for(i=0; i<n; i++) *sum += a[i*stride]; } int main() { // initialize float *a = malloc(n*n*sizeof(float)); float sums[n]; compute_seq(&sums[:],&a[0:n:n]); }
the important change @ call site. expression &sums[:] creates array section consisting of &sums[0], &sums[1], &sums[2], ... &sums[n-1]. expression &a[0:n:n] creates array section consisting of &a[0*n], &a[1*n], &a[2*n], ...&a[(n-1)*n].
i added 2 linear clauses vector attribute tell compiler generate clone optimized case arguments arithmetic sequences, in example. example, (and vector attribute) redundant since compiler can see both callee , call site in same translation unit , figure out particulars itself. if compute_seq defined in translation unit, attribute might help.
array notation work in progress. icc 14.0 beta compiled example intel(r) xeon phi(tm) without complaint. icc 13.0 update 3 reported couldn't vectorize function ("dereference complex"). perversely, leaving vector attribute off shut report, because compiler can vectorize after inlining.
i use compiler option "-opt-assume-safe-padding" when compiling intel(r) xeon phi(tm). may improve vector code quality. lets compiler assume page beyond accessed address safe touch, enabling instruction sequences otherwise disallowed.
Comments
Post a Comment