c - Does Intel array notation and elementary functions vectorize well with Xeon Phi ISA? -


i try find proper material explains different ways write c/c++ source code can vectorized intel compiler using array notation , elementary functions. materials online take trivial examples: saxpy, reduction etc. there lack of explanation on how vectorize code has conditional branching or contains loop loop-dependence.

for example: there sequential code want run different arrays. matrix stored in major row format. columns of matrix computed compute_seq() function:

#define n      256 #define stride 256  __attribute__((vector))     inline void compute_seq(float *sum, float* a) {   int i;   *sum = 0.0f;   for(i=0; i<n; i++)      *sum += a[i*stride]; }  int main() {   // initialize   float *a = malloc(n*n*sizeof(float));   float sums[n];   // following line not going valid, somthing this:   compute_seq(sums[:],*(a[0:n:1])); } 

any comments appreciated.

here corrected version of example.

__attribute__((vector(linear(sum),linear(a)))) inline void compute_seq(float *sum, float* a) {   int i;   *sum = 0.0f;   for(i=0; i<n; i++)     *sum += a[i*stride]; }  int main() {   // initialize   float *a = malloc(n*n*sizeof(float));   float sums[n];   compute_seq(&sums[:],&a[0:n:n]); } 

the important change @ call site. expression &sums[:] creates array section consisting of &sums[0], &sums[1], &sums[2], ... &sums[n-1]. expression &a[0:n:n] creates array section consisting of &a[0*n], &a[1*n], &a[2*n], ...&a[(n-1)*n].

i added 2 linear clauses vector attribute tell compiler generate clone optimized case arguments arithmetic sequences, in example. example, (and vector attribute) redundant since compiler can see both callee , call site in same translation unit , figure out particulars itself. if compute_seq defined in translation unit, attribute might help.

array notation work in progress. icc 14.0 beta compiled example intel(r) xeon phi(tm) without complaint. icc 13.0 update 3 reported couldn't vectorize function ("dereference complex"). perversely, leaving vector attribute off shut report, because compiler can vectorize after inlining.

i use compiler option "-opt-assume-safe-padding" when compiling intel(r) xeon phi(tm). may improve vector code quality. lets compiler assume page beyond accessed address safe touch, enabling instruction sequences otherwise disallowed.


Comments