python - Iterate over all pairwise combinations of numpy array columns -


i have numpy array of size

arr.size = (200, 600, 20).  

i want compute scipy.stats.kendalltau on every pairwise combination of last 2 dimensions. example:

kendalltau(arr[:, 0, 0], arr[:, 1, 0]) kendalltau(arr[:, 0, 0], arr[:, 1, 1]) kendalltau(arr[:, 0, 0], arr[:, 1, 2]) ... kendalltau(arr[:, 0, 0], arr[:, 2, 0]) kendalltau(arr[:, 0, 0], arr[:, 2, 1]) kendalltau(arr[:, 0, 0], arr[:, 2, 2]) ... ... kendalltau(arr[:, 598, 20], arr[:, 599, 20]) 

such cover combinations of arr[:, i, xi] arr[:, j, xj] i < j , xi in [0,20), xj in [0, 20). (600 choose 2) * 400 individual calculations, since each takes 0.002 s on machine, shouldn't take longer day multiprocessing module.

what's best way go iterating on these columns (with i<j)? figure should avoid like

for in range(600):     j in range(i+1, 600):         xi in range(20):             xj in range(20): 

what numpythonic way of doing this?

edit: changed title since kendall tau isn't important question. realize like

import itertools i, j in it.combinations(xrange(600), 2):     xi, xj in product(xrange(20), xrange(20)): 

but there's got better, more vectorized way numpy.

the general way of vectorizing use broadcasting create cartesian product of set itself. in case have array arr of shape (200, 600, 20), take 2 views of it:

arr_x = arr[:, :, np.newaxis, np.newaxis, :] # shape (200, 600, 1, 1, 20) arr_y = arr[np.newaxis, np.newaxis, :, :, :] # shape (1, 1, 200, 600, 20) 

the above 2 lines have been expanded clarity, write equivalent:

arr_x = arr[:, :, none, none] arr_y = arr 

if have vectorized function, f, did broadcasting on last dimension, do:

out = f(arr[:, :, none, none], arr) 

and out array of shape (200, 600, 200, 600), out[i, j, k, l] holding value of f(arr[i, j], arr[k, l]). instance, if wanted compute pairwise inner products, do:

from numpy.core.umath_tests import inner1d  out = inner1d(arr[:, :, none, none], arr) 

unfortunately scipy.stats.kendalltau not vectorized this. according the docs

"if arrays not 1-d, flattened 1-d."

so cannot go this, , going wind doing python nested loops, explicitly writing them out, using itertools or disguising under np.vectorize. that's going slow, because of iteration on python variables, , because have python function per iteration step, both expensive actions.

do note that, when can go vectorized way, there obvious drawback: if function commutative, i.e. if f(a, b) == f(b, a), doing twice computations needed. depending on how expensive actual computation is, offset increase in speed not having python loops or function calls.


Comments

Popular posts from this blog

css - Which browser returns the correct result for getBoundingClientRect of an SVG element? -

gcc - Calling fftR4() in c from assembly -

.htaccess - Matching full URL in RewriteCond -