python - Iterate over all pairwise combinations of numpy array columns -
i have numpy array of size
arr.size = (200, 600, 20).
i want compute scipy.stats.kendalltau
on every pairwise combination of last 2 dimensions. example:
kendalltau(arr[:, 0, 0], arr[:, 1, 0]) kendalltau(arr[:, 0, 0], arr[:, 1, 1]) kendalltau(arr[:, 0, 0], arr[:, 1, 2]) ... kendalltau(arr[:, 0, 0], arr[:, 2, 0]) kendalltau(arr[:, 0, 0], arr[:, 2, 1]) kendalltau(arr[:, 0, 0], arr[:, 2, 2]) ... ... kendalltau(arr[:, 598, 20], arr[:, 599, 20])
such cover combinations of arr[:, i, xi]
arr[:, j, xj]
i < j
, xi in [0,20)
, xj in [0, 20)
. (600 choose 2) * 400
individual calculations, since each takes 0.002 s
on machine, shouldn't take longer day multiprocessing module.
what's best way go iterating on these columns (with i<j
)? figure should avoid like
for in range(600): j in range(i+1, 600): xi in range(20): xj in range(20):
what numpythonic way of doing this?
edit: changed title since kendall tau isn't important question. realize like
import itertools i, j in it.combinations(xrange(600), 2): xi, xj in product(xrange(20), xrange(20)):
but there's got better, more vectorized way numpy.
the general way of vectorizing use broadcasting create cartesian product of set itself. in case have array arr
of shape (200, 600, 20)
, take 2 views of it:
arr_x = arr[:, :, np.newaxis, np.newaxis, :] # shape (200, 600, 1, 1, 20) arr_y = arr[np.newaxis, np.newaxis, :, :, :] # shape (1, 1, 200, 600, 20)
the above 2 lines have been expanded clarity, write equivalent:
arr_x = arr[:, :, none, none] arr_y = arr
if have vectorized function, f
, did broadcasting on last dimension, do:
out = f(arr[:, :, none, none], arr)
and out
array of shape (200, 600, 200, 600)
, out[i, j, k, l]
holding value of f(arr[i, j], arr[k, l])
. instance, if wanted compute pairwise inner products, do:
from numpy.core.umath_tests import inner1d out = inner1d(arr[:, :, none, none], arr)
unfortunately scipy.stats.kendalltau
not vectorized this. according the docs
"if arrays not 1-d, flattened 1-d."
so cannot go this, , going wind doing python nested loops, explicitly writing them out, using itertools
or disguising under np.vectorize
. that's going slow, because of iteration on python variables, , because have python function per iteration step, both expensive actions.
do note that, when can go vectorized way, there obvious drawback: if function commutative, i.e. if f(a, b) == f(b, a)
, doing twice computations needed. depending on how expensive actual computation is, offset increase in speed not having python loops or function calls.
Comments
Post a Comment