Short description
I want to walk along a numpy 2D array starting from different points in specified directions (either 1 or -1) until a column changes (see below)
Current code
First let's generate a dataset:
# Generate big random dataset
# first column is id and second one is a number
np.random.seed(123)
c1 = np.random.randint(0,100,size = 1000000)
c2 = np.random.randint(0,20,size = 1000000)
c3 = np.random.choice([1,-1],1000000 )
m = np.vstack((c1, c2, c3)).T
m = m[m[:,0].argsort()]
Then I wrote the following code that starts at specific rows in the matrix (start_points
) then keeps extending in the specified direction (direction_array
) until the metadata changes:
def walk(mat, start_array):
start_mat = mat[start_array]
metadata = start_mat[:,1]
direction_array = start_mat[:,2]
walk_array = start_array
while True:
walk_array = np.add(walk_array, direction_array)
try:
walk_mat = mat[walk_array]
walk_metadata = walk_mat[:,1]
if sorted(metadata) != sorted(walk_metadata):
raise IndexError
except IndexError:
return start_mat, mat[walk_array + (direction_array *-1)]
s = time.time()
for i in range(100000):
start_points = np.random.randint(0,1000000,size = 3)
res = walk(m, start_points)
Question
While the above code works fine I think there must be an easier/more elegant way to walk along a numpy 2D array from different start points until the value of another column changes? This for example requires me to slice the input array for every step in the while loop which seems quite inefficient (especially when I have to run walk
millions of times).
from Walk along 2D numpy array as long as values remain the same
No comments:
Post a Comment