Sunday, 29 November 2020

Walk along 2D numpy array as long as values remain the same

Short description
I want to walk along a numpy 2D array starting from different points in specified directions (either 1 or -1) until a column changes (see below)

Current code

First let's generate a dataset:

# Generate big random dataset
# first column is id and second one is a number
np.random.seed(123)
c1 = np.random.randint(0,100,size = 1000000)
c2 = np.random.randint(0,20,size = 1000000)
c3 = np.random.choice([1,-1],1000000 )
m = np.vstack((c1, c2, c3)).T
m = m[m[:,0].argsort()]

Then I wrote the following code that starts at specific rows in the matrix (start_points) then keeps extending in the specified direction (direction_array) until the metadata changes:

 def walk(mat, start_array):
    start_mat       = mat[start_array]
    metadata        = start_mat[:,1]
    direction_array = start_mat[:,2]
    walk_array      = start_array
    
    while True:
        walk_array = np.add(walk_array, direction_array)
        try:
            walk_mat = mat[walk_array]
            walk_metadata = walk_mat[:,1]
            if sorted(metadata) != sorted(walk_metadata):
                raise IndexError
        except IndexError:
            return  start_mat, mat[walk_array + (direction_array *-1)]
            
s = time.time()
for i in range(100000):
    start_points = np.random.randint(0,1000000,size = 3)
    res = walk(m, start_points)

Question
While the above code works fine I think there must be an easier/more elegant way to walk along a numpy 2D array from different start points until the value of another column changes? This for example requires me to slice the input array for every step in the while loop which seems quite inefficient (especially when I have to run walk millions of times).



from Walk along 2D numpy array as long as values remain the same

No comments:

Post a Comment