Wednesday, 20 January 2021

How does the numpy function `array_split` work mathematically?

I need to write a Python function that when passed an array, and an integer N, returns the contents of the array divided into N sub-arrays of equal size.

If the length of the array cannot be divided equally by N, the final sub-arrays must be of suitable length to accommodate the remaining elements.

Example: split_array(array=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], n=4)

Should output: [[1, 2, 3], [4, 5, 6], [7, 8], [9, 10]]

My research indicated that the numpy.array_split function does exactly that and I looked at the source code on GitHub and found that first it composes an array containing all the sizes of the sub-arrays which it then iterates over to split the original array.

Abridged sample from numpy.array_split

def array_split(ary, indices_or_sections, axis=0):
    # indices_or_sections is a scalar, not an array.
    Nsections = int(indices_or_sections)
    if Nsections <= 0:
        raise ValueError('number sections must be larger than 0.')
    Neach_section, extras = divmod(Ntotal, Nsections)
    section_sizes = ([0] +
                     extras * [Neach_section+1] +
                     (Nsections-extras) * [Neach_section])
    div_points = _nx.array(section_sizes, dtype=_nx.intp).cumsum()

    sub_arys = []
    sary = _nx.swapaxes(ary, axis, 0)
    for i in range(Nsections):
        st = div_points[i]
        end = div_points[i + 1]
        sub_arys.append(_nx.swapaxes(sary[st:end], axis, 0))

    return sub_arys

The only thing I'm struggling to understand is how the variable section_sizes is created mathematically. For the example split_array(array=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], n=4) it builds a list of sizes which would be [3, 3, 2, 2] which is exactly what I need but I don't understand why it works.

I understand that divmod(Ntotal, Nsections) will give you the quotient(Neach_section) and remainder(extras) of a division calculation.

But why does quotient * [remainder+1] always give you the exact right number of correctly-sized "quotient" sub-array sizes (In the case of this example [3, 3])?

Why does [quotient-remainder] * quotient give you the exact right number of correctly-sized "remainder" sub-array sizes (In the case of this example [2, 2])?

Could someone even just tell me what this kind of operation is called or what branch of mathematics this deals with as it's not something I've come across before.



from How does the numpy function `array_split` work mathematically?

No comments:

Post a Comment