Tuesday, 28 May 2019

Python difference between mutating and re-assigning a list ( _list = and _list[:] = )

So I frequently write code following a pattern like this:

_list = list(range(10)) # Or whatever
_list = [some_function(x) for x in _list]
_list = [some_other_function(x) for x in _list]

etc

I saw now on a different question a comment that explained how this approach creates a new list each time and it is better to mutate the existing list, like so:

_list[:] = [some_function(x) for x in _list]

It's the first time I've seen this explicit recommendation and I'm wondering what the implications are:

1) Does the mutation save memory? Presumably the references to the "old" list would drop to zero after a re-assignment and the "old" list is disregarded, but is there a delay before that happens where I'm potentially using more memory than I need to when I use re-assignment instead of mutating the list?

2) Is there a computational cost to using mutation? I suspect changing something inplace is more expensive than creating a new list and just dropping the old one?

In terms of safety, I wrote a script to test this:

def some_function(number: int):
    return number*10

def main():
    _list1 = list(range(10))
    _list2 = list(range(10))

    a = _list1
    b = _list2 

    _list1 = [some_function(x) for x in _list1]
    _list2[:] = [some_function(x) for x in _list2]

    print(f"list a: {a}")
    print(f"list b: {b}")


if __name__=="__main__":
    main()

Which outputs:

list a: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list b: [0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

So mutation does seem to have the drawback of being more likely to cause side effects. Although these might be desirable. Are there any PEPs that discuss this safety aspect, or other best practice guides?

Thank you.

EDIT: Conflicting Answers: So more tests on memory So I have received two conflicting answers so far. In the comments, jasonharper has written that the right hand side of an equation does not know about the left hand side, and therefore memory usage cannot possibly be affected by what appears on the left. However, in the answers, Masoud has written that "when [reassignment] is used, two new and old _lists with two different identities and values are created. Afterward, old _list is garbage collected. But when a container is mutated, every single value is retrieved, changed in CPU and updated one-by-one. So the list is not duplicated." This seems to indicate that there is a big memory cost to doing reassignment.

I decided to try using memory-profiler to dig deeper. Here is the test script:

from memory_profiler import profile


def normalise_number(number: int):
    return number%1000


def change_to_string(number: int):
    return "Number as a string: " + str(number) + "something" * number


def average_word_length(string: str):
    return len(string)/len(string.split())


@profile(precision=8)
def mutate_list(_list):
    _list[:] = [normalise_number(x) for x in _list]
    _list[:] = [change_to_string(x) for x in _list]
    _list[:] = [average_word_length(x) for x in _list]


@profile(precision=8)
def replace_list(_list):
    _list = [normalise_number(x) for x in _list]
    _list = [change_to_string(x) for x in _list]
    _list = [average_word_length(x) for x in _list]
    return _list


def main():
    _list1 = list(range(1000))
    mutate_list(_list1)

    _list2 = list(range(1000))
    _list2 = replace_list(_list2)

if __name__ == "__main__":
    main()

Please note that I am aware that, eg, this the find average word length function isn't particularly well written. Just for testing sake.

Here are the results:

Line #    Mem usage    Increment   Line Contents
================================================
    16  32.17968750 MiB  32.17968750 MiB   @profile(precision=8)
    17                             def mutate_list(_list):
    18  32.17968750 MiB   0.00000000 MiB       _list[:] = [normalise_number(x) for x in _list]
    19  39.01953125 MiB   0.25781250 MiB       _list[:] = [change_to_string(x) for x in _list]
    20  39.01953125 MiB   0.00000000 MiB       _list[:] = [average_word_length(x) for x in _list]


Filename: temp2.py

Line #    Mem usage    Increment   Line Contents
================================================
    23  32.42187500 MiB  32.42187500 MiB   @profile(precision=8)
    24                             def replace_list(_list):
    25  32.42187500 MiB   0.00000000 MiB       _list = [normalise_number(x) for x in _list]
    26  39.11328125 MiB   0.25781250 MiB       _list = [change_to_string(x) for x in _list]
    27  39.11328125 MiB   0.00000000 MiB       _list = [average_word_length(x) for x in _list]
    28  32.46484375 MiB   0.00000000 MiB       return _list

What I found is that even if I increase the list size to like 100000, reassignment consistently uses more memory, but, like, only maybe 1% more. This makes me think that the additional memory cost is probably just an extra pointer somewhere, not the cost of an entire list.

To further test the hypothesis, I performed time based profiling at intervals of 0.00001 seconds and graphed the results. I wanted to see whether there was perhaps a momentary spike in memory usage that dissappeared instantly due to garbage collection (reference counting). But alas, I have not found such a spike.

Can anyone explain these results? What exactly is happening under the hood here that causes this slight but consistent increase in memory usage?



from Python difference between mutating and re-assigning a list ( _list = and _list[:] = )

No comments:

Post a Comment