Saturday, 2 November 2019

Using psutil.Process.memory_info memory usage differs from Pandas.memory_usage

I'm profiling a program that makes use of Pandas to process some CSVs. I'm using psutil's Process.memory_info to report the Virtual Memory Size (vms) and the Resident Set Size (rss) values. I'm also using Pandas' DataFrame.memory_usage (df.memory_usage().sum()) to report the size of my dataframes in memory.

There's a conflict between the reported vms and df.memory_usage values, where Pandas is reporting more memory just for the dataframe than the Process.memory_info call is reporting for the whole (single-threaded) process.

For example:

  • rss: 334671872 B
  • vms: 663515136 B
  • df.memory_usage().sum(): 670244208 B

The Process.memory_info call is made immediately after the memory_usage call. My expected result was that df.memory_usage < vms at all times, but this doesn't hold up. I assume I'm misinterpreting the meaning of the vms value?



from Using psutil.Process.memory_info memory usage differs from Pandas.memory_usage

No comments:

Post a Comment