I'm profiling a program that makes use of Pandas to process some CSVs. I'm using psutil's Process.memory_info to report the Virtual Memory Size (vms) and the Resident Set Size (rss) values. I'm also using Pandas' DataFrame.memory_usage (df.memory_usage().sum()) to report the size of my dataframes in memory.
There's a conflict between the reported vms and df.memory_usage values, where Pandas is reporting more memory just for the dataframe than the Process.memory_info call is reporting for the whole (single-threaded) process.
For example:
- rss: 334671872 B
- vms: 663515136 B
- df.memory_usage().sum(): 670244208 B
The Process.memory_info call is made immediately after the memory_usage call. My expected result was that df.memory_usage < vms at all times, but this doesn't hold up. I assume I'm misinterpreting the meaning of the vms value?
from Using psutil.Process.memory_info memory usage differs from Pandas.memory_usage
No comments:
Post a Comment