Thursday, 16 May 2019

Need to expand an inventory journal (log) pandas dataframe to include all dates per product id

I have an inventory journal that contains products and their relative inventory qty (resulting_qty) as well as the loss/gain every time inventory is added or subtracted (delta_qty).

The issue is that inventory records do not get updated daily, rather they are only updated when a change in inventory occurs. For this reason, it is difficult to extract the total inventory qty for all items on a given day, because some items are not recorded on certain days, despite the fact that they do have available inventory given their last entry resulting_qty was greater than 0. Logically, this would mean that an item went without a change in qty for a certain amount of days equal to the number of days between the max date and the last recorded date.

my data looks something like this, except in reality there are thousands of product ids

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2017-03-06 | 2017-03-06 12:24:22 | A   | 0         | 0.0           |
| 2017-03-31 | 2017-03-31 02:43:11 | A   | 3         | 3.0           |
| 2017-04-08 | 2017-04-08 22:04:35 | A   | -1        | 2.0           |
| 2017-04-12 | 2017-04-12 18:26:39 | A   | -1        | 1.0           |
| 2017-04-19 | 2017-04-19 09:15:38 | A   | -1        | 0.0           |
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-05-09 16:25:41 | C   | 2         | 2.0           |

Essentially, I need to make the data look something more like this so that I can simply pull a date and get the sum of total inventory for a given day when grouping by date (e.g. df.groupby(date).resulting_qty.sum()):

Note I removed the PID= A rows due to character limitations, but I hope you get the idea:

| date       | timestamp           | pid | delta_qty | resulting_qty |
|------------|---------------------|-----|-----------|---------------|
| 2019-01-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-01-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-02-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-05 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-06 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-07 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-08 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-09 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-10 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-11 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-12 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-13 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-14 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-15 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-16 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-17 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-18 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-19 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-20 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-21 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-22 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-23 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-24 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-25 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-26 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-27 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-28 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-29 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-30 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-03-31 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-01 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-02 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-03 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-04 | 2019-01-16 23:37:17 | B   | 0         | 0.0           |
| 2019-04-05 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-06 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-07 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-08 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-09 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-10 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-11 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-12 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-13 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-14 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-15 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-16 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-17 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-18 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-19 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-20 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-21 | 2019-04-05 16:40:32 | B   | 2         | 2.0           |
| 2019-04-22 | 2019-04-22 11:06:33 | B   | -1        | 1.0           |
| 2019-04-23 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-24 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-25 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-26 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-27 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-28 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-29 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-04-30 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-01 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-02 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-03 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-04 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-05 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-06 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-07 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-08 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-09 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-05-10 | 2019-04-23 13:23:17 | B   | -1        | 0.0           |
| 2019-01-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-01-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-02-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-18 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-19 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-20 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-21 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-22 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-23 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-24 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-25 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-26 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-27 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-28 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-29 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-30 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-03-31 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-01 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-02 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-03 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-04 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-05 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-06 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-07 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-08 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-09 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-10 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-11 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-12 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-13 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-14 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-15 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-16 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |
| 2019-04-17 | 2019-01-19 09:40:38 | C   | 0         | 0.0           |

So far what I've done was created a series of loops that generates a date range between the min date of the product lifecycle and the max date of all products. I then append the last recorded row values as a new row with a new date if there is no information for said new date. I append these to lists, and then generate a new dataframe with the updated lists. The code is terribly slow and takes 2+ hours to complete on the total dataset:

date_list = []
pid_list= []
time_stamp_list = []
delta_qty_list = []
resulting_qty_list = []


timer = len(test.product_id.unique().tolist())
counter = 0
for product in test.product_id.unique().tolist():
    counter+=1
    print((counter/timer)*100)
    temp_df = test.query(f'product_id=={product}', engine='python')
    for idx,date in enumerate(pd.date_range(temp_df.index.min(),test.index.max()).tolist()):
        min_date= temp_df.index.min()
        if date.date() == min_date:
            date2=min_date
            pid = temp_df.loc[date2]['product_id']
            timestamp = temp_df.loc[date2]['timestamp']
            delta_qty = temp_df.loc[date2]['delta_qty']
            resulting_qty = temp_df.loc[date2]['resulting_qty']
            date_list.append(date2)
            pid_list.append(pid)
            delta_qty_list.append(delta_qty)
            time_stamp_list.append(timestamp)
            resulting_qty_list.append(resulting_qty)
        else:

            if date.date() in temp_df.index:
                date2= date.date()
                pid = temp_df.loc[date2]['product_id']
                timestamp = temp_df.loc[date2]['timestamp']
                delta_qty = temp_df.loc[date2]['delta_qty']
                resulting_qty = temp_df.loc[date2]['resulting_qty']
                date_list.append(date2)
                pid_list.append(pid)
                delta_qty_list.append(delta_qty)
                time_stamp_list.append(timestamp)
                resulting_qty_list.append(resulting_qty)
            elif date.date() > date2:
                date_list.append(date.date())
                pid_list.append(pid)
                time_stamp_list.append(timestamp)
                delta_qty_list.append(delta_qty)
                resulting_qty_list.append(resulting_qty)
            else:
                pass

Can someone please help me to understand what is the right way I should approach this as I'm 100% sure this is not the best approach.

Thank you



from Need to expand an inventory journal (log) pandas dataframe to include all dates per product id

No comments:

Post a Comment