Suppose that I have a Pandas dataframe as follows:
+------------+--------+
| Date | Price |
+------------+--------+
| 2021-07-30 | 438.51 |
| 2021-08-02 | 437.59 |
| 2021-08-03 | 441.15 |
| 2021-08-04 | 438.98 |
+------------+--------+
The above data frame can be genreated using the code below:
data = {'Date': ['2021-07-30', '2021-08-02', '2021-08-03', '2021-08-04'],
'Price': [438.51, 437.59, 441.15, 438.98]
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
normalisation_days = 365.25
compounding_days = 365.25
For the given timeseries I want to compute the rate_of_return which is time dependent and problem here is to identify the time periods between which the best or worst values of rate_of_return is achieved.
One could simply calculate rate_of_return on all possible combinations and then create a data frame containing period_start period_end and rate_of_return and sort in descending (best) or ascending (worst) order and then exclude any periods where there is overlap.
rate_of_return = ((period_end_price/period_start_price)^(compounding_days/(days_in_between))-1 * (normalisation_days/compounding_days)
On above data frame I've computed rate_of_return using code below
df['rate_of_return_l1'] = ((((df.Price /
df.Price[0]) **
(compounding_days /
(df.Date - df.Date[0]).dt.days) - 1) *
(normalisation_days /
compounding_days)))
df['rate_of_return_l1'].iloc[0] = np.nan
df['rate_of_return_l2'] = ((((df.Price /
df.Price[1]) **
(compounding_days /
(df.Date - df.Date[1]).dt.days) - 1) *
(normalisation_days /
compounding_days)))
df['rate_of_return_l2'].iloc[:2] = np.nan
df['rate_of_return_l3'] = ((((df.Price /
df.Price[2]) **
(compounding_days /
(df.Date - df.Date[2]).dt.days) - 1) *
(normalisation_days /
compounding_days)))
df['rate_of_return_l3'].iloc[:3] = np.nan
Based on the results the best/worst cases period are as below
+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-02 | 2021-08-03 | 18.28751739 |
| 2021-08-02 | 2021-08-04 | 0.784586925 |
| 2021-07-30 | 2021-08-03 | 0.729942907 |
| 2021-07-30 | 2021-08-04 | 0.081397181 |
| 2021-07-30 | 2021-08-02 | -0.225626914 |
| 2021-08-03 | 2021-08-04 | -0.834880227 |
+--------------+------------+----------------+
Expected Output
If I want to see the best of rate_of_return the resulting dataframe would be
+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-02 | 2021-08-03 | 18.28751739 |
+--------------+------------+----------------+
If I want to see the worst of rate_of_return the resulting dataframe would be
+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-03 | 2021-08-04 | -0.834880227 |
| 2021-07-30 | 2021-08-02 | -0.225626914 |
+--------------+------------+----------------+
- What we be the optimal approach to test out all scenarios to compute
rate_of_return? - How can I achieve the expected output such that periods don't overlap? (seen in expected output)
- Best/Worst data frame is not sign dependent best data frame can contain negative
rate_of_returnsgiven that there is no time period overlap. - What would the approach be if formula changes to
(period_end_price/period_start_price) - 1(not time dependent)?
from Evaluate Time Dependent Rate of Return to create Pandas DataFrame
No comments:
Post a Comment