I'm working on OHLC trading data and i have different datasets with different ranges of prices. For example, on one dataset the price will range from 100 to 150, on another from 2 to 3, on another from 0.5 to 0.8 and so on, so very different magnitudes.
On each dataset, i'm looping through the data and for each point i'm computing the slope on the last five prices on each point, and for that i'm using np.polyfit().
Here is my code:
x = df['Date'].to_numpy()
y = df['Close'].to_numpy()
fits = []
for idx, j in enumerate(y):
arr_y = y[:idx]
arr_x = x[:idx]
p_y = arr_y[-5:]
p_x = arr_x[-5:]
if len(py) >= 4 and len(px) >= 4:
fit = np.polyfit(p_x, p_y, 1)
ang_coeff = fit[0]
intercept = fit[1]
fits.append(ang_coeff)
else:
fits.append(np.nan)
df['SLOPE'] = fits
Here is what the code does: loop through the prices, and for each price, calculate the slope based on the last five prices.
This code works well, but the problem is that, since i'm working with more dataset where prices are going to be a lot different on each dataset, it becomes hard for me to perform any kind of analysis. So a very high slope value on a dataset will be very low on another dataset. My question is: how can i standardize or normalize (i know they are two different things) this data? How can i process my slope values so that an "high" slope value on a dataset will be high on another dataset too?
Here is a sample of my outputs:
Date Close Slope
2021-01-17 00:00:00 34031.098338 29.572362
2021-01-17 04:00:00 34034.475090 20.097445
2021-01-17 08:00:00 34034.982351 8.655060
2021-01-17 12:00:00 34044.665386 3.914707
2021-01-17 16:00:00 34049.372571 4.538112
2021-01-17 20:00:00 34059.458965 4.673876
2021-01-18 00:00:00 34063.656831 6.435797
2021-01-18 04:00:00 34070.819559 7.214254
2021-01-18 08:00:00 34086.331298 6.659261
2021-01-18 12:00:00 34099.272005 8.527805
2021-01-18 16:00:00 34099.560423 10.230055
2021-01-18 20:00:00 34106.109568 10.025963
2021-01-19 00:00:00 34110.932662 8.380914
2021-01-19 04:00:00 34122.312205 5.604029
2021-01-19 08:00:00 34134.855812 5.745264
2021-01-19 12:00:00 34162.275141 8.679342
2021-01-19 16:00:00 34190.550778 13.625430
2021-01-19 20:00:00 34211.505419 19.919917
2021-01-20 00:00:00 34222.969489 23.408140
2021-01-20 04:00:00 34237.699255 22.545763
2021-01-20 08:00:00 34240.094551 18.326694
2021-01-20 12:00:00 34239.827609 12.528138
2021-01-20 16:00:00 34239.900596 7.376944
2021-01-20 20:00:00 34246.295214 3.599057
2021-01-21 00:00:00 34248.790292 1.699797
2021-01-21 04:00:00 34251.656251 2.385909
2021-01-21 08:00:00 34211.135875 3.254698
2021-01-21 12:00:00 34150.903010 -5.216841
2021-01-21 16:00:00 34127.857586 -22.843883
2021-01-21 20:00:00 34072.463679 -34.261865
2021-01-22 00:00:00 34018.425804 -44.166343
2021-01-22 04:00:00 33974.399053 -46.385947
2021-01-22 08:00:00 33946.475779 -46.243970
2021-01-22 12:00:00 33929.852159 -46.082824
2021-01-22 16:00:00 33927.598892 -35.717306
2021-01-22 20:00:00 33918.627401 -22.620072
2021-01-23 00:00:00 33905.044709 -13.042019
2021-01-23 04:00:00 33894.973038 -9.408690
2021-01-23 08:00:00 33861.417022 -9.231243
And a different dataset:
Date Close Slope
2021-02-18 04:00:00 0.492204 4.013722e-04
2021-02-18 08:00:00 0.492488 4.721365e-04
2021-02-18 12:00:00 0.493027 4.831912e-04
2021-02-18 16:00:00 0.493569 4.591663e-04
2021-02-18 20:00:00 0.494286 4.463141e-04
2021-02-19 00:00:00 0.494799 5.245110e-04
2021-02-19 04:00:00 0.495515 5.880476e-04
2021-02-19 08:00:00 0.496172 6.204948e-04
2021-02-19 12:00:00 0.496634 6.435782e-04
2021-02-19 16:00:00 0.497133 6.069365e-04
2021-02-19 20:00:00 0.497526 5.787601e-04
2021-02-20 00:00:00 0.497712 4.983345e-04
2021-02-20 04:00:00 0.497762 3.972312e-04
2021-02-20 08:00:00 0.497956 2.835458e-04
2021-02-20 12:00:00 0.498307 1.880521e-04
2021-02-20 16:00:00 0.498692 1.804976e-04
2021-02-20 20:00:00 0.498813 2.505608e-04
2021-02-21 00:00:00 0.499153 2.839021e-04
2021-02-21 04:00:00 0.499364 2.901245e-04
2021-02-21 08:00:00 0.499471 2.574213e-04
2021-02-21 12:00:00 0.499556 2.107408e-04
2021-02-21 16:00:00 0.499902 1.803125e-04
2021-02-21 20:00:00 0.500177 1.690260e-04
2021-02-22 00:00:00 0.500221 2.059057e-04
2021-02-22 04:00:00 0.501403 2.121462e-04
2021-02-22 08:00:00 0.502194 4.012434e-04
2021-02-22 12:00:00 0.502318 5.809102e-04
2021-02-22 16:00:00 0.502852 6.255775e-04
2021-02-22 20:00:00 0.503182 6.177676e-04
2021-02-23 00:00:00 0.503209 4.214821e-04
2021-02-23 04:00:00 0.503271 2.893487e-04
2021-02-23 08:00:00 0.502459 2.262497e-04
2021-02-23 12:00:00 0.502190 -6.951268e-05
2021-02-23 16:00:00 0.501697 -2.733434e-04
2021-02-23 20:00:00 0.501526 -4.105911e-04
2021-02-24 00:00:00 0.501506 -4.251799e-04
2021-02-24 04:00:00 0.501420 -2.571382e-04
2021-02-24 08:00:00 0.501332 -1.730550e-04
2021-02-24 12:00:00 0.501099 -8.359633e-05
2021-02-24 16:00:00 0.500684 -1.027447e-04
2021-02-24 20:00:00 0.500341 -1.962963e-04
2021-02-25 00:00:00 0.500027 -2.806065e-04
2021-02-25 04:00:00 0.499747 -3.368647e-04
2021-02-25 08:00:00 0.499428 -3.361539e-04
2021-02-25 12:00:00 0.499212 -3.105732e-04
2021-02-25 16:00:00 0.498883 -2.857117e-04
So these two datasets have very different Close values, which means the slope values are going to be completely different, so a very "high" slope value on the second dataset is nothing compared to the first dataset's slope values. Is there any way i can solve this? Do i have to apply some sort of normalization or standardization? Or do i need to use a different kind of calculation or metric? Thanks in advance!
from How can i standardize time series data?
No comments:
Post a Comment