Friday, 7 October 2022

Append data within while loop to dataframe - python

I'm trying to append new or continually updated data to an existing data frame. The new data is obtained by a timer that imports the same dataset every minute.

I'm aiming to subset this dataset using a condition but am getting an error.

Using below, I import data from yahoo where the same data is pulled every minute. I'm then aiming to subset specific rows from this updated dataframe and return the data for future use.

The data is being downloaded using a while loop but I'm getting an error when trying to subset this df.

I've tried two attempts outlined in Edit 1 and Edit 2.

import pandas as pd
import yfinance as yf
import datetime
import pytz
from threading import Thread
from time import sleep

# end date
my_date = datetime.datetime.now(pytz.timezone('Etc/GMT-5'))

# start date
prev_24hrs = my_date - datetime.timedelta(hours = 25, minutes = 0)

# import data
data = yf.download(tickers = 'EURUSD=X',
                   start = prev_24hrs, 
                   end = my_date, 
                   interval = '1m'
                   ).reset_index()

Edit 1:

# updated data
upd_data = []

def scheduled_update():

    while datetime.datetime.now().minute % 1 != 0:
        sleep(1)
    data

    while True:
        sleep(60)
        data
    
        upd_data.append(data)

        upd_data = upd_data[upd_data['High'] > 0.97000]
        print(upd_data) 

    return upd_data
    

thread = Thread(target = scheduled_update)

thread.start()

Output:

Exception in thread Thread-12 (scheduled_update):
Traceback (most recent call last):
  File "/opt/anaconda3/envs/gpd/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/opt/anaconda3/envs/gpd/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
 File "/Users/xxx/xxx/xxx/xxx/untitled5.py", line 43, in scheduled_update
   upd_data.append(data)
UnboundLocalError: local variable 'upd_data' referenced before assignment

Edit 2:

# updated data
upd_data = []

def scheduled_update():

    while datetime.datetime.now().minute % 1 != 0:
        sleep(1)
    data

    while True:
        sleep(60)
        data

        upd_data.append(data)

        df_out = upd_data[upd_data['High'] > 0.97000]
        print(df_out) 

    return df_out


thread = Thread(target = scheduled_update)

thread.start()

Output:

Exception in thread Thread-16 (scheduled_update):
Traceback (most recent call last):
  File "/opt/anaconda3/envs/gpd/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/opt/anaconda3/envs/gpd/lib/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/xxx/xxx/xxx/xxx/untitled5.py", line 44, in scheduled_update
    df_out = upd_data[upd_data['High'] > 0.97000]
 TypeError: list indices must be integers or slices, not str

Intended output:

Using data obtained on 6/10/22, these outputs are 10mins apart:

1st execution:

                     Datetime      Open      High       Low     Close  Adj Close  Volume
0   2022-10-05 18:28:00+01:00  0.988045  0.988045  0.988045  0.988045   0.988045       0
1   2022-10-05 18:29:00+01:00  0.988142  0.988142  0.988142  0.988142   0.988142       0
2   2022-10-05 18:30:00+01:00  0.988142  0.988142  0.988142  0.988142   0.988142       0
3   2022-10-05 18:31:00+01:00  0.987947  0.987947  0.987947  0.987947   0.987947       0
4   2022-10-05 18:32:00+01:00  0.988240  0.988240  0.988240  0.988240   0.988240       0
..                        ...       ...       ...       ...       ...        ...     ...
280 2022-10-05 23:23:00+01:00  0.989022  0.989022  0.989022  0.989022   0.989022       0
281 2022-10-05 23:24:00+01:00  0.989120  0.989120  0.989120  0.989120   0.989120       0
282 2022-10-05 23:25:00+01:00  0.989022  0.989022  0.989022  0.989022   0.989022       0
283 2022-10-05 23:26:00+01:00  0.989120  0.989120  0.989120  0.989120   0.989120       0
284 2022-10-05 23:27:00+01:00  0.989022  0.989022  0.989022  0.989022   0.989022       0

If the code continues to run every minute, data should be appended if it meets the subset condition. e.g 10 mins later:

                     Datetime      Open      High       Low     Close  Adj Close  Volume
0   2022-10-05 18:38:00+01:00  0.987947  0.987947  0.987947  0.987947   0.987947       0
1   2022-10-05 18:39:00+01:00  0.987849  0.987849  0.987849  0.987849   0.987849       0
2   2022-10-05 18:40:00+01:00  0.988045  0.988045  0.988045  0.988045   0.988045       0
3   2022-10-05 18:41:00+01:00  0.987947  0.987947  0.987947  0.987947   0.987947       0
4   2022-10-05 18:42:00+01:00  0.987849  0.987849  0.987849  0.987849   0.987849       0
..                        ...       ...       ...       ...       ...        ...     ...
278 2022-10-05 23:32:00+01:00  0.989022  0.989022  0.989022  0.989022   0.989022       0
279 2022-10-05 23:33:00+01:00  0.989120  0.989120  0.989120  0.989120   0.989120       0
280 2022-10-05 23:34:00+01:00  0.989218  0.989218  0.989218  0.989218   0.989218       0
281 2022-10-05 23:35:00+01:00  0.989218  0.989218  0.989218  0.989218   0.989218       0
282 2022-10-05 23:36:00+01:00  0.989511  0.989511  0.989511  0.989511   0.989511       0


from Append data within while loop to dataframe - python

No comments:

Post a Comment