Tuesday, 19 September 2023

Can the k-index in either of the two individual locations be used to predict the estimated kp-index?

The k-index measures the condition of the magnetosphere. It is usually averaged over three hour, so each day has 8 measurements.

  • The planetary k-index (kp-index) is an average of the measures taken in 15 locations in the world,
  • while the estimated kp-index is an average taken over only 8 locations. The data is available here:

https://www.swpc.noaa.gov/products/planetary-k-index

If you take a look at their data for the last 30 days:

https://services.swpc.noaa.gov/text/daily-geomagnetic-indices.txt

You can see that they give the k-index measured in two different locations (Fredericksburg and College), as well as the estimated kp-index. (You can ignore the columns marked "A".)

My Question:

An interesting question is can the k-index in either of the two individual locations be used to predict the estimated kp-index?

My Thoughts:

My thought is that this can be done by training a learning model (for example SVR) on some of the data (where a point is an 8-vector from either Fredericksburg or College representing some day, and its label is an 8-vector that is the estimated kp-index for the say day), and then testing its error on the rest of the data. I am programming in python, I might want to use thesklearn.multioutput.MultiOutputRegressor function.

My Try:

We will train a Support Vector Regressor (SVR) model to predict the estimated kp-index based on the k-index measurements from Fredericksburg and College for the past 30 days. We will use the sklearn.multioutput.MultiOutputRegressor to handle the multi-output regression problem.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error

#Load the data from the provided URL
data_url = "https://services.swpc.noaa.gov/text/daily-geomagnetic-indices.txt"
column_names = ["Date", "Fredericksburg", "College", "EstimatedKP"]

# Load the data into a Pandas DataFrame
data = pd.read_csv(data_url, delimiter=" ", skiprows=1, names=column_names)

# Select the last 30 days' data
data = data.tail(30)

# Extract input features (k-index from Fredericksburg and College)
X = data[["Fredericksburg", "College"]].values

# Extract target labels (estimated kp-index)
y = data["EstimatedKP"].values.reshape(-1, 1)

#Split the data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#Create an SVR model and wrap it in MultiOutputRegressor:
svr = SVR(kernel="linear")
multi_output_regressor = MultiOutputRegressor(svr)

#Fit the model on the training data
multi_output_regressor.fit(X_train, y_train)

#Make predictions on the testing data
y_pred = multi_output_regressor.predict(X_test)

#Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"Mean Absolute Error: {mae:.2f}")

Edit:

A single training point is the 8-value vector of the k-index in Frederickburg on a certain day, and its label is the 8-value vector of the estimated kp-index on that same day. Having trained the algorithm on a large number of sample points, a new query is the k-index in Frederickburg on a different day (with no label), and the algorithm must produce an 8-value vector which is its guess for the estimated kp-index on that day. For example, if the query vector was

2 2 1 2 1 2 1 2

(the k-index vector for Fredericksburg yesterday), then the answer

2.00 0.67 0.67 0.67 0.67 1.33 1.00 1.33

would have zero error, as this is exactly the estimated kp-index for that day. Of course, we can't really expect the algorithm to produce a solution with zero error.



from Can the k-index in either of the two individual locations be used to predict the estimated kp-index?

No comments:

Post a Comment