The k-index measures the condition of the magnetosphere. It is usually averaged over three hour, so each day has 8 measurements.
- The planetary k-index (kp-index) is an average of the measures taken in 15 locations in the world,
- while the estimated kp-index is an average taken over only 8 locations. The data is available here:
https://www.swpc.noaa.gov/products/planetary-k-index
If you take a look at their data for the last 30 days:
https://services.swpc.noaa.gov/text/daily-geomagnetic-indices.txt
You can see that they give the k-index measured in two different locations (Fredericksburg and College), as well as the estimated kp-index. (You can ignore the columns marked "A".)
My Question:
An interesting question is can the k-index in either of the two individual locations be used to predict the estimated kp-index?
My Thoughts:
My thought is that this can be done by training a learning model (for example SVR) on some of the data (where a point is an 8-vector from either Fredericksburg or College representing some day, and its label is an 8-vector that is the estimated kp-index for the say day), and then testing its error on the rest of the data. I am programming in python, I might want to use thesklearn.multioutput.MultiOutputRegressor
function.
My Try:
We will train a Support Vector Regressor (SVR) model to predict the estimated kp-index based on the k-index measurements from Fredericksburg and College for the past 30 days. We will use the sklearn.multioutput.MultiOutputRegressor
to handle the multi-output regression problem.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.multioutput import MultiOutputRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error
#Load the data from the provided URL
data_url = "https://services.swpc.noaa.gov/text/daily-geomagnetic-indices.txt"
column_names = ["Date", "Fredericksburg", "College", "EstimatedKP"]
# Load the data into a Pandas DataFrame
data = pd.read_csv(data_url, delimiter=" ", skiprows=1, names=column_names)
# Select the last 30 days' data
data = data.tail(30)
# Extract input features (k-index from Fredericksburg and College)
X = data[["Fredericksburg", "College"]].values
# Extract target labels (estimated kp-index)
y = data["EstimatedKP"].values.reshape(-1, 1)
#Split the data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#Create an SVR model and wrap it in MultiOutputRegressor:
svr = SVR(kernel="linear")
multi_output_regressor = MultiOutputRegressor(svr)
#Fit the model on the training data
multi_output_regressor.fit(X_train, y_train)
#Make predictions on the testing data
y_pred = multi_output_regressor.predict(X_test)
#Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"Mean Absolute Error: {mae:.2f}")
Edit:
A single training point is the 8-value vector of the k-index in Frederickburg on a certain day, and its label is the 8-value vector of the estimated kp-index on that same day. Having trained the algorithm on a large number of sample points, a new query is the k-index in Frederickburg on a different day (with no label), and the algorithm must produce an 8-value vector which is its guess for the estimated kp-index on that day. For example, if the query vector was
2 2 1 2 1 2 1 2
(the k-index vector for Fredericksburg yesterday), then the answer
2.00 0.67 0.67 0.67 0.67 1.33 1.00 1.33
would have zero error, as this is exactly the estimated kp-index for that day. Of course, we can't really expect the algorithm to produce a solution with zero error.
from Can the k-index in either of the two individual locations be used to predict the estimated kp-index?
No comments:
Post a Comment