I am trying to train a model using script mode via AWS Sagemaker. I would like to track this training job with AWS Sagemaker Experiments together with some calculated metrics in the training job. When I start the training job a new experiment run is created successfully that tracks all the provided hyperparameters (e.g. nestimators).
However, as said earlier, additionally, I also want to track other metrics (e.g. Accuracy) in the custom script. Here I use load_run()
before I fit the model and then for example log a metric with run.log_metric()
. However when I do that, Sagemaker creates a new separate experiment entry in the UI which means that my hyperparameters and metrics are stored separately in two individual experiment runs:
I would like to see the metrics and hyperparameters all in one Experiment run combined. What am I doing wrong?
Here is the abbreviated code I am using to kick off the training process:
exp_name = "sklearn-script-mode-experiment"
with Run(
experiment_name=exp_name,
sagemaker_session=sess,
) as run:
sklearn_estimator = SKLearn('train.py',
instance_type='ml.m5.large',
framework_version='1.0-1',
role="arn:aws:iam:::role/service-role/AmazonSageMaker-ExecutionRole-",
hyperparameters={'nestimators': 100},
environment={"REGION": REGION})
sklearn_estimator.fit({'train': f's3://{BUCKET}/{S3_INPUT_PATH}'})
Here is the abbreviated train.py
:
#parsing arguments here ... etc ...
model = RandomForestClassifier(n_estimators=args.nestimators,
max_depth=5,
random_state=1)
with load_run(sagemaker_session=sagemaker_session) as run:
model.fit(X, y)
run.log_metric(name = "Final Test Loss", value = 0.9)
from Sagemaker experiment tracking duplication
No comments:
Post a Comment