Monday 25 September 2023

Sagemaker experiment tracking duplication

I am trying to train a model using script mode via AWS Sagemaker. I would like to track this training job with AWS Sagemaker Experiments together with some calculated metrics in the training job. When I start the training job a new experiment run is created successfully that tracks all the provided hyperparameters (e.g. nestimators).

However, as said earlier, additionally, I also want to track other metrics (e.g. Accuracy) in the custom script. Here I use load_run() before I fit the model and then for example log a metric with run.log_metric(). However when I do that, Sagemaker creates a new separate experiment entry in the UI which means that my hyperparameters and metrics are stored separately in two individual experiment runs:

two separate runs created by sagemaker

I would like to see the metrics and hyperparameters all in one Experiment run combined. What am I doing wrong?

Here is the abbreviated code I am using to kick off the training process:

 
exp_name = "sklearn-script-mode-experiment"

with Run(
    experiment_name=exp_name,
    sagemaker_session=sess,
) as run:

    sklearn_estimator = SKLearn('train.py',
                                    instance_type='ml.m5.large',
                                    framework_version='1.0-1',
                                    role="arn:aws:iam:::role/service-role/AmazonSageMaker-ExecutionRole-",
                                    hyperparameters={'nestimators': 100},
                                    environment={"REGION": REGION})

    sklearn_estimator.fit({'train': f's3://{BUCKET}/{S3_INPUT_PATH}'})

Here is the abbreviated train.py:

    #parsing arguments here ... etc ...
    

    model = RandomForestClassifier(n_estimators=args.nestimators,
                                   max_depth=5,
                                   random_state=1)

    with load_run(sagemaker_session=sagemaker_session) as run:

        model.fit(X, y)

        run.log_metric(name = "Final Test Loss", value = 0.9)


from Sagemaker experiment tracking duplication

No comments:

Post a Comment