Wednesday, 4 October 2023

Invalid data binding expression when running AzureML pipeline

I'm running an AzureML pipeline using the command line where the sole job (for now) is a sweep.

When I run run_id=$(az ml job create -f path_to_pipeline/pipeline.yaml --query name -o tsv -g grp_name -w ws-name), I get the following error:

ERROR: Met error <class 'Exception'>:{
  "result": "Failed",
  "errors": [
    {
      "message": "Invalid data binding expression: inputs.data, outputs.model_output, search_space.batch_size, search_space.learning_rate",
      "path": "command",
      "value": "python train.py --data_path $ --output_path $ --batch_size $ --learning_rate $"
    }
  ]
}

The pipeline yaml looks like this:

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: pipeline_with_hyperparameter_sweep
description: Tune hyperparameters
settings:
  default_compute: azureml:compute-name  # sub with your compute name
jobs:
  sweep_step:
    type: sweep
    inputs:
      data:
        type: uri_file
        path: azureml:code_train_data:1  #data store I created
    outputs:
      model_output:
    sampling_algorithm: random
    search_space:
      batch_size:
        type: choice
        values: [1, 5, 10, 15]
      learning_rate:
        type: loguniform
        min_value: -6.90775527898 # ln(0.001)
        max_value: -2.30258509299 # ln(0.1)
    trial:
      code: ../src
      command: >-
        python train.py 
        --data_path $ 
        --output_path $ 
        --batch_size $ 
        --learning_rate $
      environment: azureml:env_finetune_component:1
    objective:
      goal: maximize
      primary_metric: bleu_score
    limits:
      max_total_trials: 5
      max_concurrent_trials: 3
      timeout: 3600
      trial_timeout: 720

For the train.py file, note that I of course have a lot of actual code in in the main function, but I commented it out with pass to check if it makes a difference and the error is the same. So the problem is upstream with the bindings, not what's inside of train.

import argparse

def main(args):
    pass

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_arguments("--data_path")
    parser.add_arguments("--output_path")
    parser.add_arguments("--batch_size", type=int)
    parser.add_arguments("--learning_rate", type=float)
    args = parser.parse_args()

    return args


if __name__ == "__main__":

    args = parse_args()

    main(args)

If helpful, here's output when I run az version:

{
  "azure-cli": "2.53.0",
  "azure-cli-core": "2.53.0",
  "azure-cli-telemetry": "1.1.0",
  "extensions": {
    "ml": "2.20.0"
  }
}


from Invalid data binding expression when running AzureML pipeline

No comments:

Post a Comment