User Tools

Site Tools


beewm:devel:resolving_workflow_templates

Resolving Workflow Templates

Bee supports the use of variables in its workflow description file. Variables are specified with the following terminology: ${variable_type.variable_name}.

Summary of Supported Variables

Variable Resolved in template
${config.extras_dir}
${task.log_stdout}
${task.log_stderr}
${bee_indexer.start_index}
${bee_indexer.end_index}
${module.version}
${api.xxxx}

Variable types

config

The currently supported variable of this type is:

  • ${config.extras_dir}

The value corresponds to the property extras.dir defined in the system.config file.
The workflow template will be resolved by substituting this variable with the effectively used value to provide traceability.
Example:
The template snippet:

<path>${config.extras_path}/ShadingCorrectionAverageImage/ComputeShadingCorrectionAvgImg.command</path>

would be resolved into:

<path>/import/bc2/home/resit/mx_nas/stage/bee/extras/ShadingCorrectionAverageImage/ComputeShadingCorrectionAvgImg.command</path>

task

The currently supported variables of this type are:

  • ${task.log_stdout}
  • ${task.log_stderr}

This variables correspond the files where the cluster job (or task) standard output and standard error streams are directed.
The extension of such files is defined to be: .stdout and .stderr.
These variables can be used in the task validations to specify the files to validate.
These variables will not be resolved in the workflow template since in the case of parallel running modules, one set of such files is produced per task.
In case there is interest on examining/keeping those files, they can be sent to storage by locating them in the module's work directory using their file extension (.stdout, .stderr).
Example:

<output>
    <dataset type="CLUSTER_JOB_LOGS" store="true">
      <files in_dir="" regex=".*\.stderr" />
      <files in_dir="" regex=".*\.stdout" />
    </dataset>
  </datasets>
  <validations level="task">
    <validation mode="count" sub_dir="" regex="${task.log_stdout}" comparator="equal" target_value="1" fail_status="validation_error" fail_message="Missing Stdout Log File" />
    <validation mode="content" sub_dir="" regex="${task.log_stdout}" content_regex="finished successfully" comparator="equal" target_value="1" fail_status="validation_error" fail_message="Missing Success Message in Stdout Log File" />

bee_indexer

The currently supported variables of this type are:

  • ${bee_indexer.start_index}
  • ${bee_indexer.end_index}

These variables correspond, in parallel running modules, to the start and end indexes of the objects to analyze in one job (task).
They can be used as arguments to be added to the executable in each of the parallel calls.
The values of them will be calculated in each task, using the values specified as indexbuilder_dataset, indexbuilder_regex, indexes_per_job and indexes_start in the module parameters.
These variables will not be resolved in the workflow template because they are used for parallel running modules, and therefore, one set of such values is produced per task.
Example:

<module name="CPv1CPCluster" version="1.*.*" class="ch.systemsx.bee.workflowmanager.module.ClusterModule" >
  <params>
    <param name="indexbuilder_dataset" value="hcs_plate" />
    <param name="indexbuilder_regex" value=".*_cDAPI.*\.(TIF|JP2)$" />
    <param name="indexes_per_job" value="400" />
    <param name="indexes_start" value="2" />
  </params>
  <executable>
    <path>${config.extras_dir}/CellProfiler1/CellProfiler1_rev004094_R2012b_12001/CPCluster.command</path>
    <args>
      <arg type="path" value="dataset:CpClusterProfiling" />
      <arg type="path" value="dataset:Cpv1BatchFile" selector="Batch_data.mat" />
      <arg type="string" value="${bee_indexer.start_index}" />
      <arg type="string" value="${bee_indexer.end_index}" />
      <arg type="path" value="dataset:CpClusterResults" />
      <arg type="string" value="Batch_" />
      <arg type="string" value="yes" />
      <arg type="string" value="date" />
    </args>
  </executable>

module

The currently supported variable of this type is:

  • ${module.version}

This variable can be used to specify the version of the module which will be computed with the regex given in the module version attribute (e.g.: version=“1.*.*”).
The mdoule version attribute is a regex which should match the highest 3-digit version of a module. See Jira issue BEE-114 for a detailed description of such a match.
The workflow template will be resolved by substituting this variable with the effectively used value to provide traceability.
Example:
The template snippet:

<module name="CPv1CreateBatchFile" version="1.*.*" class="ch.systemsx.bee.workflowmanager.module.ClusterModule">
  <executable>
    <path>/modules/CellProfiler1/CellProfiler1_ver${module.version}/CPCluster.command</path>

would be resolved into:

<module name="CPv1CreateBatchFile" version="1.*.*" class="ch.systemsx.bee.workflowmanager.module.ClusterModule">
  <executable>
    <path>/modules/CellProfiler1/CellProfiler1_ver001.000.000/CPCluster.command</path>

The resolved workflow should be stored together with the stored results (as it is in the current iBrain2).

api

The api-type variables are defined by the user. They have the following structure:

  • ${api.some_variable_name}

Those variables can be used when submitting a workflow using the REST interface. They allow substitution of variables in the workflow.xml through the provided values when submitting the REST call.
Example:
Issuing such REST call:

curl -X POST --data-urlencode "workflow.id=20130912110409253-42113" --data-urlencode "api.hcs_bee_cppipeline=20130912105648991-42110" --data-urlencode "api.hcs_plate=20130215175757007-40162"  http://localhosst:12345/apiv1/processes

would produce the substitution of the api-type submitted variables (api.hcs_bee_cppipeline and api.hcs_plate) through the provided values (20130912105648991-42110 and 20130215175757007-40162) in the workflow XML description in the following way:
1. submitted workflow XML template:

<input>
  <datasets>
    <dataset name="hcs_plate" id="${api.hcs_plate}" type="HCS_IMAGE_RAW" stage="true" />
    <dataset name="hcs_bee_cppipeline" id="${api.hcs_bee_cppipeline}" type="HCS_BEE_CPPIPELINE" stage="true" />
  </datasets>
</input>

2: resolved workflow XML

<input>
  <datasets>
    <dataset name="hcs_plate" id="20130215175757007-40162" type="HCS_IMAGE_RAW" stage="true" />
    <dataset name="hcs_bee_cppipeline" id="20130912105648991-42110" type="HCS_BEE_CPPIPELINE" stage="true" />
  </datasets>
</input>
beewm/devel/resolving_workflow_templates.txt · Last modified: 2016/05/17 16:17 by 127.0.0.1