Differences

This shows you the differences between two versions of the page.

--- devel:workflow_specification_syntax [2013/09/11 15:44] – [Summary of Supported Variables] epujadas
+++ beewm:devel:workflow_specification_syntax [2016/05/20 12:11] (current) – admin
@@ Line 2: / Line 2: @@
 <code xml>
-<workflow name="String (required)" author="String (required)" cleanup="TRUE|FALSE (optional, defaults to TRUE)">
+<workflow name="String (optional)" author="String (required)" cleanup="TRUE|FALSE (optional, defaults to TRUE)">
     <hosts>
         <run_on>CLUSTER_HOST|LOCAL_HOST (optional, defaults to CLUSTER_HOST)</run_on>
@@ Line 8: / Line 8: @@
     <input>
         <datasets>
-            <dataset name="String (required)" id="String (storage ID, required)" type="String (storage dataset type, required)" stage="TRUE|FALSE (optional, defaults to TRUE)" />
+            <dataset name="String (required)" id="String (storage ID, required)" type="String (storage dataset type, optional)" stage="TRUE|FALSE (optional, defaults to TRUE)" />
             ....
         </datasets>
     </input>
     <modules>
-        <module name="String (required)" version="Regex pattern (required)" class="String (optional, defaults to ch.systemsx.bee.workflowmanager.module.MockModule)" required_runtime_minutes="Integer (optional)" required_memory_mb="Integer (optional)" >
+        <module name="String (required)" version="Regex pattern (required)" class="String (optional, defaults to ch.systemsx.bee.workflowmanager.module.MockModule)" required_runtime_minutes="Integer (optional)" required_memory_mb="Integer (optional)" cpus_per_job="Integer (optional)">
             <params (optional)>
                 <param name="indexbuilder_dataset|indexbuilder_regex|indexes_per_job|indexes_start (required)" value="String (required)" />
@@ Line 27: / Line 27: @@
             <output>
                 <datasets (optional)>
-                    <dataset name="String (required for datasets not to store, no default)" type="String (required for datasets to store, no default)" store="TRUE|FALSE (optional, defaults to FALSE)" relevant="TRUE|FALSE (optional, defaults to TRUE)">
+                    <dataset name="String (required for datasets not to store, no default)" type="String (required for datasets to store, no default)" store="TRUE|FALSE (optional, defaults to FALSE)" relevant="TRUE|FALSE (optional, defaults to TRUE)" dropbox="String: the name of the dropbox that will be used for storing the dataset. Optional, if it's not specified the type will be used as dropbox name.">
                         <files (considered and required only for datasets to store) in_dir="String (optional, defaults to the root of the module's work directory)" regex="Regex pattern (required)" />
                     </dataset>
@@ Line 50: / Line 50: @@
         <datasets>
             <dataset name="RawImages" id="0bCDME-BE01" type="HCS_IMAGE_RAW" stage="true" />
-            <dataset name="ComputeShadingCorrectionAverageImageSettings" id="12345" type="HCS_ARGUMENTS" stage="true" />
+            <dataset name="ComputeShadingCorrectionAverageImageSettings" id="123459876" stage="true" />
             <dataset name="MergeShadingCorrectionAverageImageSettings" id="1234567890" type="HCS_ARGUMENTS" stage="true" />
         </datasets>
@@ Line 141: / Line 141: @@
 ==== <workflow name="..." ... > ====
 Name of the workflow.
+Workflow
 ==== <workflow author="..." ... > ====
 Author of the workflow.
@@ Line 151: / Line 152: @@
 When a process is started by submitting a particular workflow together with one or several input datasets, it will be checked that:
   * the input dataset(s) identified by its storage ID exist in storage
-  * the input dataset(s) in storage have the same type as specified in the ''<input><datasets><dataset ... />'' element
+  * the input dataset(s) in storage have the same type as specified in the ''<input><datasets><dataset ... />'' element, if given
@@ Line 161: / Line 162: @@
 ==== <dataset type="..." ... > ====
-Required. Corresponds to the dataset type provided by the storage. It is used for validation purposes.
+Optional. Corresponds to the dataset type provided by the storage. It is used for validation purposes.
 ==== <dataset stage="..." ... > ====
@@ Line 169: / Line 170: @@
 :!: This feature is not implemented. The application behaves as this value would be set to true. To be discussed if it is needed, since the directory of a dataset could be specified as metadata or as variable.
+==== <files in_dir="..." regex="..." /> ====
+With this element, the files and/or directories to stage are selected. All the ''"files"'' elements defined in an output dataset will be evaluated. The files and/or directories selected will be the following:
+  * located in the ''"in_dir"'', as a subdirectory of the module's work directory; or located directly in the module's work directory if the ''"in_dir"'' attribute is empty.
+  * matching the regex expression specified in the ''"regex"'' attribute.
 ===== <module name="..."  ... > =====
@@ Line 246: / Line 253: @@
-Bee supports the use of variables in its workflow description file.
+Bee supports the use of variables in its workflow description file.\\
-Variables are specified with the following terminology:  **''${variable_type.variable_name}''**.
+Variables are specified with the following terminology:  **''${variable_type.variable_name}''**.\\
+Please check **[[:beewm:devel:resolving_workflow_templates|Resolving Workflow Templates]]** for detailed information.
-===== Summary of Supported Variables =====
-^ Variable                   ^ Resolved in template ^
-| ${config.extras_dir}       |         ✔            |
-| ${task.log_stdout}         |         ✘            |
-| ${task.log_stderr}         |         ✘            |
-| ${bee_indexer.start_index} |         ✘            |
-| ${bee_indexer.end_index}   |         ✘            |
-| ${module.version}          |         ✔            |
-| ${api.xxxx}                |         ✔            |
-===== Variable types =====
-==== config ====
-The currently supported variable of this type is:
-  *  **''${config.extras_dir}''**
-The value corresponds to the property ''extras.dir'' defined in the ''system.config'' file.\\
-The workflow template will be resolved by substituting this variable with the effectively used value to provide traceability.\\
-__Example__:\\
-The template snippet:
-<code xml>
-<path>${config.extras_path}/ShadingCorrectionAverageImage/ComputeShadingCorrectionAvgImg.command</path>
-</code>
-would be resolved into:
-<code xml>
-<path>/import/bc2/home/resit/mx_nas/stage/bee/extras/ShadingCorrectionAverageImage/ComputeShadingCorrectionAvgImg.command</path>
-</code>
-==== task ====
-The currently supported variables of this type are:
-  *  **''${task.log_stdout}''**
-  *  **''${task.log_stderr}''**
-This variables correspond the files where the cluster job (or task) standard output and standard error streams are directed.\\
-The extension of such files is defined to be: .**''stdout''** and **''.stderr''**.\\
-These variables can be used in the task validations to specify the files to validate.\\
-These variables will not be resolved in the workflow template since in the case of parallel running modules, one set of such files is produced per task.\\
-In case there is interest on examining/keeping those files, they can be sent to storage by locating them in the module's work directory using their file extension (''.stdout'', ''.stderr'').\\
-__Example__:\\
-<code xml>
-<output>
-    <dataset type="CLUSTER_JOB_LOGS" store="true">
-      <files in_dir="" regex=".*\.stderr" />
-      <files in_dir="" regex=".*\.stdout" />
-    </dataset>
-  </datasets>
-  <validations level="task">
-    <validation mode="count" sub_dir="" regex="${task.log_stdout}" comparator="equal" target_value="1" fail_status="validation_error" fail_message="Missing Stdout Log File" />
-    <validation mode="content" sub_dir="" regex="${task.log_stdout}" content_regex="finished successfully" comparator="equal" target_value="1" fail_status="validation_error" fail_message="Missing Success Message in Stdout Log File" />
-</code>
-==== bee_indexer ====
-The currently supported variables of this type are:
-  *  **''${bee_indexer.start_index}''**
-  *  **''${bee_indexer.end_index}''**
-These variables correspond, in parallel running modules, to the start and end indexes of the objects to analyze in one job (task).\\
-They can be used as arguments to be added to the executable in each of the parallel calls.\\
-The values of them will be calculated in each task, using the values specified as ''indexbuilder_dataset'', ''indexbuilder_regex'', ''indexes_per_job'' and ''indexes_start'' in the module parameters.\\
-These variables will not be resolved in the workflow template because they are used for parallel running modules, and therefore, one set of such values is produced per task.\\
-__Example__:\\
-<code xml>
-<module name="CPv1CPCluster" version="1.*.*" class="ch.systemsx.bee.workflowmanager.module.ClusterModule" >
-  <params>
-    <param name="indexbuilder_dataset" value="hcs_plate" />
-    <param name="indexbuilder_regex" value=".*_cDAPI.*\.(TIF|JP2)$" />
-    <param name="indexes_per_job" value="400" />
-    <param name="indexes_start" value="2" />
-  </params>
-  <executable>
-    <path>${config.extras_dir}/CellProfiler1/CellProfiler1_rev004094_R2012b_12001/CPCluster.command</path>
-    <args>
-      <arg type="path" value="dataset:CpClusterProfiling" />
-      <arg type="path" value="dataset:Cpv1BatchFile" selector="Batch_data.mat" />
-      <arg type="string" value="${bee_indexer.start_index}" />
-      <arg type="string" value="${bee_indexer.end_index}" />
-      <arg type="path" value="dataset:CpClusterResults" />
-      <arg type="string" value="Batch_" />
-      <arg type="string" value="yes" />
-      <arg type="string" value="date" />
-    </args>
-  </executable>
-</code>
-==== module ====
-The currently supported variable of this type is:
-  *  **''${module.version}''**
-This variable can be used to specify the version of the module which will be computed with the regex given in the module ''version'' attribute (e.g.: ''version="1.*.*"'').\\
-The mdoule version attribute is a regex which should match the highest 3-digit version of a module. See Jira issue [[https://jira.biozentrum.unibas.ch/browse/BEE-114 | BEE-114]] for a detailed description of such a match.\\
-The workflow template will be resolved by substituting this variable with the effectively used value to provide traceability.\\
-__Example__:\\
-The template snippet:
-<code xml>
-<module name="CPv1CreateBatchFile" version="1.*.*" class="ch.systemsx.bee.workflowmanager.module.ClusterModule">
-  <executable>
-    <path>/modules/CellProfiler1/CellProfiler1_ver${module.version}/CPCluster.command</path>
-</code>
-would be resolved into:
-<code xml>
-<module name="CPv1CreateBatchFile" version="1.*.*" class="ch.systemsx.bee.workflowmanager.module.ClusterModule">
-  <executable>
-    <path>/modules/CellProfiler1/CellProfiler1_ver001.000.000/CPCluster.command</path>
-</code>
-The resolved workflow should be stored together with the stored results (as it is in the current iBrain2).\\

screeningBee Data Analysis Tools

User Tools

Site Tools

Differences

Page Tools