User Tools

Site Tools


Dependency Resolution


To reduce the amount of the unnecessary processing to save resources and time we would like to compute only that steps that didn't already produce a compatible result (Dataset Equivalence Classes).

Possible Solutions

Top Down Approach

With the expected result definition in the workflow definition it would be easy to implement this approach. We just start from the expected results:

checkList = { Expected Results}
modulesToExecute = { }
while checkList is not empty do
    dataSet = checkList.getHead()
    if dataSet is an input dataset
    if  modulesToExecute contains module which produces dataSet
    if there is compatible dataset in storage
    module = the module from the workflow which produces dataSet
    modulesToExecute.add( module )
    checkList.pushBack( module.getInputDatasets() )

Note: This algorithm will only select that modules that are needed to compute the results. So if the workflow description contains branches/modules that produce datasets not listed in the results section or needed to an other module these modules never will be executed.

Bottom Up Approach

In this case we start from the modules only depend on the input datasets, check if their result should be recomputed then continue with the modules need this dataset too.

dataSetsReady = { Input dataSets}
modules = { all the modules from the workflow}
modulesToExecute = { }
while modules is not empty do
    module = a module from modules which only depends on datasets from dataSetsReady
    dataSetReady.add( module.getOutputs() )
    if there's no compatible dataset in the storage for the outputs of module


Top-down approach implemented.
See the following two methods in the WorkflowStarter class:

  • setOutputEquivalences(Set<Module> modules, WorkflowConfig workflowConfig, DatasetEquivalenceChecker datasetEquivalenceChecker)
  • setParentsComplete(List<Module> moduleGraph)
beewm/devel/dependency_resolution.txt ยท Last modified: 2016/05/17 16:17 by