This is an old revision of the document!
Allow to comfortably select a set of datasets as input for a workflow from the set of all available datasets. Typical examples could include:
This feature request might not be of high priority, if the following feature “Deduce which results need to be computed automatically” exists.
If the module must not execute, then the workflow manager must check if the outputs of the module are required for subsequent processing steps. All results that are required for subsequent processing steps must be made available on the cluster. The results should be made available in a way that hides the fact that the module did not execute, i.e. be transparent about whether the module executed or not.
Probably this wouldn't be too hard to implement in the current module structure of ibrain2. In the first state handler of the module this logic can be included with the usage of the OpenBis java API. The OpenBis API provides methods to query datasets belonging to projects/experiments.
The bigger problem here would be defining the proper equivalence relation between datasets.
See: Tight Acquisition Integration
Make use the time of the acquisition. There are steps should be performed per image and can be done right after the image is available. These steps are checking the images and preprocessing them. The process does this steps would be able to check the machinery of the microscope and send warnings to the right person if something happens (like the microscope stops acquiring).
In the current iBRAIN2 concept, datasets must first be registered with iBRAIN2 in order to be used for any processing. This makes it hard to work with datasets that come from sources unknown to iBRAIN2. A possible use case is the registration of metadata alongside images in openBIS: we could register in openBIS a compound datasets consisting of ( images, TIFF metadata, thumbnails small, thumbnails big, quality assessment, and shading correction model ). However, these datasets would be unknown to iBRAIN2.
Ideally, iBRAIN2 could learn about these datasets “cheaply” for example by automatic openBIS inspection.
If we get rid off the dataset specific information from the database and we use the gui to query the datasets we don't need anymore the dataset registration. For the automatic processing we could also use dataset queries to check if there is new data in a previously set project.
Accumulate a list of processings that ended in an error condition. For each job that ended in an error condition, allow to
Typical resource bottlenecks and rules for clever handling of resource bottlenecks are:
See: Prioritization
See: Prioritization
See: Reusable Datasets in Cluster
Maintain information about the pool of datasets on the cluster.
See: Reusable Datasets in Cluster
Delete data from the pool of datasets on the cluster as soon as the dataset is not required by subsequent modules anymore.
See: Cluster Job Management and Visualization
Make sure that when the workflow manager starts, it either kills all running cluster jobs or it can recover them?
See: Cluster Job Management and Visualization
Allow stopping of jobs, including forced stop where cluster processes are killed.
See: Cluster Queue Optimization
Avoid overloading the cluster queuing system by starting unnecessary large numbers of cluster jobs. Maintain a list of running and queued jobs dynamically based on current cluster load.
Manually triggered maintenance tasks for the cluster:
See: Watchdog and User Notification
A watchdog process could monitor the current status and responsiveness of the daemon. In case of crash or error, an email notification should be sent.
A corresponding watchdog based on a shell script already exists.
Email notification should be robust to exceptions and crashes, and should work based on logging classes. Newly added code should easily benefit from email notification without programming overhead.
Allow execution of sanity checking code on datasets to validate the data status.
See: Combining Processing Information with Storage Information
Many properties of datasets can and should be annotated with the datasets in the storage. For these properties, the authority should be the storage and not the workflow manager. The workflow manager may cache some or all of these properties for speed purposes, however it should allow to gather or update the properties from the storage when required.
See: Split into User Interface and Backend Daemon
This is given by tight integration with the storage. Data maintenance is easy on the storage backend (openBIS or file system). Therefore, data maintenance is not required in the workflow manager, because it uses the storage backend for fetching data information.
See: Workflow Specification Syntax
Modules are partially configured in:
See: CHROOT-like Cluster Environment