Allow to comfortably select a set of datasets as input for a workflow from the set of all available datasets. Typical examples could include:
This feature request might not be of high priority, if the following feature “Deduce which results need to be computed automatically” exists.
This is specifically a request for the user interface. When starting a specific workflow on a number of plates, the UI must allow to additionally pick certain “fixed” datasets from storage that will be additionally made available on the cluster. Examples could be a CP pipeline, a shading model, an object classification model, etc. Such input files have the status of settings of the workflow, i.e. a shading model is a setting in the same way as a parameter for a module.
Probably starting a workflow with other datasets which are more like parameters would be possible already now. As workflow chaining works on group datasets the staging of this datasets would work.
Building a gui also wouldn't be a problem we have the means to query openbis and display this query results (see iPo rtal). The question is here how to organize this kind of data and display on the gui.
If the module must not execute, then the workflow manager must check if the outputs of the module are required for subsequent processing steps. All results that are required for subsequent processing steps must be made available on the cluster. The results should be made available in a way that hides the fact that the module did not execute, i.e. be transparent about whether the module executed or not.
Probably this wouldn't be too hard to implement in the current module structure of ibrain2. In the first state handler of the module this logic can be included with the usage of the OpenBis java API. The OpenBis API provides methods to query datasets belonging to projects/experiments.
The bigger problem here would be defining the proper equivalence relation between datasets.
Make use the time of the acquisition. There are steps should be performed per image and can be done right after the image is available. These steps are checking the images and preprocessing them. The process does this steps would be able to check the machinery of the microscope and send warnings to the right person if something happens (like the microscope stops acquiring).
In the current iBRAIN2 concept, datasets must first be registered with iBRAIN2 in order to be used for any processing. This makes it hard to work with datasets that come from sources unknown to iBRAIN2. A possible use case is the registration of metadata alongside images in openBIS: we could register in openBIS a compound datasets consisting of ( images, TIFF metadata, thumbnails small, thumbnails big, quality assessment, and shading correction model ). However, these datasets would be unknown to iBRAIN2.
Ideally, iBRAIN2 could learn about these datasets “cheaply” for example by automatic openBIS inspection.
If we get rid off the dataset specific information from the database and we use the gui to query the datasets we don't need anymore the dataset registration. For the automatic processing we could also use dataset queries to check if there is new data in a previously set project.
Accumulate a list of processings that ended in an error condition. For each job that ended in an error condition, allow to
Typical resource bottlenecks and rules for clever handling of resource bottlenecks are:
Maintain information about the pool of datasets on the cluster.
Delete data from the pool of datasets on the cluster as soon as the dataset is not required by subsequent modules anymore.
Make sure that when the workflow manager starts, it either kills all running cluster jobs or it can recover them?
Allow stopping of jobs, including forced stop where cluster processes are killed.
Avoid overloading the cluster queuing system by starting unnecessary large numbers of cluster jobs. Maintain a list of running and queued jobs dynamically based on current cluster load.
Manually triggered maintenance tasks for the cluster:
A watchdog process could monitor the current status and responsiveness of the daemon. In case of crash or error, an email notification should be sent.
A corresponding watchdog based on a shell script already exists.
Email notification should be robust to exceptions and crashes, and should work based on logging classes. Newly added code should easily benefit from email notification without programming overhead.
Allow execution of sanity checking code on datasets to validate the data status.
Many properties of datasets can and should be annotated with the datasets in the storage. For these properties, the authority should be the storage and not the workflow manager. The workflow manager may cache some or all of these properties for speed purposes, however it should allow to gather or update the properties from the storage when required.
This is given by tight integration with the storage. Data maintenance is easy on the storage backend (openBIS or file system). Therefore, data maintenance is not required in the workflow manager, because it uses the storage backend for fetching data information.
Modules are partially configured in: