beewm:devel:cluster_queue_optimization

When submitting jobs to the cluster, there are constraints on the number of jobs to submit. These constraints depend on the actual cluster hardware and on the configuration.

**List of possible constraints:**

- Jobs will time out (QUEUE_ERROR), if not started within N hours
- Maximum number of jobs for the queue
- Maximum number of useful jobs for the queue

Assuming the cluster is a shared resource with multiple users, and:

- it can process no more than M jobs in parallel, and
- if currently has Q jobs queued, and
- it currently processes N ⇐ M jobs

What is the optimal number of jobs to submit?

We can find the following rules:

If the number of queued jobs Q is smaller than the available nodes M, then
it is useful to **submit at least (M-Q) jobs**. This is easy to see: any lower
number of jobs would not use all available cluster resources. However, we
can not deduce anything about a higher number (yet).

In the following we therefore assume the cluster is always running at least M jobs, otherwise we can use this rule to find a decision.

If the cluster queuing system serves on a “first come, first serve” policy,
it is better to submit jobs as early as possible, because the queue position
depends on the submission time. If this policy is active, it is useful to
**submit as many jobs as possible** at the earliest possible time point.

beewm/devel/cluster_queue_optimization.txt · Last modified: 2013/07/24 17:49 by epujadas

Except where otherwise noted, content on this wiki is licensed under the following license: GNU Free Documentation License 1.3