User Tools

Site Tools


beewm:devel:cluster_queue_optimization

Cluster Queue Optimization

When submitting jobs to the cluster, there are constraints on the number of jobs to submit. These constraints depend on the actual cluster hardware and on the configuration.

List of possible constraints:

  • Jobs will time out (QUEUE_ERROR), if not started within N hours
  • Maximum number of jobs for the queue
  • Maximum number of useful jobs for the queue

Optimizing the Number of Submitted Jobs for the Cluster

Assuming the cluster is a shared resource with multiple users, and:

  • it can process no more than M jobs in parallel, and
  • if currently has Q jobs queued, and
  • it currently processes N ⇐ M jobs

What is the optimal number of jobs to submit?

We can find the following rules:

Submit at least as many jobs as free nodes available

If the number of queued jobs Q is smaller than the available nodes M, then it is useful to submit at least (M-Q) jobs. This is easy to see: any lower number of jobs would not use all available cluster resources. However, we can not deduce anything about a higher number (yet).

In the following we therefore assume the cluster is always running at least M jobs, otherwise we can use this rule to find a decision.

Submit enough jobs to keep a position in the queue

If the cluster queuing system serves on a “first come, first serve” policy, it is better to submit jobs as early as possible, because the queue position depends on the submission time. If this policy is active, it is useful to submit as many jobs as possible at the earliest possible time point.

beewm/devel/cluster_queue_optimization.txt · Last modified: 2013/07/24 17:49 by epujadas