====== Cluster Queue Optimization ====== When submitting jobs to the cluster, there are constraints on the number of jobs to submit. These constraints depend on the actual cluster hardware and on the configuration. **List of possible constraints:** * Jobs will time out (QUEUE_ERROR), if not started within N hours * Maximum number of jobs for the queue * Maximum number of useful jobs for the queue ===== Optimizing the Number of Submitted Jobs for the Cluster ===== Assuming the cluster is a shared resource with multiple users, and: * it can process no more than M jobs in parallel, and * if currently has Q jobs queued, and * it currently processes N <= M jobs What is the optimal number of jobs to submit? We can find the following rules: ==== Submit at least as many jobs as free nodes available ==== If the number of queued jobs Q is smaller than the available nodes M, then it is useful to **submit at least (M-Q) jobs**. This is easy to see: any lower number of jobs would not use all available cluster resources. However, we can not deduce anything about a higher number (yet). In the following we therefore assume the cluster is always running at least M jobs, otherwise we can use this rule to find a decision. ==== Submit enough jobs to keep a position in the queue ==== If the cluster queuing system serves on a "first come, first serve" policy, it is better to submit jobs as early as possible, because the queue position depends on the submission time. If this policy is active, it is useful to **submit as many jobs as possible** at the earliest possible time point.