Quality of Service (QOS)
One can specify a Quality of Service (QOS) for each job submitted to SLURM. The quality of service associated with a job will affect the job in three ways:
The QOS's are defined in the SLURM database using the sacctmgr utility.
Jobs request a QOS using the "--qos=" option to the sbatch, salloc, and srun commands.
Job Scheduling Priority
Job scheduling priority is made up of a number of factors as described in the priority/multifactor plugin. One of the factors is the QOS priority. Each QOS is defined in the SLURM database and includes an associated priority. Jobs that request and are permitted a QOS will incorporate the priority associated with that QOS in the job's multi-factor priority calculation.
To enable the QOS priority component of the multi-factor priority calculation, the "PriorityWeightQOS" configuration parameter must be defined in the slurm.conf file and assigned an integer value greater than zero.
A job's QOS only affects is scheduling priority when the multi-factor plugin is loaded.
Job Preemption
SLURM offers two ways for a queued job to preempt a running job, free-up the running job's resources and allocate them to the queued job. See the Preemption description for details.
The preemption method is determined by the "PreemptType" configuration parameter defined in slurm.conf. When the "PreemptType" is set to "preempt/qos", a queued job's QOS will be used to determine whether it can preempt a running job.
The QOS can be assigned (using sacctmgr) a list of other QOS's that it can preempt. When there is a queued job with a QOS that is allowed to preempt a running job of another QOS, the SLURM scheduler will preempt the running job.
Job Limits
Each QOS is assigned a set of limits which will be applied to the job. The limits mirror the limits imposed by the user/account/cluster/partition association defined in the SLURM database and described in the Resource Limits section. When limits for a QOS have been defined, they will take precedence over the association's limits.
Here are the limits that will be imposed on jobs running under a QOS
- GrpCpus Maximum number of CPU's all jobs with this QOS can be allocated.
- GrpCPUMins A hard limit of cpu minutes to be used by jobs running from this QOS. If this limit is reached all jobs running in this group will be killed, and no new jobs will be allowed to run.
- GrpCPURunMins Maximum number of CPU minutes all jobs running with this QOS can run at the same time. This takes into consideration time limit of running jobs. If the limit is reached no new jobs are started until other jobs finish to allow time to free up.
- GrpJobs Maximum number of jobs that can run with this QOS.
- GrpMemory Maximum amount of memory (MB) all jobs with this QOS can be allocated.
- GrpNodes Maximum number of nodes that can be allocated to all jobs with this QOS.
- GrpSubmitJobs Maximum number of jobs with this QOS that can be in the system (no matter what state).
- GrpWall Wall clock limit for all jobs running with this QOS.
- MaxCpusPerJob Maximum number of CPU's any job with this QOS can be allocated.
- MaxCPUMinsPerJob Maximum number of CPU*minutes any job with this QOS can run.
- MaxNodesPerJob Maximum number of nodes that can be allocated to any job with this QOS.
- MaxWallDurationPerJob Wall clock limit for any jobs running with this QOS.
- MaxCpusPerUser Maximum number of CPU's any user with this QOS can be allocated.
- MaxJobsPerUser Maximum number of jobs a user can run with this QOS.
- MaxNodesPerUser Maximum number of nodes that can be allocated to any user with this QOS.
- MaxSubmitJobsPerUser Maximum number of jobs with this QOS that can be in the system.
Other QOS Options
- Flags Used by the slurmctld to override or enforce certain
characteristics. Valid options are
- DenyOnLimit If set jobs using this QOS will be rejected at submission time if they do not conform to the QOS 'Max' limits. By default jobs that go over these limits will pend until they conform.
- EnforceUsageThreshold If set, and the QOS also has a UsageThreshold, any jobs submitted with this QOS that fall below the UsageThreshold will be held until their Fairshare Usage goes above the Threshold.
- NoReserve If this flag is set and backfill scheduling is used, jobs using this QOS will not reserve resources in the backfill schedule's map of resources allocated through time. This flag is intended for use with a QOS that may be preempted by jobs associated with all other QOS (e.g use with a "standby" QOS). If the allocated is used with a QOS which can not be preempted by all other QOS, it could result in starvation of larger jobs.
- PartitionMaxNodes If set jobs using this QOS will be able to override the requested partition's MaxNodes limit.
- PartitionMinNodes If set jobs using this QOS will be able to override the requested partition's MinNodes limit.
- PartitionTimeLimit If set jobs using this QOS will be able to override the requested partition's TimeLimit.
- RequiresReservaton If set jobs using this QOS must designate a reservation when submitting a job. This option can be useful in restricting usage of a QOS that may have greater preemptive capability or additional resources to be allowed only within a reservation.
- GraceTime Preemption grace time to be extended to a job which has been selected for preemption.
- UsageFactor Usage factor when running with this QOS (i.e. .5 would make it use only half the time as normal in accounting and 2 would make it use twice as much.)
- UsageThreshold A float representing the lowest fairshare of an association allowable to run a job. If an association falls below this threshold and has pending jobs or submits new jobs those jobs will be held until the usage goes back above the threshold. Use sshare to see current shares on the system.
Configuration
To summarize the above, the QOS's and their associated limits are defined in the SLURM database using the sacctmgr utility. The QOS will only influence job scheduling priority when the multi-factor priority plugin is loaded and a non-zero "PriorityWeightQOS" has been defined in the slurm.conf file. The QOS will only determine job preemption when the "PreemptType" is defined as "preempt/qos" in the slurm.conf file. Limits defined for a QOS (and described above) will override the limits of the user/account/cluster/partition association.
Last modified 9 October 2009