gang_sched(7)gang_sched(7)NAMEgang_sched - Gang Scheduler
DESCRIPTION
The gang scheduler permits a set of MPI (Message Passing Interface)
processes, or multiple threads from a single process, to be scheduled
concurrently as a group.
Gang scheduling is enabled and disabled by setting the environment
variable to or
The gang scheduling feature can significantly improve parallel applica‐
tion performance in loaded timeshare environments that are oversub‐
scribed. Oversubscription occurs when the total number of runnable
parallel threads, runnable MPI processes, and other runnable processes
exceeds the number of processors in the system.
Gang scheduling also permits low-latency interactions among threads in
shared-memory parallel applications.
Only applications using the HP-UX V11.0 MPI or pthread libraries can be
gang scheduled. Because HP compiler parallelism is primarily built on
the pthread library, programs compiled with HP compilers can benefit
from gang scheduling.
INTERFACE
The HP-UX gang scheduler is enabled and disabled using an environment
variable. The variable is defined as:
Setting to enables gang scheduling and setting it to disables it. If
is not set, or if it is set to an undefined value, no action is taken.
Gang scheduling is a process attribute that is inherited by child pro‐
cesses created by (see fork(2)). The state of gang scheduling for a
process can change only following a call to (see exec(2)).
BEHAVIOR
After the environment variable is set to any MPI or pthread application
to execute and find this variable will enable gang scheduling for that
process.
Only the pthread and MPI libraries query the variable--the operating
system does not.
Gang scheduling is an inherited process attribute. When a process with
gang scheduling enabled creates a child process, the following occurs:
· The child process inherits the gang scheduling attribute.
· A new gang is formed for the child process. The child does
not become part of its parent's gang.
The gang scheduler is engaged only when a gang consists of multiple
threads. For a pthread application, this is when a second thread is
created. For an MPI application, it is when a second process is added.
As a process creates threads, the new threads are added to the
process's gang if gang scheduling is enabled for the process. However,
once the size of a gang equals the number of processors in the system,
the following occurs:
· New threads or processes are not added to the gang.
· The gang remains intact and continues to be gang scheduled.
· The spill-over threads are scheduled with the regular time‐
share policies.
· If threads in the gang exit (thus making room available), the
spill-over threads are not added into the gang. However,
newly created threads are added into the gang when room is
available.
MPI processes are allocated statically at the beginning of execution.
When is set to all processes in an MPI application are made part of the
same gang.
Thread and process priorities for gangs are managed identically to
timeshare policy. The timeshare priority scheduler determines when to
schedule a gang and adheres to the timeshare policies.
Although it is likely that scheduling a gang will preempt one or more
higher priority timeshare threads, over the long run the gang scheduler
policy is generally fair. All threads in a gang will have been highest
priority by the time a gang is scheduled. Because all threads in a gang
must execute concurrently, some threads do not execute when they are
highest priority (the threads must wait until all other threads have
also been selected, allowing other processes to run first).
Gangs are scheduled for a single time-slice. The time-slice is the same
for all threads in the system, whether gang-scheduled or not.
When a single gang executes on a system, the gang's threads are
assigned to processors in the system and are not migrated to different
processors.
In an oversubscribed system with multiple gangs, all gangs are periodi‐
cally moved in order to give an equalized percentage of CPU time to
each of the different threads. This rebalancing occurs every few sec‐
onds.
EXTERNAL INFLUENCES
Environment Variables
The following environment variables affect gang scheduling of pro‐
cesses:
· enables (when set to and disables (when set to gang schedul‐
ing of processes. For details see the INTERFACE section of
this man page.
· specifies the number of processors available to execute pro‐
grams compiled for parallel execution. If not set, the
default is the number of processors in the system.
PERFORMANCE
Gang scheduling ensures that all runnable threads and processes in a
gang are scheduled simultaneously. This improves the synchronization
latency in parallel applications. For instance, threads waiting at a
barrier do not have to wait for currently unscheduled threads.
However, applications with lengthy parallel regions and infrequent syn‐
chronization may perform best when not gang scheduled. For those appli‐
cations, some threads can be scheduled even if all threads are not
scheduled at once.
A gang-scheduled application's performance can be affected by the num‐
ber of gang-scheduled applications on a system, and by the number of
threads in each. The gang scheduler assigns parallel applications to
CPUs using a "best fit" algorithm that attempts to minimize CPU overlap
among applications.
On systems with complex workloads including gangs of varying sizes, or
odd combinations of sizes, the workload may not optimally match the
number of CPUs available. In this situation an application may perform
better when not gang scheduled, thus enabling some threads to be sched‐
uled rather than waiting for all threads to be scheduled as a gang.
Scheduling Overhead
Gang scheduling incurs overhead when the scheduler collects a set of
threads, assigns a set of processors to the threads, and rendezvous the
set of threads and processors to achieve concurrent execution.
On an idle system, the gang scheduling overhead can be seen in the exe‐
cution time of a single parallel application.
Kernel Blocking of Threads
If a thread from a gang blocks in the kernel, the thread's processor is
available to run other non-gang-scheduled threads. When the blocked
thread resumes and its gang is currently running, the thread can join
the other ganged threads without having to rendezvous again.
In a multi-gang environment, thread blocking can result in lower
throughput. This occurs if an application's threads block often in the
kernel for long periods of time.
Preempting by Realtime Threads
Gang-scheduled threads can be preempted from execution by realtime
threads. This affects only the gang-scheduled thread running on the
processor being preempted by a realtime thread. The remaining threads
of the gang continue to run through the end of their time-slice.
RESTRICTIONS
For this implementation of gang scheduling, the following restrictions
exist. Some of these may be removed in future releases.
· Gang scheduling of processes being debugged is not supported.
When a debugger attaches to a process, gang scheduling for
the process is disabled. This avoids gang scheduling pro‐
cesses with one or more threads stopped by a debugger.
· Gang scheduling is completely shut down when Process Resource
Manager (PRM) is enabled.
· If a gang-scheduled process is selected to be swapped out,
the process will not be gang-scheduled when it is swapped
back in.
· Realtime processes are not gang-scheduled.
· Gang scheduling is only supported for processes with time‐
share scheduling policies.
· When a gang-scheduled process contains the maximum number of
threads (or the maximum number of processes, for MPI applica‐
tions), threads or processes created after this point are
not scheduled as part of the gang. For details see the BEHAV‐
IOR section of this man page.
· Multiprocess applications that do not use MPI are not sup‐
ported by the gang scheduler.
· Gang scheduling is not supported for threads. From release
11i Version 1.6 of HP-UX, the default scheduling contention
scope for threads is If any threads are created by an appli‐
cation, the initial thread will be treated as a
FILES
The following are libraries used in providing gang scheduling:
The pthread library.
The directory containing MPI libraries and MPI software. HP MPI is an
optional product.
SEE ALSOfork(2), exec(2).
gang_sched(7)