tsungjen.io

Intro

One of the biggest feature for Go is the Go Scheduler. The scheduler allows goroutines sheduling works in the user space and has very good performance for concurrency. In this article, I want to talk about what is go scheduler and how does it work. By understanding go scheduler will allow you to make good decision and use the goroutines correctly.

Why do we need scheduler in Go?

A short answer: Becaues Go uses goroutines. Goroutines are user space threads that run uppon the OS thread. The operating system is not aware of goroutines since it run in the user space. Thus, Go needs a scheduler in runtime to handle and schedule the state of goroutines. In other words, Go scheduler multiplexes goroutines onto kernel threads.

Reference for the reasons why Go use goroutines. Goroutine v.s. Thread

When do it schedules?

There are 4 classical scenes that Go scheduler needs to make scheduling decision.

Creation of goroutines: either schedule to run or put it into the wait queue
System calls: switch the goroutine and allow other goroutine to use the CPU
I/O Blocking: switch the goroutine and allow other goroutine to use the CPU
I/O Complete: put the blocked goroutine back to the wait queue

From the example, you can see the goal of the schedule is to optimize the usage of the CPU and try to have the CPU always running if there is any goroutine is waiting for the computing resource.

For (2), (3), a kernel thread may be blocked.

Is Go scheduler preemptive?

Based on Go 1.14, Go scheduler is preemptive. The Go 1.14 introduces a new technique of asynchronous preemption which is triggered based on a time condition. There is a thread sysmon, dedicated to watching long-running goroutines and emit a signal if a long-running goroutine is found. Once a signal is received, Go will push an instruction to the program counter, to make it looks like the running program called a function in the runtime. This function parks the goroutine and hands it to the scheduler that will run another one.

Noted that a goroutine cannot be blocked in anywhere. The current instruction should be a safe point.

The “MPG” Model

There are 3 main components in Go scheduler.

M(Machine): Kernel Thread. This is the thread managed by the OS
P(Processor): A logical processor to manage the Local Run Queue(LRQ) and put the goroutine to run in the “M”
G(Goroutine): The goroutines

For a N cores CPU machines, the maximal parallels we can have is equal to N which means that we will have at most N MPG pairs in the system. A visualization of a 2 cores machine is as belows.

There are 2 types of wait queue in the model. Local Run Queue and Global Run Queue.

Local run queue has higher priority to be execute. Access by the process only. Unless there is a stealing happaned
Global run queue has lower priority. Once the local run queue is empty. Processor will consider to fetch goroutine from the Global Run Queue

The idea of local run queue is to prevent the resoucre competition from each processor. If we only have one global queue, everytime a processor access a queue needs to acquire the lock(prevent race condition) which will cause a bad performance.

Context Switch

The context switch for goroutines is in the thread and when a switch occurs, only 3 registers need to be saved/restored - Program Counter, Stack Pointer, and DX. From the OS’s perspective Go program behaves as an event-driven program. A waiting goroutine will be placed in the wait queue for later execution. Therefore even thousands of goroutine will not effect the context switch since processor fetch from a queue with O(1) complexity

When a thread is idle(local queue is empty)

To optimize the CPU usage, Go scheduler will prevent the M(Thread) to be idle. In others words, always have goroutines in each local run queue if possible. If the local run queue is empty, the strategy is as belows.

runtime.schedule() {
    // only 1/61 of the time, check the global runnable queue for a G.
    // if not found, check the local queue.
    // if not found,
    //     try to steal from other Ps.
    //     if not, check the global runnable queue.
}

1/61 of the time, fetch a goroutine from global run queue.
Try to steal from other MPG’s Local run queue

60/61 of the times, Go scheduler will do a work stealing. It will randomly pick a process a steal a half of the goroutines from its local runqueue. If it fail to steal, then it will try to fetch from the global queue instead. Again, we always try to avoid access global queue which requires the lock acquisition.

When a thread is blocked by OS

If the kernel thread is blocked (ex: syscall), the M(machine) attached to it will also be blocked. There may be a list of goroutines in the local run queue waiting for the execution. If we don’t handle it, the goroutines in the local run queue will wait until the thread is released. A straight forward solution is to put those goroutines into the global run queues. However, accessing global run queue requires a lock acquisition which causes a bad performance. In fact, Go scheduler instead of put it into the global run queue, it will try to find an idle thread to take over the “P” and its LRQ. After the OS Thread is released, Go scheduler will try to see there is any “P” with LRQ is looking for the idle thread. If it can’t find it, It will put the “G” to the global run queue then go to sleep.

Spinning Thread

You may feel weird why is there any idle thread. Isn’t Go scheduler suppose to prevent threads from being idle? Before we answer the question, we must know an OS Thread is much more heavier than a goroutine. To frequently create or destroy a OS Thread might hurt the performance. In the case of thread block (the P and LRQ is missing the “M”(Thread)), We can either wait until the block is released or find a new thread to take over the P and LRQ. Instead of creating the thread at the moment, Go introduces spinning thread to have thread standby to serve. It may hurt extra CPU power, but it will minimize the cost if a preemption or a block occur. A thread is spinning if:

An “M” with a “P” assignment is looking for a runnable goroutine.
An “M” without a “P” assignment is looking for available “Ps”.
Scheduler also unparks an additional thread and spins it when it is readying a goroutine if there is an idle “P” and there are no other spinning threads. There are at most GOMAXPROCS spinning “Ms” at any time. When a spinning thread finds work, it takes itself out of spinning state.

Idle threads with a “P” assignment don’t block if there are idle Ms without a “P” assignment. When new goroutines are created or an “M” is being blocked, scheduler ensures that there is at least one spinning “M”. This ensures that there are no runnable goroutines that can be otherwise running; and avoids excessive M blocking/unblocking.

Understanding Go Scheduler