One of the biggest feature for Go is the Go Scheduler
. The scheduler allows goroutines sheduling works in the user space and has very good performance for concurrency. In this article, I want to talk about what is go scheduler and how does it work. By understanding go scheduler will allow you to make good decision and use the goroutines correctly.
A short answer: Becaues Go uses goroutines. Goroutines are user space threads that run uppon the OS thread. The operating system is not aware of goroutines since it run in the user space. Thus, Go needs a scheduler in runtime to handle and schedule the state of goroutines. In other words, Go scheduler multiplexes goroutines onto kernel threads.
Reference for the reasons why Go use goroutines. Goroutine v.s. Thread
There are 4 classical scenes that Go scheduler needs to make scheduling decision.
From the example, you can see the goal of the schedule is to optimize the usage of the CPU and try to have the CPU always running if there is any goroutine is waiting for the computing resource.
For (2), (3), a kernel thread may be blocked.
Based on Go 1.14, Go scheduler is preemptive. The Go 1.14 introduces a new technique of asynchronous preemption which is triggered based on a time condition. There is a thread sysmon
, dedicated to watching long-running goroutines and emit a signal if a long-running goroutine is found. Once a signal is received, Go will push an instruction to the program counter, to make it looks like the running program called a function in the runtime. This function parks the goroutine and hands it to the scheduler that will run another one.
Noted that a goroutine cannot be blocked in anywhere. The current instruction should be a safe point.
There are 3 main components in Go scheduler.
For a N cores CPU machines, the maximal parallels we can have is equal to N which means that we will have at most N MPG pairs in the system. A visualization of a 2 cores machine is as belows.
There are 2 types of wait queue in the model. Local Run Queue and Global Run Queue.
The idea of local run queue is to prevent the resoucre competition from each processor. If we only have one global queue, everytime a processor access a queue needs to acquire the lock(prevent race condition) which will cause a bad performance.
The context switch for goroutines is in the thread and when a switch occurs, only 3 registers need to be saved/restored - Program Counter, Stack Pointer, and DX. From the OS’s perspective Go program behaves as an event-driven program. A waiting goroutine will be placed in the wait queue for later execution. Therefore even thousands of goroutine will not effect the context switch since processor fetch from a queue with O(1) complexity
To optimize the CPU usage, Go scheduler will prevent the M(Thread) to be idle. In others words, always have goroutines in each local run queue if possible. If the local run queue is empty, the strategy is as belows.
runtime.schedule() {
// only 1/61 of the time, check the global runnable queue for a G.
// if not found, check the local queue.
// if not found,
// try to steal from other Ps.
// if not, check the global runnable queue.
}
60/61 of the times, Go scheduler will do a work stealing. It will randomly pick a process a steal a half of the goroutines from its local runqueue. If it fail to steal, then it will try to fetch from the global queue instead. Again, we always try to avoid access global queue which requires the lock acquisition.
If the kernel thread is blocked (ex: syscall), the M(machine) attached to it will also be blocked. There may be a list of goroutines in the local run queue waiting for the execution. If we don’t handle it, the goroutines in the local run queue will wait until the thread is released. A straight forward solution is to put those goroutines into the global run queues. However, accessing global run queue requires a lock acquisition which causes a bad performance. In fact, Go scheduler instead of put it into the global run queue, it will try to find an idle thread to take over the “P” and its LRQ. After the OS Thread is released, Go scheduler will try to see there is any “P” with LRQ is looking for the idle thread. If it can’t find it, It will put the “G” to the global run queue then go to sleep.
You may feel weird why is there any idle thread. Isn’t Go scheduler suppose to prevent threads from being idle? Before we answer the question, we must know an OS Thread is much more heavier than a goroutine. To frequently create or destroy a OS Thread might hurt the performance. In the case of thread block (the P and LRQ is missing the “M”(Thread)), We can either wait until the block is released or find a new thread to take over the P and LRQ. Instead of creating the thread at the moment, Go introduces spinning thread
to have thread standby to serve. It may hurt extra CPU power, but it will minimize the cost if a preemption or a block occur. A thread is spinning if:
Idle threads with a “P” assignment don’t block if there are idle Ms without a “P” assignment. When new goroutines are created or an “M” is being blocked, scheduler ensures that there is at least one spinning “M”. This ensures that there are no runnable goroutines that can be otherwise running; and avoids excessive M blocking/unblocking.