In the last articles, we talked about processes, how they are structured and we started talking about concurrency and parallelism (link). Today we will talk about threads in Ruby and its ecosystem, what are they and why are they so important to understand.
Let’s dive into it!
This article is the 4th article of a broader series about “low-level” computing concepts applied to Ruby.
- What is a Ruby implementation?
- Process management in Ruby
- Concurrency and parallelism in Ruby
- Thread management in Ruby
- Memory management in Ruby
Always keep in mind that I am intentionally summarising things to give you a quick overview, there is more to each concept.
What is a thread?
A thread can be defined as the smallest unit of execution that can be managed by the operating system, it is also called an execution context. It is part of a process, which can have one (single-threaded) to many threads (multithreaded).
The process threads share the process resources, such as text (code of the program), data (global variables and data which are available when the process is first initialised), and the heap, but they have their own stack. They may be executed concurrently in case of multithreading.
The purpose of a thread is the same as a process: it aims to execute code the fastest way possible. To achieve this goal it will use concurrency and/or parallelism (see my previous article (link)).
A process has_many :threads
Alright, now that we defined what’s a thread, let’s put it in context of a process. We could say that, continuing our previous article about process management, when running a program / application, it creates a process that has its own text, data and heap. This process will generate its main thread that has its dedicated stack and can access the process text, data and heap.
This main thread will then possibly create new threads that share the process resources mentioned above, but that will have their own stack.
This is summarised for multithreading in the below schema:
But now that you see this schema, you might wonder, couldn’t we launch several Ruby processes with 1 thread each instead of launching 1 Ruby process with several threads?
And you would be right, it’s possible and it’s the best options in certain scenarii, this would then be represented as:
Or you could even have many Ruby processes with 1 to many threads each…
With that said, now that you know what is a thread and how is it related to a process, several questions arise:
- What are the pros and cons of multithreading and multiprocessing?
- How do threads and processes are related to concurrency and parallelism?
- How are these concepts implemented in Ruby?
- And finally, why is it so important to know?
Multithreading vs multiprocessing
As you can see while looking at the 2 schemas we shared before, with multiprocessing comes duplication of the static states of a process (text and data) plus the duplication of 1 dynamic state of a process (heap).
Due to its dynamic nature, duplicating the heap is not the biggest memory usage increase factor here, so we will not account for it for the sake of simplicity.
Let’s say that you have an application using 200MiB of memory for the static states and that needs 150MiB of stack memory:
- using 2 processes with 1 single thread => requires 700MiB of memory
- using 1 single process with 2 threads => requires 500MiB of memory
In the real world, your OS uses a technique called Copy-on-Write (Cow) that reduces memory used for resources shared among different processes, so 2 processes with 1 single thread would require less than 700MiB of memory, but still more than 500MiB.
So using more processes is more memory costly than using more threads, it’s the thing we are going to focus on in the rest of the article.
Does the “consumes more computing resources” statement rings a bell? Remember our last article about concurrency and parallelism? Well, down to the next section.
Concurrency and parallelism in Ruby
I explained thoroughly these 2 concepts in my previous article (link), so I won’t delve into it here. In Ruby, both concurrency and parallelism can be achieved, but by different means:
So… Now we are talking! To put it simply, in Ruby, parallelism = different processes, concurrency = different threads.
Keep in mind that this statement only applies to the Ruby MRI implementation (the ‘default’ one when someone mentions Ruby see the first article of this series for further explanations), and it does not apply the same for other programming languages.
As an example, in Rust, both parallelism and concurrency can be achieved by threads due to its ownership system and lack of a Global Interpreter Lock (GIL). Word dropped. GIL… (in Ruby it’s called the Global VM Lock (GVL) because it exists at the VM level rather than the interpreter level).
Okay, that’s a long enough article, let’s keep the GVL long explanation for another article. To put it shortly, Ruby is not thread-safe, meaning that thread A could mess things for thread B if they were both to manipulate the same object at the same time for example. Therefore, Ruby needs something called the Global VM Lock to ensure that everything run smoothly (more in another article I promise).
But there is a catch, this GVL locks the execution of Ruby code to only 1 thread at a time, that’s why parallelism is not achievable with threads in Ruby.
Imagine you have 1 process with 2 threads (A and B), this would result in the below execution:
Thread B would need to wait for thread A to finish CPU execution (= Ruby code execution) to be able to execute its remaining Ruby code, therefore the execution latency is increasing due to the GVL, and that's why we cannot achieve parallelism with threads.
Implementation of threads and processes (forks) in Ruby
Hopefully you are still reading because the previous GVL explanation did not kill you. Let’s now talk about the Ruby implementation of threads showcasing this implementation in two popular gems.
Threads are implemented in Ruby using the Thread class, whereas processes are implemented using the Kernel#fork method.
Let’s see how it can be implemented!
Sidekiq
Sidekiq is a multi-threaded background job processor, therefore leveraging concurrency.
Below is the Sidekiq::Manager
class, it’s the central coordination point in Sidekiq
class Manager include Sidekiq::Component attr_reader :workers attr_reader :capsule def initialize(capsule) @config = @capsule = capsule @count = capsule.concurrency raise ArgumentError, "Concurrency of #{@count} is not supported" if @count < 1 @done = false @workers = Set.new @plock = Mutex.new @count.times do @workers << Processor.new(@config, &method(:processor_result)) end end end
When a new Manager
instance is initialised (right after launching Sidekick actually), it will create as many Sidekiq::Processor
instances as the concurrency set in the Sidekick config (do not get it wrong, a processor in Sidekick is ... a thread, not a process 😅).
Resque
Resque is a multi-processed background job processor, therefore leveraging parallelism.
Below is the Worker#perform_with_fork
method that is used by the gem to process jobs.
module Resque class Worker # … def perform_with_fork(job, &block) run_hook :before_fork, job begin @child = fork do unregister_signal_handlers if term_child perform(job, &block) exit! unless run_at_exit_hooks end rescue NotImplementedError @fork_per_job = false perform(job, &block) return end srand # Reseeding procline "Forked #{@child} at #{Time.now.to_i}" begin Process.waitpid(@child) rescue SystemCallError nil end end # … end end
(https://github.com/resque/resque/blob/2f9d080ce86eb2e3f1f3d47599a21c576124c6f3/lib/resque/worker.rb)
When a new job is registered with Resque, it will create a new fork (ie a new process) to perform it.
That’s it! If you have survived till the end of this article, you should now be able to make better decisions about thread and process configuration based on your application needs.
The next article in this series will delve into memory management in Ruby, exploring topics such as the RAM, the Ruby heap or the Ruby Garbage collector. Get ready for it!
Resources used:
- Shopify blog article - To Thread or Not to Thread: An In-Depth Look at Ruby’s Execution Models - https://shopify.engineering/ruby-execution-models
- Forking and threading in Ruby - https://thecodest.co/blog/forking-and-threading-in-ruby/
- Copy-on-Write - https://www.linkedin.com/pulse/what-copy-on-write-advantages-disadvantages-billy-chan/
- The Practical Effects of the GVL on Scaling in Rub - https://www.speedshop.co/2020/05/11/the-ruby-gvl-and-scaling.html
- Ruby Kernel#fork official documentation - https://ruby-doc.org/3.2.2/Kernel.html#method-i-fork
- Ruby Thread official documentation - https://ruby-doc.org/3.2.2/Thread.html
- Puma gem repository - https://github.com/puma/puma
- Sidekiq gem repository - https://github.com/sidekiq/sidekiq
- Sidekiq::Manager class - https://github.com/sidekiq/sidekiq/blob/7302de4dd5358302f85531e0e3ae27d2d5ddb493/lib/sidekiq/manager.rb
- Resque gem repository - https://github.com/resque/resque
- Resque::Worker class - https://github.com/resque/resque/blob/2f9d080ce86eb2e3f1f3d47599a21c576124c6f3/lib/resque/worker.rb