On UNIX-like operating systems (e.g. Linux) there's the concept of process forking via the
fork() system call. When one calls fork from a parent-process it will spawn a new child-process - that is a copy of its parent, then both processes return from fork. The child-process has an exact copy of the parent's memory in a separate address space. Due to this clearly not being efficient, UNIX implements the copy-on-write semantics (i.e. CoW): this delays the actual copying of memory until it needs to be written - simple and elegant.
The old days of Ruby 1.8.7 and REE
Historically, Ruby has been really bad at forking and I'm not referring to the language itself but to its VMs implementations: especially the MRI and the YARV. The issue is the way garbage collection works in these versions, without entering into details, when the GC runs to clear some unused references from memory it changes every object making them "dirty" i.e. CoW fails quickly as the whole memory becomes dirty. Why is this important ? The best Rack compliant servers use pre-forking to max out all CPUs (e.g. Unicorn , Phusion Passenger ).
When you fire up a Unicorn server it creates a number of a priori configured workers via the
fork() system call. On a 64bit Linux, a Rails 3.x, almost vanilla application takes out ~ 70M of memory and a rather complex one ~ 200M, now multiply that to the number of available cores on your system, let's say eight and we have a memory usage spanning from 560M to a whopping 1.6G. Using Ruby 1.8.7 yields a nice bonus : there's a CoW friendly version with an updated garbage collection algorithm written by the guys at Phusion - in the form of Ruby Enterprise Edition - REE .
This is not a solution as 1.8.7 nears EOL on June 2013 and REE is next , nevertheless REE was/is a great piece of software.
The recent release of Ruby 2.0
Narihiro Nakamura implemented the new GC algorithm - bitmap marking: in a nutshell - the Ruby VM can do its sweeping without actually modifying the objects i.e. CoW works as expected now. For more details you can check out a video about Ruby's GC and a recent really nice high-level explanation by Pat Shaughnessy.
When you combine Ruby 2.0 with Unicorn you can get some pretty impressive results. Using this simple script memstats.rb I checked the memory usage with eight workers:
Memory Summary: private_clean 0 kB private_dirty 1,584 kB pss 12,884 kB rss 80,152 kB shared_clean 1,984 kB shared_dirty 76,584 kB size 275,704 kB swap 0 kB
Based on this gist : “rss represents the physical memory that is actually used, and it's comprised of private_clean + private_dirty + shared_clean + shared_dirty” so practically ~ 76M is shared and ~ 1.5M is private. This is a huge step-up from YARV 1.9.3 where there's no CoW friendly GC i.e. no memory sharing whatsoever. These results are impressive, of course my app is rather trivial (it just runs this blog) so in real ones the ratio will not be as high but still big enough to save lots or ram.
What about Heroku ?
One Heroku Dyno is limited to 512M of ram after which it starts swapping memory until it crashes and/or Heroku will send a SIGKILL and restart your app - more info . Due to the recent routing-miscommunication, we all know that the best server for a Rack compliant app is Unicorn (actually, that should be common knowledge).
# config/unicorn.rb worker_processes 8
What happens when you upgrade to Ruby 2.0 ? You can crank up the workers as much as doubling them without hitting the hard quota.
Clearly this does not benefit only Heroku users, the upgrade to Ruby 2.0 is painless as in most cases is a drop-in replacement with the added bonus of lower memory usage. The big issue is with apps that run 1.8.7 or REE as upgrading from 1.8.7 to 1.9.3 and 2.0 yields some compatibility problems.