There is no magic in Elixir!

2019-04-30 9 min read programming Sven Gehring

If you’re anything like me, you probably started to learn Elixir and wanted to skip to the shiny stuff right away. Sure there’s some basics to sift through like the data types and specific syntax elements but after that, we can finally build a distributed, scalable, performant masterpiece of an application! - Riiigt? Granted, even if you’re a bit more of a sane person, once you get to work with Supervisor, GenServer, Agent and other modules alike, you can’t help but feel that things have been simplified a lot for you. That’s great but sometimes, this comes at the cost of a framework doing complex magic that you have no hope of deciphering if something ever goes wrong.

I’ve read a few books on Elixir by now and yes, most of them will tell you that this is not the case in Elixir/OTP and things are actually really simple internally but… that’s exactly what someone with lost of complex magic in their modules would say, eh! In short, You’ve probably read about this topic but you probably also didn’t believe it. If you have not yet done your own research, this article is here to slap some code in your face to proof to you how thin of a layer a lot of the shiny stuff in Elixir is and provide some insight on why it works so nicely.

⚠️ You should not be like me back then and treat data structures as something I have to learn as well. Understanding their pros and cons is absolutely vital and defining them properly can make the difference between an extremely clean or a horribly messy codebase.

What you need to know about recursion

If we want to dumb things down, a (tail) recursive function is a function that calls itself as its last action. That’s it. In Elixir, a function like the following is not an issue, since tail calls are optimized, so there won’t be an overflow of the call stack. It’s still not quite ideal, though, if we spawn a process with that function, it will probably kick our CPU usage to 100%

1
2
3
4
5
defmodule Recursive do
  def function do
    function()
  end
end

For functions with finite recursions - like a text parser or a sequence generator - it’s ok to run as fast as they can, however, a lot of times we want these things to run forever. To understand how we can do this efficiently, we need to understand a simple concept of the BEAM. Processes can have different states internally, which decide how the scheduler will treat them. To simplify, a process can be in the running state, if it’s doing something, or in the waiting state, if it’s waiting to receive a new message in its inbox. There is more to it than that but we don’t have to care about that at this point. Since the scheduler will ignore waiting processes, it is very cheap to have a process waiting for a message without doing anything.

Let’s adjust our example. The recursive function will still happily call itself but only after receiving a message. If we spawn a process with this function, it will instantly go into the waiting state and its resource usage (aside from its reserved memory) is pretty much zero.

1
2
3
4
5
6
7
defmodule Recursive do
  def function do
    receive do
      _ -> function()
    end
  end
end

Look mum, no supervisors!

Elixir - and Erlang’s - supervisors provide lots of options and conventions. And that’s also where most of their complexity resides. At its very core, a supervisor does nothing but spawning a process, watching over it and restarting it when it dies. So if we want to build our own supervisor, we need to do precisely that.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
defmodule Pragmatic do
  def supervisor(worker) do
    spawn_link(fn ->
      Process.flag(:trap_exit, true)
      worker_pid = spawn_link(worker)
      receive do
        {:EXIT, ^worker_pid, _} ->
          supervisor(worker)
      end
    end)
  end
end

Line
`3`	We start a new process linked to our current process. With this, we encapsulate the supervisor in its own process, so our calling process is not affected by its flags and does not receive its messages.
`4`	We set the the process to trap exits of its child processes. For more details, see Erlang’s process_flag/2.
`5`	We start a linked process with the passed worker function and store its process id.
`6`	Now we wait. Our supervisor enters waiting state.
`7`	When an `:EXIT` message from our worker process id is received…
`8`	… we call ourselves again and the whole process starts over.

That’s it. Supervisor done. Good job! - Ok, I admit, this is lacking most features that the actual Supervisor module in Elixir provides but if we take a look at the code, we can see that it will fulfil its most important task. - Let’s run it in iex and see if it works!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
worker_function = fn ->
  :timer.sleep(1_000)
  IO.puts("completed at #{:erlang.system_time()}")
end

Pragmatic.supervisor(worker_function)

#> completed at 1556621845035672000
#> completed at 1556621846038645599
#> completed at 1556621847039733259
#> ...

This behaves pretty much exactly like using the actual Elixir Supervisor module with permanent restarting and a very simple worker process.

A not so secret agent

Ok, let’s do one more. I originally wanted to do GenServer but figured Agent would be a better fit, since they are a bit less generic and so we can save another 10 lines when implementing a very pragmatic version of it. Again, let’s think. At its very core, an agent is a process that holds a state that we can get and set. That’s it. You know what, let’s cut the boring stuff and just implement it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
defmodule Pragmatic do
  def worker(state \\ nil) do
    receive do
      {:set, value} ->
        worker(value)

      {:get, from} ->
        send(from, state)
        worker(state)
    end
  end
end

Line
`3`	We go into waiting state right away. There’s no work to be done, we’re just a vessel for reserving some memory.
`4`	If we receive a set tuple with a new value…
`5`	…we call ourselves with the provided new value.
`7`	If we receive a get tuple with the requesting process’ id…
`8`	… we send that process our current state…
`9`	… and call ourselves with that same state again

Let’s put this all together. I have added Process.register/2 to the supervisor, so our worker process gets named. This way, we can more easily find it after it has been started. As commented in the code, this implementation is not ideal, since it does not wait for the child to actually be started, which the actual Supervisor obviously does. But it shall be sufficient enough for our demonstration.

At line 26, we create a function that contains our worker. This does not yet spawn the worker, it only creates a function which is capable of doing so, that can then be passed to start_link/1. Since we built a nice supervisor, we might as well use it, so on line 27, we have the supervisor start our agent. Again, we could do this on our own with start_link/1 any time as well.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
defmodule Pragmatic do
  def supervisor(worker) do
    spawn_link(fn ->
      worker_pid = spawn_link(worker)
      Process.register(worker_pid, :worker)  # <- I added this so we can find our worker
      Process.flag(:trap_exit, true)
      receive do
        {:EXIT, ^worker_pid, _} ->
          supervisor(worker)
      end
    end)
  end

  def worker(state \\ %{}) do
    receive do
      {:set, value} ->
        worker(value)

      {:get, from} ->
        send(from, state)
        worker(state)
    end
  end
end

pseudo_agent = fn -> Pragmatic.worker() end
pseudo_supervisor = Pragmatic.supervisor(pseudo_agent)


# Our very simple implementation does not wait until the worker process
# is actually started, so if we use this outside iex, this is a race condition!
agent_pid = Process.whereis(:worker)

send(worker_pid, {:set, 1})
send(worker_pid, {:get, self()})
receive do
  any -> IO.inspect any
end
#> 1

send(worker_pid, {:set, 3})
send(worker_pid, {:get, self()})
receive do
  any -> IO.inspect any
end
#> 3

Ok, we need 5 lines for getting and setting a state each time but it works. And that’s where the actual Agent module comes into play. It wraps those lines into nice, easy functions for you to use without having to care about the message format between processes or the timeout handling.

So what does this tell us?

I am not here to tell you, that any of the generic abstractions in Elixir/OTP are bad or that you can do them better on your own. Me and you probably can’t, because the original modules are battle tested and have been refined over time. The true takeaway from all of this is, Elixir/OTP gives you tons of awesome tools for building distributed, scalable and maintainable applications but you don’t always have to use them. Maybe a simple key-value store can be done faster without a GenServer?

The big benefits of using these modules, is having a defined interface and wrapper functions, that will be the same across every code base and that abstract away a lot of complexity that we don’t want to deal with when we write applications on top of them. But, as the title probably told you, there is no magic in them. And yes, you probably knew that but seeing this tiny bit of code in action is an impressive demonstration of the BEAMs capabilities without using complex abstractions.

At the end of the day, re-implementing some of these abstractions can be an awesome learning experience and can really help you estimate, where you should rely on them - and where you don’t.

A personal note: I obviously knew the conclusion of this article when I started writing it but creating a pragmatic implementation of what powers a huge part of most of our applications and getting it done in under 25 lines was amazing for me. Sure, it is nowhere near ready to be used but at its very core, this tiny fragment of code encapsulates the ideas that build the backbone for a lot of our applications.