Hey there! I’m on the lookout for my next engineering leadership adventure. If you know of any roles let me know through my contact page or on LinkedIn.
I second guess myself a lot about what I know, or what I think I know. It’s even worse when it comes to the names of people I don’t normally interact with: “I’m almost positive his name is Dave, but I don’t want to get it wrong, so I’ll just wait until someone else says his name.” It’s a solution, but I’d rather be certain about their names.
In my previous article, Crawling the Web with Elixir’s Broadway and Wallaby, I suggested that it might be “beneficial to use either a Protocol or Behaviour to reduce code duplication,” but when considering which to use for what I’m working on, it left me second-guessing myself yet again. What’s the difference? When would you use one over the other? How do they actually help?
Because I don’t think I’m alone in my confusion, I’m going to try to answer these questions by first explaining what I thought I understood about both, talking about what I got wrong, showing the differences, and then giving a real-world—if contrived and incomplete—example for when you might use both.
I thought I had a pretty good handle on protocols; I even wrote an article about them. I knew they provided, “a mechanism to achieve polymorphism in Elixir.” That is,
by implementing functions specific to a protocol, we make sure our structs and built-in data types can take advantage of everything the implemented library has to offer.
I also knew they provided a way to extend a module’s functionality without having access to the module’s source code.
I’m embarrassed to admit it, but I thought behaviours were only used to define what functions a module had to have. That’s not entirely wrong, but it misses the “why?”. I came to this conclusion after reading José’s article, Mocks and Explicit Contracts. In that article he shows how to create mocks by switching out one module for another in a test environment, but it only worked if both modules, the original and the mock, conformed to the same contract, i.e. Behaviour. Because the Behaviour example from the article did nothing other than define callbacks, I concluded that that’s all they were used for.
This conclusion was further solidified after hearing an Elixir podcast host state that he thought Behaviours were best used in libraries rather than in your project, because you have control over the modules in your own project. That’s probably not what he meant, but it’s what I heard and it conformed to what I was already thinking: Behaviours are kind of useless.
There were three things I got wrong about the Protocols and Behaviours.
The first misunderstanding I had was thinking Protocols were Behaviours with added functionality. I came to that conclusion from the following quote:
Protocol is a behaviour with the dispatching logic so you don’t need to hand roll it nor impose a particular implementation in the user module.
– José Valim - Google Group discussion
It’s true that “[a] protocol is indeed a behaviour + dispatching logic,” but what I misunderstood was thinking a Protocol was interchangeable with a Behaviour. They are not.
Maybe it’s because I knew about behaviors in C# or because I had heard
“contract” used in relation to Behaviours, but for whatever reason, I had it in
my mind that Behaviours only defined a module’s functionality, but didn’t
provide any itself. I can’t defend this misunderstanding. If I had taken five
minutes to think about how GenServer or Plug worked, I would have quickly
abandoned that idea: both Behaviours obviously provide lots of functionality.
The idea that Behaviours are better suited in libraries than in your projects came from a couple places. I first heard the argument on a podcast, (which shall remain nameless) but I don’t think it’s what the speaker meant. And because I already had the idea that Protocols were just specialized Behaviours, and Behaviours didn’t provide functionality, it was an easy, if erroneous, conclusion to reach.
Protocols and Behaviours differ primarily by what they execute against. Protocols work with data types, while Behaviours execute against modules. Every other difference is based off these fundamental concepts.
…a behaviour is internal to a module–the module implements the behaviour. Protocols are different–you can place a protocol’s implementation completely outside the module. This means you can extend modules functionality without having to add code to them…”
– Dave Thomas, Programming Elixir
The first thing to notice when working with Protocols—that is, if you are
creating one—is that they don’t provide functionality, they define it.
Consider the Enumerable Protocol:
defprotocol Enumerable do
# documentation, type definitions, and specs have been removed
def reduce(enumerable, acc, fun)
def count(enumerable)
def member?(enumerable, element)
def slice(enumerable)
end
It’s not until you implement the Protocol, that you gain functionality.
Example:
defmodule User do
defstruct [:id, :name, :age]
defimpl Inspect do
def inspect(user, _opts_) do
"#{user.name} (#{user.age})"
end
end
end
# iEX
iex :: 1 > IO.inspect %User{id: '10-289", name: "John Galt", age: 38}
John Galt (38)
Unlike Behaviours which work at the Module level, Protocols work on data types.
In the previous example, where we implemented the Inspect protocol on User,
a %User{} struct is passed to IO.inspect/2. The inspect/2 function is then
able to do something with it because User defines how Inspect should handle
it.
Protocol is type/data based polymorphism. When I call Enum.each(foo, …), the concrete enumeration is determined from the type of foo.
– Sasa Juric StackOverflow answer
At runtime, Protocols allow us to execute the appropriate logic against the specific datatype. This is dynamic dispatching.
In OOP, polymorphism allows objects of different classes to be treated as instances of a common superclass. You get a similar behavior in Elixir when you implement a Protocol in your datatypes.
Polymorphism is a runtime decision about which code to execute, based on the nature of the input data. In Elixir, the basic (but not the only) way of doing this is by using the language feature called protocols.
– Sasa Juric
As we saw when we implemented the Inspect Protocol in the %User{} struct
above, any datatype that implements the Inspect protocol can be passed to
functions like Kernel.inspect/2 and IO.inspect/2 and Elixir figures out how
to handle each at runtime based on the datatype implementation.
If you’ve come to Elixir from another language like Ruby or JavaScript, you might be familiar with the term, “monkey patching.” Monkey patching allows us to “open up” a class or object and add or overwrite functionality. While you can’t overwrite functions in Elixir, you can extend functionality through the use of Protocols.
protocols allow us to extend the original behavior for as many data types as we need. That’s because dispatching on a protocol is available to any data type that has implemented the protocol and a protocol can be implemented by anyone, at any time.
– Elixir Lang
As an example, if you had the following Emptiness protocol in your codebase…
defprotocol Emptiness do
@doc "Returns a boolean value based on the 'emptiness' of the term"
@spec empty?(term) :: boolean()
def empty?(t)
end
…you could extend any Elixir type with Emptiness regardless of whether or
not it was a 1st-party type or if it was included as a library.
defimpl Emptiness, for: Plug.Conn do
def empty?(%Plug.Conn{resp_body: nil}), do: true
def empty?(%Plug.Conn{resp_body: body}), do: length(body) == 0
end
In this example, even though we didn’t create Plug.Conn, we can still add new
functionality with the Emptiness Protocol.
Protocols don’t require you to implement every function definition, but if you
don’t you may not get every available feature. For example, if you only
implement the Enumerable.reduce/2 function in your module, you’ll only be able
to pass your module type to some of Enum’s functions. With Behaviours, on the
other hand, you either implement every associated function or your application
doesn’t run.
A module that declares that it implements a particular behaviour must implement all of the associated functions. If it doesn’t, Elixir will generate a compilation warning.
– Dave Thomas, Programming Elixir
And again…
By declaring that our module implements that behaviour, we let the compiler validate that we have actually supplied the necessary interface. This reduces the chance of an unexpected runtime error.”
– Dave Thomas, Programming Elixir
As stated under What are the Differences, “Protocols and Behaviours differ primarily by what they are executed against. Protocols work with data types, while Behaviours execute against modules.” In order for this to work, modules must conform to the Behaviour by implementing the required functions.
Behaviour is a typeless plug-in mechanism. When I call
GenServer.start(MyModule), I explicitly pass MyModule as a plug-in, and the generic code from GenServer will call into this module when needed.– Sasa Juric StackOverflow answer
Behaviours provide a way of abstracting away functionality that is common across
every module that would implement them. For example, when you create a module
using the GenServer Behaviour and implement init/1 and handle_cast/2, you
don’t think about the loop it runs in, or handling state. The GenServer
Behaviour provides that functionality. All you need to worry about is
implementing the required functions.
A behaviour is a way to say: give me a module as argument and I will invoke the following callbacks on it, which these argument and so on. A more complex example for behaviours besides a GenServer are the Ecto adapters.
– José Valim Google Group discussion
Usually the provided functionality is accomplished through meta programming, and
you would use the specific Behaviour, but it’s not required and there’s no
reason you couldn’t populate the Behaviour module with “normal” functions for
added utility.
We get things wrong all the time. We miss meetings, include or exclude the wrong features, work on the wrong tasks, and the list goes on. Sometimes these mistakes are due to a misunderstanding, sometimes it’s poor communication, and sometimes it’s failing to check your premises. With regard to Protocols and Behaviours, for me it was a little bit of everything.
What I discovered, however, is that Behaviours and Protocols, while
superficially similar, serve very different roles in the Elixir ecosystem.
Protocols provide extensibility and polymorphism to your types, while Behaviours
provide functionality to and demand conformity from your modules. Behaviours are
about plugging in modules (GenServer, Plug, Ecto.Repo, etc.). Protocols
are about plugging in data types (Enumerable, Jason.Encoder, String.Chars,
etc.). Once you see it that way, you’ll never second-guess yourself again.
What follows is a completely contrived and incomplete example of how one might use Behaviours and Protocols together. The idea is that you might want to update a local User’s identity with information from a social network, and also update the social network’s information from that changed locally.
Here is both an example Behaviour and Protocol:
# SocialProfile Behaviour
defmodule SocialProfile do
@callback get_profile(Identity.t) :: {:ok, term} | {:error, atom}
@callback update_profile(Profile.t) :: {:ok, term} | {:error, atom}
@callback to_profile(Identity.t) :: term | {:error, atom}
end
# Identifier Protocol
defprotocol Identifier do
def to_identity(profile)
end
Modules implementing the SocialProfile Behaviour are required to implement
the functions: get_profile/1, update_profile/1, to_profile/1, and
follow the TypeSpecs provided.
The Identifier Protocol requires that any module implementing it create the
to_identity/1 function. You would use this to transform social media profiles
into local Identities.
Here are two Profiles you might create:
# x_profile.ex
defmodule XProfile do
@behaviour SocialProfile
@impl SocialProfile
def get_profile(identity) do
# logic to retrieve profile based on provided identity
end
@impl SocialProfile
def update_profile(profile) do
# logic to update profile
end
@impl SocialProfile
def to_profile(identity) do
[first_name, last_name] = String.split(identity.name)
%{
first_name: first_name,
last_name: last_name,
email: identity.email,
bio: identity.bio
}
end
end
# Protocol implemented outside of module
defimpl Identifier, for: XProfile do
def to_identity(profile) do
%Identity{
name: "#{profile.first_name} #{profile.last_name}",
email: profile.email,
bio: profile.bio
}
end
end
# github_profile.ex
defmodule GitHubProfile do
@behaviour SocialProfile
@impl SocialProfile
def get_profile(identity) do
# logic to retrieve profile based on provided identity
end
@impl SocialProfile
def update_profile(profile) do
# logic to update profile
end
@impl SocialProfile
def to_profile(identity) do
%{
name: identity.name,
email: identity.email,
description: identity.bio
}
end
# Protocol implemented inside module
defimpl Identifier do
def to_identity(profile) do
%Identity{
name: profile.name,
email: profile.email,
bio: profile.description
}
end
end
end
The main difference between the two is the use of first_name and last_name
on the XProfile versus just having a name in GitHub. Because of that, we
need a little extra logic in to_identity/1 and to_profile/1 in XProfile.
I didn’t add example logic to either get_profile/1 or update_profile/1
because it seemed like unnecessary effort. The purpose of both is clear.
Lastly, we have an example of how the above might be used:
# Example Behaviour usage
identity = Users.get_identity_by_email("john@galtsgulch.co")
[GitHubProfile, XProfile]
|> Enum.each(fn social_profile ->
identity
|> social_profile.to_profile()
|> social_profile.update_profile()
end)
In this usage example, we are retrieving the User’s identity based on their email. With that in hand, we are then able to update all of their social profiles after first transforming the identity into the required profile.
In the next example, we retrieve the User’s profile from GitHub, and then transform it into an Identity in order to update our local copy.
# Example Protocol usage
identity = Users.get_identity_by_email("john@galtsgulch.co")
profile = GitHubProfile.get_profile(identity)
identity = Identifier.to_identity(profile)
Users.update_identity(identity)