FunWithFlags: a feature toggle library plus its web GUI as a Plug

tompave · May 1, 2017, 10:52pm

Hello there,

I would like to share a feature toggles library (AKA feature flags) I’ve been working on.

The main package is FunWithFlags, which provides global toggles (simple on/off), actor toggles (enabled or disabled for specific entities, e.g. a user), and group toggles (enabled or disabled for groups of actors).

The library uses Redis for persistence and ETS for caching.

I’ve also released FunWithFlags.UI, its optional web GUI available as a Plug for Phoenix or other Plug applications.

If you come from Ruby on Rails, it’s very similar to the flipper Ruby gem.

Example usage:

harry = %User{id: 1, name: "Harry Potter", groups: [:wizards, :gryffindor]}
hagrid = %User{id: 2, name: "Rubeus Hagrid", groups: [:wizards, :gamekeeper]}
dudley = %User{id: 3, name: "Dudley Dursley", groups: [:muggles]}
FunWithFlags.disable(:wands)
FunWithFlags.enable(:wands, for_group: :wizards)
FunWithFlags.disable(:wands, for_actor: hagrid)

FunWithFlags.enabled?(:wands)
false
FunWithFlags.enabled?(:wands, for: harry)
true
FunWithFlags.enabled?(:wands, for: hagrid)
false
FunWithFlags.enabled?(:wands, for: dudley)
false

A quick demo of the web GUI, here served from a Phoenix app:

brightball · May 1, 2017, 11:09pm

Fan of Big Bang Theory I take it?

tompave · May 1, 2017, 11:13pm

Used to be funnier though.

aseigo · May 2, 2017, 8:29am

Neat … nice to have libs covering common needs like this!

Some comments (can turn them into github issues if you prefer, of course):

The use of true/false for function parameters … this is a pet peeve of mine, as it decreases code readability and maintainability. In Gate:new the second parameter is for disabled/enabled. So there is code like this:

 Gate.new(:group, group_name, true)

and

 Gate.new(:actor, actor, false)

What does that last parameter mean? One has to look at gate.ex to figure that out, which is a shame. Would be nicer to use more explicit atoms … like :enabled and :disabled, with a default value in the function for that parameter set to :enabled so only when it is to be disabled does it show up in the code.

Ok, that’s a minor sniggle

More critically, I was left wondering about two things: storage and possibly ambiguous Gates.

The storage system is using Redis, and the reason appears to be to keep multiple nodes in sync with the flag data set using Redis pub/sub. This is something available out of the box, really, with the BEAM: create a cluster of nodes, send messages between them, end of story. There are even libraries like swarm for rather advanced handling of processes, and Phoenix PubSub for clusterable messaging … is an external (… slower) dependency really justifiable? Typical applications where I would use this library already have a database, so there is already a persistence end-point available (via Ecto, even…), so it really comes down to needing a way to coordinate changes across nodes, yes?

Then there is the potential question of more complex gates. By way of example: if I have a User struct which contains their geographical region, departmental group, and a beta testing group designation, all of which are strings, how would I map that using a gate?

From what I can see in the code, the name of a group in a Gate must be an atom. So if I want to select on, say, “users from South America”, I would need to be pass in a :south_america group name, or similar, and then know in the Protocol implementation for my User struct that :south_america needs to be mapped against their geo region, and not their departmental group. So it needs to know ahead of time all of the possible permutations and how they map to the data. Having a department also called South America would really complicate things… I suppose one could jump through hoops like :dpt_south_america, then do an Atom.to_string and parse out the prefix but that feels like a very big hack. Similarly for the possibility of a range of beta tester categories … it would be most convenient to have an arbitrary set of values (atoms, strings, whatever) for those categories (so a user may be a beta tester for accounts and images, but not postings …)

What would be very nice is the possibility of a {atom, any} tuple for the group which would allow things like {:region, “South America”}, {:department, :south_america}, {:beta, :accounts} in such situations. One could still use the simpler atom-only naming when the complexity is not needed.

I also wondered about how one might do random sampling of users, say on a user sign-up form, to do A/B testing of features. I suppose one way to get there would be to have a function that sets up the state in the session, then the defined flag would (via a FunWithFlags.Group protocol) pick out the state as setup in the current session. If that became a more common usage pattern, would you consider adding some convenience API for this in FunWithFlags, or would that be out of scope in your view?

Sorry for the long critique … this is just a rather interesting library imho and worth a moment of contemplation Thanks for putting this out there; many will find this functionality very useful, no doubt!

tompave · May 2, 2017, 11:03am

Hi @aseigo and thanks for the long and detailed feedback. Also, for your interest in the library

On your points:

The use of true/false for function parameters (…)

I see what you mean, and if it was a public API I would agree. The Gate struct is a private detail though – it’s not even documented – and users of the library would never see it or use it directly.
Even the web GUI library will only “use” it when rendering the template, but will not create it directly. (of course the tests will create gate structs)

The storage system is using Redis, and the reason appears to be to keep multiple nodes in sync with the flag data set using Redis pub/sub. This is something available out of the box, really, with the BEAM: create a cluster of nodes, send messages between them, end of story. There are even libraries like swarm for rather advanced handling of processes, and Phoenix PubSub for clusterable messaging.

This was a deliberate choice, and I’ve explained the reason in the readme.
To expand on that, I’ve noticed that a lot of web developers and teams moving into Elixir and Phoenix are very cautious about using it in production because the ecosystem is still small, and often the established BEAM approaches don’t work well on more “modern” deploy solutions. Think about PaaS and containers. They have become a staple of modern web development, and directly managing servers and releases is often considered an antiquate practice. Even if you don’t agree, please bear in mind that I’m simply describing what’s common in certain circles.

Developers coming from Ruby on Rails, Python and Django, Node.js, and even Scala and Go are used to package their code or artifacts in a Docker image, Heroku slug or ${INSERT_SIMILAR_TECH_HERE}, and then scale it in the cloud. Telling them that the solution to a problem is to configure a cluster of BEAM VMs will not help, and it’s often not possible: you can’t setup a cluster on a popular PaaS like Heroku, and doing so in Docker is more complex than on traditional servers.

What I aimed for was something that would just work on any setup.

When someone says “I need a feature toggle system for my 40 Phoenix servers running on Heroku”, I think that answering “Cool, use this library, but you need to add Redis” works better than saying “There is this library, but it won’t work on Heroku”.

is an external (… slower) dependency really justifiable? Typical applications where I would use this library already have a database, so there is already a persistence end-point available (via Ecto, even…), so it really comes down to needing a way to coordinate changes across nodes, yes?

I think you’re assuming that inter-node communication in a distributed cluster is always going to be faster than a Redis call. It might not always be true but, since I agree that a Redis roundtrip for each call is not ideal, I added the ETS cache .

With all of this out of the way, the library does provide an internal API for the storage module and the cache-busting pub-sub. I will probably add an adapter for the high-level Phoenix.PubSub module, so that I can ignore the details of the transport (PG2 or Redis); the main reason I haven’t started with that is that I wanted to keep the dependencies to a minimum for the initial version. I also think that someone is working on an Ecto adapter.

Then there is the potential question of more complex gates.

Some other gates are on the roadmap.

If I understand your use cases correctly, I think you can accomplish everything with the abstractions of the library. Just treat them as building blocks instead of complete “fits all” solutions. I’ll send some examples.

I also wondered about how one might do random sampling of users, say on a user sign-up form, to do A/B testing of features.

As I mentioned, a “% of actors” gate is planned. I would be cautious about confusing feature-toggles with A/B testing though. You can use feature toggles for A/B testing if you are confident that the same users will always see the same variant (even when they’ve logged out), but then you need some other way to track user events and perform statistical calculations on the results.

I suppose one way to get there would be to have a function that sets up the state in the session, then the defined flag would (via a FunWithFlags.Group protocol) pick out the state as setup in the current session. If that became a more common usage pattern, would you consider adding some convenience API for this in FunWithFlags, or would that be out of scope in your view?

You’re describing an A/B testing tool, and I’m not sure about adding support for this use case to the library. If %-based gates ship, however, they can be used as building blocks for an A/B testing framework I think.

aseigo · May 2, 2017, 12:05pm

IME it’s just as valid for internal code; someone has to work on it, right? But yeah, not a BIG thing indeed…

Makes sense; thanks for the response on that point.

I see that the adapter is configurable … so I suppose one could write a FunWithFlags.Store.Persistent.Native (or whatever name) module and use that via config.

Indeed, I was assuming that. Mostly because of the serialization/deserialization in Gate, Flag, and Store.Serializer. Always hard / impossible to say without measuring, but I’d be surprised if the serialization step was not noticeable at scale / load

For the single-node situation (which is likely to be rather common …) it will absolutely be faster as there will be no system IPC / RPC at all.

The statistical ones look nice!

I don’t think so … at least not without grand contortions … the group name in a Gate is required to be an atom, and that’s all. So there must be a one-to-one mapping between group name and a specific bit of data (probably within a struct/map). This is likely problematic in the case of ambiguous cases or where a mapping needs to occur.

e.g. if my %User struct contains a country code, there would need to be an atom for each country and a filter that identifies those country code atoms as being such to map them to the data in %User. I think it would be cleaner to have a { :countrycode, “ch” } tuple, where the filter can then select on the atom component and use the rest of the tuple in determining a match. The current limitation to atom appears to imply that there is a very close mapping between the group name and the data in the the target datatype.

Or maybe (quite plausibly, in fact! I am missing a clever way to accomplish the above?

Yes, they would be a building block, and those %-based gates were exactly what I was hoping to see. (I have a bad habit of skipping over roadmaps in README’s and just looking at the code … apologies …) They are certainly not the whole solution (can’t be, for the reasons you noted) but if there is already something controlling application paths based on flags, it’d be nice to be able to have one mechanism for the flag management to build on. So … yeah!

OvermindDL1 · May 2, 2017, 8:55pm

Docker perfectly supports mapping a set of named containers together, then they can share a set of ports to communicate and all sorts of things. If Heroku is broken enough not to support such setups (I’ve not used it) then move to another (because all that I have tried support distributed Elixir fine) or communicate over an SSH connection or teleport a connection or a variety of other ways. There is absolutely no reason to use Redis in an EVM system at all (maybe to integrate with others, but that is not what this is for).

Redis is definitely not something that will just work everywhere. Erlang can be deployed to micro containers (like to run on an RPi or so) that Redis will not work on as one of many examples. Plus that is extra stuff to set up, which makes setup more difficult (much more so with Redis).

Or use any of the other distribution protocols, can map the EVM over ssh or a proxy or teleport or a number of things, or write your own to go over tcp or udp or sctp or whatever, all of which then become just drop it in.

Especially this, if an application is big enough for something like this Redis is even less useful as Redis is entirely useless when the system already has PostgreSQL, which entirely has all of Redis’s capabilities.

Redis is not a lightweight thing to install.

Also, if an app is truly multi-node then they likely need to sync a LOT more data than just flags, so they have to have some kind of communication already in place, which is still likely not going to be Redis (either they will use the EVM’s normal way, or over ssh, or proxy, or whatever, or they will have something like PostgreSQL).

It likely would be though assuming identical network conditions, the EVM is designed for message passing and reliability.

Which begs the question of 'Why Redis?" ^.^

I still say Redis should be an optional add-on, but it should certainly not be the default method.

/me is a bit miffed at Redis due to setup issues with it in the past…

DianaOlympos · May 2, 2017, 9:23pm

/me is miffed at redis due to having to operate it. That thing should have never been used. It lose data, it lose connection, it can not handle more than a dozen connections, and let’s not talk about the data persistence or the distributed stuff. But it is what we have…

tompave · May 2, 2017, 10:30pm

It depends on the use case, as always, but it sounds like all of them should be actors. I have the feeling that you are approaching the problem with the wrong assumption: that only users can be actors. As you’ve realized, this would make things a bit complicated in some scenarios. I’ll provide examples at the end of the post.

Yes. By design. Group names are meant to be human friendly labels that you define ahead of time. By contrast, actors are dynamic entities whose identity you resolve at runtime. In your case, yes, you can either declare groups for all of your regions and departments, or you can make them actors with namespaced IDs. It really depends on what makes more sense for your access patterns.

Oh, yes, that would be awkward. That’s what makes me think that modelling them as actors makes sense.

If I get what you mean, these categories could still be groups. You are free to represent them as you want. A user can have a literal list of group names (like roles, or tags), or you can infer the group dynamically.

I really think that you’re describing actors here.

Some examples

Here are some examples of how I would model it. I’m going to use users, departments and countries. To keep things simple I’m going to use plain structs instead of using Ecto, and I’m going to use some invented functions and just say what they would do.

Let’s say we have these structs:

defmodule User do
  defstruct [:id, :name]
end

defmodule Country do
  defstruct [:name, :iso]
end

defmodule Department do
  defstruct [:name]
end

Which are all actors:

defimpl FunWithFlags.Actor, for: User do
  def id(%{id: id}) do
    "user:#{id}"
  end
end

defimpl FunWithFlags.Actor, for: Country do
  def id(%{iso: iso}) do
    "country:#{iso}"
  end
end

defimpl FunWithFlags.Actor, for: Department do
  def id(%{name: name}) do
    "department:#{name}"
  end
end

With these simple building blocks, I can build a matrix of actor gates. Just remember that disabled gates take precedence over the enabled ones.

So, for example, if I want to enable something for engineers in Japan, I can:

japan = %Country{name: "Japan", iso: "jp"}
engineering = %Department{name: "Engineering"}

FunWithFlags.enable(:beta_features, for_actor: japan)
FunWithFlags.enable(:beta_features, for_actor: engineering)

And then I can check it for individual users with:

def beta_features_enabled_for?(user = %User{}) do
  country = Country.for(user)
  department = Department.for(user)

  FunWithFlags.enabled?(:beta_features, for: country) &&
    FunWithFlags.enabled?(:beta_features, for: department)
end

And later you can even add some groups to the Country struct, to add more flexibility. For example, let’s say that you want to enable the flag for all asian countries:

defimpl FunWithFlags.Group, for: Country do
  @asian_countries ~w{jp hk} # ...
  @european_countries ~w{it fr} # ...
  #...

  def in?(%{iso: iso}, :asian_countries) do
    iso in @asian_countries
  end

  def in?(%{iso: iso}, :european_countries) do
    iso in @european_countries
  end

  # ...
end

FunWithFlags.enable(:beta_features, for_group: :asian_countries)

For the “beta testers group designations”, then you can use a more traditional group approach. For example you could find a way to either set the the designations on the %User{} struct, or make them fetchable. Some simple examples:

defimpl FunWithFlags.Group, for: User do
  def in?(%{groups: groups}, group) when is_list(groups) do
    group in groups
  end
end

# or...
defimpl FunWithFlags.Group, for: User do
  def in?(user, group) do
    designations = BetaTesterGroupDesignation.get_list_for(user)
    group in designations
  end
end

# or even... if you feel fancy (think carefully before doing this because it's not optimal)
defimpl FunWithFlags.Group, for: User do
  def in?(user, group) do
    if Enum.member?(BetaTesterGroupDesignation.all(), group)
      designations = BetaTesterGroupDesignation.get_list_for(user)
      group in designations
    else
      # other group logic or false
      false
    end
  end
end

Done that, you can add User to the checks I showed above, or you can use different logic to give explicit settings on user actors higher priority. As you can see, this library just gives you the building blocks.

By the way, you can also model everything with groups, just like you initially imagined.
For example:

defimpl FunWithFlags.Group, for: User do
  def in?(user, group) do
    country = Country.for(user)
    department = Department.for(user)
    String.downcase("#{country.name}_#{department.name}") == to_string(group)
  end
end

FunWithFlags.enable(:beta_features, for_group: :japan_engineering)
FunWithFlags.enabled?(:beta_features, for: user)

I hope this clarifies your doubts, and I hope you’ll find it useful!

tompave · May 2, 2017, 11:04pm

Docker perfectly supports mapping a set of named containers together, then they can share a set of ports to communicate and all sorts of things. If Heroku is broken enough not to support such setups (I’ve not used it) then move to another (because all that I have tried support distributed Elixir fine) or communicate over an SSH connection or teleport a connection or a variety of other ways. There is absolutely no reason to use Redis in an EVM system at all (maybe to integrate with others, but that is not what this is for).

(…)

Or use any of the other distribution protocols, can map the EVM over ssh or a proxy or teleport or a number of things, or write your own to go over tcp or udp or sctp or whatever, all of which then become just drop it in.

I feel like I’ve touched a nerve here. Apologies if I’ve said something technically naive, but let me explain better my point of view.

I understand where you’re coming from, but I don’t think that your approach is correct.

If a team or organization is running on Heroku (they’re a lot), and they’re looking at Elixir and Phoenix, and they’re trying to find their bearings in an unfamiliar ecosystem, then telling them what you wrote above won’t work. If the objective is to grow the Elixir userbase, then it will actually cause damage.

I can promise you that there is no way a sensible team is going to change their hosting and platform just because they want to run a different language. New teams on greenfield projects, maybe. If they can be bothered.

Again, my declared goal was to provide functionality in a form that is familiar to people moving into Elixir and Phoenix from other languages and frameworks. It’s about lowering the entry barrier, and I believe that it’s something that will helps the language and ecosystem gain traction.

We all know that experienced people who have used a dozen languages over more than 10 years can deploy Phoenix on Docker and setup a cluster of BEAM VMs and have Phoenix play ping-pong with PG2 and probably clean the dishes while hangover and walk the dog in the morning. Cool. What I think I’m after, however, are people who have used Rails or Django for years, are used to much simpler setups (where Redis is a staple for a number of reasons, and service or node autodiscovery is a scary beast), and are anxiously looking around for a more performant alternative. A lot of them have been moving to Node.js, but a lot have also been moving to Elixir and Phoenix.

Redis is definitely not something that will just work everywhere. Erlang can be deployed to micro containers (like to run on an RPi or so) that Redis will not work on as one of many examples. Plus that is extra stuff to set up, which makes setup more difficult (much more so with Redis).
(…)
Especially this, if an application is big enough for something like this Redis is even less useful as Redis is entirely useless when the system already has PostgreSQL, which entirely has all of Redis’s capabilities.
(…)
Redis is not a lightweight thing to install.
(…)
Which begs the question of 'Why Redis?" ^.^
(…)
/me is a bit miffed at Redis due to setup issues with it in the past…

This makes me think that you might not be very familiar with Redis, the same way I am clearly less familiar with Elixir, OTP and BEAM than you are.

Redis can handle thousands of clients and millions of calls per minute, and its latency is generally very low.

if an application is big enough for something like this Redis is even less useful as Redis is entirely useless when the system already has PostgreSQL, which entirely has all of Redis’s capabilities.

I’m sorry, but this is wrong and misinformed. A relational DB and Redis serve completely different requiremnts, and a complex and large-scale application can make very good use of both at the same time, for different things. Redis is also a good cache, and it can help shed some load from the DB.

I have the feeling that we are looking at this from completely different angles.

The design goals of the library are to help people working in an environment of PaaS and SaaS, where developers are used to accept the narrow constraints of some platforms and quickly spin up a managed Redis instance if needed.
On the other hand, I think that you’re mainly focused on the increased overhead of running Redis in a data center that you manage youself, which leads to the importance of doing as much as possible in the BEAM.

The things you suggested seem interesting though. Would you care to provide some examples?

dom · May 3, 2017, 3:41am

Erlang clustering doesn’t solve the persistence problem though. I don’t think you can avoid tying the library to something, whether it’s PostgreSQL or Redis. At best, it could be configurable.

(There’s Mnesia, but it’s very difficult to use on fully dynamic clusters, and is a non-starter if your containers don’t have access to a persistent filesystem)

aseigo · May 3, 2017, 8:59am

Aha! That makes so much sense now, thanks…

I got a bit fixated on the flags/gates and somehow blind-spotted Actors. Nice.

tompave · May 27, 2017, 8:09pm

Hello,

I’ve released new versions of this library and its GUI extension:

I’ve added a new PubSub adapter to support cache-busting notifications over distributed Erlang, to be used as an alternative to the Redis PubSub adapter.
The new adapter uses Phoenix.PubSub and its high level API, so it doesn’t really care about what’s used under the hood as transport. It works out of the box with pg2 when running in a cluster of nodes.

I’ve also refactored the internals to make them more modular. Specifically, now it would be really easy to add other persistence adapters, as all they need to do is to reimplement the API of the Redis module. Redis is still the default store, but I’m thinking of adding an Ecto adapter (CC @aseigo).

aseigo · May 30, 2017, 8:53am

Oh, cool! In that case, I will take a look again this week … huzzah!

tompave · May 31, 2017, 2:37pm

There is a branch there already.

tompave · June 18, 2017, 3:53pm

Hello,

I’ve added an Ecto persistence adapter and released version 0.9.0 of this package.

Now that that both Ecto persistence and Phoenix PubSub (PG2) have first class support, this library can be used in Phoenix applications without the extra Redix dependency (although Redis and Redix are awesome).

tompave · February 22, 2018, 1:03am

Hi there,

I’ve released version 1.0.0 of this package and updated the web-UI plugin to support the new features.

This is a significant release because it adds two new kinds of toggles that I’m particularly happy about:

percentage of time gates
percentage of actors gates

They basically do what it says on the tin, and more details can be found on github.

The percentage of actors gate is a great starting block to build an AB testing harness, because it allows to enable a feature for a fixed subset of users with deterministic, consistent and repeatable results. In a web application, if used together with an analytics system to track conversion success rates, it can be used to run experiments and observe the effects on user behaviour.