Introducing Backtrex: solve discrete problems by brute force

jmitchell · January 13, 2017, 4:18am

I mentioned the project in “Logging: a silent performance killer”, but I’d like to formally present it. Backtrex is a behaviour that makes it easy to brute force any discrete problem.

The project currently includes a sample Sudoku solver, though it will probably get pulled out into a separate project later.

These callbacks are all Backtrex needs to get to work.

defmodule SudokuSolver do
  use Backtrex

  def unknowns(puzzle) do
    SudokuPuzzle.empty_cells(puzzle)
  end

  def values(_puzzle, _cell), do: 1..9

  def assign(puzzle, cell, value) do
    SudokuPuzzle.put_cell(puzzle, cell, value)
  end

  def valid?(puzzle) do
    SudokuPuzzle.valid?(puzzle)
  end
end

Since this is my first published package I’d sincerely appreciate feedback on its design. Cheers.

jmitchell · January 14, 2017, 11:59pm

I plan to make Backtrex scale across multiple processes and even nodes. This issue talks more about that.

It mentions the need to research existing OTP and Elixir tools that would simplify the implementation. I’ve implemented a couple GenServers and Supervisors from tutorials, but haven’t done much with them or Agents, Tasks, and GenStage yet. Some combination of them or other tools would probably help a lot here.

Any advice on where to focus my search?

OvermindDL1 · January 15, 2017, 8:32pm

I’m not really sure what backtrex does yet (I’ve not looked at the docs yet, I’m on my phone ^.^), but what kind of scaling do you need? Is it serialized? Trivially distributable? Do you need one central information store? Is there no shared information?

jmitchell · January 15, 2017, 9:18pm

It implements a generic backtracking algorithm to solve problems amenable to brute force. After implementing a few simple callbacks specific to your problem domain, you get a Backtrex.solve/1 function in return that will find a solution (eventually a stream of all solutions).

Backtracking is a step up from naive brute force. For example, the most naive way to solve a Sudoku puzzle is taking the N blank cells and enumerating all possible 9^N ways to fill them with number 1-9. A significant ratio of those solutions aren’t worth exploring, though, because as soon as a fraction of those cells are in an invalid state you know no possible values for the remaining blank cells could possibly yield a solution. Backtracking actively prunes the search space by checking whether a provisional set of assignments are valid.

Concurrency is a natural fit for a problem like this. Rather than having one process do a depth first search of the space on its own, you could, for instance, assign half the possible values for the first blank cell to one process and the other half to a second process.

As in needing everything to happen in a sequential order? No, not for this API, but I intend to offer a non-concurrent API for cases where users want solutions to arrive in a deterministic order.

Yes, every problem can be broken in many subproblems, each of which can be delegated to another process/node. And in turn, those subproblems may be broken in sub-subproblems, which lends well to a tree of workers and supervisors.

It would be nice to support work stealing when there are unoccupied nodes and worker processes, but that requires more coordination traffic in this work tree.

Solutions discovered by individual workers must be reported back up the tree to the root so they can be appended to a stream which is fed on demand to the Backtrex client.

Not in the sense of a database, no. That would be the Backtrex user’s job. If you mean processes responsible for coordinating other processes, yes, especially if the problem is repeatedly divided into the large tree structure described above. And again, more will be required of these coordinating processes if work stealing is supported (which I’d like).

At this stage, nope, not any more than what’s introduced by the coordination problem. However, a potential optimization may benefit from a way to broadcast information from one worker to all workers, probably by reporting up the tree to the root and then back down.

michalmuskala · January 16, 2017, 2:06am

This does look interesting. How does it compare to prolog? From what I saw, this is in some ways similar to working with clpfd (“Constraint Logic Programming over Finite Domains”).

There’s an erlang implementation of prolog, https://github.com/rvirding/erlog, but it does not come with a clpfd library. I think this would also be an interesting area to explore.

jmitchell · January 16, 2017, 2:52am

AFAIK Backtrex could be used as the basis for a Prolog engine or other unification systems. I’ve made toy projects in Prolog to solve constraint satisfaction problems like Sudoku, but I can’t claim to understand all its semantics. For example, it has impure features to prematurely stop searching part of the tree (known as a “cut”), but I don’t have any experience using it. Backtrex’s API would allow somebody to prematurely prune the search; they could simply stop returning as many unknowns and values. It’s not a recommended use case, but could be worth exploring, and I don’t know if this maps well to the “cut” semantics.

A while back I studied *kanren languages and implemented microKanren in Idris (may not compile anymore since Idris has changed a bit). Kanrens are small logic programming languages, most focusing on purity and determinism. The original authors prefer the term relational programming to emphasize that unification isn’t biased in a particular direction. For instance, “the head of a list” is a relation that could produce the single element at the front of a list or an infinite stream of lists where a given element is at the front (or even simply the set of all non-empty lists along with the first element of each). They’re fascinating languages. Anyone interested in learning more should work through The Reasoned Schemer and read the first few chapters of William E. Byrd’s PhD dissertation where the Core miniKanren language introduced and implemented in Scheme.

I’m not familiar with it. I’ll definitely check it out. Could inspire some fun demo applications and test the limits of the current API.

I recently heard about this, but haven’t had a chance to try it out yet. Looking at the README it’s not clear whether there’s been any focus on parallelism and distributed computing.

Part of why I’m excited about Backtrex is the massive scaling potential provided almost out of the box by the BEAM, and the opportunity to learn how to do that well.

benwilson512 · January 16, 2017, 3:46am

This is a very interesting library!

As some feedback on the code style, I noticed that doing use Backtrex holds an incredible amount (nearly all?) of its logic inside macros. It’s fine if you want to inject functions into the module, but I would recommend trying to extract the logic inside those functions into regular functions that are then called from the injected one.

Ecto for example will inject functions, but those quickly hand off to regular ones: https://github.com/elixir-ecto/ecto/blob/v2.1.2/lib/ecto/repo.ex#L140

Right now if there’s some bug inside the logic you have say here ( to pick a random line): https://github.com/jmitchell/backtrex/blob/master/lib/backtrex.ex#L136 instead of getting a nice stacktrace to that point it’s just gonna have whatever line your user calls use Backtrex from.

As a minor note about the debugging: I would consider supporting like a config :backtrex, debug: true type flag in order to turn that stuff on. This is what I do for ExAws if you want to see super verbose logs to debug the ultimate HTTP requests that are sent. However I still recognize that in development many people run debug logs and that they may not want the ones from my app.

That does touch on the broader question of whether there should (can?) be per-application log levels, but it’s a good way to do it for now.

jmitchell · January 16, 2017, 5:28am

Thank you

Excellent suggestion–it helps fill a mental gap I was struggling with. This is the first behaviour I’ve written, and honestly it took a while to figure out how to implement one with generic functions that call the callbacks. None of the explanations I found about implementing behaviours discuss writing generic functions, including the Typespecs and behaviours guide, the Behaviours documentation, and one of the introductory books I bought. Most explain how callbacks can form a polymorphic interface, but they don’t discuss best practices for coding against that interface. This concept comes across better with protocols, but my understanding is protocols requiring many functions are discouraged. It’s unfortunate because polymorphic interfaces are the bread and butter of building clean abstractions.

Eventually I consulted some existing code that I knew had to be doing the same thing just to get a high-level idea of how it’s wired together. Although it was enough to get something working. I wasn’t super confident I did it all the best way. Thanks for pointing this out!

Indeed! Part of the reason I use logging so much is to overcome this problem. Your suggestion helps me realize I could get by with a lot less logging, and also ensure Backtrex users get more useful stacktraces.

I’m planning on submitting a proposal to the elixir-lang-core list about supporting package- and/or module-level log controls, esp for :compile_time_purge_level. It would be as a follow up to this one where I suggested making all logging macros lazy to improve performance. Finer-grain logging control seems like a better option overall and would offer application developers/maintainers more safety. From another angle, I better understand now how informed package developers can make unobtrusive APIs.

UPDATE: This post and another thread I started were somewhat critical of Elixir resources and limitations. To counterbalance that I want to mention I’ve been pleasantly surprised by Elixir’s simplicity and expressiveness, as well as how quickly I can find answers to most of my questions. This community has been incredibly helpful too.

I’m going through some growing pains getting used to the way things are done. Sometimes my confusion spills over into minor frustrations, and my hope is periodically voicing them will lead to improvements (it’s encouraging that this has already happened). On the other hand, sometimes it’s simply a spell of PEBKAC, and I don’t know enough yet to tell the difference. I’ll continue trying to avoid that, but if it ever seems like I’ve fallen into a trap I’d be grateful for anyone pointing it out.

If there’s interest in discussing this general theme more, esp as it pertains to the community, let’s break it out into another thread and keep this one primarily about Backtrex.

jmitchell · January 16, 2017, 6:55pm

I’m giving this a try now. One complication is many of these functions need access to the callback implemented in the behaviour module.

An approach I’ve worked out is passing __MODULE__ from the __using__ macro to the locally defined functions, and then using apply/3 to call the callbacks like this:

defmacro __using__(_) do
  quote location: :keep do
    @behaviour Backtrex

    def solve(problem) do
      Backtrex.solve(problem, __MODULE__)
    end
  end
end

def solve(problem, module) do
  unknowns = apply(module, :unknowns, [problem])
  # ...
end

Is that the idiomatic way to do this?

UPDATE: I went ahead and made the full change. It seems to work well, but I’m open to more feedback on it.

OvermindDL1 · January 17, 2017, 3:11pm

That is a way that I’d do it. It does incur a slight performance penalty but it is fairly minor unless called a lot, like in a tight loop, but if it is called only once per solve then it is fine and do ignore that cost.

jmitchell · January 17, 2017, 6:33pm

Yup, no sweat compared to all the potential bottlenecks in the backtracking algorithm and callback implementations. I’m working on generating flexible profiler reports now so I can identify bottlenecks and compare multiple implementations.