How best to model data for a CrossFit app?

jdumont · February 2, 2019, 6:08pm

Evening all,

I’ve got another database modelling problem to put to you. I’ve got various solutions that use lots of join tables, or JSON embeds, etc; but what I want to know is how you’d solve the issue.

Playing around with recording CrossFit workouts, and the “constantly varied” part of their motto makes it a surprisingly tricky thing to model. If you haven’t got an understanding of CrossFit, this explanation might seem arbitrary and complex, but stick with me. Basically the schema needs to have a lot of flexibility in combining different workout structures together, but within those structures there are rigid rules.

A Workout can be many different things. There’s various types of workouts such as AMRAPs (As Many Rounds As Possible) where you have a list of movements and repeat them until the clock runs out, or For Time (and Rounds For Time) workouts where there’s a set amount of work to be done and you complete it as fast as possible. Then there are things like EMOMs (Every Minute On the Minute) where you perform a movement or group of movements every 60 seconds. Traditional strength workouts (sets x reps) also come into play.

Each workout type has their own logic and either need to be modelled separately or if combined into one single schema with such flexibility that there’s effectively no schema.

Workouts can be one or more of these simultaneously. For example, you might do a “Buy In” of for example 100 push ups and then move onto an AMRAP for the remaining time in this section (lets say 5 minutes). Then after a 2 minute rest you start a For Time workout where you complete it as fast as possible. In this case your score would be a number of reps (or rounds & reps) for part A (the AMRAP) and a time for part B (the For Time workout).

Sticking with just relational methods I’ve found I very quickly end up with lots of join tables and polymorphism, and it becomes a mess. A working mess, but a mess nonetheless that I suspect would break down very quickly if I needed to add a new type of workout.

On the other end of the spectrum I’ve also tried just dumping an array of structs (one for each workout type) into a JSON column in the Workout table. Querying in this case gets handled by moving the queryable elements (movements, equipment, etc) into regular relationships that sit along side the JSON blob. In effect the JSON is what’s required by humans and the relationships by the DB.

How would you solve an issue like this?

sanswork · February 2, 2019, 7:34pm

I started developing an app for tactical barbell. I just used flags for things like AMRAP.

Activities
name:string
description:string

Workouts
name:string
has many activities through workout_activities

WorkoutActivities
activity_id
workout_id
position:integer
reps:integer
distance:integer
time:integer
time_direction:string (could do bool but I like to be able to know from a glance at the db)
amrep:bool

Obviously there are a few types/modifiers I am not thinking of right now since I don’t have the code with me. But that is 2 tables and a join that should allow you to model any workout possible. Rests are just another workout activity for time. If you want to do tracking you’ll also need a workout results table too.

jdumont · February 2, 2019, 7:57pm

I’ve got what I’ve been calling a Composite.

A Composite has only relations and has a movement, a primary metric, and potentially a secondary and tertiary metric. A Metric has a quantity field and a Unit.

(I’ll add an example when I get back to the computer).

The idea is that these Composites represent a complete unit of work. For example, 10 x Deadlifts at 90kg. (Primary and secondary metric used). Thy gives me the flexibility to model pretty much any movement, share them all between workouts and workout results.

A workout is just be a series of Composites, but it broke down when they needed to be nested inside another struct called Minutes for EMOMs.

It’s the different workout structures (or subtypes) that I’m struggling to reconcile because I need to handle them all together (listing them, relating them to owners, relating results to athletes, etc) but need them all to be subtly different.

sanswork · February 2, 2019, 9:00pm

Have you considered not nesting them in another structure but adding a self referential parent_id and has_many :children and then when it’s an enom instead of using the composite using the children of it?

jdumont · February 2, 2019, 9:07pm

You’ll have to forgive me but I don’t follow. Do you have a link to anything that would explain this approach in more detail?

sanswork · February 2, 2019, 9:11pm

I’m going to assume the structure of your composite is similar to the below based on your description but most of it doesn’t really matter.

Composite
parent_id references Composite table
activity
type
primary
secondary
has_many :children

When you load your workout’s composites preload children. If type == EMOM use the primary attribute for the number of cycles and the children for the activities in a cycle. If its not EMOM use the primary and secondary attributes as normal. You get the same result as your additional struct plan but you reuse the same struct.

jdumont · February 2, 2019, 9:20pm

I think I didn’t explain where my Composite sits in the overall structure, but you’re absolutely right, this approach could work perfectly. The same struct could represent all types of workout recursively.

I had no idea you could do something like this! Very powerful!

Thank you for your help. I’ll be sure to add a gist of my end solution to help anyone else that ends up here.

dimitarvp · February 3, 2019, 12:30pm

I’d just have a single complex list+map column (User.workouts) and be done with it. Saves you from slow JOIN query hell in one fell swoop. Crafting search queries might be much harder though; they are obviously different when you search on a jsonb column. I would immediately agree if you said that’s a deal breaker.

But if you absolutely want the RDBMS aspect then yes, a recursive table looks like the least confusing and most economical approach.

jdumont · February 4, 2019, 4:02pm

Workouts are going to be a mix of a library of workouts that a user pick from or add to. They can then post scores/times for workouts.

As much as I’d love to go with the simplest possible option, similar to the one you suggest, I think this is a problem that requires a bit of necessary complexity to meet the specs with a user friendly solution.

Just to update on the recursive table approach: Works brilliantly for the backend/database side of things. Certainly the most straightforward and easy to understand solution yet. It does make the forms on the frontend exceedingly fiddly though with lots of recursion and difficulty targeting the nested workouts for updates like new children, siblings and adding those composites I mentioned to a specific workout in the tree.

Again, I’ve got a solution working but it’s the result of lots of hacking and is messy as hell. I’m using a combination of Phoenix forms (with Changesets), a GenServer for holding “intermediate” workouts (ones that haven’t been saved to the DB yet) and Drab for live updating the page. I’ll try and tidy it up, but I may end up resigning myself to a JS frontend that just sends through the params.

dimitarvp · February 4, 2019, 4:51pm

If things are that complex then you would be much better off to limit the complexity in only one language (and only on the frontend or the backend). That’s what I would do.

jdumont · February 4, 2019, 7:12pm

The “backend” is pretty straightforward, although what you consider the backend is pretty open to debate as I’m using Drab, supported this GenServer acting as a pseudo-Repo. Let’s say that the data structure (Elixir structs and DB) is pretty straightforward.

Building any one of the workouts is equally straightforward. It’s when trying to build and edit a whole nest/tree of association workouts that it gets hairy.

This circles back to the post I made about augmenting server-side forms. In a bid to avoid client-side rendering, I’ve opted for Drab to add some of the data (the more complex structures) and regular Phoenix forms to accept most of the regular inputs (text, numbers, etc).

It’s complex, and frankly a bit of a mess, but it’s well isolated (one controller, one commander, a few templates and a GenServer module that’s used for nothing else). Worst comes to the worst it will be easy to rip out wholesale and replace, even if changes require a bit of thinking.

ETA: Phoenix LiveView might allow for a cleaner implementation, so perhaps I’ll finish it, and then revisit it when that’s released

jdumont · February 6, 2019, 5:06pm

Updated question:

I’ve identified the main issue with my current solution is workout out which Workout in the nest of Workouts I want to target when amending a field outside the scope of the Phoenix form, such as adding a movement or a sub-Workout (child).

For example, say I have this structure (many fields omitted for brevity):

%Workout{
  children: [
    %Workout{
      children:[],
      composites: [], <--- #target for adding a composite action
      type: "buy_in"
    },
    %Workout{
      children:[], <--- #target for adding another child workout
      composites: [],
      type: "emom" 
    },
  ],
  composites: [],
  name: "Example",
  type: "hybrid"
}

When converted to an Ecto Changeset and given to Phoenix form_for, the name of the :type field that has the value “emom” would be workout[children][1][type]. That’s how nested structures are identified.

If, for example, I wanted to add a composite to the first child workout — workout[children][0][composites] — or another child to the second child workout — workout[children][1][children][0]<-this is new workout (indicated above) how would I target those nested workouts in the Changeset for the put_assoc?

I would obviously need to pass this target through the Drab handler from a click action (that has zero args in the case of adding another child, or an id in the case of adding a composite), and then target the nested (nested) association with a put_assoc.

This is something that is obviously possible, as that’s how Phoenix forms work; but after a few hours poring through the codebase and pulling apart how phoenix_html and phoenix_ecto work I can’t quite wrap my head round it.

Any help on this would be greatly appreciated!

For reference:

At present I’m using a recursive template that loops on the inputs_for @form, :children field, re-rendering itself. I’ve taken to creating each Workout with a UUID, temporarily storing them in a GenServer. When I update a nested workout, I query the GenServer on the UUID which I’ve made accessible from the HTML (for a Drab click event) and then recursively update the parent of that workout with the new child until I get to the top.

It works, but as you can imagine, its open a lot of bugs as I’m splitting the state of the form between GenServer structs that I have to manually update and the Changeset that works automatically. There’s also a lot of cleanup required afterwards, and combined with other logic (such as whether certain fields are shown, or more children available based on the type given)…it’s quickly becoming unmanageable.

That’s why I’m pursuing a solution that keeps all the state of the form inside the Changeset. It would be cleaner, easier to understand and probably more idiomatic.

Sidenote: whilst just reading the phoenix_* codebases hasn’t helped me to a solution this time, I’ll certainly be taking a look at it in greater detail. There’s a lot to learn just by looking at how these awesome libraries are actually put together!

nuclearnic · February 6, 2019, 10:23pm

Someone at my box once asked me to build an app for programming workouts that tracked all of the domains that crossfit identifies… Realized pretty quickly how hard the modeling would be so I bailed

Interesting thread though!

jdumont · February 6, 2019, 11:11pm

Yeah…everytime I ~~thought~~ think I’ve got my head round it and know how I’ll tackle a problem, CrossFit throws another “Oh, but have you allowed for X?” at me. It’s a weird mix of lots of very rigid structure —great for computers — that’s applied in infinitely flexible and creative ways — PITA for computers!

When I started out I saw a gap in market, because none of the established apps actually do much with the data you give them. The worst just treat it as plain text. The best try to do more, but still require you to identify when you’ve done a new 1RM max etc.

Now, I’ll just be happy to get something that works! It’d be easy to build an app for this use case that is awful to use (lots of complicated forms) but I’m striving for something much easier than that.

blatyo · February 7, 2019, 4:19am

I do crossfit as well and have also thought about this. I’d probably just using a text field for name, a text field for the workout, and a list of tags. It’s probably possible to model everything, but there are so many types of workout, it seems like diminishing returns. It’d be a lot harder for someone to create a workout with everything modeled, but it would display slightly better.

jdumont · February 7, 2019, 8:37am

The big incentive I saw in modelling everything fully was to be able to do something useful with the data, like provide insight to progress over time, relative strengths and weaknesses, etc. You’re giving the app a huge amount of data that it can use, but unless you’re modelling it fully you might as well be using pen and paper - which would be vastly easier too!

The UI is a big concern, and something I’m spending a lot of time making intuitive. I actually started with a set of routes per workout type, building the workout over a number of pages. It was great for being simple, each page had a clear purpose for the user, but it could only cater to the very straight forward WODs and was a huge amount of very repetitive code for something very unimpressive.

Remodelling the data structure as above has allowed me to make the whole thing much more flexible, and certainly an improvement on previous iterations, even if not as simple to use yet. That’s what comes next!

jdumont · March 6, 2019, 2:38pm

Back again! I’ve actually got a version of the app running now - hwpo.app, and although I’ve got the workout creation form working as I wanted above; there’s a lot of room for improvement.

At the moment, whenever one of the nested elements is updated I’m grabbing the entire tree of workouts as a plain old map, and then using the access mentioned above — workout[children][0][composites] — in combination with put_in. I then convert the whole map to a changeset, triggering any of the nested changesets along the way.

This is fine, and works, but it requires lots of hacking around edge cases. Sometimes the data I want to access is inside a Changeset (needing the use the data key) and sometimes it’s in a straight-up struct.

I could simplify the entire process hugely if I could target the nested changesets (i.e. use a put_change on something other than the root level changeset). That would reduce the flip-flopping between maps and changesets in the code and keep things much more straightforward.

I know that when you’re dealing with associations using put_assoc you always manage the entire association at once. To me, that sounds like I’ll need to recursively work my way back up the tree of workouts/movements even if I could update a deeply nested field.

Am I trying to do something thats not possible? Is my curretn solution of modifying maps and converting them to changesets likely to remain the best approach?