Why match request parameter using strings?

laiboonh · May 2, 2017, 1:18pm

Although in programming phoenix, the author tries to explain why not match on %{:id => id}. I still do not understand. Can someone explain this to me? I thought it will be better to match on atom since the key in the connection struct is :id

def show(conn, %{"id" => id}) do
    user = Repo.get(Rumbl.User, id)
    render conn, "show.html", user: user
end

aseigo · May 2, 2017, 1:46pm

The params map (which is the second parameter there) maps strings to values, all of this coming from the user (via the http request). The conn struct contains information related to the connection and uses atoms for its keys since it is defined and populated in the server (as opposed to reflecting data from the user).

Of course, params could convert the strings to atoms, but as atoms are constants which are not garbage collected (they are intended to be used as identifiers, not as a general purpose data type), it would be possible for a user to DOS a Phoenix application just by sending lots and lots of unique keys via http requests.

(It’s also almost certainly more performant to use strings due to how substring sharing is done with bitstrings in the BEAM …)

tl;dr -> Te params map is keyed by strings because it is data that comes from the user, and the conn struct is an internal datatype with a static shape and so happily uses atoms.

HTH.

net · May 2, 2017, 2:04pm

My naive benchmark says otherwise. It probably still is faster though, as the conversion to atoms is skipped when using strings.

gregvaughn · May 2, 2017, 2:23pm

Atoms, once generated in the BEAM are never garbage collected. If Phoenix blindly converts user supplied data into atoms, it is effectively a Denial of Service hole in every Phoenix app.

aseigo · May 2, 2017, 4:38pm

Assuming that the parsing into params doesn’t do deep copies of the string data, that’s exactly why I expected it would be cheaper.

However, I did a (also naive ) benchmark here of creating atoms out of strings, then doing an atom comparison vs comparing strings … and it turns out that the overhad of creation of atoms from strings is still faster than string comparisons when the strings being converted to atoms are short (1 to 6 character). As the length of the string grows, it (understably) gets progressively worse and even at pretty small sizes (15-20 characters) it crosses over and the string comparisons are considerably cheaper.

Note to get anything reasonably measurable on my laptop, I had to do runs in batches of 1 000 000 iterations! That said, if there are deep copies of strings going on, then the string comparison path probably never caches up.

Still, this isn’t the primary reason for not using atoms in this case, but it is perhaps an interesting aside. Or … maybe not. who knows

laiboonh · May 3, 2017, 4:22am

Please help to correct my understanding if i’m wrong.

From what i understand, the conn struct gets modified (the immutable sense) from going through some pre processing pipeline including the extraction of request parameters and “saving” it in conn struct. Does this mean that request parameters are “saved” in conn struct using string keys instead of atom keys? hence we are able to do a pattern match on string keys.

aseigo · May 3, 2017, 8:57am

Exactly; the request parameters are parsed out into the params argument and stored using string keys.

This prevents a possible security / DOS issue, and so requires that pattern matching on those params is done with strings.