Model data validation/sanitization

merlin · December 8, 2017, 10:28am

Hi all,

I’ve had a quick look at several authentication libraries and sample applications
on github, part of my learning Elixir process.

I’m a bit puzzled by the quasi absence of proper model data validation/sanitization.

Most user model changeset functions look like that:
model
|> cast(params, [:name, :email])
|> validate_required([:name, :email])
|> validate_format(:email, ~r/@/)
|> unique_constraint(:email)

Using that code you might end up with bad stuff in your DB.
A quick example if you inject something like:
email = <script>alert('Hello@world')</script>
it will be inserted into user table …

I know most of these injection techniques might be blocked in layers above.
Yet I should not be allowed to inject XSS code so easily in DB.

What techniques do you use to enforce proper data validation/sanitization ?
Please remember the email field above is just an example, there are many other XSS/SQLi vectors.

Is there some kind of WAF-plug for Cowboy ?

LostKobrakai · December 8, 2017, 10:43am

Phoenix does by default prevent XSS issues by escaping any content, which is not explicitly marked as safe. If you’re using raw/1 you can go around that limitation, but you’re explicitly opening that attack vector. For validating an email you can certainly be more elaborate than just checking for an @ in the string, but as most regexes for emails are either damn complex or with lots of edge-cases I feel it’s better to let people chose their own.

michalmuskala · December 8, 2017, 11:06am

So two attacks were mentioned: XSS and SQL injection - let’s discuss how Phoenix (and the default stack it uses) mitigate them.

The primary mitigation for XSS is escaping of everything on output. Whenever you do <%= foo %> in your template, the foo value will be actually wrapped in a call to html-escape it before printing. This means, it doesn’t really matter what is in the database - this security concern is handled on a different layer - one that is responsible for rendering HTML.

The primary mitigation for SQL injection is the use of enhanced/parametrized queries - user-provided values are never part of the SQL that is sent to the database, they are sent separately. The ecto query syntax allows as literals only values that are known at compile-time, this means that, as far as we know, it’s impossible to introduce SQL injection vevtors when using Ecto query syntax.
As far as I know, the only way of introducing SQL injection vectors in ecto applications are either by the use of “raw queries” (e.g. with Ecto.Adapters.SQL.query/3) or with some second-order issues when using eval-style facilities in the database itself.

StefanHoutzager · December 8, 2017, 4:11pm

Some ideas for validations / rules (I think there are some more links):

merlin · December 9, 2017, 9:08am

My points:

People should not use basic regex such as /@/ to validate any data.
I have found this pattern almost everywhere
Data might come from other sources, presentation layer might be React/Vue/your favorite CS framework with Phoenix as JSON API server etc.
I’ve managed to inject some UTF8 chars directly into DB (unicode smuggling), no escaping here. Tested with Postgresql, I’ll check other DBs when I have time to see if I can successfully perform some real sqli.

michalmuskala · December 9, 2017, 9:57am

The full regex to validate email is couple pages long - this is not really usable. What most places do instead is validate the email by simply sending a message to that address with a confirmation link. If the message is never received, the address was obviously bad. The regex doesn’t really matter - I can provide "test@example.com" which will pass all regex validations, yet it’s not a valid email address (since the example.com domain is reserved). The regex is more intended as a sanity check, to tell the user of your website if they made some mistake, rather than a security check.
All of those frameworks have HTML escaping. If you do something like node.innerHTML = in the front-end without any escaping, you’re asking for trouble anyway and no level of escaping on the server side will protect you.
It doesn’t really matter what data is inserted into the database, that data is never part of a query! All SQL injection attacks rely on the possibility that you can make some arbitrary string part of SQL that will be executed - that does not happen with ecto.

For example, when you do:

Repo.insert!(%Post{body: ";--\n DROP TABLE users"})

The executed SQL will look something like this:

INSERT INTO posts(body) VALUES ($1);

The string ";--\n DROP TABLE users" will be sent completely separately to the database, it’s never part of the query, so it can never inject anything into the query.
Similar thing happens in queries:

nasty_string = ";--\n SELECT * FROM users"
Repo.all(from p in Post, where: p.title == ^nasty_string)

The produced SQL will be something like:

SELECT * FROM posts WHERE posts.title = $1

And the ";--\n SELECT * FROM users" string will be sent completely separately, again never becoming part of the query.

merlin · December 9, 2017, 4:42pm

This is something else. Here I want to ensure proper data sanitization before anything gets inserted into DB. You may try to send a message to “<script…” but then I guess it’s too late …
I would never trust user data, esp. coming the front-end
I’ll have a look at Ecto source code. I’m not sure if all underlying DB support parametrized queries such as:
INSERT INTO posts(body) VALUES ($1);
Even then I believe it still possible to inject bad statements.
I’m not a security expert but I believe my pentesters

michalmuskala · December 9, 2017, 4:58pm

All database adapters that ship with ecto (postgres and mysql) support parametrized queries. Ecto does not use non-parametrized queries.

The case for email “sanitized” for database is again exactly the same - since we use parametrized queries, it’s not a concern (unless you use EXECUTE - SQL’s eval - in some database triggers on user input, this is a very pathologic situation, though)

merlin · December 9, 2017, 5:11pm

I may be old school but I can’t trust user data.

Ecto is really great but one cannot be too cautious. Too many imaginative hackers out there

benwilson512 · December 9, 2017, 5:23pm

Can you please point to where ecto is trusting user data?

merlin · December 9, 2017, 5:24pm

Please read the full thread

benwilson512 · December 9, 2017, 5:27pm

I have.

Micmus spent quite a bit of time pointing out how Ecto is using the industry standard best practice to make SQL injection not simply unlikely, but actually impossible. You replied with a remark that still indicated skepticism about this security measure, but did not actually articulate a reason or provide a rationale to doubt micmus’s point.

I was hoping you might offer one so that the discussion can move forward in an educational and productive way, instead of merely casting aspersions.

EDIT: If by full thread you mean the stack overflow one:

With bound parameters, the query tree is parsed, then, in PostgreSQL at least, the data is looked at in order to plan. The plan is executed. With prepared statements, the plan is saved so you can re-execute the same plan with different data over and over (this may or may not be what you want). But the point is that with bound parameters, a parameter cannot inject anything into the parse tree. So this class of SQL injection issue is properly taken care of.

The bits of concern that remain in that post do not compass functionality that Ecto actually does.

merlin · December 9, 2017, 6:12pm

“Instead of merely casting aspersions” … That’s a bit harsh no ?
Could we try to keep calm and respect one another ?

Read my first message, really. I’ve never said anything against Ecto (I really like it btw) I was merely stating the fact that it’s a very bad idea to trust user input with some sanitizing regexp such as /@/. That’s all.

And I’m not doubting anyone’s point. Read the SO comment too, it’s a good one (rationale included tm ).

EDIT: (added SO comment)

What this means is that if I run a query with a bound parameter like foo’; drop user postgres; – then it cannot directly implicate the top-level query tree and cause it to add another command to drop the postgres user. However, if this query calls another function directly or not, it is possible that somewhere down the line, a function will be vulnerable and the postgres user will be dropped. The bound parameters offered no protection to secondary queries. Those secondary queries need to make sure they use bound parameters too to the extent possible, and where not, will need to use appropriate quoting routines.

The rabbit hole goes deep

Nothing against Ecto here.
If you end up with sth like:
DROP TABLE users
somewhere inside your DB then it might be too late. There are many classes of sql injection, parametrized queries and/or prepared statements will not protect against all of them.

Again, nothing against Ecto here (or Postgresql which is my fav. db engine). Just saying that you should not trust nor validate user data with weak sanitization techniques (such as a /@/ regexp).

I think it would be better to stop this discussion before it degenerates for no reason.
If there’s a modo/admin onboard: could you please lock or delete this thread ?
Thank you

benwilson512 · December 9, 2017, 6:25pm

This was indeed too harsh, I apologize.

I don’t intend to derail the conversation, so I will simply bow out, there’s no need to lock the thread.

merlin · December 9, 2017, 6:35pm

Thank you Ben. I do apologize too, I understand now this thread might look like a troll, which was not my intention.

Have a nice week-end everyone, I’ll get back to my Elixir questions later

LostKobrakai · December 9, 2017, 11:37pm

These are kinda two different points. If you want to prevent potentially dangerous strings to be inserted into the database you need to run sanitization on each string input you get not just emails. As for sanitizing a string being a valid email having a regex check for the @ and simply confirming the email later is actually not a bad way to go. There are lots of potential email addresses, which would be at the same time be valid emails and include things like XSS or SQL injection attempts.

merlin · December 10, 2017, 8:47am

These are not two different points. We agree that you need to run sanitization on each and every string, that’s my point. As said above, email is just an example.

Actually this is very bad advice, let me show you with an example.

Create a new phoenix project and setup your favorite auth library. Include user confirmation by email so that you get to see why the /@/ is bad.

In your browser, submit the user registration form with the following email :
spam1@gmail.com ,spam2@gmail.com ,spam3@gmail.com ,spam4@gmail.com ,spam5@gmail.com ,spam6@gmail.com ,spam7@gmail.com ,spam8@gmail.com ,spam9@gmail.com ,spam10@gmail.com ,spam11@gmail.com ,spam12@gmail.com ,spam13@gmail.com ,spam14@gmail.com

Guess what ? You’ve just built yourself a nice little spambot
Or destroyed your sender IP reputation …

Of course this is just a basic example. You could do much worse. It depends on your mailer.
Google for email header injection if you want to learn more.

LostKobrakai · December 10, 2017, 10:37am

They’re still different things. You’re talking about securing you application against various (potential) attack vectors. The regex you discredited in your entry post is not there to do that. It’s trying to do the most basic validation for an email, because more advanced attempts to validate email addresses are either super complex or have lot’s of false positives in terms of what a valid email address can be.

For example "@"@example.com is a syntactically valid email. Same with "<script>alert(1)</script>"@example.com. So true email validation must by definition be a different step to sanitizing user input against attacking attempts.

If you can get away with more strict rules about email formatting, which does already exclude the subset of emails including e.g. XSS attempts you can do that. But I’d not vote for using those in a tutorial or documentation example, which most are only meant to showcase how things work.

merlin · December 10, 2017, 1:10pm

Re-read my initial post please. When I want to disable potential attack vectors I use a properly-configured WAF too but that’s not the point here.

Tell me: would you rather have a few false positive or be hacked and get your servers transformed into, say, spam relays ?
I believe many auth. frameworks/libs in Java/Go/PHP already got it correctly a few years ago. Nothing really hard there, even for PCI DSS env.

What is really important here is that such a regexp is found in some Elixir authentication libraries/applications.
Not tutorials, not basic examples but authentication libraries… So people should be aware of that.

Anyway, this will be my last post in this thread. I’m not here to try to convince that /@/ is a very bad email validator scheme, I guess most of you have already figured that out.

Have a nice we