Filtering of user input

I’ve been looking into the input sanitation routines.

I saw this code in an elixir chat, but I think its incomplete:

 * @return {string} str - the santised text
 */
function sanitise(str) {
  const map = {
      '&': '&',
      '<': '&lt;',
      '>': '&gt;',
      '"': '&quot;',
      "'": '&#x27;',
      "/": '&#x2F;',
  };
  const reg = /[&<>"'/]/ig;
  return str.replace(reg, (match)=>(map[match]));
}

The only thing about this is that not all punctuation is encoded, and the ascii control characters are not escaped.

But I should go back a few steps and really open a discussion of why we have to do this…

  1. html forms was badly written. Its output is in a format that can accept an escape character that can cause a response to split, which is the attack vector used to bypass web application firewalls.

  2. No ONE has addressed this.

  3. We should come up with our own form->submit methods because of the lack of proper html form module.

Phoenix.HTML handle data sanitisation OotB, so yeah, somebody thought about that and already solved that problem.

Then why did the chat program guy added the sanitize routine in the elixir chat program here:

The problem I see its done in Java, so the chances are, the escapement already happened in some attacks.

But the main issue is that the html form used the xml language to communicate. We should have one that does not use xml as its means of communication. And be able to ignore all external xml queries.

Because they weren’t using Phoenix.HTML to present data but JS. Still this can be done better via innerText and manually creating DOM elements instead of building string containing HTML and innerHTML.

1 Like

This little piece I found in the source code. Its an incomplete list, as the ` and space and ascii control characters i don’t see getting sanitized, much less everyone’s favorite modifiers \ and / :

  def to_iodata(?>), do: "&gt;"
  def to_iodata(?&), do: "&amp;"
  def to_iodata(?"), do: "&quot;"
 def to_iodata(?'), do: "&#39;"

You do not need to escape all characters to make string safe for HTML. \, / or : will not make problems in most cases. And escaping them in inputs would be more problematic than escaping them when displaying, as in a lot of cases these characters are completely safe.

2 Likes

I’m just wondering why use ascii in the first place of any kind. Like some sort of html forms=>Binary transport. Store and output => binary. That away user input is standardized regardless of the input media, (text, binary stream input, ect. ). The storage efficiency would be impoved I imagine…

ASCII is a form of binary code for letters. I do not get what you mean by output => binary.

binary like 1’s and 0’s and using something like this function.

var ABC = {
  toAscii: function(bin) {
    return bin.replace(/\s*[01]{8}\s*/g, function(bin) {
      return String.fromCharCode(parseInt(bin, 2))
    })
  },
  toBinary: function(str, spaceSeparatedOctets) {
    return str.replace(/[\s\S]/g, function(str) {
      str = ABC.zeroPad(str.charCodeAt().toString(2));
      return !1 == spaceSeparatedOctets ? str : str + " "
    })
  },
  zeroPad: function(num) {
    return "00000000".slice(String(num).length) + num
  }
};

then use a binary bridge packet format instead of xml query string.

I want a different method than xml arrays. Because I want to build an application server that ignores all xml queries. So I want something other than xml communications.

So No POST, GET, PUT…methods.

Can you explain what your goal is? It seems like you’re somewhat upset (also based on your post in other topics) with how HTTP works especially when using xml to send data.

HTTP is in no way bound to xml in any form. Its body params can be encoded in whatever form you want. Try json or protobuf if you don’t like xml. Query params are urlencoded, so also no xml there, unless you explicitly send xml data.

If you don’t like HTTP as well you can try other protocols, depending on your use-case.

5 Likes

Are you aware that this is still ASCII? And this will make your data at least 8x larger? That would be enormous waste of the bandwidth just to send ASCII encoded 0 and 1. That is absurd.

And as it was said before, if you do not like the HTTP then you can always use any other protocol or even create your own if you want. With Erlang/Elixir pattern matching it would be pretty easy.

I still do not get what you want. Could you please show us an example of what you want to get and send? Because with each of your posts it is less and less clear what you expect.

1 Like

Binary is faster, if the transport sends binary data. It’s a waste of space if you send ascii encoded binary data, because each 0 and 1 will still take up 8 bit of space, instead of one.

Also while it’s your rightful opinion that you find xml to be a problematic datatype it’s still not clear what you’re asking about. Nobody here can make xml go away, but that’s what it sounds like you’re asking for. This is especially strange as you’ve not mentioned one place where xml with elixir is problematic. Most people in phoenix seem to use json/protobuf and phoenix itself doesn’t even greatly enforce any of those.

3 Likes

Its the transport method of using XML as data i/o and use of the URL line.

Reading about URL encoded attacks from this article:
https://www.cgisecurity.com/lib/URLEmbeddedAttacks.html

"
A popular method of manipulating a web application for malicious ends is to extend the functionality of the URL in an HTTP or HTTPS request beyond that originally envisaged by the developer. Using a mix of escaped-encoding and Unicode character representation, it is often possible for an attacker to craft requests that may be interpreted by either the server or client environments as a valid application request. Even though certain characters do not need to be escape-encoded, any 8-bit code (i.e., decimal 0-255 or hexadecimal 00-FF) may be encoded. ASCII control characters such as the NULL character (decimal code 0) can be escape-encoded, as can all HTML entities and any restricted characters used by the operating system or database. In some cases, the encoding of URL information may be designed to purposefully disguise the nature of the attack. "

I like this one:

riginal database query in the example file - 'login.asp': SQLQuery = "SELECT preferences FROM logintable WHERE userid='' & Request.QueryString('userid') & '' AND password='' & Request.QueryString('password') & '';"

URL-encoded attack: http://target/login.asp?userid=bob%27%3b%20update%20logintable%20set%20passwd
%3d%270wn3d%27%3b--%00

Executed database query: SELECT preferences FROM logintable WHERE userid='bob'; update logintable set password='0wn3d';

 

This is a relatively standard SQL injection attack, even though it’s executed via an url. The issue here is first and foremost that user input (url params are just that) are interpolated into the sql query. This is a quite well known antipattern and in no way specific to anything xml.

You need to always sanitize user input and you never want to interpolate parameters into an sql query. Parameters must be sent separately to be secure.

In ecto sql injection attacks are quite prohibited, as the query api does actively prevent interpolated parameters. In ecto parameters are always sent to the db separately from the query. Injection attacks are not possible at the database level, because of how the db treats parameters as well. Even MyApp.Repo.query/3 accepts parameters as extra parameter even though it cannot control if you interpolated params into the query it receives. So many parts of ecto aren’t even vulnerable to such an attack, and the places where it cannot be enforced at least recommend proper secure practises.

Just to make the point super clear. The attack you showed is in no way related to http, urls or xml. As soon as any application accepts user input (and without that I’d say it’s not an application, but static) it’s up to the code handling the input to validate/sanitise the input. If you don’t do that properly you might be attackable.

Now while I’ve tried to explain a lot, I’m still not sure how this relates to your initial post and elixir/phoenix. Maybe you can elaborate on what user input you expect and what you want to do with it.

4 Likes

html forms was badly written. Its output is in a format that can accept an escape character

Ultimately it doesn’t matter as there is always is the possibility that the client (browser) has been compromised - i.e. the server can only rely on it’s own sanitation and validation.

The problem I see its done in Java

JavaScript is to Java as hamburger is to ham; both are delicious, but the don’t have much in common except a name (Secrets of the JavaScript Ninja 1e, p.32)

But the main issue is that the html form used the xml language to communicate

URL encoding isn’t XML:

GET /?say=Hi&to=Mom HTTP/2.0
Host: example.com

application/x-www-form-urlencoded isn’t XML:

POST / HTTP/2.0
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 13

say=Hi&to=Mom

multipart/form-data isn’t XML

Binary is faster.

Hence: Why is HTTP/2 binary?

Its the transport method of using XML as data i/o

JSON for the most part has replaced XML as the data interchange format.

Also:

4 Likes

xml origins (xhtml)

Technically HTML and XML evolved separately from SGML. XHTML was released in 2000 by which point HTML was already at HTML 4.

A form data set is a sequence of control-name/current-value pairs constructed from successful controls

The only similarity I can see is that XML attributes use = between an attribute name and value. But that’s an established norm in numerous contexts.

I would like to find a better solution than a GET,POST, PUT, HEAD , XHttp XHR requests.

Then you are no longer talking about the Web. Phoenix uses Cowboy which is an HTTP server - which means that it is designed to operate within the constraints as laid out by the HTTP protocol:

Furthermore in 2000 Roy Fielding outlined in his thesis an architectural style which he called Representational State Transfer (REST) which in essence describes how the web works and why the web works.

There are four concepts

  1. Resources
  2. Their names (URIs)
  3. Their representations
  4. The links between them

and four properties

  1. Addressability
  2. Statelessness
  3. Connectedness
  4. A uniform interface

The HTTP request methods GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE and PATCH are part of that uniform interface.

You are free to do your own thing - but whatever it is, it isn’t web related (and Phoenix most definitely is).

3 Likes

I think sockets would be better.

this piece looks like it could be modified for form usage, other than chat:

  class myWebsocketHandler {
    setupSocket() {
      this.socket = new WebSocket("ws://localhost:4000/ws/chat")

      this.socket.addEventListener("message", (event) => {
        const pTag = document.createElement("p")
        pTag.innerHTML = event.data

        document.getElementById("main").append(pTag)
      })

      this.socket.addEventListener("close", () => {
        this.setupSocket()
      })
    }

    submit(event) {
      event.preventDefault()
      const input = document.getElementById("message")
      const message = input.value
      input.value = ""

      this.socket.send(
        JSON.stringify({
          data: {message: message},
        })
      )
    }
  }

  const websocketClass = new myWebsocketHandler()
  websocketClass.setupSocket()
  
  document.getElementById("button")
    .addEventListener("click", (event) => websocketClass.submit(event))
})()

The query string format (http://www.example.com?term1=“var1”&term2=“var2”) has fundamental flaws in it, and the web servers had this flaw since the 90’s.

Of course there is a half a dozen other things. The one everyone has been focused on is script embedding (php, asp/cgi & javascript) in pictures, that don’t get sanitized by normal methods, and execute once stored on a social network site.
All picture formats has been effected. JPG has a patch, but not widely implemented yet…
There are other flaws that I can’t exactly remember how they are duplicated, but I hear they are still around…

I think this post’s title should be changed to “I want to rant on HTTP and XML and current network protocols and serialization formats in general”. :003:

…So, what are you trying to do with Elixir, again?