Where to handle switch on user-agent with Phoenix

I’m new to Phoenix and web development, so if I missed something in the docs, please point me there and I’ll study.

I’m helping someone with a Phoenix project where they’d like to differentiate between a web browser loading a page and curl/wget loading it. The idea is that when loaded by a browser, the user gets HTML/etc. with possibly some options for interacting more with the site. When loaded by curl or wget, the user would get text with ANSI escape codes so that it would look nice when dumped to a terminal window.

It seems like we want to check the user-agent string. Is it a best practice to put that check in the view, controller, or router? At first this seemed like a view concern, but it seems really convenient to put it in the router. Like maybe in an :accepts-like plug, but checking the user-agent. Plus I could cut out pipeline steps for web browsers that aren’t needed for curl/wget commandline clients.

Thanks for any pointers.

1 Like

What you’re describing suggests that Accept header would be a better strategy. Browser detection is very brittle.

A browser will accept text/html. For command line usage, specify text/x-my-special-format or even just text/plain

5 Likes

Content Negotiation

Have a look at Content Negotiation & Phoenix.

So to get text with curl you have to specify an accept header, e.g.:

curl -H "Accept: text/plain" http://localhost:4000/bikes

A browser will typically send an accept header like:

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3

i.e. expressing a preference for text/html

Phoenix.Controller.accepts/2

2 Likes

For Phoenix you would:

  1. Allow the alternative format (assuming you send Accept: text/plain)
pipeline :browser do
  plug accepts, ["html", "text"]
  ...
end
  1. Render in a format insensitive way in your controller
render(conn, :index, assigns)
  1. Write the view functions for html and text to render in the correct format
def render("index.html", assigns) do
    
end

def render ("index.text", assigns) do

end
3 Likes

As curl is mostly used for debugging or scripting rather than to consume the site in a human readable format, I’d just treat it as a regular browser.

If I were scripting and my XPATH that I have carefully checked using the browser wouldn’t work using curl anymore, because all of a sudden it just gets preformatted plaintext, this would break with all my expectations about how curl is used.

Also if I think my browser goes made on a website, I tend to curl that site and check the HTML source for obvious problems. I do not trust the browsers source view anymore, since it seems not to be the HTML it downloads, but those thats left after it tried to repair the HTML and also the DOM has been transformed by JavaScript.

Last but not least, if I really want to browse on a terminal, I’d use lynx, links, or brow.sh rather than curl | less

But I really like the idea of providing alternatives to HTML through content negotiation. Perhaps a site which can be fully consumed via MD and a MD browser? This could save a lot of bandwith on small dataplans…

4 Likes

If it’s mean to be an actual feature and not some rarely used debugging tool I’d opt for a fitting content-type like mentioned above. This makes it easy for curl to still get the html instead as well.

This is true for the DOM viewer in browsers’ dev tools. It’s not meant to show source code, but the DOM. I’ve not yet had problems with using context menu > view source (or whatever it’s named in browsers). Those should still show what’s being sent by the server.

1 Like

It is fascinating how someone asks a question and people just invalidate the idea at all.
Maybe that person wants to build something like curl wttr.in? That basically does the same thing. If opened via curl, you get a response for a pretty console output and if opened via a browser, you get the same but in html and some social sharing widgets.

From a usability perspective, this is a really great solution. Imagine you would need to explicitly set an Accept header each time. That would be annoying AF.

Just my 2 cents.

BTW, source for the example above can be found here: https://github.com/chubin/wttr.in

2 Likes

Imagine I use wget in a html-to-pdf service and all I get back is the terminal text and not the html. That can be just as annoying AF. The proper way is content type negotiation if the url shall stay the same. The rest is weighing convenience in one place against the inconvenience in others. What @NobbZ posted above is his version of “this is inconvenient”.

5 Likes

We don’t know the OP usecase so you cannot assume anything. From what I read, it sounds like the usecase for something like wttr.in.

This very much sounds like it is intended to be viewed in the console and not to generate some PDFs. If someone wants to generate a PDF from the HTML, just send a proper user-agent and get the HTML.

In the “old days” it was common to use the user agent string to try to work out what capabilities a browser had. Eventually it was worked out that browser strings are rarely conforming, change frequently and using user agents this way hid the intent. As a result, today it would be unusual for a javascript lib to use the user agent for anything. Testing the capability of the runtime has become the norm.

Using it as a way of determining the representation of an HTTP response seems just as strange to me - but I acknowledge that probably makes me an outlier these days.

The primary intent of the Accept header is to allow a consumer to specify what format is acceptable. And Phoenix has a really nice and clean way to respect that request. I don’t want, as a consumer, for you to decide what representation I should have. You tell me what you can deliver, I’ll tell you what I want.

Any other approach is rife with ambiguity. If you decide what to send me based upon my user agent, and I specify an Accept header - which representation are you going to decide to send me?

2 Likes

In this case, the Accept header would have higher priority to the user-agent.

Let’s go back to the curl example. Do you really want to type curl -H "Accept: text/plain" http://wttr.in all the time? In my case, I had to first google how to set a header via curl because I use it very rarely.

Sometimes the most usable solution is not the “correct solution” those professors want to teach you in university.

1 Like

That said, its a fair point that I didn’t answer the question asked!

So yes, I would believe that if it needs to be done for some reason, a plug in the pipeline would be the right approach :slight_smile:

(I have actually spent a lot of lost time parsing browscap files and friends and it wasn’t fun!)

What’s a proper user-agent though? curl is just as valid as any other user-agent sent by certain browsers. As it’s implemented right now even setting Accept: text/html won’t make wttr.in send back html content. The only way to get back html is to not use one of the hardcoded “terminal” user agents, which are just assumed to want text returned not html.
In my html-to-pdf example I wasn’t really talking about me being in control of that. I was talking about some random service on the web, where I simply supply an url and get back a pdf.

That’s true, but it’s a tradeoff to be decided by whomever is in charge of implementation.

But the discussion of backdraws has it’s usefulness as well. You already mentioned letting the Accept header trump the user-agent. We also know that browsers will send that header with text/html requested. So maybe the best way is to send text for Accept: */*, which is the default for curl/wget and send html only when requested via the Accept header, to cater for browsers. If I want html in the terminal I can add the header, which does solve the imagined example of a html-to-pdf service, which really should set the header to request html to be returned.

To me this sounds like a solution without requiring terminal users to set an accept header to get text returned, while still using only content negotiation for any tool to access the html content.

3 Likes

AFAIK ordering on the accepts plug specifies the server preference, e.g.

pipeline :browser do
  plug accepts, ["text", "html"]
  ...
end

will send text/plain when there is no Accept header and only text/html if it is the preferred option in the header.

The blog post needed to specify:

curl -H "Accept: text/plain" http://localhost:4000/bikes

because it chose plug :accepts, ["html", "text"]

With plug :accepts, ["text", "html"]

curl http://localhost:4000/bikes

should be enough to get the text representation.

2 Likes

OP just woke up. I’m still processing the responses, but the intended application is for something like wttr.in. I.e., this is what you do most of the time:

$ curl wttr.in/moon
                  ------------.
               -'  o     . .   `--.
            '   .    O   .       . `-.
          @   @@@@@@@   .  @@@@@      `-.
         @  @@@@@@@@@@@   @@@@@@@   .    \
          o @@@@@@@@@@@   @@@@@@@       . \.
        o   @@@@@@@@@@@.   @@@@@@@   O      \
      @   .   @@@@@@@o    @@@@@@@@@@     @@@ \
      @@               . @@@@@@@@@@@@@ o @@@@|
     @@  O  `.-./  .      @@@@@@@@@@@@    @@  \	 First Quarter +
     @@    --`-'       o     @@@@@@@@ @@@@    |	 4  6:22:22
     @@        `    o      .  @@   . @@@@@@@  |	 Full Moon -
         @@  @         .-.     @@@   @@@@@@@  |	 2 20:08:21
      @        @@@     `-'   . @@@@   @@@@  o /
         @@   @@@@@ .           @@   .       |
        @@@@  @\@@    /  .  O    .     o   . /
         @@     \ \  /         .    .       /
           .    .\.-.___   .      .   .-. /'
                  `-'                `-' /
             o   / |     o    O   .   .-'
            .   /     .       .    .-'
               -.       .      .--'
                  ------------'


Follow @igor_chubin for wttr.in updates

Loading the URL in the web browser still should work and give something useful, but the focus is on making the curl part friendly to the person typing it.

1 Like

I updated the accepts plug like peerreynders said with text listed first. curl gets text by default and browsers get HTML. No switching on “user-agent” needed.

Thanks!

8 Likes