Is there a Best Practice for forming JSON (etc) URLs?

If I’m exporting (or accepting) data structures for a given HTML page, is there a Best Practice or convention for forming JSON (etc) URLs?. For example, something like ...?fmt=<fmt> or perhaps ...?fmt=<fmt>&part=<part>?

-r
Inquiring gnomes need to mine.

I would say: DON’T

URLs lengths are limited by the browsers so it is not good idea to send large structures as part of the URL. So if you can then use POST queries and send data within body of the request (GET requests cannot have body according to spec).

And if you really know what you are doing and you still want to continue, then I would say that you should use URL-safe base 64 encoding.

2 Likes

I’m confused - like ...?fmt=json ?

If yes, then “best practice” states don’t do it. The “format” isn’t something that should be part of the URL (other than supporting looking at data by typing the URL in the browser’s address bar) but should be supported via content negotiation.

A page would access JSON content via, for example, the fetch API. When assembling the request an Accept header set to the IANA MIME media type - application/json should be added to the request.

The server should then fulfill that request to the best of it’s capability by matching the requested IANA MIME media type.

In the response application/json would appear in the Content-Type header.

4 Likes

The only problem I have with this approach is that it pretty much requires writing
some client code in order to get at the JSON data. Although the curl(1) command
has a -H flag which can be used to set headers, I don’t know of anything analogous
in the world of web browsers.

https://hexdocs.pm/phoenix/routing.html suggests that the JSON version of the URL
/foo could be something like /api/foo. However, this hard-wires JSON in as the
API’s data format.

-r

Is it really that hard? It seems pretty easy for me.

1 Like

For browsers the initial point of contact tends to be text/html or text/plain.

If you look at the developer console’s network traffic you’ll notice that the browser doesn’t specify an Accept header - leaving it up to the server to serve the default media type as there often is only one.

However if there are multiple representations of the same resource, it’s discouraged to use distinct URLs for each representation. The resource should be identified by the URL (a URI) but the format of the representation should be handled via Accept/Content-Type (if there is more than one possible format).

application/json , application/toml and application/x-yaml are intended for consumption by programmatic clients - not straight inclusion in a static web page.

suggests that the JSON version of the URL /foo could be something like /api/foo.

defmodule HelloWeb.Router do
  use HelloWeb, :router

  pipeline :browser do
    plug :accepts, ["html"]
    plug :fetch_session
    plug :fetch_flash
    plug :protect_from_forgery
    plug :put_secure_browser_headers
  end

  pipeline :api do
    plug :accepts, ["json"]
  end

  scope "/", HelloWeb do
    pipe_through :browser

    get "/", PageController, :index
  end

  scope "/api", HelloWeb do
     pipe_through :api
  end
end

That split has largely to do with the separate Plug pipelines as API requests typically don’t require a lot of the baggage that exists for the browser (and APIs may need their own type of baggage).

Plug accepts/2 can take multiple media types, so the data format isn’t hardwired.

The controller can access the requested format through get_format/1. In fact render/3 uses it to choose the template with the correct format.

1 Like

I really don’t think the header method is really the best to be honest. For example, at work here I have a webpage table report that can be downloaded as html, json, csv, formatted-pretty-excel, pdf, and can be expanded in the future, this is done by just adding something like .csv or .xlsx or whatever to the end of the url. When sending URL’s to other systems or synching to excel or so forth I can only send a URL, thus using a header only method seems quite impossible to support.

So yes, I do it not by doing something like ?fmt=json, but rather by just appending .json to the end of the URL (before the query args if any). There is a plug that strips that off and sets the content format based on that. Without it, then it defaults to the header as usual.

3 Likes

I understand that this is a common practice for Rails servers. This post
talks about it in some detail:

https://chodounsky.net/2015/01/26/respond-to-different-formats-in-rails-controller/

How do you handle situations where there is no file name at the end of the URL?
For example:

http://foo.com/bar/

-r

He said already :wink:

2 Likes

Content Negotiation: why it is useful, and how to make it Work:

The first thing we need to understand is that a URI is not a file name.

Using URI patterns I’ve seen

resource/json
resource/xml
resource/yaml

as the preferred alternative - simply to break away from the mental model of a file.

But the thing is operating in this way creates another conceptual problem. There are three distinct URIs up there - how do we know that they refer to the same resource? One would guess because they share the same root in the URI but to know you would have to retrieve them and compare them on the semantic level.

This gave rise to the concept of a canonical URI.

resource
resource/json
resource/xml
resource/yaml
  • a request to resource/json results in a 303 status redirecting to resource
  • a request to resource/json serves the JSON representation but also includes a Content-Location header referencing resource to clearly identify that resource/json is semantically identical to resource.

As far as an API goes I’d prefer to just expose each resource at a single (canonical) URI and be done with it (and handle the representation via content negotiation).

Now the concept of a canonical URI may seem academic on a resource that isn’t going to be indexed on Google but everybody simply doing things their own way isn’t exactly helpful either.

Content negotiation is the W3C preferred standard - that doesn’t stop deviating implementations appearing all the time simply because somebody can’t be bothered to follow (or know) the preferred way.



Phoenix and the Trailing Format Plug (2015)

… seems to only support .json etc.

2 Likes

I’d love to know the ‘standards’ way of how to specify it on the URL and nothing but the URL for downloading a specific format of a file, as the URL’s are the only thing I’m able to supply. ^.^;

I already outlined that above.

  • the resource is identified via the canonical URI
  • duplicate URI’s either redirect or identify themselves as duplicates of the original resource (the latter being the one suited to your use case, the former really isn’t that helpful).

Now how that behaviour for the duplicate URIs is implemented is a technical matter.

I’m curious how it would identify itself as a duplicate, is it via a header, as the only thing sent is, for example, an excel file?

A request going to resource/xlsx would get a response with a Content-Type of
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
and a Content-Location header of resource.

Of course that only makes sense if resource is capable of processing an Accept header for
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
to respond with that media type.

Well it’s excel, I don’t think it checks at all, it just downloads and uses it, and it wouldn’t use Content-Location either I’d imagine.

1 Like

I do read your link the other way round. Content redirects to the path containing the format, but the format URL version has no way to link back.

I wouldn’t say this is used enough to be considered a best practice at all – and I don’t know if that would be the best for your case, but a “standard” way of describing URLs with client parameters is brought by the URI Template specification

Some examples:

http://example.com/~{username}/
http://example.com/dictionary/{term:1}/{term}
http://example.com/search{?q,lang}

That’s beside the point - i.e. that is Excel’s lookout. If it simply accepts what was dumped in it’s lap and only barfs when it can’t process the file … :man_shrugging:.

And while Excel may not care about the Content-Location something else might be by first sending a HEAD request and then not bother (because it already got resource) with a full GET.

Yes, you’re right. And the spec seems to read that way, i.e. pointing from the generic to the specific - which would be absolutely useless to establish the canonical URI.

Which would explain why search engines resort to

<link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />

but that only helps with HTML - not with general HTTP responses.

I was lead by Restful Web services p.84.

The revised Restful Web API is contradictory p.324:

So that would only leave the redirects which is useless for avoiding the use of Accept headers.

Seems the canonical IRI has been pushed into Web Annotation Data Model (again not very useful for pruning unnecessary GET requests by using HEAD requests first).

2 Likes