Possible hidden caching default in Req, receiving status 304

I thought about opening an issue for this question in the Req repo, but I think it’s more an issue with my HTTP knowledge. I’m hoping someone here can point me in the right direction.

I have an application that builds a database from an external data source. The data is available in .zip files and downloaded by performing get requests to known URLs. I am performing the requests like this

Req.get!(url, raw: true, output: file_path)

Occasionally I want to rebuild the database in development. Lately I’ve been getting exceptions during some attempts to rebuild when the files cannot be unzipped, due to invalid file contents {:error, :einval}. I traced the error to the request response, which is returning a status 304 "not modified" and a body "". The empty body contents are being written to the output file path, causing the unzip error downstream.

  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"cache-control", "private, max-age=169126"},
    {"date", "Sun, 03 Jul 2022 15:21:43 GMT"},
    {"connection", "keep-alive"}
  private: %{},
  status: 304

Is there any way to get around the 304, or to force my way to a 200 with the desired response body? Or is there just no workaround as long as I’m requesting the same contents from the same IP within the server’s cool-down period?

Thanks in advance for any suggestions.

Do you have any other global / default config set for Req? The only way a server should be replying with an HTTP 304 is if the request has an If-Modified-Since header, and the only way that happens in default Req is with cache: true set - which should also handle the 304 :thinking:

Thanks, I had a similar thought after doing more digging, but I’m not setting any config for Req anywhere, or passing any extra opts to that request.

Update to this, I tried hitting the same url in Req and HTTPoison, Req seems to be caching somehow under the hood while HTTPoison returns the desired response.

This was done in a fresh iex session with Mix.install([{:httpoison, "~> 1.8"}, {:req, "~> 0.3"}]), so there are no global config overrides.

iex> url = "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip"
iex> Req.get!(url, headers: [], cache: false, raw: true)
  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"cache-control", "private, max-age=93477"},
    {"date", "Mon, 04 Jul 2022 12:22:32 GMT"},
    {"connection", "keep-alive"}
  private: %{},
  status: 304

iex> HTTPoison.get!(url)
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 51, 166, 51, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 20, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 117,
    115, 95, 99, 100, 49, 49, 54, 46, 99, 112, ...>>,
  headers: [
    {"X-Frame-Options", "SAMEORIGIN"},
    {"X-Content-Type-Options", "nosniff"},
    {"Cache-Control", "private"},
    {"Last-Modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"X-XSS-Protection", "1; mode=block"},
    {"Content-Security-Policy", "frame-ancestors 'self';"},
    {"Content-Type", "application/zip"},
    {"Strict-Transport-Security", "max-age=31536000"},
    {"Date", "Mon, 04 Jul 2022 12:17:37 GMT"},
    {"Transfer-Encoding", "chunked"},
    {"Connection", "keep-alive"},
    {"Connection", "Transfer-Encoding"}
  request: %HTTPoison.Request{
    body: "",
    headers: [],
    method: :get,
    options: [],
    params: %{},
    url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip"
  request_url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip",
  status_code: 200

I’ll probably bring this to the Req issue tracker from here.

Yeah, not sure what’s going on. Just to get it out of the way, you’re sure you’re not using Req.default_options(cache: true) anywhere, e.g. in your .iex.exs and such? Could you run the following?

|> IO.inspect()
|> Req.get!(url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip")
|> IO.inspect()

and paste the results?

Thanks for looking into this, I haven’t set any config options in .iex.exs or in the application config, I grep’d for Req in the app repo and there are only two references, both Req.get! calls.


|> IO.inspect()
  method: :get,
  url: URI.parse(""),
  headers: [],
  body: "",
  options: %{},
  registered_options: #MapSet<[:auth, :base_url, :cache, :cache_dir,
   :compress_body, :compressed, :connect_options, :decode_body, :finch,
   :follow_redirects, :form, :http_errors, :json, :location_trusted,
   :max_redirects, :max_retries, :output, :params, :plug, :pool_timeout, :range,
   :raw, :receive_timeout, :retry, :retry_delay, :unix_socket, :user_agent]>,
  halted: false,
  adapter: &Req.Steps.run_finch/1,
  request_steps: [
    put_user_agent: &Req.Steps.put_user_agent/1,
    compressed: &Req.Steps.compressed/1,
    encode_body: &Req.Steps.encode_body/1,
    put_base_url: &Req.Steps.put_base_url/1,
    auth: &Req.Steps.auth/1,
    put_params: &Req.Steps.put_params/1,
    put_range: &Req.Steps.put_range/1,
    cache: &Req.Steps.cache/1,
    put_plug: &Req.Steps.put_plug/1,
    compress_body: &Req.Steps.compress_body/1
  response_steps: [
    retry: &Req.Steps.retry/1,
    follow_redirects: &Req.Steps.follow_redirects/1,
    decompress_body: &Req.Steps.decompress_body/1,
    decode_body: &Req.Steps.decode_body/1,
    handle_http_errors: &Req.Steps.handle_http_errors/1,
    output: &Req.Steps.output/1
  error_steps: [retry: &Req.Steps.retry/1],
  private: %{}

|> Req.get!(url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip")
|> IO.inspect()
  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"cache-control", "private, max-age=90474"},
    {"date", "Mon, 04 Jul 2022 13:12:35 GMT"},
    {"connection", "keep-alive"}
  private: %{},
  status: 304

Everything looks correct. Sorry, not sure what’s going on. I tried reproducing it but it works well here. If you can come up with steps to reproduce, I’m happy to try them out.

Ok, this should reproduce. I had to find another file on that server that was returning 200 for me. This is one of the smaller payloads too, 669kb.

iex> url = "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"
iex> Req.get!(url, raw: true)
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 178, 181, 47, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 19, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 53, 48,
    95, 115, 108, 100, 108, 46, 99, 112, 103, ...>>,
  headers: [
    {"x-frame-options", "SAMEORIGIN"},
    {"x-content-type-options", "nosniff"},
    {"last-modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"accept-ranges", "bytes"},
    {"content-length", "684760"},
    {"vary", "Accept-Encoding"},
    {"x-xss-protection", "1; mode=block"},
    {"content-security-policy", "frame-ancestors 'self';"},
    {"content-type", "application/zip"},
    {"strict-transport-security", "max-age=31536000"},
    {"cache-control", "private, max-age=172761"},
    {"date", "Mon, 04 Jul 2022 14:13:45 GMT"},
    {"connection", "keep-alive"},
     "TS01d1a586=01283c52a476381da9e86d878350dcabc68f6ca87923d686b7be476d407bb8afb678fe959fb5577f02211bd35ce00f4187f734ef9c; Path=/; Domain=.www2.census.gov"},
     "TS01d1a586=01283c52a496d659b2f32b5f9a08a459ae22df2a5b8309f9b9c5c439f8c7cf2649bd5eeb2c0523b73215578f15965efeaef5da6b01; Path=/; Domain=.www2.census.gov"},
     "TS01d1a586=01283c52a4dd1a41674ce8f8d9ed1cec027fe5712ad7460a60a0a01725a3de02d85e1b5c5a6d69fd94f46e78fd1e8b6d70bd807d0d; Path=/; Domain=.www2.census.gov"},
     "TS01d1a586=01283c52a4163f7ab13932973e1423b85fc08b84e66eaecb3c2867742726ef10b3e43bdf7d3c01899cef1c35c1fd32d0d2bd805fdf; Path=/; Domain=.www2.census.gov"},
     "TS01d1a586=01283c52a4cca094407ccd48149f13e2f002146b6d7757be8165acb9fa09c565748091e9f4e7f9e508d30fb999c24467a1d45b404f; Path=/; Domain=.www2.census.gov"},
     "TS01d1a586=01283c52a40b87d78adacbc04dcc5b74e13b4a3c6215f12d91334e4a87e63c5eb91be5219a8885a13d90e8469918de621ab68b8794; Path=/; Domain=.www2.census.gov"},
     "TS01d1a586=01283c52a4e77b55c32a8473da38addffafd2d9ebc73db6794227db5c8f12082a3b8075be9fe0badf7332575d54be33b9418518d64; Path=/; Domain=.www2.census.gov"},
     "TS01d1a586=01283c52a40d65ef2a6cae4d5768f4b9b8da5b19242e22d038f65a10d7fc3c8edb34e3118e0d0736ac325d52c293e57e34789c41b3; Path=/; Domain=.www2.census.gov"}
  private: %{},
  status: 200

# 50 is random, I have no idea what the limit might be
iex> Enum.each(1..50, fn _ ->
  Req.get!(url, raw: true)

iex> Req.get!(url, raw: true)
  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"cache-control", "private, max-age=172333"},
    {"date", "Mon, 04 Jul 2022 14:20:53 GMT"},
    {"connection", "keep-alive"}
  private: %{},
  status: 304

iex(14)> HTTPoison.get!(url)
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 178, 181, 47, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 19, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 53, 48,
    95, 115, 108, 100, 108, 46, 99, 112, 103, ...>>,
  headers: [
    {"X-Frame-Options", "SAMEORIGIN"},
    {"X-Content-Type-Options", "nosniff"},
    {"Cache-Control", "private"},
    {"Last-Modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"X-XSS-Protection", "1; mode=block"},
    {"Content-Security-Policy", "frame-ancestors 'self';"},
    {"Content-Type", "application/zip"},
    {"Strict-Transport-Security", "max-age=31536000"},
    {"Date", "Mon, 04 Jul 2022 14:21:31 GMT"},
    {"Transfer-Encoding", "chunked"},
    {"Connection", "keep-alive"},
    {"Connection", "Transfer-Encoding"}
  request: %HTTPoison.Request{
    body: "",
    headers: [],
    method: :get,
    options: [],
    params: %{},
    url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"
  request_url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip",
  status_code: 200

I am noticing that in the response headers when using Req (for both statuses 200 and 304) there is this for "cache-control":

{"cache-control", "private, max-age=172761"}

While with HTTPoison:

{"Cache-Control", "private"}

max_age is counting down in seconds on each request.

I realized I should try 50 requests with HTTPoison too:

iex(15)> Enum.each(1..50, fn _ ->                             
...(15)> HTTPoison.get!(url)     
...(15)> Process.sleep(2000)                                
...(15)> end)

iex(16)> HTTPoison.get!(url)     
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 178, 181, 47, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 19, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 53, 48,
    95, 115, 108, 100, 108, 46, 99, 112, 103, ...>>,
  headers: [
    {"X-Frame-Options", "SAMEORIGIN"},
    {"X-Content-Type-Options", "nosniff"},
    {"Cache-Control", "private"},
    {"Last-Modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"X-XSS-Protection", "1; mode=block"},
    {"Content-Security-Policy", "frame-ancestors 'self';"},
    {"Content-Type", "application/zip"},
    {"Strict-Transport-Security", "max-age=31536000"},
    {"Date", "Mon, 04 Jul 2022 14:34:50 GMT"},
    {"Transfer-Encoding", "chunked"},
    {"Connection", "keep-alive"},
    {"Connection", "Transfer-Encoding"}
  request: %HTTPoison.Request{
    body: "",
    headers: [],
    method: :get,
    options: [],
    params: %{},
    url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"
  request_url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip",
  status_code: 200

OK, I was able to reproduce it with this script:


url = "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"

Enum.each(1..50, fn _ ->

interestingly, for me, it always either printer 200,200,… OR 304,304,… never one or the other (though I didn’t run it for long)

%Req.Response{headers: [..., {"vary", "Accept-Encoding"}, ...]}

Req by default sets accept-encoding with gzip and others. Maybe that is confusing the server? You can turn it off like this: compressed: false.

I have tried this a few times and always kept getting 200s:


url = "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"

Enum.each(1..50, fn _ ->
  IO.inspect(Req.get!(url, compressed: false).status)

Incredible, this is working for me now:

Req.get!(url, raw: true, compressed: false, output: filepath)

It returns 200 with the expected binary response body that writes a valid .zip file, and the "cache-control" response header is now {"cache-control", "private"}.

Thanks so much!