Possible hidden caching default in Req, receiving status 304

I thought about opening an issue for this question in the Req repo, but I think it’s more an issue with my HTTP knowledge. I’m hoping someone here can point me in the right direction.

I have an application that builds a database from an external data source. The data is available in .zip files and downloaded by performing get requests to known URLs. I am performing the requests like this

Req.get!(url, raw: true, output: file_path)

Occasionally I want to rebuild the database in development. Lately I’ve been getting exceptions during some attempts to rebuild when the files cannot be unzipped, due to invalid file contents {:error, :einval}. I traced the error to the request response, which is returning a status 304 "not modified" and a body "". The empty body contents are being written to the output file path, causing the unzip error downstream.

%Req.Response{
  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"cache-control", "private, max-age=169126"},
    {"date", "Sun, 03 Jul 2022 15:21:43 GMT"},
    {"connection", "keep-alive"}
  ],
  private: %{},
  status: 304
}

Is there any way to get around the 304, or to force my way to a 200 with the desired response body? Or is there just no workaround as long as I’m requesting the same contents from the same IP within the server’s cool-down period?

Thanks in advance for any suggestions.

Do you have any other global / default config set for Req? The only way a server should be replying with an HTTP 304 is if the request has an If-Modified-Since header, and the only way that happens in default Req is with cache: true set - which should also handle the 304 :thinking:

Thanks, I had a similar thought after doing more digging, but I’m not setting any config for Req anywhere, or passing any extra opts to that request.

Update to this, I tried hitting the same url in Req and HTTPoison, Req seems to be caching somehow under the hood while HTTPoison returns the desired response.

This was done in a fresh iex session with Mix.install([{:httpoison, "~> 1.8"}, {:req, "~> 0.3"}]), so there are no global config overrides.

iex> url = "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip"
iex> Req.get!(url, headers: [], cache: false, raw: true)
%Req.Response{
  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"cache-control", "private, max-age=93477"},
    {"date", "Mon, 04 Jul 2022 12:22:32 GMT"},
    {"connection", "keep-alive"}
  ],
  private: %{},
  status: 304
}

iex> HTTPoison.get!(url)
%HTTPoison.Response{
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 51, 166, 51, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 20, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 117,
    115, 95, 99, 100, 49, 49, 54, 46, 99, 112, ...>>,
  headers: [
    {"X-Frame-Options", "SAMEORIGIN"},
    {"X-Content-Type-Options", "nosniff"},
    {"Cache-Control", "private"},
    {"Last-Modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"X-XSS-Protection", "1; mode=block"},
    {"Content-Security-Policy", "frame-ancestors 'self';"},
    {"Content-Type", "application/zip"},
    {"Strict-Transport-Security", "max-age=31536000"},
    {"Date", "Mon, 04 Jul 2022 12:17:37 GMT"},
    {"Transfer-Encoding", "chunked"},
    {"Connection", "keep-alive"},
    {"Connection", "Transfer-Encoding"}
  ],
  request: %HTTPoison.Request{
    body: "",
    headers: [],
    method: :get,
    options: [],
    params: %{},
    url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip"
  },
  request_url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip",
  status_code: 200
}

I’ll probably bring this to the Req issue tracker from here.

Yeah, not sure what’s going on. Just to get it out of the way, you’re sure you’re not using Req.default_options(cache: true) anywhere, e.g. in your .iex.exs and such? Could you run the following?

Req.new()
|> IO.inspect()
|> Req.get!(url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip")
|> IO.inspect()

and paste the results?

Thanks for looking into this, I haven’t set any config options in .iex.exs or in the application config, I grep’d for Req in the app repo and there are only two references, both Req.get! calls.

Output:

Req.new()
|> IO.inspect()
%Req.Request{
  method: :get,
  url: URI.parse(""),
  headers: [],
  body: "",
  options: %{},
  registered_options: #MapSet<[:auth, :base_url, :cache, :cache_dir,
   :compress_body, :compressed, :connect_options, :decode_body, :finch,
   :follow_redirects, :form, :http_errors, :json, :location_trusted,
   :max_redirects, :max_retries, :output, :params, :plug, :pool_timeout, :range,
   :raw, :receive_timeout, :retry, :retry_delay, :unix_socket, :user_agent]>,
  halted: false,
  adapter: &Req.Steps.run_finch/1,
  request_steps: [
    put_user_agent: &Req.Steps.put_user_agent/1,
    compressed: &Req.Steps.compressed/1,
    encode_body: &Req.Steps.encode_body/1,
    put_base_url: &Req.Steps.put_base_url/1,
    auth: &Req.Steps.auth/1,
    put_params: &Req.Steps.put_params/1,
    put_range: &Req.Steps.put_range/1,
    cache: &Req.Steps.cache/1,
    put_plug: &Req.Steps.put_plug/1,
    compress_body: &Req.Steps.compress_body/1
  ],
  response_steps: [
    retry: &Req.Steps.retry/1,
    follow_redirects: &Req.Steps.follow_redirects/1,
    decompress_body: &Req.Steps.decompress_body/1,
    decode_body: &Req.Steps.decode_body/1,
    handle_http_errors: &Req.Steps.handle_http_errors/1,
    output: &Req.Steps.output/1
  ],
  error_steps: [retry: &Req.Steps.retry/1],
  private: %{}
}

|> Req.get!(url: "https://www2.census.gov/geo/tiger/TIGER2021/CD/tl_2021_us_cd116.zip")
|> IO.inspect()
%Req.Response{
  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:48:44 GMT"},
    {"cache-control", "private, max-age=90474"},
    {"date", "Mon, 04 Jul 2022 13:12:35 GMT"},
    {"connection", "keep-alive"}
  ],
  private: %{},
  status: 304
}

Everything looks correct. Sorry, not sure what’s going on. I tried reproducing it but it works well here. If you can come up with steps to reproduce, I’m happy to try them out.

Ok, this should reproduce. I had to find another file on that server that was returning 200 for me. This is one of the smaller payloads too, 669kb.

iex> url = "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"
iex> Req.get!(url, raw: true)
%Req.Response{
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 178, 181, 47, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 19, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 53, 48,
    95, 115, 108, 100, 108, 46, 99, 112, 103, ...>>,
  headers: [
    {"x-frame-options", "SAMEORIGIN"},
    {"x-content-type-options", "nosniff"},
    {"last-modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"accept-ranges", "bytes"},
    {"content-length", "684760"},
    {"vary", "Accept-Encoding"},
    {"x-xss-protection", "1; mode=block"},
    {"content-security-policy", "frame-ancestors 'self';"},
    {"content-type", "application/zip"},
    {"strict-transport-security", "max-age=31536000"},
    {"cache-control", "private, max-age=172761"},
    {"date", "Mon, 04 Jul 2022 14:13:45 GMT"},
    {"connection", "keep-alive"},
    {"set-cookie",
     "TS01d1a586=01283c52a476381da9e86d878350dcabc68f6ca87923d686b7be476d407bb8afb678fe959fb5577f02211bd35ce00f4187f734ef9c; Path=/; Domain=.www2.census.gov"},
    {"set-cookie",
     "TS01d1a586=01283c52a496d659b2f32b5f9a08a459ae22df2a5b8309f9b9c5c439f8c7cf2649bd5eeb2c0523b73215578f15965efeaef5da6b01; Path=/; Domain=.www2.census.gov"},
    {"set-cookie",
     "TS01d1a586=01283c52a4dd1a41674ce8f8d9ed1cec027fe5712ad7460a60a0a01725a3de02d85e1b5c5a6d69fd94f46e78fd1e8b6d70bd807d0d; Path=/; Domain=.www2.census.gov"},
    {"set-cookie",
     "TS01d1a586=01283c52a4163f7ab13932973e1423b85fc08b84e66eaecb3c2867742726ef10b3e43bdf7d3c01899cef1c35c1fd32d0d2bd805fdf; Path=/; Domain=.www2.census.gov"},
    {"set-cookie",
     "TS01d1a586=01283c52a4cca094407ccd48149f13e2f002146b6d7757be8165acb9fa09c565748091e9f4e7f9e508d30fb999c24467a1d45b404f; Path=/; Domain=.www2.census.gov"},
    {"set-cookie",
     "TS01d1a586=01283c52a40b87d78adacbc04dcc5b74e13b4a3c6215f12d91334e4a87e63c5eb91be5219a8885a13d90e8469918de621ab68b8794; Path=/; Domain=.www2.census.gov"},
    {"set-cookie",
     "TS01d1a586=01283c52a4e77b55c32a8473da38addffafd2d9ebc73db6794227db5c8f12082a3b8075be9fe0badf7332575d54be33b9418518d64; Path=/; Domain=.www2.census.gov"},
    {"set-cookie",
     "TS01d1a586=01283c52a40d65ef2a6cae4d5768f4b9b8da5b19242e22d038f65a10d7fc3c8edb34e3118e0d0736ac325d52c293e57e34789c41b3; Path=/; Domain=.www2.census.gov"}
  ],
  private: %{},
  status: 200
}

# 50 is random, I have no idea what the limit might be
iex> Enum.each(1..50, fn _ ->
  Req.get!(url, raw: true)
  Process.sleep(2000)
end)
:ok

iex> Req.get!(url, raw: true)
%Req.Response{
  body: "",
  headers: [
    {"content-type", "application/zip"},
    {"last-modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"cache-control", "private, max-age=172333"},
    {"date", "Mon, 04 Jul 2022 14:20:53 GMT"},
    {"connection", "keep-alive"}
  ],
  private: %{},
  status: 304
}

iex(14)> HTTPoison.get!(url)
%HTTPoison.Response{
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 178, 181, 47, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 19, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 53, 48,
    95, 115, 108, 100, 108, 46, 99, 112, 103, ...>>,
  headers: [
    {"X-Frame-Options", "SAMEORIGIN"},
    {"X-Content-Type-Options", "nosniff"},
    {"Cache-Control", "private"},
    {"Last-Modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"X-XSS-Protection", "1; mode=block"},
    {"Content-Security-Policy", "frame-ancestors 'self';"},
    {"Content-Type", "application/zip"},
    {"Strict-Transport-Security", "max-age=31536000"},
    {"Date", "Mon, 04 Jul 2022 14:21:31 GMT"},
    {"Transfer-Encoding", "chunked"},
    {"Connection", "keep-alive"},
    {"Connection", "Transfer-Encoding"}
  ],
  request: %HTTPoison.Request{
    body: "",
    headers: [],
    method: :get,
    options: [],
    params: %{},
    url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"
  },
  request_url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip",
  status_code: 200
}

I am noticing that in the response headers when using Req (for both statuses 200 and 304) there is this for "cache-control":

{"cache-control", "private, max-age=172761"}

While with HTTPoison:

{"Cache-Control", "private"}

max_age is counting down in seconds on each request.

I realized I should try 50 requests with HTTPoison too:

iex(15)> Enum.each(1..50, fn _ ->                             
...(15)> HTTPoison.get!(url)     
...(15)> Process.sleep(2000)                                
...(15)> end)

iex(16)> HTTPoison.get!(url)     
%HTTPoison.Response{
  body: <<80, 75, 3, 4, 10, 0, 0, 0, 0, 0, 178, 181, 47, 83, 80, 60, 129, 14, 5,
    0, 0, 0, 5, 0, 0, 0, 19, 0, 28, 0, 116, 108, 95, 50, 48, 50, 49, 95, 53, 48,
    95, 115, 108, 100, 108, 46, 99, 112, 103, ...>>,
  headers: [
    {"X-Frame-Options", "SAMEORIGIN"},
    {"X-Content-Type-Options", "nosniff"},
    {"Cache-Control", "private"},
    {"Last-Modified", "Wed, 22 Sep 2021 19:58:21 GMT"},
    {"X-XSS-Protection", "1; mode=block"},
    {"Content-Security-Policy", "frame-ancestors 'self';"},
    {"Content-Type", "application/zip"},
    {"Strict-Transport-Security", "max-age=31536000"},
    {"Date", "Mon, 04 Jul 2022 14:34:50 GMT"},
    {"Transfer-Encoding", "chunked"},
    {"Connection", "keep-alive"},
    {"Connection", "Transfer-Encoding"}
  ],
  request: %HTTPoison.Request{
    body: "",
    headers: [],
    method: :get,
    options: [],
    params: %{},
    url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"
  },
  request_url: "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip",
  status_code: 200
}

OK, I was able to reproduce it with this script:

Mix.install([:req])

url = "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"

Enum.each(1..50, fn _ ->
  IO.inspect(Req.get!(url).status)
  Process.sleep(2000)
end)

interestingly, for me, it always either printer 200,200,… OR 304,304,… never one or the other (though I didn’t run it for long)

%Req.Response{headers: [..., {"vary", "Accept-Encoding"}, ...]}

Req by default sets accept-encoding with gzip and others. Maybe that is confusing the server? You can turn it off like this: compressed: false.

I have tried this a few times and always kept getting 200s:

Mix.install([:req])

url = "https://www2.census.gov/geo/tiger/TIGER2021/SLDL/tl_2021_50_sldl.zip"

Enum.each(1..50, fn _ ->
  IO.inspect(Req.get!(url, compressed: false).status)
  Process.sleep(2000)
end)
3 Likes

Incredible, this is working for me now:

Req.get!(url, raw: true, compressed: false, output: filepath)

It returns 200 with the expected binary response body that writes a valid .zip file, and the "cache-control" response header is now {"cache-control", "private"}.

Thanks so much!