I am trying to parse quite a few RSS feeds using feeder_ex and HTTPoison.Everything runs smoothly for a bit and then start getting :receive_timeout error.I have tried to increase the timeout and receive_timeout to 10,000 but no use.Any suggestions/hints please.
Can you reproduce the error in a small example app or tell us more about how you use the libraries and some example code as well?
Thanks @NobbZ…here’s a snippet of the code
response = HTTPoison.get(url,[timeout: 10_000, recv_timeout: 10_000])
case response do
{:ok, %HTTPoison.Response{body: body}} ->
{:ok, feed, _} = FeederEx.parse(body)
{:error,reason} -> IO.puts " #{inspect blogtoupdate} .... #{inspect reason}"
end
Is only ever one of this calls active or can there happen many at once?
Are you really sure that there can never be one more than a single active call to HTTPoison.request/5
at any given point in time?
:hackney
which is the workinghorse here, does use a connection pool, and if it is exhausted, it will wait up to :recv_timeout
milliseconds for a free connection. Therefore I do assume that simply the pool is massively exhausted in your case.
Perhaps you can find something in HTTPoison issue #73 that helps you to improve the pool management.
But to be honest… {:error, :recv_timeout}
seems not to be a valid return value of HTTPoison.request/5
…
@nobbz…sorry…no…I am calling the list of urls serially :hackney
starts with a pool of 50
and it’s aHTTPoison.get
call and not HTTPoison.request
HTTPoison.get/3
is only a wrapper around HTTPoison.request/5
.
So lets try to figure out on this further…
- How often do you poll for updates?
- How many servers do you poll in sequence?
- When setting hackneys pool size to
1
, do you get the error faster as with the default? If you set it to 100 does the error get more rare or later?
sorry didn’t know that
Polling about 200 resources(URLs).This approximately takes about less than a minute.
I have left the pool size as it is i.e. 50
200 ressources, and the first 50 finish in less than 10 seconds, I asume. So everything in the pool is waiting for beeing used again for server 1, 2, 3, …, 50, while you request a connection for server 200.
Therefore, just try to increase your limit. Maybe 250 as a start? Try to decrease it step by step and try to find a value that suites you.
Also you can reduce the :recv_timeout
, the longer you choose it, the more you will run into this problem…
@NobbZ But I am parsing the URLs serially…here’s my understanding of it.
def site_parser(listofurls) do
[head | tail] = listofurls
{pid,_reference}=spawn_monitor(Somemodule, :some_function, [head])
monitor_ref = Process.monitor(pid)
receive do
{:DOWN,_,_,_,reason} -> IO.puts "PROCESS DIED COS OF #{inspect reason}"
Process.demonitor(monitor_ref,[:flush])
site_parser(tail)
end
end
first url get parsed and the process that parsed it - the one we got from:hackney
pool-goes back in the pool.So, when we do
site_parser(tail)
the :hackney
pool should still have 50 processes in the pool.
Is this correct or have I got it totally wrong?
As far as I understand the hackney pool, each connection is bound to a certain server until it times out, you can’t use that connection for another server.
So have you tried to drastically increase the limit and does it solve the issue? The third time I’m asking…
I’ll do that and let you know…thanks
Just a note, if you are using spawn_monitor
, there’s no need to do Process.monitor
again. You get the monitor_ref
as _reference
in your call.
Also you should probably match on your ref in the receive
… You might get confused by multiple monitored processes later on:
receive do
{:DOWN, ^reference, :process, ^pid, reason} ->
IO.puts "maaah… it crashed: #{inspect reason}"
Process.demonitor(reference, [:flush])
site_parser(tail)
end
Thanks.I have increased the URL list to 450 and :max_connections
to 500 but still every now and then I am getting :timeout
error.Do you think the crawler might be getting blocked because of the frequency of crawls?
As far as I understood the reason :timeout
is different from recieve_timout
, but still, both aren’t valid return values, since the documentation says the return type were {:error, HTTPoison.Error.t}
in case of an error and HTTPoison.Error.t
is an exception, so it is a struct, so it is a map. Neither :timeout
, nor any other atom are.
I really think there has to be done much work in overworking HTTPoisons documentation… Perhaps Filing an issue might help?