Hello folks,
I’m thinking to write a service to scrape data from a website, but the first requirement is to use some Proxy to change ip address per request.
I would love to hear about ideas of how to do it using Elixir/Plug/Phoenix.
Hello folks,
I’m thinking to write a service to scrape data from a website, but the first requirement is to use some Proxy to change ip address per request.
I would love to hear about ideas of how to do it using Elixir/Plug/Phoenix.
Plug and Phoenix are wrappers around http servers, not http clients. And for scraping you probably need an http client. So you can use something like HTTPoison which seems to have some support for proxies.
HTTPoison.post!("localhost", "body", [], proxy: {:socks5, 'localhost', 1080}, socks5_user: "user", socks5_pass: "secret")
That sounds concerning - don’t scrape unless you have permission by the site owner. It’s best to ask for a proper API instead.
Thanks for the information.
I have permission to scrape but some sites doesn’t have the proper API, so I need a better ideas to handle with IP’s
Since you have the permission, why would you need to change IPs?
I’ll ignore the permissions issues, and send you to https://luminati.io they can give you basically everything you need except google searches
bureaucracy, the site have a third party dns/firewall that allow only a small amount of request’s.
but its not illegal to use more than one ip to get data.
So have you considered using the TOR network then? Most public proxies still have a single IP, and you’d need to round robin them (or what is the proper word here?) In your program. But then you might also hit limits of the proxy in the number of requests, or some might be paid, and maybe other important data points which your program needs to consider when choosing which proxy from the pool to use next.
Good guy Shrike. Indeed, Luminati is cool. However, I found a cheaper alternative with kinda even higher success rates Smartproxy. Check it out in the future.