Progmatically access a webpage and download a file

benperiton · September 14, 2019, 1:18pm

Hi,

I have a service that I use that doesn’t provide an API or FTP to get the data that I need, so at the moment it involves logging in, filtering on some dates, then downloading the file.

I’ve seen Chroxy, which looks like it might work using the ChromeRemoteInterface - has anyone used this for something like this? Alternatively, I could write it in JS and use Puppeteer, but how would I then call that from Elixir and process the resulting downloaded file? (It would also mean I would have to have NodeJS installed on the server, which although not a problem, just means another thing to keep updated)

Has anyone had to do this already?

Thanks!

josemrb · September 14, 2019, 3:30pm

I’ve used Hound to drive a Firefox session with Selenium.

You could try to navigate to the download page then grab the cookies and the URL from the generated link and use :httpc to download the file.

# Sample config for Hound
config :hound,
  browser: "firefox",
  driver: "selenium"

Also there is a ready to use Docker image with a compatible version of Selenium Standalone with Firefox installed.

$ docker run -d -p 4444:4444 --shm-size=2g selenium/standalone-firefox:3.4.0-francium

benperiton · September 14, 2019, 5:11pm

Thanks for the suggestion, but I fear it might be a bit too “heavy” for what I need
However for some E2E testing, it sounds like it would do the trick!