htazewell
I'm making a web scraper to download zip files where the filename does not appear in the URL
I’ve been stuck trying to figure this out for a few weeks now. I am using a headless browser (phantomjs) to download a zip file. I have tried using HTTPoison however the URL does not contain the filename. In fact every attempt to download this zip file using HTTPoison returns a 302 redirect. I was wondering if there is a method by which files could be downloaded in a headless browser by either using the content-disposition or some other AJAX property since the filename is not part of the URL. Any ideas or insight would be helpful! Thank you.
Most Liked
OvermindDL1
You need to follow redirects then, you can do that by passing the option follow_redirect: true, max_redirect: 20 for whatever values you want. Or follow them manually. ![]()
I’m not sure why there would be a point to use a headless browser, should just use HTTPoison. ![]()
sribe
My only suggestion is that as long as you’re using a headless browser you’ll never know what is actually happening. You need to download the document in a regular browser, then look at the network activity in your browser’s dev tools to see what actually happened in terms of redirects etc, then use an HTTP client lib (HTTPoison is the usual choice) to follow the same path.
It’s perfectly possibly that the site you’re hitting uses a combination of redirects AND cookies. Advertising & tracking sometimes complicate the heck out of things. I’ve seen a single page load result in a series of 8 redirects each of which added some damned tracking/advertising info until the request was finally answered.
OvermindDL1
Uh… that’s a server bug, like big-time server bug… o.O
Well HTTPoison is the thing for that, plus processing the HTML through Meeseeks or something if you need to do that. ![]()







