htazewell

htazewell

I'm making a web scraper to download zip files where the filename does not appear in the URL

I’ve been stuck trying to figure this out for a few weeks now. I am using a headless browser (phantomjs) to download a zip file. I have tried using HTTPoison however the URL does not contain the filename. In fact every attempt to download this zip file using HTTPoison returns a 302 redirect. I was wondering if there is a method by which files could be downloaded in a headless browser by either using the content-disposition or some other AJAX property since the filename is not part of the URL. Any ideas or insight would be helpful! Thank you.

Most Liked

OvermindDL1

OvermindDL1

You need to follow redirects then, you can do that by passing the option follow_redirect: true, max_redirect: 20 for whatever values you want. Or follow them manually. :slight_smile:

I’m not sure why there would be a point to use a headless browser, should just use HTTPoison. :slight_smile:

sribe

sribe

My only suggestion is that as long as you’re using a headless browser you’ll never know what is actually happening. You need to download the document in a regular browser, then look at the network activity in your browser’s dev tools to see what actually happened in terms of redirects etc, then use an HTTP client lib (HTTPoison is the usual choice) to follow the same path.

It’s perfectly possibly that the site you’re hitting uses a combination of redirects AND cookies. Advertising & tracking sometimes complicate the heck out of things. I’ve seen a single page load result in a series of 8 redirects each of which added some damned tracking/advertising info until the request was finally answered.

OvermindDL1

OvermindDL1

Uh… that’s a server bug, like big-time server bug… o.O

Well HTTPoison is the thing for that, plus processing the HTML through Meeseeks or something if you need to do that. :slight_smile:

Where Next?

Popular in Questions Top

marius95
Hello everyone, I try to use an Javascript Event Handler in my root.html.leex file. Therefore I created a function in the app.js file: ...
New
aadeshere1
I have a another noob question about loop. Since elixir is immutable, while loop is not directly possible. total = 10 while total != 0 ...
New
chokchit
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2733ms. You can configure how long re...
New
lessless
I believe there are people here who are dealing with CSV files import on the daily basis, and since Excel is a really popular tool there ...
New
dokuzbir
I want to highlight html closing tags when i click a html tag. That works in .html files but doesnt work for html.eex templates. How can...
New
aalberti333
As the title describes, I’m trying to run Enum.map() over a list of key/value pairs, where the value is a map. My data looks like this: ...
New
LegitStack
I’m trying to make a websocket server in Phoenix or raw Elixir. I heard about gun, I think I could use cowboy, but since I’m not that sma...
New
bsollish-terakeet
Credo is smart enough to check for (something like) this: assert length(the_list) == 0 with this response: Checking if an enum is empt...
New
chensan
I have a User schema with a :from_id field set to type :string: defmodule TweetBot.Repo.Migrations.CreateUsers do use Ecto.Migration ...
New
WestKeys
Currently suffering from paralysis by [HTTP client] analysis. This is rather unusual in Elixirland as there tends to be consensus on the ...
New

Other popular topics Top

chrismccord
As promised, the first release candidate of Phoenix 1.3.0 is out! This release focuses on code generators with improved project structure...
New
AstonJ
Posting this to see if we can make things easier for people to get into Neovim. If you use Neovim and have a favourite distro please let ...
New
stefanluptak
Hello everybody, usually, I use a 29" ultra-wide monitor for VSCode which can easily accomodate explorer (files panel) + file with code ...
New
alice
Hey, Just curious what are the main benefits of Elixir compared to Clojure? When is Elixir more useful than Clojure and vice versa? Th...
New
Emily
I have VueJS GUIs with the project generated using Webpack. I have Elixir modules that will need to be used by the VueJS GUIs. I fore...
New
Lily
In templates/appointment/index.html.eex: <%= for appointment <- @appointments do %> <tr> <td><%= appoi...
New
fayddelight
I tried installing elixir 1.11.2 erlang 23.3.4 via asdf in my zsh shell. Enabled the versions locally and globally. When I list them ...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a > b) do {:ok, "a"} end if (a < b) do {:ok, b} end if (a == b) do {:ok, "eq...
New
baxterw3b
Hi guys, i’m new in the Elixir world, and i have to say, that i love it! i’m having some problem to understand anonymous functions with ...
New
axelson
This post is a wiki (feel free to hit the edit button near the bottom right of this post to add your own changes!) This post collects co...
239 47849 226
New

We're in Beta

About us Mission Statement