JoeZMar

JoeZMar

Tips on building a web scraper

I am attempting to make a web scraper that started out as a tool for just one person. As word got around I am finding out that there is actually somewhat of a need for this service. I started out building this with the intention of learning how to integrate OTP into a phoenix app and so far so good.

The scraper itself requires a user to log in to the targeted SPA, and monitor items in an auction board. When an item drops that fits the users parameters, it alerts the user. Right now I am using Hound and PhantomJS headless so that JS is rendered, but it was also just a choice I made when I was thinking about using this for just one person.

What are some common pitfalls that I should make sure that I am watching out for when making requests from several different users? I am making sure that each user is using the same User Agent each request, but how would I go about setting a proxy for each connection? I ask this because currently I have one user that will log in and monitor the trading board pulling in each new “item” that gets put up, but I would like to eventually implement the “purchase” with the bot as well. For that I would need my application to login as the other user and make the purchase. I am currently having the user log in and my application stores their cookie for credentials for that part and not their actual credentials for the site.

What is the legality of using a bot to log in to another site as well? Is there a proper way to scrape a site that uses authorization? If anyone that has experience scraping sites at a large scale I would love to talk with them outside of the forums and possibly get some more advice.

Most Liked

arnodirlam

arnodirlam

As a first pointer, there was a really nice talk at ElixirConf 2019 by Adam Mokan on exactly that topic:

JoeZMar

JoeZMar

I coincidentally sent him a PM yesterday because of some of his responses on the forum to scraping. I didn’t realize he had a talk. Thanks!

Where Next?

Popular in Questions Top

tduccuong
Hi, is there any work on GUI with Elixir, that is similar to Electron/Javascript? My idea is to bundle Phoenix and BEAM into a single se...
New
9mm
I am constructing a JSON object (map) and I need to conditionally set a field. I’m trying to write proper elixir-way code… and I’m at a l...
New
aadeshere1
I have a another noob question about loop. Since elixir is immutable, while loop is not directly possible. total = 10 while total != 0 ...
New
alice
Hey, Just curious what are the main benefits of Elixir compared to Clojure? When is Elixir more useful than Clojure and vice versa? Th...
New
hariharasudhan94
lets say i have a sample like a = 20; b = 10; if (a > b) do {:ok, "a"} end if (a < b) do {:ok, b} end if (a == b) do {:ok, "eq...
New
nobody
Hi! In PHP: $SERVER['SERVERADDR'] - in Elixir? Searched the docs for ip address and the web, no good results. Thanks!
New
komlanvi
Hi everyone, I was playing with phoenix liveView but I run into an issue. I have a form and want to validate each input text when the te...
New
Brian
What is the proper way to load a module from a file in to IEX? In the python world, doing something like this pretty standard: from ....
New
WestKeys
Currently suffering from paralysis by [HTTP client] analysis. This is rather unusual in Elixirland as there tends to be consensus on the ...
New
marick
I had some trouble figuring out how to make many-to-many associations work. Once I got it working, I wrote a blog post. Because I'm a nov...
New

Other popular topics Top

Harrisonl
We have an ECS cluster with 4 services, where each task joins a single cluster, via discovery ECS discovery service. Currently when I de...
New
lastday4you
I wanted to check elixir version in phoenix because i found that my elixir is 1.5 but when i use Enum.chunk_by it said the function is un...
New
AstonJ
Posting this to see if we can make things easier for people to get into Neovim. If you use Neovim and have a favourite distro please let ...
New
JorisKok
I have a server on AWS, and was running a load test using artillery. When looking at the Phoenix dashboard I see the Ports going to 100% ...
New
JeremM34
Hello, how can I check the Phoenix version ? Thanks !
New
boundedvariable
I am going through the kafka architecture. All the features what the kafka is providing are already in Erlang. I would like hear your opi...
New
romenigld
I am trying to run a deploy with docker and I successfully runned with this command: docker build -t romenigld/blog-prod . but when I t...
New
klo
Got a question about when to concat vs. prepending items to list then reversing to achieve appending. So i know lists boil down to [1 | ...
New
jononomo
For some reason my phoenix channels are working for me in my local dev environment, but as soon as I deploy via Docker, I get a 403 error...
New
vonH
In asking this question I am more interested about the expressiveness of the language itself and less concerned about the availability of...
New

We're in Beta

About us Mission Statement