Site Implementation using... GenServers?

Hey everyone. I am new to Phoenix trying to build my first application and I was wondering if I could get another opinion on how to implement a web scraper I am creating.

The bot/scraper itself logs a user into a third-party site (which is very JS heavy), enters a couple things about the user (name, phone…), then logs out.
The other challenge that I face is the interface for the user has to be done through the messaging platform Telegram.

So right now, I created a Bot within Telegram to listen for commands from users. When a user sends a command to the bot (i.e. /enter_raffle) a webhook sends a post request with the message text and telegram user info as params to the controller.

The other thing I have to consider is that this is for a raffle of about 1-2k people who are sitting together at a convention who will enter the raffle at the same time (so maybe about 1k bots/scrapers running concurrently).

I have been trying to think of how to implement this and would like to know if I am on the right track.

This is what I have sort of come up with:

Assuming the user is already registered:
When a user sends the /login command to the TG bot a new GenServer will spin up, and authenticate the user by asking for its password and then storing a Guardian token as state (I am not sure how to implement a secure auth because I am just using the telegram_id that comes in the params from the TG bot).
One user can have multiple entries in the raffle if they have more accounts set up, so when a raffle starts and a user can use the /join “account_name” and it will spin up another GenServer that performs the bot.

The bot itself takes about 9 minutes to complete because it is entering about 800 sub-raffles for the one user.

I am not sure if running 1k GenServers that are all using Hound connected to a Selenium server to scrape a website will be a bottleneck. Am I anywhere close to thinking the correct way to handle something like this?

1 Like

Hi.

Is there maybe any way to not use selenium? JS on the frontend must be talking to the backend through some sort of API. Can you use it?

1 Like

There is no API available and for some of the raffle entries there are timed interactions. I have to wait 10-15 seconds between some raffle entries.

1 Like

I didn’t mean a publicly available API, I meant that you might want to try and reverse engineer it.

1 Like

Well, there are several raffles that require you “like” their Instagram photos before you can enter. I am not sure how I would be able to reverse engineer a way to like Instagram media through an API, plus they just revoked the privileges to like media through their public API recently so I know that’s not available to use.

1 Like

https://www.instagram.com/developer/endpoints/likes/#post_likes

But I digress. Anyway, everything else in your approach seems fine. I did something similar for a telegram bot I made for my uni (it posted updates like new hw/lecture slides etc and had some rudimentary search) and it worked fine. But I reverse engineered the website I was scraping and used just http requests which ended up very lightweight.

1 Like

This works for access tokens with an above Basic Privilege account. They no longer approve apps for anything but the basic so that doesn’t work anymore.

Thanks for the input! Did you have any type of user auth between telegram and your app?

1 Like

I get confused in auth terms, do you mean authentication or authorization?

If it’s the former, I stored each user with their corresponding telegram id in the database and looked it up on some of the requests that required it.

If it’s the latter, then not really, since the only “resource” that could be accessed was a config page to choose subscriptions, and for that I generated a short lived token like IrfkFkHf and sent the user a url with it https://botbot.bot/config/IrfkFkHf. The token with the user’s telegram id was stored in a genserver (but if you want to avoid bottlenecks, you might want to pick ets). When the config page was accessed, the controller would look up the telegram id for the token and go from there.

1 Like

Sorry, yes I meant authorization. That is how I planned on authorizing users, but I was concerned with security. Is there a way for someone to spoof a request with a different telegram_id and essentially be able to have access as the other user?

This is probably how I end up creating it, but that was something that is on my mind.

1 Like

Well, they might see the link on their displays but you can send a button (inline keyboard) … And telegram uses https for bot apis, so there is at least some encryption in transit.

1 Like

What do you mean I can send a button?

1 Like

You can send a message with an inline keyboard with only one button which would have a url leading to https://botbot.bot/config/IrfkFkHf, in my case. Like this https://core.telegram.org/bots/2-0-intro#url-buttons. In my case, it could’ve been “edit subscriptions”.

1 Like

Oh, I see now! I started something like this in Rails where the bot would respond with a link for them to visit and validate their password.

I am thinking now of creating a button to send to them to validate their password. If they validate it correctly, using Guardian to store an auth token as state within a GenServer. Then when certain commands are sent from telegram it will check the token that is stored.

1 Like