Magnetissimo - Web application that indexes all popular torrent sites, and saves it to the local database

sergio · July 23, 2016, 8:07am

Star and follow the repo for updates to the project.

Since KickassTorrents was killed, I decided to write my own indexer using Elixir and Phoenix.

It’s still a WIP but I just started last night. You give it a public torrent site URL and it will sniff out all magnet links and save them to the Postgresql database.

The idea is that anyone will be able to run this locally or on a VPS or heroku and have a working torrent indexer. (not a tracker).

No more single point of failure.

I would LOVE some suggestions on improving the code within crawler.ex, I know the code there is ugly as hell.

Thanks!

mkunikow · July 23, 2016, 8:53pm

I was using torrent some ages ago … Now I moved to nzb (usenet) .
https://www.reddit.com/r/usenet/

sergio · July 23, 2016, 9:21pm

Yep I’m actually just saving the magnet link. There are no files, just database text fields name and magnet. Should be super duper ultra turbo magnum fast.

Edit: I still need to figure out how to scale this and run crawl’s in parallel. Right now I can just feed it a URL and it sprawls from there sequentially.

thinkpadder1 · July 23, 2016, 9:22pm

Yea, I misread your Github page when I made that comment. Thought I deleted my comment too

sergio · August 5, 2016, 1:52am

Decided to rewrite the crawler to be a more manual process:

Check it out live here:

https://streamable.com/zy94

At this point, I need to tweak some parser code as you can see there’s some errors in the video there. And integrate my brother’s UX design into the app.

Then I’ll look into documenting it, and put up deployment instructions. I have a feeling metaprogramming would help me in the Crawler.ex but I’ll get back to that later.

sergio · August 7, 2016, 7:17pm

It seems adding parsers improves the rate of importing linearly!

Added a few more parsers and we’re at 13,000 per minute. Gotta love that GenServer and Exq combo.

If anyone’s feeling up to it would love some code review on the current code base, eager to learn some more advanced tricks.

OvermindDL1 · August 8, 2016, 2:09pm

GenStream seems like a good use-case for this for throughput? Just keep increasing the parsers as there is demand.

benwilson512 · August 8, 2016, 7:25pm

I would look at GenStage personally, seems ideal for this. I can’t imagine the redis persistence Exq gives you is worthwhile here.

sergio · August 8, 2016, 7:43pm

I chose Exq not because it would persist queued tasks but it allowed me to rate-limit concurrency on a per queue basis.

I couldn’t figure out how to accomplish the same thing with pure GenServer.

I don’t want to hit these websites so hard that it affects their performance.

sergio · August 21, 2016, 2:03pm

So I just fixed a massive bug I didn’t notice until now.

Basically Magnetissimo was queuing and running the initial 6 scrape jobs every hour without waiting for the other in-progress jobs to finish.

This meant that the queue would just get longer and longer until Redis ran out of room and saturated the computer.

This new fix will only queue the scrape jobs once it finished running the scrape jobs in progress.

Also, it’s going to attempt to fire off the scrape jobs every 300 seconds. This means much sooner updates on latest torrents inside Magnetissimo.

Much better!

https://github.com/sergiotapia/magnetissimo/blob/master/lib/crawler.ex#L23

sergio · January 24, 2018, 8:55am

After a hiatus, and lots of learning (thanks to this forum!), I’m back with a vengeance to work on Magnetissimo. If you’re new to Elixir and want to learn how it can be used for “real projects” check out our source code.

I also added Theophile as a contributor and he’s done a tremendous job adding some really valuable things to the project. Recently we migrated the project to Phoenix 1.3

Lots to do and lots of ideas to execute!

sergio · March 5, 2019, 9:21pm

Back with some news if you’re into that!

We have our Discord server now: Magnetissimo / Torrentinim
Join us to discuss new features, bugs, tips or just chat about torrenting in general.
New coat of paint, courtesy Bulma!
Updated to the latest and greatest Phoenix/Elixir.
I use the term updated losely, it’s a rewrite!
Crawling torrent sources using their RSS feeds as available. This means as soon as a torrent release is out, Magnetissimo will sniff it out for you. Near-instantly!
New Wiki pages on how to run this on systemd, ubuntu, etc.
A fantastic designer donated his talents and created a gorgeous logo for the project.

Many thanks to Theophile, thatpixelcrown and caskd for their contributions.

axelson · March 6, 2019, 2:19am

That sounds pretty neat! How does that portion work? My understanding (that’s likely flawed) is that RSS feeds are typically polled which could mean a considerable delay.

mischov · March 6, 2019, 3:39pm

It looks like they poll all the RSS feeds every 15 seconds.

NobbZ · March 6, 2019, 5:08pm

Which in turn might violate a lot of usage terms… Most feeds (not torrent related) I have used so far, asked you to not poll more often than once every 15 minutes.

Clientsoftware, their developers and their users need to be aware of the fact, that each request to the server produces load on the server side and also creates traffic.

CPU and bandwith do cost money period

AstonJ · March 7, 2019, 4:04pm

Hi Sergio,

I hope you don’t mind but I have unlisted the thread as somebody has been in contact querying whether it may put the forum in a compromising position legally, particularly since more recently you have added screenshots which show copyrighted material.

You can still use this thread to get help etc, as those who have interacted with the thread will still see your replies.

If there was no mention or screengrabs of copyrighted material we could re-list it, but I understand if you don’t want to do that, and so hope you don’t mind me unlisting the thread on the forum.

Thanks for your understanding.