Need help with web scraping and parsing data in Elixir

I have been trying to build a side project for about 6 months now and am extremely frustrated and not sure where to turn for help.

I originally wrote the application in Rails and had what I thought was a decent working version but the code became a mess and I just couldn’t rely on the data I was getting back from my web scraping.

You can see an example of my scraping code here.

I decided that I would try to rebuild this application in Elixir because I wanted to build something that was more reliable and I wanted to learn more about Elixir.

Well now I have spent a few weeks building it out and have run into the same issues (even worse now) that I was having with my Rails app. (FYI I am new to programming)

I am trying to scrape two tables in particular this one and this one.

You can see examples of my scrape logic here and here.

Now the issue I am having is there are small inconsistencies in the tables and where they are placed within the DOM that I do not know how to account for. I tried to write tests for these modules but there is only so much I can test. Also I am grabbing a lot of this data by filtering through parent classes to grab it so if any of the parent classes are off it ruins everything.

After reviewing the results of my scrapped data I realized that much of the information I was collection was either wrong or null.

I am at my wits end and don’t want to give up on this project but there is no point for me to continue to build this out if I can’t figure out a reliable way to scrape this data.

I am sorry for the long rambling post but was just curious if anyone would be able to help me out, I would honestly pay for your time at this point just so I could learn the proper way to go about doing something like this.

2 Likes

Hi there. I’m also relatively new and I haven’t looked at your code yet as I’m on the phone, but if the HTML is valid(ish?) XML perhaps using an XML library such as SweetXML could be a good approach. You could then use XPath to select the elements required, which is very flexible and can help avoid relying on brittle structure.

1 Like

Thanks David I will check that out!

1 Like

Did somebody help you? Or did you do it yourself?

1 Like