Announcing Crawly - A high-level web crawling & scraping framework for Elixir

What JSON output does Crawly support? With some sites, I like Scrapy’s default style of a JSON object per page. But with other sites, I want to create a single JSON tree representing the site. This is possible with Scrapy with a few tricks. How hard would this be to do with Crawly?

Hey @dogweather,

We support JL and CSV formats of outputs. Having a line per object.

At this moment we don’t have support for a possibility to create a root object with all items inside. As far as I see it might be complex for large crawls.

Also just in case, check our experimental UI. It still has a quite basic styling, but we’re migrating some parts to LiveView, so it will get better soon!

1 Like

Sounds very nice. I replied to your other comment, but here’s a shortened example of a scrape where I get just one object. The real result has info for 20,000 or so web pages:

{
  "date_accessed": "2019-03-21",
  "chapters": [
    {
      "kind": "Chapter",
      "db_id": "36",
      "number": "101",
      "name": "Oregon Health Authority, Public Employees' Benefit Board",
      "url": "https://secure.sos.state.or.us/oard/displayChapterRules.action?selectedChapter=36",
      "divisions": [
        {
          "kind": "Division",
          "db_id": "1",
          "number": "1",
          "name": "Procedural Rules",
          "url": "https://secure.sos.state.or.us/oard/displayDivisionRules.action?selectedDivision=1",
          "rules": [
            {
              "kind": "Rule",
              "number": "101-001-0000",
              "name": "Notice of Proposed Rule Changes",
              "url": "https://secure.sos.state.or.us/oard/view.action?ruleNumber=101-001-0000",
              "authority": [
                "ORS 243.061 - 243.302"
              ],
              "implements": [
                "ORS 183.310 - 183.550",
                "192.660",
                "243.061 - 243.302",
                "292.05"
              ],
              "history": "PEBB 2-2009, f. 7-29-09, cert. ef. 8-1-09<br>PEBB 1-2009(Temp), f. &amp; cert. ef. 2-24-09 thru 8-22-09<br>PEBB 1-2004, f. &amp; cert. ef. 7-2-04<br>PEBB 1-1999, f. 12-8-99, cert. ef. 1-1-00",
              }
            ]
          }
        ]
      }
    ]
  }

As a part of the project development, I have decided to create a short cookbook of scraping recipes, if you’re using Crawly or Scraping in general, these articles might be useful for you (I am including medium friend links, so everyone can read them):

8 Likes
3 Likes

20 claps for this article on Medium from me :wink:

Thanks for your excellent work :slight_smile:

1 Like
4 Likes

Would you have a link to working crawly demo code? I tried several, including the one in the README, but I don’t get any output from it. It also has a bug or two that I fixed to get it to compile. I’m ready to try crawly out, but haven’t seen it work yet.

Sorry to say it, but I don’t have time (mostly due to an ongoing war in my country :() to work on Crawly.

1 Like