I want to do some web scraping, for which I wanna use the hound package as it have everything required for web scraping.
I know how to use it in a Phoenix project for testing, which I learnt from Phoenix inside out, thanks to @shankardevy. In a Phoenix project we had to put config :hound, driver: "phantomjs" in config/config.exs, if we were using phantomjs.
But here the situation is a bit different.
Its not a Phoenix project.
I want to use it in a file inside lib/ not test/, as this time I’m using it for web-scraping instead of testing.
No it doesn’t on it’s own it needs a web driver selenium or phantomjs like help
From the docs
When you run `mix test` , Hound is automatically started. **You'll need a webdriver server**
running, like Selenium Server or Chrome Driver. If you aren't sure what it is, then
[read this](https://github.com/HashNuke/hound/wiki/Starting-a-webdriver-server).
I have only used it for testing. Someone else tried using it for receiving content from a page but worked to a certain point you can see our discussion here:
It’s an Elixir Project (not a Phoenix one), and it have the following file tree (excluding _build and deps directories), and it doesn’t have a config.exs file.
Of course :hound will fallback then to the defaults.
If you do not want to use the defaults, you need to configure them.
Though I’d check if there is some API for hound that allows for runtime configuration of single scrapers rather than a global boot time configuration…
Phoenix is just a library. It doesn’t make your projects different in any way, they are just plain OTP applications.
I’ll read through the docs and see if there is some configuration like that, either adding to some file or running in the iex.
I’ll come back later and inform about the success or failure.
Just my experience, I use hound for testing in the same way you do (it’s a django app that I’m testing), with chromedriver. I have written Chromedriver.ensure_started and Chromedriver.ensure_dead functions which use System.cmd/2 to make sure chromedriver is running before running the test suite.
Also I prefer wallabys syntax but find wallaby fails sometimes on our django/angilar stack (it has some poorly written parts and even 750ms latencies which don’t play well I think with wallaby’s “wait for all JavaScript to finish”. I’m also not the best at frontend so there’s maybe just something I’m missing.
I don’t know if it’s from the start or it changed in the recent versions (I predominantly created Phoenix apps, not bare Elixir apps), but a mix new something won’t add the config.exs by default.
Add a config.exs under a config\ folder with the following code. If you’re using something other than phantomjs edit the code accordingly.
iex(1)> Scrap.start
starting
** (Hound.NoSuchElementError) No element found for name 'email'
(hound) lib/hound/helpers/page.ex:51: Hound.Helpers.Page.find_element/3
(scrap) lib/scrap.ex:9: Scrap.start/0
iex(1)>