Where do I put driver configuration for hound in a (non Phoenix) Elixir project for web scraping?

Hi Everyone!

I want to do some web scraping, for which I wanna use the hound package as it have everything required for web scraping.
I know how to use it in a Phoenix project for testing, which I learnt from Phoenix inside out, thanks to @shankardevy. In a Phoenix project we had to put config :hound, driver: "phantomjs" in config/config.exs, if we were using phantomjs.

But here the situation is a bit different.

  1. Its not a Phoenix project.
  2. I want to use it in a file inside lib/ not test/, as this time I’m using it for web-scraping instead of testing.

Thank you!

It should just work…

Make sure its not runtime: false and not only: :test in your deps.

2 Likes

Yes, the runtime: false and only: :test isn’t there.

Following is my code:

defmodule Scrap do
  use Hound.Helpers


  def start do
    IO.puts "starting"
    Hound.start_session()
    navigate_to("https://somewebsite.com/sign-in")
    find_element(:name, "email") |> fill_field("someone@something.com")
    find_element(:name, "password") |> fill_field("somepassword")
    find_element(:type, "submit") |> click()
    IO.puts "Logged in!"
  end
end

And this is the error

iex(1)> Scrap.start
starting

14:21:57.299 [error] GenServer Hound.SessionServer terminating
** (RuntimeError) could not create a new session: econnrefused, check webdriver is running
    (hound) lib/hound/session_server.ex:101: Hound.SessionServer.create_session/2
    (hound) lib/hound/session_server.ex:78: Hound.SessionServer.handle_call/3
    (stdlib) gen_server.erl:661: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:690: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message (from #PID<0.208.0>): {:change_session, #PID<0.208.0>, :default, []}
State: %{}
Client #PID<0.208.0> is alive

    (stdlib) gen.erl:167: :gen.do_call/4
    (elixir) lib/gen_server.ex:1006: GenServer.call/3
    (scrap) lib/scrap.ex:7: Scrap.start/0
    (stdlib) erl_eval.erl:680: :erl_eval.do_apply/6
    (elixir) src/elixir.erl:275: :elixir.eval_forms/4
    (iex) lib/iex/evaluator.ex:257: IEx.Evaluator.handle_eval/5
    (iex) lib/iex/evaluator.ex:237: IEx.Evaluator.do_eval/3
    (iex) lib/iex/evaluator.ex:215: IEx.Evaluator.eval/3
** (exit) exited in: GenServer.call(Hound.SessionServer, {:change_session, #PID<0.208.0>, :default, []}, 60000)
    ** (EXIT) an exception was raised:
        ** (RuntimeError) could not create a new session: econnrefused, check webdriver is running
            (hound) lib/hound/session_server.ex:101: Hound.SessionServer.create_session/2
            (hound) lib/hound/session_server.ex:78: Hound.SessionServer.handle_call/3
            (stdlib) gen_server.erl:661: :gen_server.try_handle_call/4
            (stdlib) gen_server.erl:690: :gen_server.handle_msg/6
            (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
    (elixir) lib/gen_server.ex:1009: GenServer.call/3
    (scrap) lib/scrap.ex:7: Scrap.start/0
iex(1)>

Looks like :hound hasn’t been started.

How do you start iex and how does your application/1 in mix.exs look like?

1 Like

You don’t have a webdriver running for example phantomjs --wd

This error here is what i posted above in case you want a ref

** (RuntimeError) could not create a new session: econnrefused, check webdriver is running
1 Like

I was under the assumption that hound would handle the lifecycle

1 Like

No it doesn’t on it’s own it needs a web driver selenium or phantomjs like help

From the docs 
When you run `mix test` , Hound is automatically started. **You'll need a webdriver server** 
running, like Selenium Server or Chrome Driver. If you aren't sure what it is, then 
[read this](https://github.com/HashNuke/hound/wiki/Starting-a-webdriver-server).
1 Like

I’m running the webdriver, but may be it’s not finding it, or may be I need to tell it explicitly like we do in the Phoenix apps.

➜  bin ./phantomjs --wd
[INFO  - 2020-01-11T12:41:45.744Z] GhostDriver - Main - running on port 8910

I have only used it for testing. Someone else tried using it for receiving content from a page but worked to a certain point you can see our discussion here:

1 Like

Again, please tell us how you start the iex session from where you try to play with hound.

Also from that same iex session, please do Application.get_env(:hound, :driver).

If it is returning something else than "phantomjs", your configuration is not set up properly.

1 Like

It says nil

iex(1)> Application.get_env(:hound, :driver)
nil

Looks like I need to add the config somewhere.

Do you want to use only hound? or are you open to other solutions for scraping?
If you are not please show config.exs

Do you have this in config exs
config :hound, driver: "phantomjs"
?

1 Like

It’s an Elixir Project (not a Phoenix one), and it have the following file tree (excluding _build and deps directories), and it doesn’t have a config.exs file.

➜  scrap tree .
.
├── lib
│   └── scrap.ex
├── mix.exs
├── mix.lock
├── README.md
└── test
    ├── scrap_test.exs
    └── test_helper.exs

Of course :hound will fallback then to the defaults.

If you do not want to use the defaults, you need to configure them.

Though I’d check if there is some API for hound that allows for runtime configuration of single scrapers rather than a global boot time configuration…

Phoenix is just a library. It doesn’t make your projects different in any way, they are just plain OTP applications.

3 Likes

I’ll read through the docs and see if there is some configuration like that, either adding to some file or running in the iex.
I’ll come back later and inform about the success or failure.

Thank you @NobbZ and @wolfiton!

1 Like

Seems like you can use Hound.Helpers.Session.start_session/1 to start a runtime configured session.

PS: I really don’t like hounds API, as it has a lot of hidden magic, I prefer wallabys explicit passing around of the responsible client/connection.

3 Likes

Just my experience, I use hound for testing in the same way you do (it’s a django app that I’m testing), with chromedriver. I have written Chromedriver.ensure_started and Chromedriver.ensure_dead functions which use System.cmd/2 to make sure chromedriver is running before running the test suite.

Also I prefer wallabys syntax but find wallaby fails sometimes on our django/angilar stack (it has some poorly written parts and even 750ms latencies which don’t play well I think with wallaby’s “wait for all JavaScript to finish”. I’m also not the best at frontend so there’s maybe just something I’m missing.

The problem is that he hasn’t configured his hound with phantomjs or any other web driver

Try this https://github.com/jaydorsey/elixir_scraper it is also an elixir project using hound.

I don’t know if it’s from the start or it changed in the recent versions (I predominantly created Phoenix apps, not bare Elixir apps), but a mix new something won’t add the config.exs by default.

Add a config.exs under a config\ folder with the following code. If you’re using something other than phantomjs edit the code accordingly.

use Mix.Config

config :hound, driver: "phantomjs"

Also let us know if it worked or not.

1 Like

Now the output is:

iex(1)> Scrap.start
starting
** (Hound.NoSuchElementError) No element found for name 'email'
    (hound) lib/hound/helpers/page.ex:51: Hound.Helpers.Page.find_element/3
    (scrap) lib/scrap.ex:9: Scrap.start/0
iex(1)>

Which shows that the webdriver error is gone.

Thank you!