ModestEx - Pipeable transformations on html strings (with CSS selectors)

Hello!

I just published my first draft of ModestEx, a Elixir/Erlang binding to lexborisov’s Modest library.

Modest is a fast HTML renderer implemented as a pure C99 library with no outside dependencies.

ModestEx exposes features to do pipeable transformations on html strings with CSS selectors, e.g. find(), prepend(), append(), replace() etc.

iex> ModestEx.find("<p><a>Hello</a> World</p>", "p a")
{:ok, "<a>Hello</a>"}

iex> ModestEx.serialize("<div>Hello<span>World")
{:ok, "<html><head></head><body><div>Hello<span>World</span></div></body></html>"}

The binding is implemented as a C-Node following the excellent example in @Overbryd 's package nodex. If you want to learn how to set up bindings to C/C++, you should definitely check it out.

Before that I experimented with a lot of other Html parser libraries like gumbo-parser, gumbo-query and GQ. I even implemented a binding package called gumbo_query_ex.
However Modest is currently the most active and most promising.

This project is under development!
Stay tuned for more features like ModestEx.remove, ModestEx.prepend, ModestEx.append

Tell me what you think :relaxed:

Best, F34nk

8 Likes

Ah it looks like a C-based variant of Meeseeks except it can also create HTML too, quite nice! :slight_smile:

2 Likes

Modest is the full project of which Alexander Borisov’s (lexborisov) myhtml, which also has an Elixir wrapper, is a part. It’s closer to being a C-varient of… maybe Servo? It’s larger in scope than just parsing HTML.

Everything I’ve seen suggests that Modest (and probably ModestEx) will be fast, with low resource usage. That’s pretty great.

2 Likes

Thanks guys!

I have a use case where a request reads html from a database and before rendering I need to do some changes on the html string. In my case the html is quite large and the changes are quite extensive.

The idea for ModestEx is to implement a set of features that just do transformations on a html string. Each transformation feature will be done in C.

Something like:

result ModestEx.find("<p><a>Hello</a> World</p>", "p a")
|> ModestEx.attribute("href", "https://elixir-lang.org")

will return:

{:ok,  "<a href=\"https://elixir-lang.org\">Hello</a>"}

… ready to render in a template.

Or you could also serialize it:

ModestEx.serialize(result)

and return:

{:ok,  "<html><head></head><body><a href=\"https://elixir-lang.org\">Hello</a></body></html>"}

Which is already a (more or less) valid page!

Of course, if you need further decoding htmlex, floki or Meeseeks are great!
I see ModestEx as a useful addition to the landscape of html tools in Elixir.

3 Likes

For sure, I have been very hesitant to try adding transformations to Meeseeks, so I’m glad somebody’s doing it. :slight_smile:

2 Likes

Oooo, that’s fascinating!

Yeah this looks very useful! :slight_smile:

Heh, yeah it’s quite a thing to tackle. ^.^

1 Like

Hey @mischov @OvermindDL1

I just published ModestEx v0.0.2-dev.

Thanks again for your input. It’s a lot clearer now what the main strength of the library actually is!

I added a new feature ModestEx.get_attribute and ModestEx.set_attribute.

And you can actually pipe them together.

iex> ModestEx.find("<p><a>Hello</a><a>World</a></p>", "p a") |> 
...> ModestEx.set_attribute("href", ["https://elixir-lang.org", "https://google.de"])
["<html><head></head><body><a href=\"https://elixir-lang.org\">Hello</a></body></html>", "<html><head></head><body><a href=\"https://google.de\">World</a></body></html>"]
3 Likes

Wohoo! This is great news :slight_smile:

Will we get Elixir based end 2 end headless browser testing soon? :smiley:

Thanks for the reference and I am delighted to see a binding to Modest.

I have a few use cases where I will get back to it for sure.

1 Like

Hey guys,

the first mayor release is coming soon and I hope to publish it before ElixirConf EU in April.

I also implemented a new CSS selector for :contains(text) in Modest PR#42.

I’ll keep you updated!

1 Like

Release v1.0.0

This release is stable.

def deps do
  [
    {:modest_ex, "~> 1.0.0"}
  ]
end

Total 16 features implemented. See complete feature list.

Total 38 selector patterns implemented (including custom selector :contains(text)).
See complete list of supported CSS selectors.

The package includes all binding code under the folder target/modest_worker.
All Modest related features are implemented in a single C library called modest_html.

This way, all features are tested in a C environment using CMake/CTest with memory tracking enabled using a library called dmt.

Please feel invited to check it out :relaxed:

Best, F34nk

4 Likes

Greetings @f34nk , I have been trying to get modest_ex to work with Erlang 24.x. I know my C/C++ is rusty, but I’ve spent a good half-day on this without any progress. I am currently stuck looking for vec.h, which doesn’t seem to exist anywhere on my Mac or Linux boxes. Any pointers here? Thank you!

Hey,

vec.h is part of another git repo, which is included in this repo as a git submodule.

Did you init the git submodules, after cloning the repo?

As described here: GitHub - f34nk/modest_ex: Elixir library to do pipeable transformations on html strings (with CSS selectors)

I hope this helps.

Oh, I see.
The submodules are pinned to the latest Modest master.
And, since I stopped maintaining modest_ex years ago, this is not working anymore.

:frowning:

However, I actually forked Modest back then, and based development on this fork.

First, I would recommend to change the Modest submodule to point to the fork:

Please make this change to .gitmodules.

diff --git a/.gitmodules b/.gitmodules
index ed877d0..189ffb9 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -1,6 +1,6 @@
 [submodule "libs/Modest"]
        path = libs/Modest
-       url = https://github.com/lexborisov/Modest.git
+       url = https://github.com/f34nk/Modest.git
 [submodule "libs/vec"]
        path = libs/vec
        url = https://github.com/rxi/vec.git

Then

cd libs/Modest
git checkout 87d75c95854d7d97a12ba8d78a4c7f553a4eb945

This is the last commit, where I introduced my changes:

But if you run ./configure I still get errors.

source/mycss/selectors/function_parser.c:469:57: error: cast to smaller integer type 'mycss_selectors_function_drop_type_t' (aka 'enum mycss_selectors_function_drop_type') from 'void *' [-Werror,-Wvoid-pointer-to-enum-cast]
        mycss_selectors_function_drop_type_t drop_val = mycss_selector_value_drop(selector->value);

To solve this, you need to change libs/Modest/Makefile.cfg and add -Wno-int-to-void-pointer-cast -Wno-void-pointer-to-enum-cast.

diff --git a/Makefile.cfg b/Makefile.cfg
index 92d425d..30c2ab1 100644
--- a/Makefile.cfg
+++ b/Makefile.cfg
@@ -12,7 +12,7 @@ PROJECT_VERSION_STRING := $(PROJECT_VERSION_MAJOR).$(PROJECT_VERSION_MINOR).$(PR
 #********************
 # Flags
 #***************
-MODEST_CFLAGS ?= -Wall -Werror -pipe -pedantic
+MODEST_CFLAGS ?= -Wall -Werror -pipe -pedantic -Wno-int-to-void-pointer-cast -Wno-void-pointer-to-enum-cast
 MODEST_LFLAGS ?=

After that, run ./clean and then ./configure - should compile Modest without errors.

1 Like

Wow, thank you so much for the detailed response! Your directions did enable a clean compile right up until near the end:

/Users/coldham/src/elixir/modest_ex/target/modest_worker/utils/eterm_vec.h:22:10: fatal error: 'erl_interface.h' file not found
#include "erl_interface.h"

This appears to be related to recent deprecations and removal of the older erl_interface components. I might be able to make some more progress here. Simply removing #include "erl_interface.h" changes the errors to

error: unknown type name 'ETERM'
typedef vec_t(ETERM*) vec_eterm_t;

as well as lots of missing function definitions like erl_iolist_to_string, erl_free, erl_print_term and so forth.

I’m having trouble finding documentation or examples for migrating from the older erl_interface APIs to the newest ei APIs. I know erl_interface was deprecated a long time ago and only recently removed.

(Uff, this project is really outdated and obviously, I did not pin any versions.)

I think I used erlang 19.
But looking at the travis file, it might also work with others.

Then you need to export ERLANG_PATH variable.
As used here:

Oh look.

It’s in the README under target-dependencies

So would you recommend focusing on

with

as the backend instead? I want to parse an HTML document, fixup some links internally, and write it back out.