External DSL's

StefanHoutzager · February 14, 2018, 10:20am

I was thinking a bit about External DSL’s (like UML, BPMN, XSLT, regular expressions etc). I agree with the dutch computer scientist Erik Meijer who says “External DSLs on the other hand are like puppies, they all start out cute and happy, but without exception turn into vicious beasts as the grow up”.

http://lambda-the-ultimate.org/node/4560

OvermindDL1 · February 14, 2018, 5:27pm

I take exception to that exception. ^.^

Look at Racket, it makes DSL’s for everything, yet they remain small and work well.

gon782 · February 14, 2018, 5:35pm

Just to clarify, I think the keyword here is external. It’s things like make, etc., that he’s talking about (as far as I can understand):

Erik Meijer: I fully ascribe to Hudak-style embedded DSL, which really just are well-designed APIs. External DSLs on the other hand are like puppies, they all start out cute and happy, but without exception turn into vicious beasts as the grow up (make, XSLT, regular expressions, …).

While Racket DSLs can certainly change a lot, they can (and do) piggyback on everything that is Racket. You can modify the reader and all that, but fundamentally it’ll all have to be reduced to the same Racket that every other language is running.

OvermindDL1 · February 14, 2018, 5:58pm

Racket’s are external, each has to be in it’s own specific file with it’s own #lang ... declaration at the top.

Isn’t that the same of any other DSL built in whatever language they were built in?

gon782 · February 14, 2018, 8:50pm

No, #lang ... is only a shorthand for (module name initial-import decl ...). You can have several of those in one file and also nest them.

OvermindDL1 · February 14, 2018, 8:52pm

Exactly, it sets up the compiler for a set of macro’s, read-macro’s, and more. It can become an entirely different language, even just look at scribble, that is an entirely different domain specific language that is LaTeX’y with embedded code. ^.^

In a constrained scope yep, which is why this is so much more useful.

gon782 · February 14, 2018, 8:55pm

My point was that this:

each has to be in it’s own specific file with it’s own #lang … declaration at the top.

is false.

Now, I’m not 100% sure what the definition of external is supposed to be and I can’t find much clarity in that, but all languages written in Racket can seemlessly interact with eachother. They’re all embedded on the Racket platform, which makes them fundamentally different to, say, make. make runs in its own context without any possibility of interacting with a platform it’s running on and other things made to run on that platform.

Edit:

What I’m trying to get at is that a language written in Racket that completely modifies the syntax will still be generating requireable definitions for modules written in any other language hosted on the Racket VM. There’s nothing external about it, as it’s all internally consistent within the VM.

You don’t load anything from a makefile unless you specifically write a parser for it. With Racket you don’t have to write the parser because the parser bit is already just a modification to the same reader you’re running and it will load modules that are usable for you out of the box.

OvermindDL1 · February 14, 2018, 9:27pm

Yet make interacts with whatever language it was written in with whatever calls were exposed, just like languages built in racket, even if that means shelling out to external programs (if such a call is exposed).

It may expand out to standard racket, but then you could say that make just runs in an interpreter that just runs calls to machine code, thus what defines external, perhaps it is only pipes/IP or so?

make is designed to shell out to things to interact with them, and it does so splendidly, such a DSL in racket could do the same with no other possible interactions as well, it depends on how the DSL works and is designed. You could implement make as a DSL in racket without too much difficulty, and it would work and act identically.

gon782 · February 14, 2018, 9:44pm

Of course you could. The point isn’t whether or not you can make it, but rather what you get out of that process.

You make a language called Obstinate on the Racket VM.
You create a module called tired using Obstinate.
You import the definitions from tired in a module called getting-somewhere defining a language called Going
You use Going to define a module called are-we-there-yet
You import are-we-there-yet in a “normal” Racket module

None of the above need to share a reader, yet transparently all their data and definitions can be shared amongst eachother. They’re all embedded DSLs because they interact seemlessly with anything else.

You have a makefile and you want to pull metadata out of that makefile as well as use the result of it
You can’t pull any of the data from the makefile unless you have/write a parser for it, so you end up just using the bindings that one of the make rules creates

You have a CSS file and you want to pull out something from it, or get data about calculations that the CSS file contains behind the scenes
You can pull out data if you have a parser but there is no data to pull out that is not literally specified in the file

Have you just not used Racket or do you honestly not see any difference between these scenarios? I think there’s a pretty clear difference between these cases.

StefanHoutzager · February 17, 2018, 6:13am

Another one on external dsl’s

There’s a natural tension for any domain-specific language between staying highly focused on
its problem domain and growing to accommodate the needs of users who want to stretch 
the language in new directions.. [..] as the language grows, the rationale for using
a domain specific language rather than a general purpose language becomes more and more diluted.

cdegroot · February 17, 2018, 5:31pm

Yeah, I think that this discussion should’ve started there ;-). By Fowler, as long as you don’t go beyond macros you’re certainly internal in Elixir; if it runs in a separate process with a separate parser/interpreter (his reference to Unix DSLs: sed, awk, make, …) it’s certainly external. Now what about something I cobble up with (if they existed) LFE reader macros which compiled down to BEAM bytecode? I dunno.

Regardless of semantics, I do tend to think that if (e)DSLs are perceived to be dangerous, we must be doing something wrong - writing a language that’s problem specific and then solving your problem in that language sounds like an excellent idea. Insert obligatory reference to the research that Kay and friends are/have been doing.

gon782 · February 17, 2018, 5:57pm

As you alluded to, different Racket languages have different readers, so they’re running the exact same compiler and can be used together with each other in the same compile, as dependencies to each other. It is the same thing as having an inline eDSL, except it’s determined how to parse the code on a module level, so it’s as if we had something like @compile :lang_name at the top of an Elixir module and that told the compiler to check for rules associated with the language specified.

You’re running the same compiler, everything can be done in the same pass as if this was a normal Elixir module and the code it generates behaves the same way on the BEAM. It’s as internal as it gets; it’s just way more powerful than just having macros.

cdegroot · February 17, 2018, 6:08pm

Well… the gist of an eDSL, to me, is not to which bytecode it compiles or how easy the toolchain can be invoked. It’s the fact that you now are maintaining a lexer/parser and probably a run-time library or at least bindings to a standard library (like sed, awk, … have bindings to various C run-time library functions). The invocation of a Racket language is very simple, the implementation, i’m sure, of say Python-in-Racket less so. I think that this is why people warn against eDSLs, and from that point-of-view one might argue that the DSL is external rather than internal.

gon782 · February 17, 2018, 9:05pm

In the vast majority of cases, no. Even for the more interesting readers you have tooling that does most of the work for you and you’re not really maintaining much. In even more cases you’ll simply choose a reader (S-expressions, @-expressions, etc.) and you simply supply the actual functions for that language, which is exactly the thing you’d be doing with only macros. Scribble is a good example of this. It’s a complete language for writing documentation and it uses the @-reader. Anyone can make a language using the same reader just by specifying that it uses that one.

You’re running on the Racket VM with that run-time, there is nothing to maintain except your internal functions and macros, which is a matter of importing the right things from the Racket libraries (and excluding + replacing the things that differ).

Implementing Python in Racket isn’t necessarily a walk in the park for sure, but we weren’t even originally talking about that. Scribble was brought up as an example and while I’m sure making Scribble was not an easy task originally, the tooling that exists today in Racket allows you to implement something like that with less push-back than using external tooling for writing documentation, provided you want the same functionality (seemless linking to library identifiers, etc.), for sure.

No matter what one might argue about classification, I think it’s a matter of simply making a language that runs on the Racket VM and realizing that the amount of actual maintenance and hardship that’s been shed by that is significant enough where it can’t be likened to implementing it standalone and the utility is far lower in terms of how you can use it in unison with other code you have (on the Racket VM).

cdegroot · February 17, 2018, 9:44pm

I just looked at Scribble’s source code, it seems to be a pretty beefy package. And whether the tooling is good or not, is of course not the discussion - you will end up having to maintain the reader/lexer/parser (however you call it - something that is built to parse sexps is not magically going to parse something that’s in a completely different format). I’m not making calls on how much work it is, I’m just saying it is extra ballast you need to carry around when you implement a new language.

I think the thread was generally about eDSLs vs iDSLs and started with a quote on that eDSLs were seen as generally bad. And went on with some confusion about the difference

gon782 · February 17, 2018, 10:49pm

Even if you do modify the read-table, it’s a very straight forward procedure and the majority of the work is being done on the Racket end regardless. It’s an issue of simply implementing a few functions and telling Racket that it should use those instead of its standard ones. For the majority of syntax changes you could want the reader code you will have to write is minimal, if you even have to write any at all. The at-reader, as I said before, is already included with the language, so for Scribble’s syntax you’d have to write exactly zero lines of code to modify the reader. So no, it’s not some given that if you’re implementing a language in Racket that you’d have all this extra ballast.

A similar assumption would be to say that because you can create a query language with very clear rules about precedence of keywords and whatnot in Elixir, all macro writing means that you will have to do this. Just because it’s possible that you might, doesn’t mean that you always will.

Being that the entire point of DSLs is that you want less resistance, it’s likely that you wouldn’t want to do extra work for nothing, and likely you’d use a default reader and implement your actual language in terms of functionality. Some might argue that’s not a new language and I think that’s the entire point of this: The line between a new language and simply a DSL is blurred here, so much so that these classifications are largely useless when pointed at Racket.

Tons of languages are simply about replacing keywords, changing the behavior of one of two basic forms and then providing functions and macros to solve a task better, i.e. exactly what an eDSL is. The only difference here is that Racket also extends that behavior to languages with entirely different read-tables, because it’s pluggable. Some of those read-tables (ones that change the syntax entirely) are included in the language distribution, some are not. You don’t have to use the ones that change things.

It very clearly veered into the direction of debating what Racket VM languages were. By the way, eDSL means embedded DSL, not external.

In the end, I’m not at all concerned about hypotheticals. The reality is that building a language in Racket is entirely different than what the classic process would be. Most people have no idea, but much like engine programmers who’ve never touched anything but C and C++ can’t believe Erlang allows you to handle multi-threading much easier than in those languages, people will assume the same about Racket and creating languages.

ShalokShalom · February 18, 2018, 1:48am

Which tools you mean here in specific?

Thanks a lot Gonz for the very comprehensive and imho competent clearification.

OvermindDL1 · February 19, 2018, 5:56pm

Uh…

(This are all entirely valid and proper C/C++)

Using Boost.Fiber (C++):

typedef message = variant<int, float, string, whatever_else>;

// Erlang'ism names!
typedef process = fiber;
typedef pid = buffered_channel<message>;

void pinger(pid &self, string name, pid ping_to) {
  message m;
  while(fibers::channel_op_status::success == self.pop(m)) {
    stdout << name << ": " << m.get<string>() << endl;
    ping_to.push(m);
  }
}

int main(void) {
  pid joe{10};
  pid sam{10};
  process p_joe(bind(pinger, ref(joe), "joe", ref(sam));
  process p_joe(bind(pinger, ref(sam), "sam", ref(joe));
  // Kick off the infinite ping loop
  joe.push("ping!");
  p_joe.join();
  p_sam.join();
  return 0; // Never reaches here as the pingers ping forever
}

Or perhaps something C-like (this has channels like the above too, just lazy since I already showed it above)?

coroutine void worker(const char *text) {
    while(1) {
        printf("%s\n", text);
        msleep(now() + random() % 500);
    }
}

int main() {
    go(worker("Hello!"));
    go(worker("World!"));
    msleep(now() + 5000);
    return 0;
}

And there are tons and tons and tons of other formats. C/C++ have some excellent multi-processing libraries, just because the std::thread is super low level doesn’t mean everything is.

EDIT: Wow these forums fail on a lot of syntax parsing/coloring… >.>
Need more addons!