StructuredIO, an Elixir API for consuming structured input streams

StructuredIO is a Hex package. What problem does it solve? In a nutshell, it simplifies working with structured input (such as markup or binary data) that streams from another process or computer. In such situations your application needs to tolerate input arriving piecemeal rather than in complete, well-formed data elements. Input fragments may even split multibyte characters and thereby cause encoding errors.

How do you tame this incidental complexity so that your application can just focus on using data elements instead of the error-prone tedium of reassembling them from fragments? StructuredIO can help.

There are two main features of this library.

  1. It provides a stateful process with a writer function for writing binary data. (If it sounds to you like IO that is part of the Elixir standard library, you’re right.)
  2. It provides a variety of reader functions for conditional reading according to a specified data structure. If a complete data structure has not (yet) been written to the process, nothing is read.

The combination of these two features makes it easy for your application to allow data to trickle into a StructuredIO process, and to consume data elements only as each arrives in its entirety.

You’ll find detailed examples in the project readme as well as in the project API reference.

12 Likes

Huh, that’s a cool design, kind of like gen_tcp with a parsing step it sounds like (and no tcp). :slight_smile:

1 Like

Thanks! Yes, I’m now working on adding parsers that use the building blocks of the existing API. So there will be one-liner streaming access to well-known wire formats such as JSON, XML, CSV, BER-TLV, DER, and ASN.1.

3 Likes

Whooo cool. :slight_smile:

1 Like

It would be nice if this could be used as a Framing component within a Flow. But the more I look at Flow the less easy this appears.

Does Flow not have something that allows custom GenStages (or sub-Flows) to be composited? I hope I’m just missing something.

That is an intriguing question that I need to investigate.

@CptnKirk, does GenStage presume that its events are publishing atomic data elements rather than possibly fragmentary ones? Can a stage aggregate multiple events from its producer into a single event for its consumer? This is the main problem that StructuredIO solves: declarative accumulation of data element fragments until a full one is available for publication.

Yes. A GenStage can accumulate data and emit downstream a new format.

This would be a valuable addition to GenStage pipelines.

My thinking was that a Flow would be the high-level API for constructing GenStage networks. But that doesn’t actually appear to be the case.

Just wanted to give my +1 to the idea of the library. Sounds well thought out and useful and I’ll keep it in mind for the future :slight_smile:

1 Like

I’ll look into the Flow connection further. I need to wrap my head around it because it still seems far afield of the reason I started StructuredIO; I needed a TCP buffer with a set of functions for conditionally reading buffered content (stdlib’s IO is unconditional).

But more generically aren’t you interested in turning a stream of TCP bytes into a stream of structured content? Structure packing is at the core of nearly every streaming protocol.

Unfortunately, I don’t see Flow as providing a convenient API for this. Although, this would still have value as a GenStage. Or at least it would be nice if there was a way for this to assume GenStage behavior, even if it is available outside of GenStage by default.

2 Likes