One thing I’ve been missing from python ecosystem was an ability to connect to a remote Apache Spark cluster from a locally running notebook and run queries and data transformations on the cluster.
I’d like to present a library I created: a native elixir client of Spark Connect protocol with Livebook integration via optional support for explorer and kino libraries.
Current status:
- Most of the protocol exposed and working (except python specific APIs)
- DataFrame and Column API including queries, joins, transformations, aggregations, reading and writing, jar upload
- Catalog API
- Structured Streaming
- Telemetry
- Tested with Spark 3.5, 4.0 and 4.1
Check out a demo notebook spark_ex/notebooks/spark_ex_demo.livemd at main · lukaszsamson/spark_ex · GitHub


























