Explorer - Series (1D) and dataframes (2D) for fast and elegant data exploration in Elixir

The Explorer library has been out for some time. But we just released the latest version (see below), and we thought we’d start posting updates to the forum.

Wait, Explorer is new to me. What are Series and DataFrames?

Explorer is a DataFrame library for Elixir.

DataFrame libraries are common in languages which have a focus on data manipulation, including:

If you’d like a more in-depth tutorial, there’s an excellent LiveBook called Ten Minutes to Explorer that you can play with:

But we’ll provide a quick overview here.

Briefly, you can think of a DataFrame like an in-memory table. Its purpose is to facilitate common data exploration and analysis tasks. As such, it’s a column-oriented table.

Column-oriented tables

If you’re unfamiliar with column-oriented tables, suppose you have a table of pet data like this:

type age color
cat 5 black
dog 2 brown
dog 3 brindle

A row-oriented organization of that data might look like this in Elixir:

rows = [
  [type: "cat", age: 5, color: "black"],
  [type: "dog", age: 2, color: "brown"],
  [type: "dog", age: 3, color: "brindle"],
]

It matches the original table fairly one-to-one. But the column-oriented version might instead look like:

columns = [
  type: ["cat", "dog", "dog"],
  age: [5, 2, 3],
  color: ["black", "brown", "brindle"]
]

It has same information, but “transposed”.

Column-orientation is beneficial if you’re asking questions that require a lot of number-crunching like “What’s the average age of all pets?”. In the row-oriented version, finding the average age would require first looking through the entire contents of the table to collect the relevant data. But in the column-oriented version, those values have already been co-located in memory.

Series and DataFrames: columns and tables

In dataframe parlance, a “series” is a single column and a “dataframe” is a collection of named series, aka a table.

Our example above would look like this:

type = Explorer.Series.from_list(["cat", "dog", "dog"])
age = Explorer.Series.from_list([5, 2, 3])
color = Explorer.Series.from_list(["black", "brown", "brindle"])

df = Explorer.DataFrame.new(type: type, age: age, color: color)
# #Explorer.DataFrame<
#   Polars[3 x 3]
#   type string ["cat", "dog", "dog"]
#   age s64 [5, 2, 3]
#   color string ["black", "brown", "brindle"]
# >

Some things to note:

  • Each series has a corresponding data type or “dtype”, e.g. type has the dtype string.
  • The word “Polars” appears. That indicates that this dataframe is using the backend powered by the fantastic Polars library (the default backend).

And if we really did want to know the average age of the pets, that would look like this:

Explorer.Series.mean(df["age"])
# 3.3333333333333335

Features and design

Preiminaries out of the way, here are Explorer’s high-level features:

  • Simply typed series: :binary, :boolean, :category, :date, :datetime, :duration, floats of 32 and 64 bits ({:f, size}), integers of 8, 16, 32 and 64 bits ({:s, size}, {:u, size}), :null, :string, :time, :list, and :struct.

  • A powerful but constrained and opinionated API, so you spend less time looking for the right function and more time doing data manipulation.

  • Support for CSV, Parquet, NDJSON, and Arrow IPC formats

  • Integration with external databases via ADBC and direct connection to file storages such as S3

  • Pluggable backends, providing a uniform API whether you’re working in-memory or (forthcoming) on remote databases or even Spark dataframes.

  • The first (and default) backend is based on NIF bindings to the blazing-fast polars library.

The API is heavily influenced by Tidy Data and borrows much of its design from dplyr.
The philosophy is heavily influenced by this passage from dplyr’s documentation:

  • By constraining your options, it helps you think about your data manipulation challenges.

  • It provides simple “verbs”, functions that correspond to the most common data manipulation tasks, to help you translate your thoughts into code.

  • It uses efficient backends, so you spend less time waiting for the computer.

The aim here isn’t to have the fastest dataframe library around (though it certainly helps that we’re building on Polars, one of the fastest).
Instead, we’re aiming to bridge the best of many worlds:

  • the elegance of dplyr
  • the speed of polars
  • the joy of Elixir

That means you can expect the guiding principles to be ‘Elixir-ish’. For example, you won’t see the underlying data mutated, even if that’s the most efficient implementation. Explorer functions will always return a new dataframe or series.

Links:

Acknowledgements

Explorer is an extensive library and there’s much more we could say. But for now, we’d just like to thank the dozens of contributors who’ve added wonderful improvements over the years. :heart:

20 Likes

Explorer - version 0.8

Explorer has released version 0.8!

Added

  • Add explode/2 to Explorer.DataFrame. This function is useful to expand the contents of a {:list, inner_dtype} series into a “inner_dtype” series.

  • Add the new series functions all?/1 and any?/1, to work with boolean series.

  • Add support for the “struct” dtype. This new dtype represents the struct dtype from Polars/Arrow.

  • Add map/2 and map_with/2 to the Explorer.Series module.
    This change enables the usage of the Explore.Query features in a series.

  • Add sort_by/2 and sort_with/2 to the Explorer.Series module.
    This change enables the usage of the lazy computations and the Explorer.Query module.

  • Add unnest/2 to Explorer.DataFrame. It works by taking the fields of a “struct” - the new dtype - and transform them into columns.

  • Add pairwise correlation - Explorer.DataFrame.correlation/2 - to calculate the correlation between numeric columns inside a data frame.

  • Add pairwise covariance - Explorer.DataFrame.covariance/2 - to calculate the covariance between numeric columns inside a data frame.

  • Add support for more integer dtypes. This change introduces new signed and unsigned integer dtypes:

    • {:s, 8}, {:s, 16}, {:s, 32}
    • {:u, 8}, {:u, 16}, {:u, 32}, {:u, 64}.

    The existing :integer dtype is now represented as {:s, 64}, and it’s still the default dtype for integers. But series and data frames can now work with the new dtypes. Short names for these new dtypes can be used in functions like Explorer.Series.from_list/2. For example, {:u, 32} can be represented with the atom :u32.

    This may bring more interoperability with Nx, and with Arrow related things, like ADBC and Parquet.

  • Add ewm_standard_deviation/2 and ewm_variance/2 to Explorer.Series.
    They calculate the “exponentially weighted moving” variance and standard deviation.

  • Add support for :skip_rows_after_header option for the CSV reader functions.

  • Support {:list, numeric_dtype} for Explorer.Series.frequencies/1.

  • Support pins in cond, inside the context of Explorer.Query.

  • Introduce the :null dtype. This is a special dtype from Polars and Apache Arrow to represent “all null” series.

  • Add Explorer.DataFrame.transpose/2 to transpose a data frame.

Changed

  • Rename the functions related to sorting/arranging of the Explorer.DataFrame.
    Now arrange_with is named sort_with, and arrange is sort_by.

    The sort_by/3 is a macro and it is going to work using the Explorer.Query module. On the other side, the sort_with/2 uses a callback function.

  • Remove unnecessary casts to {:s, 64} now that we support more integer dtypes.
    It affects some functions, like the following in the Explorer.Series module:

    • argsort
    • count
    • rank
    • day_of_week, day_of_year, week_of_year, month, year, hour, minute, second
    • abs
    • clip
    • lengths
    • slice
    • n_distinct
    • frequencies

    And also some functions from the Explorer.DataFrame module:

    • mutate - mostly because of series changes
    • summarise - mostly because of series changes
    • slice

Fixed

  • Fix inspection of series and data frames between nodes.

  • Fix cast of :string series to {:datetime, any()}

  • Fix mismatched types in Explorer.Series.pow/2, making it more consistent.

  • Normalize sorting options.

  • Fix functions with dtype mismatching the result from Polars.
    This fix is affecting the following functions:

    • quantile/2 in the context of a lazy series
    • mode/1 inside a summarisation
    • strftime/2 in the context of a lazy series
    • mutate_with/2 when creating a column from a NaiveDateTime or Explorer.Duration.

Contributors

Thank you to everyone who opend up a PR:

And thank you to the first-time contributors!:

Changelogs

Full Changelog: Comparing v0.7.2...v0.8.0 · elixir-explorer/explorer · GitHub
Official Changelog: Changelog — Explorer v0.8.0

10 Likes

[Blog] Explorer 0.8: The dtype release

5 Likes

[Release] Explorer - v0.9

Added

  • Add initial support for SQL queries.

    The Explorer.DataFrame.sql/3 is a function that accepts a dataframe and a SQL query. The SQL is not validated by Explorer, so the queries will be backend dependent. Right now we have only Polars as the backend.

  • Add support for remote series and dataframes.

    Automatically transfer data between nodes for remote series and dataframes and perform distributed garbage collection.

    The functions in Explorer.DataFrame and Explorer.Series will automatically move operations on remote dataframes to the nodes they belong to.
    The Explorer.Remote module provides additional conveniences for manual placement.

  • Add FLAME integration, so we automatically track remote series and dataframes returned from FLAME calls when the :track_resources option is enabled.
    See FLAME for more.

  • Add Explorer.DataFrame.transform/3 that applies an Elixir function to each row. This function is similar to Explorer.Series.transform/2, and as such, it’s considered an expensive operation. So it’s recommended only if there is no similar dataframe or series operation available.

  • Improve performance of Explorer.Series.from_list/2 for most of the cases where the :dtype option is given. This is specially true for when the dtype is :binary.

Changed

  • Stop inference of dtypes if the :dtype option is given by the user.
    The main goal of this change is to improve performance. We are now delegating the job of decoding the terms as the given :dtype to the backend.

  • Explorer.Series.pow/2 no longer casts to float when the exponent is a signed integer. We are following the way Polars works now, which is to try to execute the operation or raise an exception in case the exponent is negative.

  • Explorer.Series.pivot_wider/4 no longer includes the names_from column name in the new columns when values_from is a list of columns. This is more consistent with its behaviour when values_from is a single column.

  • Explorer.Series.substring/3 no longer cycles to the end of the string if the negative offset surpasses the beginning of that string. In that case, an empty string is returned.

  • The Explorer.Series.ewm_* functions no longer replace nil values with the value at the previous index. They now propogate nil values through to the result series.

  • Saving a dataframe as a Parquet file to S3 services no longer works when streaming is enabled. This is temporary due to a bug in Polars. An exception should be raised instead.

Contributors

And thank you to the first-time contributors!:

Changelogs

3 Likes

[Release] Explorer - v0.10

Added

  • Add support for the decimals data type.

    Decimals dtypes are represented by the {:decimal, precision, scale} tuple,
    where precision can be a positive integer from 0 to 38, and is the maximum number
    of digits that can be represented by the decimal. The scale is the number of
    digits after the decimal point.

    With this addition, we also added the :decimal package as a new dependency.
    The Explorer.Series.from_list/2 function accepts decimal numbers from that
    package as values * %Decimal{}.

    This version has a small number of operations, but is a good foundation.

  • Allow the usage of queries and lazy series outside callbacks and macros.
    This is an improvement to functions that were originally designed to accept callbacks.
    With this change you can now reuse lazy series across different “queries”.
    See the Explorer.Query docs for details.

    The affected functions are:

    • Explorer.DataFrame.filter_with/2
    • Explorer.DataFrame.mutate_with/2
    • Explorer.DataFrame.sort_with/2
    • Explorer.DataFrame.summarise_with/2
  • Allow accessing the dataframe inside query.

  • Add “lazy read” support for Parquet and NDJSON from HTTP(s).

  • Expose more options for Explorer.Series.cut/3 and Explorer.Series.qcut/3.
    These options were available in Polars, but not in our APIs.

Fixed

  • Fix creation of series where a nil value inside a list * for a {:list, any()} dtype -
    could result in an incompatible dtype. This fix will prevent panics for list of lists with
    nil entries.

  • Fix Explorer.DataFrame.dump_ndjson/2 when date time is in use.

  • Fix Explorer.Series.product/1 for lazy series.

  • Accept %FSS.HTTP.Entry{} structs in functions like Explorer.DataFrame.from_parquet/2.

  • Fix encode of binaries to terms from series of the {:struct, any()} dtype.
    In case the inner fields of the struct had any binary (:binary dtype), it was
    causing a panic.

Changed

  • Change the defaults of the functions Explorer.Series.cut/3 and Explorer.Series.qcut/3
    to not have “break points” column in the resultant dataframe.
    So the :include_breaks is now false by default.

Contributors

New Contributors

Changelogs

8 Likes

I have a question regarding the development of new “primitive” functions.

I want to write functions that sample vectors (series?) of values from probability distributions. I’m most interested in the Beta and Dirichlet distributions.

I’ve thought of a format that handles both univariate (e.g. Beta) and multivariate (e.g. Dirichlet) distributions.

For univariate distributions it would be something like this:

DataFrame.draw_from_beta(a, b, n) = DataFrame.new(draws: [1,2,3,..., n], values: [...])

For multivariate distributions it would be something like this:

# For length(alpha) == 4
DataFrame.draw_from_dirichlet(alphas, n) =
  DataFrame.new(
    draws: [1,1,1,1,1,2,2,2,2, ... , n,n,n,n], # length = 4 * n
    i: [1,2,3,4, 1,2,3,4, ...], # length = 4 * n
    values: [...] # length = 4 * n
   )

Using a dataframe instead of a series allows me to have more or less uniform encoding for both univarite and multivariate distributions that plays well with the Explorer functions.

In order to implement this efficiently, I am building the dataframe fully in rust using a rust function and binding it to the Native module.

My questions are:

  1. Is there any way of implementing this outside of Explorer? I believe there isn’t, because the Rust types are defined inside Explorer’s source tree

  2. Is there any way the base types could be split into a rust crate so that they could be used in different elixir packages? I understand it’s probably more work than it’s worth

  3. Would Explorer accept the inclusion of these functions as part of a larger group of statistical functions (random variable sampling, CDF and PDF evaluation, etc.)

  4. If so, should those functions live in the Explorer.DataFrame module (because they generate dataframes) or in a new Explorer.Distribution module or something (like Explorer.Math or Explorer.Statistics)

Some final thoughts: I wonder if dataframes are the right API for this, but the truth is that Elixir doesn’t have bindings to a good array library, and even if it did, representing multidimensional arrays as a columns in a dataframe indexed by multiple columns doesn’t seem that terrible, and it might lead to an API which is actually better than some messy things I’ve had to do with NumPy arrays

1 Like

I wonder if you could implement these functions in Nx? This means you get to implement them in Elixir and they should be highly efficient as well.

Although we discussed ideas for exposing an Elixir API that would allow you to pass user defined functions, implemented in Rust, to Polars, so you can extend them.

1 Like

I wonder if you could implement these functions in Nx?

I’ve just checked and Nx does provide a function to generate uniform random number between 0 and 1, so yeah, I could probably sample from whatever distributions I wanted. However I have never actually been able to make Nx work with the EXLA compiler (or with any other compiler backend, IIRC). I didn’t actually try that hard, though.

I can try to make it work again, I guess…

Although we discussed ideas for exposing an Elixir API that would allow you to pass user defined functions, implemented in Rust, to Polars, so you can extend them.

How would that work? Would you use the C-ABI to interface between dynamically linked rust libraries from different Elixir packages?

How would [exposing an Elixir API that would allow you to pass user defined functions] work?

Polars has a plugin feature. IIRC the plan was to try and piggyback off that. But we’d not worked out the details.

I’ll also echo that this sounds more like a case for Nx.Tensors. I would start by trying to implement draw_from_beta, draw_from_dirchlet, etc. as defns as that seems like the more natural setting to me.

1 Like

Hm… Maybe, I don’t know… Drawing random numbers and putting them in a dataframes for further analysis (let’s say those numbers are posterior samples of a parameter in the context of a Bayesian analysis) seems like quite a natural thing to do.

But I’ll give Nx a try, anyway

I still can’t get the EXLA compiler to run on my system (64bit WSL running on Windows), so I went ahead and implemented the random distributions in a fork. I’ll polish it a bit and make a case for inclusion in Explorer.

I think Explorer should also support KDE for random samples, with decent bandwidth estimations. I have successfully implemented KDEs using the Silverman’s rule for bandwidth estimation in Elixir (on top of Explorer, of course), and it’s quite performant, but it’s important to support better methods like the improved Sheather-Jones method, which definitely needs to be implemented in “raw” rust with access to sophisticated rust packages.

Having good support for KDEs (or for binning in histograms) is important for plotting random distributions.

Since the Explorer team doesn’t seem to be interested in adding random variable sampling or advanced scientific functions, I have started my own project inspired by SciPy with the goal of adding efficient computation over multidimensional arrays in Elixir. The code is based on bindings to the Rust library ndarray and supporting packages. I’m linking it here in case anyone finds it interesting.

Forum topic here: SciEx - scientific programming for Elixir (based on bindings to rust's ndarray)

The following snippet shows an example of what’s possible (keep in mind this in the very early stages):

iex(4)> a = SciEx.Random.draw_from_normal(0.0, 1.0, 10)
#SciEx.F64.Array1<[0.980546308591242, -0.4124865578379766, -0.2678229458353892, 0.13895119159920113, -0.05895938590440134, 0.5463383504051798, -0.47139019152800626, 1.1432875805108502, -1.053279376911445, -0.4890897953025381], shape=[10], strides=[1], layout=CFcf (0xf), const ndim=1>
iex(5)> b = SciEx.Random.draw_from_normal(0.0, 1.0, 10)
#SciEx.F64.Array1<[-0.6112979257481231, 0.04187616378275234, 0.024658043411414837, -0.7423086166282216, 2.5287898587121993, -1.5968438074887625, 0.3028680311098233, 1.1657682958696634, -1.3245816353940532, 1.1693983808728805], shape=[10], strides=[1], layout=CFcf (0xf), const ndim=1>
iex(6)> use SciEx.Operators
SciEx
iex(7)> 0.5 * a + 0.7 * b  
#SciEx.F64.Array1<[0.06236460627193485, -0.17692996427106167, -0.1166508425297042, -0.45014043584015456, 1.7406732081463385, -0.8446214900395438, -0.02368747398712684, 1.3876815973641894, -1.4538468332315597, 0.5740339689597472], shape=[10], strides=[1], layout=CFcf (0xf), const ndim=1>
1 Like

[Release] Explorer - v0.11.0

Version 0.11.0 has been released!

This one is mostly minor improvements and bugfixes. One notable change is that the DataFrame print format was altered to save vertical space and to hint when rows were hidden.

Before:

iex> Explorer.Datasets.iris() |> Explorer.DataFrame.print()
+-----------------------------------------------------------------------+
|              Explorer DataFrame: [rows: 150, columns: 5]              |
+--------------+-------------+--------------+-------------+-------------+
| sepal_length | sepal_width | petal_length | petal_width |   species   |
|    <f64>     |    <f64>    |    <f64>     |    <f64>    |  <string>   |
+==============+=============+==============+=============+=============+
| 5.1          | 3.5         | 1.4          | 0.2         | Iris-setosa |
+--------------+-------------+--------------+-------------+-------------+
| 4.9          | 3.0         | 1.4          | 0.2         | Iris-setosa |
+--------------+-------------+--------------+-------------+-------------+
| 4.7          | 3.2         | 1.3          | 0.2         | Iris-setosa |
+--------------+-------------+--------------+-------------+-------------+
| 4.6          | 3.1         | 1.5          | 0.2         | Iris-setosa |
+--------------+-------------+--------------+-------------+-------------+
| 5.0          | 3.6         | 1.4          | 0.2         | Iris-setosa |
+--------------+-------------+--------------+-------------+-------------+

After:

iex> Explorer.Datasets.iris() |> Explorer.DataFrame.print()
+--------------------------------------------------------------------------+
|               Explorer DataFrame: [rows: 150, columns: 5]                |
+--------------+-------------+--------------+-------------+----------------+
| sepal_length | sepal_width | petal_length | petal_width |    species     |
|    <f64>     |    <f64>    |    <f64>     |    <f64>    |    <string>    |
+==============+=============+==============+=============+================+
| 5.1          | 3.5         | 1.4          | 0.2         | Iris-setosa    |
| 4.9          | 3.0         | 1.4          | 0.2         | Iris-setosa    |
| 4.7          | 3.2         | 1.3          | 0.2         | Iris-setosa    |
| …            | …           | …            | …           | …              |
| 6.2          | 3.4         | 5.4          | 2.3         | Iris-virginica |
| 5.9          | 3.0         | 5.1          | 1.8         | Iris-virginica |
+--------------+-------------+--------------+-------------+----------------+

See here for details on the new format:

Added

  • Explorer.DataFrame.estimated_size/1 - Estimates memory size of a DataFrame
  • Explorer.DataFrame.to_table_string/2 - Represents a DataFrame as a string
    for printing
  • Explorer.Series.degrees/1 - Converts radians to degrees
  • Explorer.Series.radians/1 - Converts degrees to radians
  • :quote_style option to CSV functions

Fixed

  • Fix bug where :region was incorrectly required in %FSS.S3.Entry{}
  • Fix trigonometric functions to not raise on f32
  • Fix warning from :table_rex dependency when printing
  • Fix formatting of Explorer.DataFrame.mutate_with/2 options
  • Explorer.Series.fill_missing/2 now works for all integer and float dtypes
  • Explorer.Series.frequencies/1 now works for {:list, _} dtype
  • Fix inefficiency with categorization
  • Fix typespecs
    • Explorer.DataFrame.select/2
    • Explorer.DataFrame.ungroup/1
    • Explorer.Series functions that may return lazy series

Changed

  • Printing a DataFrame looks different
    • Adds a row of to indicate there are hidden rows. Includes a new option
      limit_dots: :bottom | :split to specify how to do this.
    • Drops the row separators except when composite dtypes are present.
    • Allows you to pass through valid options to TableRex.render!/2. This
      gives you a little more flexibility in case you don’t like the defaults.
  • Explorer.DataFrame.print/1 now documents its default :limit of 5 rows
  • Explorer.DataFrame.concat_rows/1 has improved error messages
  • Accessing a DataFrame with a range now raises if the range is out of bounds

New Contributors

Full Changelog

The full changelog includes all contributions with individual attributions. Special shoutout to @mhanberg for helping out with some gnarly version wrangling!

4 Likes

[Release] Explorer - v0.11.1

Version 0.11.1 released!

This is a small one, mostly because we broke printing for lazy dataframes :grimacing: (thanks for the patch, @mhanberg!). But it comes with a few improvements too.

Of particular note is that Explorer.DataFrame.group_by is no longer stable by default. Before, groups would always be returned in their original order which is nice but has a non-trivial performance penalty. Now groups are returned in a random order for a performance boost (thanks, @petrkozorezov!). But you can always set group_by(..., stable: true) if stability is needed.

Added

  • Explorer.DataFrame.dump_ipc_schema
  • Explorer.DataFrame.dump_ipc_record_batch
  • Explorer.Series.cumulative_count
  • :stable option for Explorer.DataFrame.group_by

Fixed

  • Fix printing lazy data frame with new default print options (as of v0.11.0)
  • Fix mutate docs formatting

New Contributors

Full Changelog

The full changelog includes all contributions with individual attributions.

4 Likes

For my SciEx package I’d like to have “cheap” conversions between series of floats and integers (and complex numbers if Explorer supports such thinks, which I think it doesn’t) into 1D ndarrays. I don’t expect Explorer to deal with ndarrays internally - that’s completely outside the scope for Explorer but I wonder if there could be a “portable” way of converting a series into a rust Vec<f64> or Vec<f32> which could be returned into Elixir as a reference and passed into a rust function that would accept a Vec<_>. Making a 1D ndarray from a Vec<_> is effectively free. Currently, the only way of getting a rust Vec<_> for use with SciEx is to convert the Series into an elixir list of floats and then use Rustler to convert the list of floats into a Vec<_>.

I think one could use the C ABI for that, but I’m not totally sure (?). I’m not sure whether converting an Arrow array into a Vec<_> if cheap or not, but I believe it must be reasonably cheap, or at least cheaper than converting into an Elixir list and from that into a Vec<_>

Is this something the Explorer team would be interested in?

@tmbb

[…] complex numbers if Explorer supports such thinks, which I think it doesn’t

No, neither Explorer nor Polars support them. You could “make your own” with the struct dtype, but you’d need to implement all the relevant functions yourself.

Is [exposing the underlying Vec<_>s] something the Explorer team would be interested in?

I would say no at this time. The problem is that this functionality would be highly specific to the Polars backend.

I wonder though, since SciEx and the PolarsBackend are both written Rustler, could SciEx add a nif that takes a DataFrame as an argument? Then it would be trivial to convert one into an ndarray.

Another thought: using an external file format rather than routing through Elixir. If SciEx worked with parquet or ipc, then it might be faster to write-to-then-read-from a file.

Ok, fair enough, no complex numbers going back and forth. Which makes sense, given how rarely complex numbers come up in real datasets of the kind Explorer is used to.

I understand, but do you really plan o having different backends for Explorer? I don’t think that’s a likelye possibility and I was assuming you’d want to keep using Polars, despite ir being decoupled in practice.

It would only be “trivial” if I can rely on the fact that a dataframe consists of a number of ChunkedArrays, which again is specific to the Polars backend. The only way to convert data form Explorer to SciEx “cheaply” is by assuming something about the layout of the data on the rust side.

And in any case, one of the problems is that becuase Polars and SciEx are compiled independently, they may actually use different rust compilers or simply different memory representations. Any kind of data conversion would have to serialize the data to a file (as you suggest) or use a fixed representation which could be read from the other side.

I guess I like the serialization idea and I’ll pursue it.