High Performance Iterations Over 10,000 Row Data Set


import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(10000,5), columns=('A','B','C','D','E'))

%timeit df['E'] = np.where(df['E'].values < .5, 0, 1)

93.2 µs ± 527 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

(Excerpted from: How to iterate over a Pandas DataFrame)

What Elixir libraries/functions can be used to achieve equal/better performance given the same data set?

Well, for numpy part, I would say that you could use nx, but I think for now it is still a early stage tool, so I would not expect the same result for a while.

For pandas though, I don’t think that there is a tool like that yet, but I would not be surprised if one gets released in 2 or 3 weeks :slight_smile: