Hello
I am playing with Explorer coming from pandas, and have some difficulties finding the right way to do some of my usual workflow steps.
Reading data
When using pandas in Python, I use a lot of .loc[]
and .iloc[]
calls to access data.
What is the idiomatic way to access data in Explorer ?
The closest approach I could find is to grab the underlying series and the ask the value by index (which sounds like a mix of loc and iloc to me) for ex.:
df["year"][0]
but it doesn’t feel very natural.
Cleaning data
Most of the time when I work in pandas, the first step in the notebook is to clean the database before processing.
Does anybody know how to do it in Explorer ?
More precisely I look for the equivalent of the following commands:
- Drop invalid columns (columns with only NaN or nils), equivalent to panda’s
df.dropna(how='all', axis=1)
- Same thing for invalid rows:
df.dropna(how='all', axis=0
- Data replacement with
fillna
: replace all NaN values with a given valuedf.fillna(0)
or do it by columndf.fillna({'col1': 0, 'col2': 3})
etc. - conversion from string to integer: an equivalent to
pandas.to_numeric
(with coercion of invalid values) would be useful, and equivalents to the “str methods” (to do some replace in the string before parsing them to numbers) would also be useful
If anybody can help on these I would be grateful
I understand that these are newby questions, but I could not figure them on my own… I’m also biased towards the “pandas way” since I’ve been using it for a long time so there are probably ways I just don’t see.
Kind regards,
Aurélien