We've released v0.0.14. This release adds several new pandas-like date & time operations to the Bach modeling library that provide more granular control when wrangling/exploring time-related data.
Like any Bach operation, they work on the full SQL dataset.
A few highlights of the features that we’ve added:
TimeDelta
: Get days/seconds/microseconds/components between two time series.quantile
: to return values at the given quantile of numeric columns.stack()
/unstack()
: Reshape a DataFrame or Series having a multi-level index, pivoting its columns back and forth.cut()
: Bin values into discrete intervals, e.g. to segment into groups of age ranges.qcut()
: Bin values into equal-sized buckets based on rank or quantiles.dropna()
: drop rows that contain nullable values in DataFrame and Series.fillna()
: fillnull
gaps, with either a value, orffill
andbfill
methods.sort_by
andascending
parameters forDataFrame.drop_duplicates()
.
An example of the new quantile
support:
Modeling session duration distribution with quantiles in Bach
Other features for exploration and modeling that were added:
display_sql_markdown()
support: to display SQL results as Markdown, making it easier to read in notebooks.describe()
for all values: generates descriptive statistics that summarize the shape of a dataset’s distribution, such as count, max, min, mean, and standard deviation.- Variable time aggregation in models in model hub.
The details
Check out:
- Bach in our repo, for how to run the library.
- The Bach modeling library docs, for how to use all the new functionality.
- The full changelog.
Enjoy!