ibis a pythonic query language
import ibis.omniscidb, dask, intake, sqlalchemy, pandas, pyarrow as arrow, altair, h5py as hdf5
The design of ibis, and other open source projects like intake and dask, begin with the pioneering
"blaze" ecosystem provided interfaces to store, describe, query, and process data.
"blaze" had ambitious goals and is now defunct, but its influences resonate throughout
the scientific python community development.
dask emerged directly from "blaze"; meanwhile
contributors can now be found working on projects like ibis and arrow.
ibis took the role of providing a generic query language for data in python, that
is independent of the computation.
In this document we'll describe the different ibis backend architectures and what it means to be independent of compute.
The design of ibis backends.¶
Currently, ibis supports >10 backends.
>>> dir(ibis)
[...HDFS...WebHDFS...bigquery...clickhouse...hdf5...impala...omniscidb...pandas...pyspark...spark...sql...sqlite...]
A backend takes an ibis query expression and applies computation, and the query is independent of the computation.
A backend implementation, that can be queried with ibis, has one of the three following architectures.
- Direct execution backends -
pandas and hdf5. - Expression generating backends that create
sqlalchemyexpressions. - String generating backends -
ibis.bigquery and ibis.omniscidb
In the next few sections we'll unravel some of the different capiabilities of each approach.
A data-driven history of ibis compatability.¶
The table below looks at over 2000 issues in the ibis project.
It provides an annual looked at the issues tagged in ibis
for different backends over 6 years.
| omnisci | spark | postgres | bigquery | pandas | sqlite | impala | kudu | geospatial | clickhouse | mysql | sqlalchemy | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | ||||||||||||
| 2015 | 2 | 2 | 25 | 52 | 17 | |||||||
| 2016 | 3 | 2 | 4 | 3 | ||||||||
| 2017 | 1 | 21 | 15 | 49 | 10 | 15 | 8 | 10 | ||||
| 2018 | 31 | 10 | 71 | 35 | 8 | 17 | 9 | 2 | 2 | |||
| 2019 | 33 | 22 | 17 | 12 | 32 | 1 | 4 | 7 | 1 | 2 | 5 | |
| 2020 | 38 | 3 | 4 | 2 | 4 | 1 | 2 | 1 | 3 | 4 | 4 |
We note an early focus ibis.sqlite, sqlalchemy and ibis.impala.
Later, work began on the pandas backend rounding out the three different types of backgrounds.
From this point, improvements were made to these key backends as ibis.clickhouse, ibis.spark and "postgres". Recently,
For the past 3 years, Quansight, in partnership with OmniSci, added the ibis.omniscidb
string generating backend. Since Quansight Labs has taken on a role as a community maintainer
for ibis. This collaboration introduced geospatial functionality to ibis, and we
have on going efforts to introduce SQL Server support
Currently, there is an ongoing effort to a . What backends what you like to see?
dask?altair?
Learn more about working Geospatial Data in
ibis.
ibis direct execution.¶
ibis direct execution backends like pandas and hdf5 operate on conventional in-memory python objects.
pandas is the gold standard for structured data in python, and inspires the api for ibis.
pd = ibis.pandas.connect({'A': pandas.util.testing.makeDataFrame()})
pd is an ibis backend based off pandas.DataFrame objects.
expression = pd.table('A').head()
expression is an ibis query, that has expression.compile and expression.execute methods.
We'll recognize the execute method when we return pandas.DataFrames from ibis expression.
The compile method does not trigger any computation, rather it constructs an intermediate form
that is interpretted by a backend.
>>> assert isinstance(expression.compile(), ibis.expr.types.TableExpr)
In the case of direction execution backends, the expression compiles to an the original ibis
expression. And the computation is carried out based on a set of recipes defined in ibis.
In general, we would typically do this work directly in pandas, however this work is
practical in mocking tests for expressions independent of backends.
Learn more about the HDF5 backend in the tests.
ibis expression generating backends.¶
db = ibis.sqlite.connect('lahmansbaseballdb.sqlite')
expression = db.table('halloffame').head()
Expression generating backends operate on SQL databases that interoperator with sqlalchemy.
>>> assert isinstance(expression.compile(), sqlalchemy.sql.Select)
In the case of expression generating backends, the intermediate representation is a sqlalchemy object.
sqlalchemy is The Database Toolkit for Python, and ibis leverages it compatability
with traditional SQL databases.
ibis string generating backends.¶
pip install --upgrade ibis-framework[omniscidb]
String generating backends allow ibis to interface with big data systems that manage
their own computation. For example, we may connect to an example omnisci database.
omnisci = ibis.omniscidb.connect(host='metis.omnisci.com', user='demouser', password='HyperInteractive', port=443, database='omnisci', protocol='https')
omnisci is described as a string generating backend because the intermediate representation of the
query is a flavor of SQL.
expression = omnisci.table('upstream_reservoir').head()
A string generating expression compiles to ibis.omniscidb flavored SQL, while ibis.bigquery may have a different string representatin.
>>> expression.compile()
'SELECT *\nFROM upstream_reservoir\nLIMIT 5'
Major credit goes to [@xmnlab] in his heroic PR to introduce
ibis.omniscidbintoibis. You can watch the drama play out in this Github Issue. If you'd like to learn more about OmniSci andibis.omniscidbcheckout the following links.
Conclusion¶
We'd like to thank the maintainers of the ibis for
their and effort in supporting the ibis community.
Comments
Comments powered by Disqus