ibis a pythonic query language
import ibis.omniscidb, dask, intake, sqlalchemy, pandas, pyarrow as arrow, altair, h5py as hdf5
The design of ibis
, and other open source projects like intake
and dask
, begin with the pioneering
"blaze"
ecosystem provided interfaces to store, describe, query, and process data.
"blaze" had ambitious goals and is now defunct, but its influences resonate throughout
the scientific python community development.
dask
emerged directly from "blaze"
; meanwhile
contributors can now be found working on projects like ibis
and arrow
.
ibis
took the role of providing a generic query language for data in python, that
is independent of the computation.
In this document we'll describe the different ibis
backend architectures and what it means to be independent of compute.
The design of ibis
backends.¶
Currently, ibis
supports >10 backends.
>>> dir(ibis)
[...HDFS...WebHDFS...bigquery...clickhouse...hdf5...impala...omniscidb...pandas...pyspark...spark...sql...sqlite...]
A backend takes an ibis
query expression and applies computation, and the query is independent of the computation.
A backend implementation, that can be queried with ibis
, has one of the three following architectures.
- Direct execution backends -
pandas and hdf5
. - Expression generating backends that create
sqlalchemy
expressions. - String generating backends -
ibis.bigquery and ibis.omniscidb
In the next few sections we'll unravel some of the different capiabilities of each approach.
A data-driven history of ibis
compatability.¶
The table below looks at over 2000 issues in the ibis project.
It provides an annual looked at the issues tagged in ibis
for different backends over 6 years.
omnisci | spark | postgres | bigquery | pandas | sqlite | impala | kudu | geospatial | clickhouse | mysql | sqlalchemy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
year | ||||||||||||
2015 | 2 | 2 | 25 | 52 | 17 | |||||||
2016 | 3 | 2 | 4 | 3 | ||||||||
2017 | 1 | 21 | 15 | 49 | 10 | 15 | 8 | 10 | ||||
2018 | 31 | 10 | 71 | 35 | 8 | 17 | 9 | 2 | 2 | |||
2019 | 33 | 22 | 17 | 12 | 32 | 1 | 4 | 7 | 1 | 2 | 5 | |
2020 | 38 | 3 | 4 | 2 | 4 | 1 | 2 | 1 | 3 | 4 | 4 |
We note an early focus ibis.sqlite, sqlalchemy and ibis.impala
.
Later, work began on the pandas
backend rounding out the three different types of backgrounds.
From this point, improvements were made to these key backends as ibis.clickhouse, ibis.spark and "postgres"
. Recently,
For the past 3 years, Quansight, in partnership with OmniSci, added the ibis.omniscidb
string generating backend. Since Quansight Labs has taken on a role as a community maintainer
for ibis
. This collaboration introduced geospatial functionality to ibis
, and we
have on going efforts to introduce SQL Server support
Currently, there is an ongoing effort to a . What backends what you like to see?
dask
?altair
?
Learn more about working Geospatial Data in
ibis
.
ibis
direct execution.¶
ibis
direct execution backends like pandas and hdf5
operate on conventional in-memory python objects.
pandas
is the gold standard for structured data in python, and inspires the api for ibis
.
pd = ibis.pandas.connect({'A': pandas.util.testing.makeDataFrame()})
pd
is an ibis
backend based off pandas.DataFrame
objects.
expression = pd.table('A').head()
expression
is an ibis
query, that has expression.compile
and expression.execute
methods.
We'll recognize the execute method when we return pandas.DataFrame
s from ibis
expression.
The compile method does not trigger any computation, rather it constructs an intermediate form
that is interpretted by a backend.
>>> assert isinstance(expression.compile(), ibis.expr.types.TableExpr)
In the case of direction execution backends, the expression
compiles to an the original ibis
expression. And the computation is carried out based on a set of recipes defined in ibis
.
In general, we would typically do this work directly in pandas
, however this work is
practical in mocking tests for expressions independent of backends.
Learn more about the HDF5 backend in the tests.
ibis
expression generating backends.¶
db = ibis.sqlite.connect('lahmansbaseballdb.sqlite')
expression = db.table('halloffame').head()
Expression generating backends operate on SQL databases that interoperator with sqlalchemy
.
>>> assert isinstance(expression.compile(), sqlalchemy.sql.Select)
In the case of expression generating backends, the intermediate representation is a sqlalchemy
object.
sqlalchemy
is The Database Toolkit for Python, and ibis
leverages it compatability
with traditional SQL databases.
ibis
string generating backends.¶
pip install --upgrade ibis-framework[omniscidb]
String generating backends allow ibis
to interface with big data systems that manage
their own computation. For example, we may connect to an example omnisci
database.
omnisci = ibis.omniscidb.connect(host='metis.omnisci.com', user='demouser', password='HyperInteractive', port=443, database='omnisci', protocol='https')
omnisci
is described as a string generating backend because the intermediate representation of the
query is a flavor of SQL.
expression = omnisci.table('upstream_reservoir').head()
A string generating expression compiles to ibis.omniscidb
flavored SQL, while ibis.bigquery
may have a different string representatin.
>>> expression.compile()
'SELECT *\nFROM upstream_reservoir\nLIMIT 5'
Major credit goes to [@xmnlab] in his heroic PR to introduce
ibis.omniscidb
intoibis
. You can watch the drama play out in this Github Issue. If you'd like to learn more about OmniSci andibis.omniscidb
checkout the following links.
Conclusion¶
We'd like to thank the maintainers of the ibis
for
their and effort in supporting the ibis
community.
Comments
Comments powered by Disqus