Analysis of ibis issues.

This document explores the project management of issues in the ibis project. It produces a csv that can be used in "annual_tags.csv".

import ibis, pandas; from poser import *

Cache the results after we grab them to improve reuse. Delete the cache file to reset cache.

λ['requests_cache.install_cache']('issues')

We'll only concern ourselves with some backends specifically.

backends =\

omnisci spark postgres bigquery pandas sqlite impala kudu geospatial clickhouse mysql sqlalchemy

get_issues: (

Paginated request for the ibis issues.

) = (
    λ
    ['https://api.github.com/repos/ibis-project/ibis/issues?page={}&state=all'.format]
    .partial('requests.get', params=dict(access_token=__import__('os').environ['ACCESS_TOKEN']))
    [Λ.json()]
)
i, last = 0, range(30)

Iterate through the issues. The Github API returns 30 results if there are more results on another page. This look is loading the list of issues and caching them.

while len(last) == 30:
    i +=1
    last = get_issues(i)

Tidy the issues into a dataframe.

The issue data returned from Github
id 605865017 605864409 605456457 605156532 605153217 605013516 604931573 604318267 603230786 603156074 ... 69922370 69922366 69922363 69922358 69922355 69922353 69922347 69922344 69922336 69922332
url https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... ... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis...
repository_url https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis ... https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis https://api.github.com/repos/ibis-project/ibis
labels_url https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... ... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis...
comments_url https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... ... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis...
events_url https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... ... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis... https://api.github.com/repos/ibis-project/ibis...
html_url https://github.com/ibis-project/ibis/pull/2196 https://github.com/ibis-project/ibis/pull/2195 https://github.com/ibis-project/ibis/pull/2194 https://github.com/ibis-project/ibis/pull/2193 https://github.com/ibis-project/ibis/issues/2192 https://github.com/ibis-project/ibis/issues/2191 https://github.com/ibis-project/ibis/pull/2190 https://github.com/ibis-project/ibis/pull/2189 https://github.com/ibis-project/ibis/issues/2188 https://github.com/ibis-project/ibis/pull/2187 ... https://github.com/ibis-project/ibis/issues/15 https://github.com/ibis-project/ibis/issues/14 https://github.com/ibis-project/ibis/issues/13 https://github.com/ibis-project/ibis/issues/12 https://github.com/ibis-project/ibis/issues/11 https://github.com/ibis-project/ibis/issues/10 https://github.com/ibis-project/ibis/issues/9 https://github.com/ibis-project/ibis/issues/8 https://github.com/ibis-project/ibis/issues/7 https://github.com/ibis-project/ibis/issues/6
node_id MDExOlB1bGxSZXF1ZXN0NDA4MTkxMDQw MDExOlB1bGxSZXF1ZXN0NDA4MTkwNTIx MDExOlB1bGxSZXF1ZXN0NDA3ODU0MjE0 MDExOlB1bGxSZXF1ZXN0NDA3NjE0OTg4 MDU6SXNzdWU2MDUxNTMyMTc= MDU6SXNzdWU2MDUwMTM1MTY= MDExOlB1bGxSZXF1ZXN0NDA3NDMwODIz MDExOlB1bGxSZXF1ZXN0NDA2OTM1MTgz MDU6SXNzdWU2MDMyMzA3ODY= MDExOlB1bGxSZXF1ZXN0NDA1OTk3MzUw ... MDU6SXNzdWU2OTkyMjM3MA== MDU6SXNzdWU2OTkyMjM2Ng== MDU6SXNzdWU2OTkyMjM2Mw== MDU6SXNzdWU2OTkyMjM1OA== MDU6SXNzdWU2OTkyMjM1NQ== MDU6SXNzdWU2OTkyMjM1Mw== MDU6SXNzdWU2OTkyMjM0Nw== MDU6SXNzdWU2OTkyMjM0NA== MDU6SXNzdWU2OTkyMjMzNg== MDU6SXNzdWU2OTkyMjMzMg==
number 2196 2195 2194 2193 2192 2191 2190 2189 2188 2187 ... 15 14 13 12 11 10 9 8 7 6
title CI: Removing backends from docs build to fix d... Fix CondaBuild CI job CI: Split tests in two groups by backend CI: Speeding up conda solver time by removing ... CI: Unpin arrow builds Access to Twitter account CI: Testing if previous arrow builds fix arrow... CI: Trying to find a point where the disk spac... FEAT: Create rule for tuples FIX: fix regexp expression at 'test_string' ... Method for user to indicate format of data to ... Ibis UDA persistence in Impala Consider (and benchmark) IPC alternative to se... Semaphore array management and cleanup in prod... Enable configurable row batchsize for passing ... Create user API for specifying tabular data to... [CLOSED] Cherry pick to cdh5-trunk Impala Code Style ibis.server doesn't close when Impala killed Maintaining aggregation state between task calls
user {'login': 'datapythonista', 'id': 10058240, 'n... {'login': 'xmnlab', 'id': 5209757, 'node_id': ... {'login': 'datapythonista', 'id': 10058240, 'n... {'login': 'datapythonista', 'id': 10058240, 'n... {'login': 'datapythonista', 'id': 10058240, 'n... {'login': 'datapythonista', 'id': 10058240, 'n... {'login': 'datapythonista', 'id': 10058240, 'n... {'login': 'datapythonista', 'id': 10058240, 'n... {'login': 'emilyreff7', 'id': 50638962, 'node_... {'login': 'dchigarev', 'id': 62142979, 'node_i... ... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD... {'login': 'wesm', 'id': 329591, 'node_id': 'MD...
labels [{'id': 623935379, 'node_id': 'MDU6TGFiZWw2MjM... [] [{'id': 623935379, 'node_id': 'MDU6TGFiZWw2MjM... [{'id': 623935379, 'node_id': 'MDU6TGFiZWw2MjM... [{'id': 623935379, 'node_id': 'MDU6TGFiZWw2MjM... [] [{'id': 623935379, 'node_id': 'MDU6TGFiZWw2MjM... [] [] [] ... [{'id': 200986080, 'node_id': 'MDU6TGFiZWwyMDA... [{'id': 200986080, 'node_id': 'MDU6TGFiZWwyMDA... [] [] [{'id': 200986080, 'node_id': 'MDU6TGFiZWwyMDA... [] [{'id': 200986080, 'node_id': 'MDU6TGFiZWwyMDA... [{'id': 200986080, 'node_id': 'MDU6TGFiZWwyMDA... [{'id': 200986078, 'node_id': 'MDU6TGFiZWwyMDA... []
state open open open closed open open closed closed open open ... closed closed closed closed open open closed closed closed closed
locked False False False False False False False False False False ... False False False False False False False False False False
assignee None None None None None None None None None None ... None None None None None None None None None None
assignees [] [] [] [] [] [] [] [] [] [] ... [] [] [] [] [] [] [] [] [] []
milestone None None {'url': 'https://api.github.com/repos/ibis-pro... {'url': 'https://api.github.com/repos/ibis-pro... None None None None None None ... None None None None {'url': 'https://api.github.com/repos/ibis-pro... {'url': 'https://api.github.com/repos/ibis-pro... {'url': 'https://api.github.com/repos/ibis-pro... None None None
comments 1 0 0 5 0 0 1 4 0 0 ... 1 1 1 1 1 0 1 0 2 2
created_at 2020-04-23 20:54:44+00:00 2020-04-23 20:53:50+00:00 2020-04-23 11:28:14+00:00 2020-04-23 00:29:55+00:00 2020-04-23 00:18:57+00:00 2020-04-22 19:36:09+00:00 2020-04-22 17:26:16+00:00 2020-04-21 22:12:13+00:00 2020-04-20 13:17:09+00:00 2020-04-20 11:14:57+00:00 ... 2015-04-21 18:49:29+00:00 2015-04-21 18:49:28+00:00 2015-04-21 18:49:28+00:00 2015-04-21 18:49:27+00:00 2015-04-21 18:49:27+00:00 2015-04-21 18:49:26+00:00 2015-04-21 18:49:25+00:00 2015-04-21 18:49:25+00:00 2015-04-21 18:49:24+00:00 2015-04-21 18:49:23+00:00
updated_at 2020-04-24 00:42:19+00:00 2020-04-23 23:01:30+00:00 2020-04-24 01:25:57+00:00 2020-04-23 19:46:48+00:00 2020-04-23 00:19:18+00:00 2020-04-22 19:36:09+00:00 2020-04-22 23:59:50+00:00 2020-04-23 19:52:11+00:00 2020-04-20 13:17:09+00:00 2020-04-24 06:59:35+00:00 ... 2017-06-26 18:38:48+00:00 2017-06-26 18:37:05+00:00 2017-06-26 18:35:26+00:00 2017-06-26 18:35:45+00:00 2018-01-29 18:47:11+00:00 2015-05-25 23:10:12+00:00 2015-04-21 18:52:55+00:00 2015-10-07 00:27:27+00:00 2017-02-27 14:58:40+00:00 2017-06-28 15:38:28+00:00
closed_at NaT NaT NaT 2020-04-23 19:40:10+00:00 NaT NaT 2020-04-22 23:59:50+00:00 2020-04-23 19:52:11+00:00 NaT NaT ... 2017-06-26 18:38:48+00:00 2017-06-26 18:37:05+00:00 2017-06-26 18:35:26+00:00 2017-06-26 18:35:45+00:00 NaT NaT 2015-04-21 18:52:55+00:00 2015-10-07 00:27:27+00:00 2017-02-27 14:58:40+00:00 2017-06-28 15:38:28+00:00
author_association COLLABORATOR COLLABORATOR COLLABORATOR COLLABORATOR COLLABORATOR COLLABORATOR COLLABORATOR COLLABORATOR COLLABORATOR CONTRIBUTOR ... MEMBER MEMBER MEMBER MEMBER MEMBER MEMBER MEMBER MEMBER MEMBER MEMBER
pull_request {'url': 'https://api.github.com/repos/ibis-pro... {'url': 'https://api.github.com/repos/ibis-pro... {'url': 'https://api.github.com/repos/ibis-pro... {'url': 'https://api.github.com/repos/ibis-pro... NaN NaN {'url': 'https://api.github.com/repos/ibis-pro... {'url': 'https://api.github.com/repos/ibis-pro... NaN {'url': 'https://api.github.com/repos/ibis-pro... ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
body Currently condabuild ci job is failing because... Testing if splitting tests in this naive way s... xref #2181\r\n\r\nRunning `conda env create --... Arrow 0.16.0 builds in conda-forge were causin... Does anyone have access to the Ibis Twitter ac... This will fail the Linux builds, but if the Wi... Seems like the master builds have been failing... ibis.expr.rules does not contain a rule for tu... `\\d+` means string that has `\ddd...dd` as su... ... <a href="http://github.mtv.cloudera.com/wesm">... <a href="http://github.mtv.cloudera.com/wesm">... <a href="http://github.mtv.cloudera.com/wesm">... <a href="http://github.mtv.cloudera.com/wesm">... <a href="http://github.mtv.cloudera.com/wesm">... <a href="http://github.mtv.cloudera.com/wesm">... <a href="http://github.mtv.cloudera.com/bittor... <a href="http://github.mtv.cloudera.com/bittor... <a href="http://github.mtv.cloudera.com/bittor... <a href="http://github.mtv.cloudera.com/wesm">...

23 rows × 2190 columns

issues = λ.range(1,i).map(λ+get_issues+pandas.DataFrame)[pandas.concat]().set_index('id')
times = λ(issues.columns) / λ.endswith('_at') + list + ...
issues[times] = issues[times].apply(pandas.to_datetime)
del times

We accessed 2190 from the ibis project.

Extract the label_names from the issues.

Label dummies
id 69922336 69922344 69922347 69922355 69922366 69922370 69922372 69922378 69922379 69922384 ... 594078317 596089546 596634656 597615538 598181367 604931573 605153217 605156532 605456457 605865017
analytics 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
bigquery 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 0 0
bug 1 0 0 0 0 0 1 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
ci 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 1 1 1 1 1 1 1
clickhouse 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
community 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
compatibility 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
complexity-high 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
complexity-low 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
complexity-medium 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
coverage 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
ddl 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
developer-api 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
discussion 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
docker 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
documentation 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
duplicate 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
enhancement 0 1 1 1 1 1 0 0 0 1 ... 1 1 0 0 0 0 0 0 0 0
etl 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
expressions 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
geospatial 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
good first issue 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
hdfs 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
help wanted 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
hive 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
impala 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
impala-udf 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
invalid 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
kudu 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
meeting 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
mysql 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
omnisci 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
packaging 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
pandas 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
parquet 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
performance 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
postgis 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
postgres 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
presto 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
pyspark 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
question 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
redshift 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
refactoring 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
security 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
spark 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
sql-support 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
sqlalchemy 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
sqlite 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
testing 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 0 0 0 0 0 0 0
usability 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
use-case 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
window functions 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
windows os 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
wontfix 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0

54 rows × 1707 columns

label_names = issues['labels'].apply(λ['pandas.Series']).stack().apply(λ['pandas.Series']).name
label_names = label_names.pipe(λ['pandas.get_dummies']).pipe(lambda df: df.groupby(df.index.get_level_values(0)).sum())

Join the labels with the original dataframe and compose an annual analysis of the backends.

omnisci spark postgres bigquery pandas sqlite impala kudu geospatial clickhouse mysql sqlalchemy
year
2015 2 2 25 52 17
2016 3 2 4 3
2017 1 21 15 49 10 15 8 10
2018 31 10 71 35 8 17 9 2 2
2019 33 22 17 12 32 1 4 7 1 2 5
2020 38 3 4 2 4 1 2 1 3 4 4
issues_tags = label_names.join(issues).set_index('created_at').groupby(pandas.Grouper(freq='1Y')).sum()[backends.split()]
issues_tags = issues_tags.set_index(pandas.Index(issues_tags.index.year, name='year'))
issues_tags = issues_tags.replace({0:''})
issues_tags.to_csv('annual_tags.csv')

Comments

Comments powered by Disqus