tick.dataset

This module provides easy access to some datasets used as benchmarks in tick. These datasets are hosted on the following separate repository:

https://github.com/X-DataInitiative/tick-datasets

and are easily accessible using the following function:

dataset.fetch_tick_dataset(dataset_path[, …])

Fetch dataset from tick_datasets github repository.

Some datasets might also have a dedicated function handler if they need a dedicated treatment.

dataset.fetch_hawkes_bund_data()

Load Hawkes formatted bund data from https://github.com/X-DataInitiative/tick-datasets/tree/master/hawkes/bund

The following datasets are easily downloadable using fetch_tick_dataset (for now, only for binary classification):

Example

"""
==============================================
Binary classification with logistic regression
==============================================

This code perform binary classification on adult dataset with logistic
regression learner (`tick.inference.LogisticRegression`).
"""

import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc

from tick.linear_model import LogisticRegression
from tick.dataset import fetch_tick_dataset

train_set = fetch_tick_dataset('binary/adult/adult.trn.bz2')
test_set = fetch_tick_dataset('binary/adult/adult.tst.bz2')

learner = LogisticRegression()
learner.fit(train_set[0], train_set[1])

predictions = learner.predict_proba(test_set[0])
fpr, tpr, _ = roc_curve(test_set[1], predictions[:, 1])

plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, lw=2)
plt.title("ROC curve on adult dataset (area = {:.2f})".format(auc(fpr, tpr)))
plt.ylabel("True Positive Rate")
plt.xlabel("False Positive Rate")

plt.show()

(Source code, png, hires.png, pdf)

../_images/plot_logistic_adult1.png