tick.preprocessing.LongitudinalFeaturesLagger

class tick.preprocessing.LongitudinalFeaturesLagger(n_lags, n_jobs=-1)[source]

Transforms longitudinal exposure features to add columns representing lagged features.

This preprocessor transform an input list of n_cases numpy ndarrays or scipy.sparse.csr_matrices of shape (n_intervals, n_features) so as to add columns representing the lagged exposures. It outputs a list of n_cases numpy arrays or csr_matrices of shape (n_intervals, n_features + sum(n_lags + 1)).

Exposure can take two forms: - short repeated exposures: in that case, each column of the numpy arrays or csr matrices can contain multiple ones, each one representing an exposure for a particular time bucket. - infinite unique exposures: in that case, each column of the numpy arrays or csr matrices can only contain a single one, corresponding to the starting date of the exposure.

Parameters

n_lags : numpy.ndarray, shape=(n_features,), dtype=”uint64”

Number of lags to compute: the preprocessor adds columns representing lag = 1, …, n_lags[i] for each feature [i]. If n_lags is a null vector, this preprocessor does nothing. n_lags must be non-negative.

Examples

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> from tick.preprocessing.longitudinal_features_lagger import LongitudinalFeaturesLagger
>>> features = [csr_matrix([[0, 1, 0],
...                         [0, 0, 0],
...                         [0, 0, 1]], dtype="float64"),
...             csr_matrix([[1, 1, 0],
...                         [0, 0, 1],
...                         [0, 0, 0]], dtype="float64")
...             ]
>>> censoring = np.array([3, 2], dtype="uint64")
>>> n_lags = np.array([2, 1, 0], dtype='uint64')
>>> lfl = LongitudinalFeaturesLagger(n_lags)
>>> product_features, _, _ = lfl.fit_transform(features)
>>> # output comes as a list of sparse matrices or 2D numpy arrays
>>> product_features.__class__
<class 'list'>
>>> [x.toarray() for x in product_features]
[array([[0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 1.]]), array([[1., 0., 0., 1., 0., 0.],
       [0., 1., 0., 0., 1., 1.],
       [0., 0., 1., 0., 0., 0.]])]
__init__(n_lags, n_jobs=-1)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(features, labels=None, censoring=None)[source]

Fit the feature lagger using the features matrices list.

Parameters

features : list of numpy.ndarray or list of scipy.sparse.csr_matrix,

list of length n_cases, each element of the list of shape=(n_intervals, n_features) The list of features matrices.

censoring : numpy.ndarray, shape=(n_cases,), dtype=”uint64”

The censoring data. This array should contain integers in [1, n_intervals]. If the value i is equal to n_intervals, then there is no censoring for sample i. If censoring = c < n_intervals, then the observation of sample i is stopped at interval c, that is, the row c - 1 of the corresponding matrix. The last n_intervals - c rows are then set to 0.

Returns

output : LongitudinalFeaturesLagger

The fitted current instance.

transform(features, labels=None, censoring=None)[source]

Add lagged features to the given features matrices list.

Parameters

features : list of numpy.ndarray or list of scipy.sparse.csr_matrix,

list of length n_cases, each element of the list of shape=(n_intervals, n_features) The list of features matrices.

censoring : numpy.ndarray, shape=(n_cases,), dtype=’uint64’, default=’None’

The censoring data. This array should contain integers in [1, n_intervals]. If the value i is equal to n_intervals, then there is no censoring for sample i. If censoring = c < n_intervals, then the observation of sample i is stopped at interval c, that is, the row c - 1 of the correponding matrix. The last n_intervals - c rows are then set to 0.

Returns

output : [numpy.ndarrays] or [csr_matrices], shape=(n_intervals, n_features)

The list of features matrices with added lagged features.