tick.survival.SimuSCCS

class tick.survival.SimuSCCS(n_cases, n_intervals, n_features, n_lags, time_drift=None, exposure_type='single_exposure', distribution='multinomial', sparse=True, censoring_prob=0, censoring_scale=None, coeffs=None, hawkes_exp_kernels=None, n_correlations=0, batch_size=None, seed=None, verbose=True)[source]

Simulation of a Self Control Case Series (SCCS) model. This simulator can produce exposure (features), outcomes (labels) and censoring data. The features matrices are a n_cases list of numpy arrays (dense case) or csr_matrices (sparse case) of shape (n_intervals, n_features) containing exposures to each feature. Exposure can take two forms: - short repeated exposures (single_exposure): in that case, each column of the numpy arrays or csr matrices can contain multiple ones, each one representing an exposure for a particular time bucket. - infinite unique exposures (multiple_exposure): in that case, each column of the numpy arrays or csr matrices can only contain a single one, corresponding to the starting date of the exposure.

Parameters

n_cases : int

Number of cases to generate. A case is a sample who experience at least one adverse event.

n_intervals : int

Number of time intervals used to generate features and outcomes.

n_features : int

Number of features to simulate for each case.

n_lags : numpy.ndarray, shape=(n_features,), dtype=”uint64”

Number of lags per feature. The model will regress labels on the last observed values of the features over their corresponding n_lags time intervals. n_lags values must be between 0 and n_intervals - 1.

exposure_type : {‘single_exposure’, ‘multiple_exposure’}, default=’single_exposure’

Either ‘single_exposure’ for infinite unique exposures or ‘multiple_exposure’ for short repeated exposures.

distribution : {‘multinomial’, ‘poisson’}, default=’multinomial’

Distribution used to generate the outcomes. In the ‘multinomial’ case, the Poisson process used to generate the events is conditioned by total the number event per sample, which is set to be equal to one. In that case, the simulation matches exactly the SCCS model hypotheses. In the ‘poisson’ case, the outcomes are generated from a Poisson process, which can result in more than one outcome tick per sample. In this case, the first event is kept, and the other are discarded.

sparse : boolean, default=True

Generate sparse or dense features.

censoring_prob : float, default=0.

Probability that a sample is censored. Should be in [0, 1]. If 0, no censoring is applied. When > 0, SimuSCCS simulates a censoring vector. In that case, the features and outcomes are simulated, then right-censored according to the simulated censoring dates.

censoring_scale : float, default=None

The number of censored time intervals are drawn from a Poisson distribution with intensity equal to censoring_scale. The higher, the more intervals will be censored. If None, no censoring is applied.

coeffs : list containing numpy.ndarray, default=None

Can be used to provide your own set of coefficients. Element i of the list should be a 1-d numpy.ndarray of shape (n_lags + 1), where n_lags[i] is the number of lags associated to feature i. If set to None, the simulator will generate coefficients randomly.

hawkes_exp_kernels : SimuHawkesExpKernels, default=None

Features are simulated with exponential kernel Hawkes processes. This parameter can be used to specify your own kernels (see SimuHawkesExpKernels documentation). If None, random kernels are generated. The same kernels are used to generate features for the whole generated population.

n_correlations : int, default=0

If hawkes_exp_kernels is None, random kernels are generated. This parameter controls the number of non-null non-diagonal kernels.

batch_size : int, default=None

When generating outcomes with Poisson distribution, the simulator will discard samples to which no event has occurred. In this case, the simulator generate successive batches of samples, until it reaches a total of n_samples. This parameter can be used to set the batch size.

seed : int, default=None

The seed of the random number generator

verbose : bool, default=True

If True, print things

Examples

>>> import numpy as np
>>> from tick.survival import SimuSCCS
>>> n_lags = np.repeat(2, 2).astype('uint64')
>>> sim = SimuSCCS(n_cases=5, n_intervals=3, n_features=2, n_lags=n_lags,
... seed=42, sparse=False, exposure_type="multiple_exposures",
... verbose=False)
>>> features, labels, outcomes, censoring, _coeffs = sim.simulate()
>>> print(features)
[array([[0., 0.],
       [1., 0.],
       [1., 1.]]), array([[1., 0.],
       [1., 0.],
       [1., 1.]]), array([[1., 1.],
       [1., 1.],
       [1., 1.]]), array([[0., 0.],
       [1., 1.],
       [1., 0.]]), array([[1., 0.],
       [0., 0.],
       [0., 0.]])]
>>> print(censoring)
[3 3 3 3 3]
>>> print(_coeffs)
[array([ 0.54738557, -0.15109073,  0.71345739]), array([ 1.67633284, -0.25656871, -0.25655065])]
__init__(n_cases, n_intervals, n_features, n_lags, time_drift=None, exposure_type='single_exposure', distribution='multinomial', sparse=True, censoring_prob=0, censoring_scale=None, coeffs=None, hawkes_exp_kernels=None, n_correlations=0, batch_size=None, seed=None, verbose=True)[source]

Initialize self. See help(type(self)) for accurate signature.

simulate()[source]

Launch simulation of the data.

Returns

features : list of numpy.ndarray or list of scipy.sparse.csr_matrix,

list of length n_cases, each element of the list of shape=(n_intervals, n_features) The list of features matrices.

labels : list of numpy.ndarray,

list of length n_cases, each element of the list of shape=(n_intervals,) The labels vector

censoring : numpy.ndarray, shape=(n_cases,), dtype=”uint64”

The censoring data. This array should contain integers in [1, n_intervals]. If the value i is equal to n_intervals, then there is no censoring for sample i. If censoring = c < n_intervals, then the observation of sample i is stopped at interval c, that is, the row c - 1 of the corresponding matrix. The last n_intervals - c rows are then set to 0.

_coeffs : numpy.ndarray, shape=(n_features * (n_lags + 1),)

The coefficients used to simulate the data.

simulate_features(n_samples)[source]

Simulates features, either single_exposure or multiple_exposures exposures.