tick.survival.
SimuSCCS
(n_cases, n_intervals, n_features, n_lags, time_drift=None, exposure_type='single_exposure', distribution='multinomial', sparse=True, censoring_prob=0, censoring_scale=None, coeffs=None, hawkes_exp_kernels=None, n_correlations=0, batch_size=None, seed=None, verbose=True)[source]¶Simulation of a Self Control Case Series (SCCS) model. This simulator can
produce exposure (features), outcomes (labels) and censoring data.
The features matrices are a n_cases
list of numpy arrays (dense case) or
csr_matrices (sparse case) of shape (n_intervals, n_features)
containing
exposures to each feature.
Exposure can take two forms:
- short repeated exposures (single_exposure
): in that case, each column of the
numpy arrays or csr matrices can contain multiple ones, each one representing an
exposure for a particular time bucket.
- infinite unique exposures (multiple_exposure
): in that case, each column of the
numpy arrays or csr matrices can only contain a single one, corresponding to the
starting date of the exposure.
n_cases : int
Number of cases to generate. A case is a sample who experience at least one adverse event.
n_intervals : int
Number of time intervals used to generate features and outcomes.
n_features : int
Number of features to simulate for each case.
n_lags : numpy.ndarray
, shape=(n_features,), dtype=”uint64”
Number of lags per feature. The model will regress labels on the last observed values of the features over their corresponding
n_lags
time intervals.n_lags
values must be between 0 andn_intervals
- 1.
exposure_type : {‘single_exposure’, ‘multiple_exposure’}, default=’single_exposure’
Either ‘single_exposure’ for infinite unique exposures or ‘multiple_exposure’ for short repeated exposures.
distribution : {‘multinomial’, ‘poisson’}, default=’multinomial’
Distribution used to generate the outcomes. In the ‘multinomial’ case, the Poisson process used to generate the events is conditioned by total the number event per sample, which is set to be equal to one. In that case, the simulation matches exactly the SCCS model hypotheses. In the ‘poisson’ case, the outcomes are generated from a Poisson process, which can result in more than one outcome tick per sample. In this case, the first event is kept, and the other are discarded.
sparse : boolean
, default=True
Generate sparse or dense features.
censoring_prob : float
, default=0.
Probability that a sample is censored. Should be in [0, 1]. If 0, no censoring is applied. When > 0, SimuSCCS simulates a censoring vector. In that case, the features and outcomes are simulated, then right-censored according to the simulated censoring dates.
censoring_scale : float
, default=None
The number of censored time intervals are drawn from a Poisson distribution with intensity equal to
censoring_scale
. The higher, the more intervals will be censored. If None, no censoring is applied.
coeffs : list
containing numpy.ndarray
, default=None
Can be used to provide your own set of coefficients. Element
i
of the list should be a 1-dnumpy.ndarray
of shape (n_lags + 1), wheren_lags[i]
is the number of lags associated to featurei
. If set to None, the simulator will generate coefficients randomly.
hawkes_exp_kernels : SimuHawkesExpKernels
, default=None
Features are simulated with exponential kernel Hawkes processes. This parameter can be used to specify your own kernels (see
SimuHawkesExpKernels
documentation). If None, random kernels are generated. The same kernels are used to generate features for the whole generated population.
n_correlations : int
, default=0
If
hawkes_exp_kernels
is None, random kernels are generated. This parameter controls the number of non-null non-diagonal kernels.
batch_size : int
, default=None
When generating outcomes with Poisson distribution, the simulator will discard samples to which no event has occurred. In this case, the simulator generate successive batches of samples, until it reaches a total of n_samples. This parameter can be used to set the batch size.
seed : int
, default=None
The seed of the random number generator
verbose : bool
, default=True
If True, print things
Examples
>>> import numpy as np
>>> from tick.survival import SimuSCCS
>>> n_lags = np.repeat(2, 2).astype('uint64')
>>> sim = SimuSCCS(n_cases=5, n_intervals=3, n_features=2, n_lags=n_lags,
... seed=42, sparse=False, exposure_type="multiple_exposures",
... verbose=False)
>>> features, labels, outcomes, censoring, _coeffs = sim.simulate()
>>> print(features)
[array([[0., 0.],
[1., 0.],
[1., 1.]]), array([[1., 0.],
[1., 0.],
[1., 1.]]), array([[1., 1.],
[1., 1.],
[1., 1.]]), array([[0., 0.],
[1., 1.],
[1., 0.]]), array([[1., 0.],
[0., 0.],
[0., 0.]])]
>>> print(censoring)
[3 3 3 3 3]
>>> print(_coeffs)
[array([ 0.54738557, -0.15109073, 0.71345739]), array([ 1.67633284, -0.25656871, -0.25655065])]
__init__
(n_cases, n_intervals, n_features, n_lags, time_drift=None, exposure_type='single_exposure', distribution='multinomial', sparse=True, censoring_prob=0, censoring_scale=None, coeffs=None, hawkes_exp_kernels=None, n_correlations=0, batch_size=None, seed=None, verbose=True)[source]¶Initialize self. See help(type(self)) for accurate signature.
simulate
()[source]¶Launch simulation of the data.
features : list
of numpy.ndarray
or list
of scipy.sparse.csr_matrix
,
list of length n_cases, each element of the list of shape=(n_intervals, n_features) The list of features matrices.
labels : list
of numpy.ndarray
,
list of length n_cases, each element of the list of shape=(n_intervals,) The labels vector
censoring : numpy.ndarray
, shape=(n_cases,), dtype=”uint64”
The censoring data. This array should contain integers in [1, n_intervals]. If the value i is equal to n_intervals, then there is no censoring for sample i. If censoring = c < n_intervals, then the observation of sample i is stopped at interval c, that is, the row c - 1 of the corresponding matrix. The last n_intervals - c rows are then set to 0.
_coeffs : numpy.ndarray
, shape=(n_features * (n_lags + 1),)
The coefficients used to simulate the data.
tick.survival.SimuSCCS
¶