tick.robust.ModelHuber

class tick.robust.ModelHuber(fit_intercept: bool = True, threshold: float = 1, n_threads: int = 1)[source]

Huber loss for robust regression. This model is particularly relevant to deal with datasets with outliers. The class gives first order information (gradient and loss) for this model and can be passed to any solver through the solver’s set_model method.

Given training data \((x_i, y_i) \in \mathbb R^d \times \mathbb R\) for \(i=1, \ldots, n\), this model considers a goodness-of-fit

\[f(w, b) = \frac 1n \sum_{i=1}^n \ell(y_i, b + x_i^\top w),\]

where \(w \in \mathbb R^d\) is a vector containing the model-weights, \(b \in \mathbb R\) is the intercept (used only whenever fit_intercept=True) and \(\ell : \mathbb R^2 \rightarrow \mathbb R\) is the loss given by

\[\begin{split}\ell(y, y') = \begin{cases} \frac 12 (y' - y)^2 &\text{ if } |y' - y| \leq \delta \\ \delta (|y' - y| - \frac 12 \delta) &\text{ if } |y' - y| > \delta \end{cases}\end{split}\]

for \(y, y' \in \mathbb R\), where \(\delta > 0\) can be tuned using the threshold argument. Data is passed to this model through the fit(X, y) method where X is the features matrix (dense or sparse) and y is the vector of labels.

Parameters:

fit_intercept : bool

If True, the model uses an intercept

threshold : float, default=1.

Positive threshold of the loss, see above for details.

Attributes:

features : {numpy.ndarray, scipy.sparse.csr_matrix}, shape=(n_samples, n_features)

The features matrix, either dense or sparse

labels : numpy.ndarray, shape=(n_samples,) (read-only)

The labels vector

n_samples : int (read-only)

Number of samples

n_features : int (read-only)

Number of features

n_coeffs : int (read-only)

Total number of coefficients of the model

n_threads : int, default=1 (read-only)

Number of threads used for parallel computation.

  • if int <= 0: the number of threads available on the CPU

  • otherwise the desired number of threads

Examples using tick.robust.ModelHuber