tick.solver.SDCA(l_l2sq: float, epoch_size: int = None, rand_type: str = 'unif', tol: float = 1e-10, max_iter: int = 10, verbose: bool = True, print_every: int = 1, record_every: int = 1, seed: int = -1)[source]¶Stochastic Dual Coordinate Ascent
For the minimization of objectives of the form
where the functions \(f_i\) have smooth gradients and \(g\) is
prox-capable. This solver actually requires more than that, since it is
working in a Fenchel dual formulation of the primal problem given above.
First, it requires that some ridge penalization is used, hence the mandatory
parameter l_l2sq below: SDCA will actually minimize the objective
where \(\lambda\) is tuned with the l_l2sq (see below). Now, putting
\(h(w) = g(w) + \lambda \|w\|_2^2 / 2\), SDCA maximize
the Fenchel dual problem
where \(f_i^*\) and \(h^*\) and the Fenchel duals of \(f_i\)
and \(h\) respectively.
Function \(f = \frac 1n \sum_{i=1}^n f_i\) corresponds
to the model.loss method of the model (passed with set_model to the
solver) and \(g\) corresponds to the prox.value method of the
prox (passed with the set_prox method). One iteration of
SDCA corresponds to the
following iteration applied epoch_size times:
where \(i\) is sampled at random (strategy depends on rand_type) at
each iteration. The ridge regularization \(\lambda\) can be tuned with
l_l2sq, the seed of the random number generator for generation
of samples \(i\) can be seeded with seed. The iterations stop
whenever tolerance tol is achieved, or after max_iter epochs
(namely max_iter \(\times\) epoch_size iterates).
The obtained solution \(w\) is returned by the solve method, and is
also stored in the solution attribute of the solver. The dual solution
\(\alpha\) is stored in the dual_solution attribute.
Internally, SDCA has dedicated code when
the model is a generalized linear model with sparse features, and a
separable proximal operator: in this case, each iteration works only in the
set of non-zero features, leading to much faster iterates.
l_l2sq : float
Level of L2 penalization. L2 penalization is mandatory for SDCA. Convergence properties of this solver are deeply connected to this parameter, which should be understood as the “step” used by the algorithm.
tol : float, default=1e-10
The tolerance of the solver (iterations stop when the stopping criterion is below it)
max_iter : int, default=10
Maximum number of iterations of the solver, namely maximum number of epochs (by default full pass over the data, unless
epoch_sizehas been modified from default)
verbose : bool, default=True
If
True, solver verboses history, otherwise nothing is displayed, but history is recorded anyway
seed : int, default=-1
The seed of the random sampling. If it is negative then a random seed (different at each run) will be chosen.
epoch_size : int, default given by model
Epoch size, namely how many iterations are made before updating the variance reducing term. By default, this is automatically tuned using information from the model object passed through
set_model.
rand_type : {‘unif’, ‘perm’}, default=’unif’
How samples are randomly selected from the data
if
'unif'samples are uniformly drawn among all possibilitiesif
'perm'a random permutation of all possibilities is generated and samples are sequentially taken from it. Once all of them have been taken, a new random permutation is generated
print_every : int, default=1
Print history information every time the iteration number is a multiple of
print_every. Used only isverboseis True
record_every : int, default=1
Save history information every time the iteration number is a multiple of
record_every
model : Model
The model used by the solver, passed with the
set_modelmethod
prox : Prox
Proximal operator used by the solver, passed with the
set_proxmethod
solution : numpy.array, shape=(n_coeffs,)
Minimizer found by the solver
dual_solution : numpy.array
Dual vector corresponding to the primal solution obtained by the solver
history : dict-like
A dict-type of object that contains history of the solver along iterations. It should be accessed using the
get_historymethod
time_start : str
Start date of the call to
solve()
time_elapsed : float
Duration of the call to
solve(), in seconds
time_end : str
End date of the call to
solve()
dtype : {'float64', 'float32'}, default=’float64’
Type of the arrays used. This value is set from model and prox dtypes.
References
S. Shalev-Shwartz and T. Zhang, Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization, ICML 2014
__init__(l_l2sq: float, epoch_size: int = None, rand_type: str = 'unif', tol: float = 1e-10, max_iter: int = 10, verbose: bool = True, print_every: int = 1, record_every: int = 1, seed: int = -1)[source]¶Initialize self. See help(type(self)) for accurate signature.
dual_objective(dual_coeffs)[source]¶Compute the dual objective at dual_coeffs
dual_coeffs : numpy.ndarray, shape=(n_samples,)
The dual objective objective is computed at this point
output : float
Value of the dual objective at given
dual_coeffs
get_history(key=None)¶Returns history of the solver
key : str, default=None
If
Noneall history is returned as adictIf
str, name of the history element to retrieve
output : list or dict
If
keyis None orkeyis not in history then output is a dict containing history of all keysIf
keyis the name of an element in the history, output is alistcontaining the history of this element
objective(coeffs, loss: float = None)[source]¶Compute the objective minimized by the solver at coeffs
coeffs : numpy.ndarray, shape=(n_coeffs,)
The objective is computed at this point
loss : float, default=`None`
Gives the value of the loss if already known (allows to avoid its computation in some cases)
output : float
Value of the objective at given
coeffs
set_model(model: tick.base_model.model.Model)¶Set model in the solver
model : Model
Sets the model in the solver. The model gives the first order information about the model (loss, gradient, among other things)
output : Solver
The
Solverwith given model
set_prox(prox: tick.prox.base.prox.Prox)¶Set proximal operator in the solver
prox : Prox
The proximal operator of the penalization function
output : Solver
The solver with given prox
Notes
In some solvers, set_model must be called before
set_prox, otherwise and error might be raised
solve(x0=None, step=None)¶Launch the solver
x0 : np.array, shape=(n_coeffs,), default=`None`
Starting point of the solver
step : float, default=`None`
Step-size or learning rate for the solver. This can be tuned also using the
stepattribute
output : np.array, shape=(n_coeffs,)
Obtained minimizer for the problem, same as
solutionattribute
tick.solver.SDCA¶