Optimizer using Evolution Strategies¶
EvolutionStrategiesOptimizer¶
-
class
l2l.optimizers.evolutionstrategies.optimizer.
EvolutionStrategiesOptimizer
(traj, optimizee_create_individual, optimizee_fitness_weights, parameters, optimizee_bounding_func=None)[source]¶ Bases:
l2l.optimizers.optimizer.Optimizer
Class Implementing the evolution strategies optimizer
- as in: Salimans, T., Ho, J., Chen, X. & Sutskever, I. Evolution Strategies as a Scalable Alternative to
Reinforcement Learning. arXiv:1703.03864 [cs, stat] (2017).
In the pseudo code the algorithm does:
- For n iterations do:
Perturb the current individual by adding a value with 0 mean and noise_std standard deviation
If mirrored sampling is enabled, also perturb the current individual by subtracting the same values that were added in the previous step
evaluate individuals and get fitness
Update the fitness as
theta_{t+1} <- theta_t + alpha * sum{F_i * e_i} / (n * sigma^2)
where F_i is the fitness and e_i is the perturbation
If fitness shaping is enabled, F_i is replaced with the utility u_i in the previous step, which is calculated as:
u_i = max(0, log(n/2 + 1) - log(k)) / sum_{k=1}^{n}{max(0, log(n/2 + 1) - log(k))} - 1 / n
- As in the paper: Wierstra, D. et al. Natural Evolution Strategies. Journal of Machine Learning Research 15,
949–980 (2014).
where k and i are the indices of the individuals in descending order of fitness F_i
NOTE: This is not the most efficient implementation in terms of communication, since the new parameters are communicated to the individuals rather than the seed as in the paper. NOTE: Doesn’t yet contain fitness shaping and mirrored sampling
- Parameters
traj (Trajectory) – Use this trajectory to store the parameters of the specific runs. The parameters should be initialized based on the values in parameters
optimizee_create_individual – Function that creates a new individual. All parameters of the Individual-Dict returned should be of numpy.float64 type
optimizee_fitness_weights – Fitness weights. The fitness returned by the Optimizee is multiplied by these values (one for each element of the fitness vector)
parameters – Instance of
namedtuple()
CrossEntropyParameters
containing the parameters needed by the Optimizer
-
post_process
(traj, fitnesses_results)[source]¶ See
post_process()
EvolutionStrategiesParameters¶
-
class
l2l.optimizers.evolutionstrategies.optimizer.
EvolutionStrategiesParameters
(learning_rate, noise_std, mirrored_sampling_enabled, fitness_shaping_enabled, pop_size, n_iteration, stop_criterion, seed)¶ Bases:
tuple
- Parameters
learning_rate – Learning rate
noise_std – Standard deviation of the step size (The step has 0 mean)
mirrored_sampling_enabled – Should we turn on mirrored sampling i.e. sampling both e and -e
fitness_shaping_enabled – Should we turn on fitness shaping i.e. using only top fitness_shaping_ratio to update current individual?
pop_size – Number of individuals per simulation.
n_iteration – Number of iterations to perform
stop_criterion – (Optional) Stop if this fitness is reached.
seed – The random seed used for generating new individuals
-
property
fitness_shaping_enabled
¶
-
property
learning_rate
¶
-
property
mirrored_sampling_enabled
¶
-
property
n_iteration
¶
-
property
noise_std
¶
-
property
pop_size
¶
-
property
seed
¶
-
property
stop_criterion
¶