Optimizer using Evolution Strategies

EvolutionStrategiesOptimizer

class l2l.optimizers.evolutionstrategies.optimizer.EvolutionStrategiesOptimizer(traj, optimizee_create_individual, optimizee_fitness_weights, parameters, optimizee_bounding_func=None)[source]

Bases: l2l.optimizers.optimizer.Optimizer

Class Implementing the evolution strategies optimizer

as in: Salimans, T., Ho, J., Chen, X. & Sutskever, I. Evolution Strategies as a Scalable Alternative to

Reinforcement Learning. arXiv:1703.03864 [cs, stat] (2017).

In the pseudo code the algorithm does:

For n iterations do:
  • Perturb the current individual by adding a value with 0 mean and noise_std standard deviation

  • If mirrored sampling is enabled, also perturb the current individual by subtracting the same values that were added in the previous step

  • evaluate individuals and get fitness

  • Update the fitness as

    theta_{t+1} <- theta_t + alpha * sum{F_i * e_i} / (n * sigma^2)

    where F_i is the fitness and e_i is the perturbation

  • If fitness shaping is enabled, F_i is replaced with the utility u_i in the previous step, which is calculated as:

    u_i = max(0, log(n/2 + 1) - log(k)) / sum_{k=1}^{n}{max(0, log(n/2 + 1) - log(k))} - 1 / n

    As in the paper: Wierstra, D. et al. Natural Evolution Strategies. Journal of Machine Learning Research 15,

    949–980 (2014).

    where k and i are the indices of the individuals in descending order of fitness F_i

NOTE: This is not the most efficient implementation in terms of communication, since the new parameters are communicated to the individuals rather than the seed as in the paper. NOTE: Doesn’t yet contain fitness shaping and mirrored sampling

Parameters
  • traj (Trajectory) – Use this trajectory to store the parameters of the specific runs. The parameters should be initialized based on the values in parameters

  • optimizee_create_individual – Function that creates a new individual. All parameters of the Individual-Dict returned should be of numpy.float64 type

  • optimizee_fitness_weights – Fitness weights. The fitness returned by the Optimizee is multiplied by these values (one for each element of the fitness vector)

  • parameters – Instance of namedtuple() CrossEntropyParameters containing the parameters needed by the Optimizer

post_process(traj, fitnesses_results)[source]

See post_process()

end(traj)[source]

See end()

EvolutionStrategiesParameters

class l2l.optimizers.evolutionstrategies.optimizer.EvolutionStrategiesParameters(learning_rate, noise_std, mirrored_sampling_enabled, fitness_shaping_enabled, pop_size, n_iteration, stop_criterion, seed)

Bases: tuple

Parameters
  • learning_rate – Learning rate

  • noise_std – Standard deviation of the step size (The step has 0 mean)

  • mirrored_sampling_enabled – Should we turn on mirrored sampling i.e. sampling both e and -e

  • fitness_shaping_enabled – Should we turn on fitness shaping i.e. using only top fitness_shaping_ratio to update current individual?

  • pop_size – Number of individuals per simulation.

  • n_iteration – Number of iterations to perform

  • stop_criterion – (Optional) Stop if this fitness is reached.

  • seed – The random seed used for generating new individuals

property fitness_shaping_enabled
property learning_rate
property mirrored_sampling_enabled
property n_iteration
property noise_std
property pop_size
property seed
property stop_criterion