core.algorithm package

Submodules

core.algorithm.A2C module

class core.algorithm.A2C.A2CAgent(state_size, action_size, actor, critic, load_model=True, training_mode=True, discount_factor=0.99, actor_lr=0.001, critic_lr=0.005, file_path_actor='', file_path_critic='', **kwargs)

Bases: core.common.agent.Agent

actor_optimizer()

Create optimizer for actor network model

# return
Keras function (object)
backward(reward, terminal)

Updates critic and actor network See the details in agent.py

compile(optimizer, metrics=[])

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments
optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
critic_optimizer()

Create optimizer for critic network model

# return
Keras function (object)
forward(observation)

Takes the an observation from the environment and returns the action to be taken next. See the details in agent.py

load_weights(filepath)

Loads the weights of an agent from an HDF5 file.

# Arguments
filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
save_weights(filepath, overwrite=False)

Saves the weights of an agent as an HDF5 file.

# Arguments
filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

core.algorithm.DDPG module

referenced by https://github.com/pemami4911/deep-rl/blob/master/ddpg/ddpg.py

class core.algorithm.DDPG.DDPGAgent(actor, critic, action_shape, memory, critic_action_input, policy=None, test_policy=None, discount_factor=0.99, learning_rate=0.001, batch_size=32, train_interval=1, delta_clip=inf, nb_warmup_critic_step_cnt=500, nb_warmup_actor_step_cnt=500, random_process=None, tau_for_actor=0.001, tau_for_critic=0.001, **kwargs)

Bases: core.common.agent.Agent

backward(reward, terminal)

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument
reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns
List of metrics values
compile(optimizer, metrics=[])

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments
optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
forward(observation)

# Select an action. state = self.memory.get_recent_state(observation) action = self.select_action(state) # TODO: move this into policy

# Book-keeping. self.recent_observation = observation self.recent_action = action

return action

Parameters:observation
Returns:
layers

Returns all layers of the underlying model(s).

If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.

# Returns
A list of the model’s layers
load_weights(filepath, filename)

Loads the weights of an agent from an HDF5 file.

# Arguments
filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
process_state_batch(batch)
reset_states()

Resets all internally kept states after an episode is completed.

save_weights(filepath, filename, yyyymmdd=None, overwrite=False)

Saves the weights of an agent as an HDF5 file.

# Arguments
filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
update_target_model_hard()
uses_learning_phase
core.algorithm.DDPG.ddpg_distance_metric(actions1, actions2)

Compute “distance” between actions taken by two policies at the same states Expects numpy arrays

core.algorithm.DDPG.hard_update(target, source)

core.algorithm.DQN module

class core.algorithm.DQN.DQNAgent(model, nb_actions, memory, discount_factor=0.99, batch_size=32, train_interval=1000, target_model_update=10000, delta_clip=inf, warmup_step_cnt=1000, enable_dueling=False, memory_interval=1, enable_double=False, dueling_type='avg', policy=None, test_policy=None, enable_encouraged_action=False, enable_discouraged_action=False, action_affected_observation_space=None, enable_pop_art=False, **kwargs)

Bases: core.common.agent.Agent

append_replay_memory(reward, terminal)
Parameters:
  • reward
  • terminal
Returns:

backward(reward, terminal)
Parameters:
  • reward
  • terminal
Returns:

compile(optimizer, metrics=[])

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments
optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
forward(observation)

choose action :param observation: observation which is used for agent to choose action :return: action

load_weights(filepath)
Parameters:filepath
Returns:
policy
process_state_batch(batch)
reset_states()

you can specify any logic which run whenever episode ends.

save_weights(filepath, overwrite=False, force=False)
Parameters:
  • filepath
  • overwrite
Returns:

test_policy
update_target_model_hard()

core.algorithm.Deep_sarsa module

class core.algorithm.Deep_sarsa.DeepSARSAgent(action_size, model, load_model=True, discount_factor=0.99, learning_rate=0.001, epsilon=1, epsilon_decay=0.999, epsilon_min=0.01, file_path='', training_mode=True, **kwargs)

Bases: core.common.agent.Agent

backward(reward, terminal)

Updates the agent’s network

compile(optimizer, metrics=[])

Compile the model

forward(observation)

Get action to be taken from observation See the description in agent.py

load_weights(filepath)

Load trained weight from an HDF5 file.

save_weights(filepath, overwrite=False)

Save trained weight from an HDF5 file.

core.algorithm.MADDPG module

# Based on Deep DPG as described by Lillicrap et al. (2015) # http://arxiv.org/pdf/1509.02971v2.pdf # http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.646.4324&rep=rep1&type=pdf

class core.algorithm.MADDPG.MA_DDPGAgent(nb_agents, nb_actions, actor, critic, critic_action_input, memory, gamma=0.99, batch_size=32, nb_steps_warmup_critic=1000, nb_steps_warmup_actor=1000, train_interval=1, memory_interval=1, delta_range=None, delta_clip=inf, random_process=None, custom_model_objects={}, target_model_update=0.001, **kwargs)

Bases: rl.core.Agent

Write me

backward(reward, terminal=False)

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument
reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns
List of metrics values
compile(optimizer, metrics=[])

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments
optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
forward(observation)

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument
observation (object): The current observation from the environment.
# Returns
The next action to be executed in the environment.
layers

Returns all layers of the underlying model(s).

If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.

# Returns
A list of the model’s layers
load_weights(filepath)

Loads the weights of an agent from an HDF5 file.

# Arguments
filepath (str): The path to the HDF5 file.
metrics_names

The human-readable names of the agent’s metrics. Must return as many names as there are metrics (see also compile).

# Returns
A list of metric’s names (string)
process_state_batch(batch)
reset_states()

Resets all internally kept states after an episode is completed.

save_weights(filepath, overwrite=False)

Saves the weights of an agent as an HDF5 file.

# Arguments
filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
select_action(state)
update_target_models_hard()
uses_learning_phase
core.algorithm.MADDPG.mean_q(y_true, y_pred)

core.algorithm.MADQN module

Based on implementation of the DQN agent as described in Mnih (2013) and Mnih (2015). http://arxiv.org/pdf/1312.5602.pdf http://arxiv.org/abs/1509.06461

class core.algorithm.MADQN.AbstractMA_DQNAgent(nb_agents, nb_actions, memory, gamma=0.99, batch_size=32, nb_steps_warmup=1000, train_interval=1, memory_interval=1, target_model_update=10000, delta_range=None, delta_clip=inf, custom_model_objects={}, **kwargs)

Bases: core.common.agent.Agent

Write me

compute_batch_q_values(state_batch)
compute_q_values(state)
get_config()
process_state_batch(batch)
class core.algorithm.MADQN.MA_DQNAgent(model, policy=None, test_policy=None, enable_double_dqn=True, enable_dueling_network=False, dueling_type='avg', enable_encouraged_action=False, enable_discouraged_action=False, action_affected_observation_space=None, *args, **kwargs)

Bases: core.algorithm.MADQN.AbstractMA_DQNAgent

backward(reward, terminal)

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument
reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns
List of metrics values
compile(optimizer, metrics=[])

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments
optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
forward(observation)

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument
observation (object): The current observation from the environment.
# Returns
The next action to be executed in the environment.
get_config()
layers

Returns all layers of the underlying model(s).

If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.

# Returns
A list of the model’s layers
load_weights(filepath)

Loads the weights of an agent from an HDF5 file.

# Arguments
filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
metrics_names
policy
reset_states()

you can specify any logic which run whenever episode ends.

save_weights(filepath, overwrite=False)

Saves the weights of an agent as an HDF5 file.

# Arguments
filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
test_policy
update_target_model_hard()
core.algorithm.MADQN.mean_q(y_true, y_pred)

core.algorithm.PPO module

class core.algorithm.PPO.PPOAgent(state_size, action_size, continuous, actor, critic, gamma=0.99, loss_clipping=0.2, epochs=10, noise=1.0, entropy_loss=0.001, buffer_size=256, batch_size=64, load_model=True, training_mode=True, file_path_actor='', file_path_critic='', **kwargs)

Bases: core.common.agent.Agent

backward(reward, terminal)

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument
reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns
List of metrics values
compile(optimizer, metrics=[])
# Argument
optimizer (object) : [0] = actor optimizer, [1] = critic optimizer metrics (Tensor) : [0] = Keras Tensor as an advantage , [1] = Keras Tensor as an old_prediction
# Return
None
discounted_reward()
forward(observation)

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument
observation (object): The current observation from the environment.
# Returns
The next action to be executed in the environment.
load_weights(filepath, filename)

Loads the weights of an agent from an HDF5 file.

# Arguments
filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
proximal_policy_optimization_loss(advantage, old_prediction)
proximal_policy_optimization_loss_continuous(advantage, old_prediction)
reset_env()
save_weights(filepath, filename=None, overwrite=False)

Saves the weights of an agent as an HDF5 file.

# Arguments
filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

core.algorithm.QLearning module

class core.algorithm.QLearning.QLearningAgent(actions, learning_rate=0.01, discount_factor=0.9, epsilon=0.9)

Bases: object

static arg_max(state_action)
choose_action(state)
learn(state, action, reward, done, next_state)

core.algorithm.REINFORCE module

class core.algorithm.REINFORCE.ReinforceAgent(state_size, action_size, model, load_model=True, discount_factor=0.99, learning_rate=0.001, training_mode=True, file_path='', **kwargs)

Bases: core.common.agent.Agent

backward(reward, terminal)

Updates the agent See the description in agent.py

compile(optimizer, metrics=[])

Compiles an agent Define new optimizer instead of input optimizer See the description in agent.py

discount_rewards(rewards)

calculate discounted rewards

# Argument
rewards (float): list of rewards
# Returns
List of discounted rewards
forward(observation)

Get action to be taken from observation See the description in agent.py

load_weights(file_path)

Loads the weights of an agent from an HDF5 file.

# Arguments
filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
save_weights(file_path, overwrite=False)

Saves the weights of an agent as an HDF5 file.

# Arguments
filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

Module contents