core.algorithm package¶

Submodules¶

core.algorithm.A2C module¶

class core.algorithm.A2C.A2CAgent(state_size, action_size, actor, critic, load_model=True, training_mode=True, discount_factor=0.99, actor_lr=0.001, critic_lr=0.005, file_path_actor='', file_path_critic='', **kwargs)¶

Bases: core.common.agent.Agent

actor_optimizer()¶

Create optimizer for actor network model

# return: Keras function (object)

backward(reward, terminal)¶: Updates critic and actor network See the details in agent.py

compile(optimizer, metrics=[])¶

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments: optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.

critic_optimizer()¶

Create optimizer for critic network model

# return: Keras function (object)

forward(observation)¶: Takes the an observation from the environment and returns the action to be taken next. See the details in agent.py

load_weights(filepath)¶

Loads the weights of an agent from an HDF5 file.

# Arguments: filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.

save_weights(filepath, overwrite=False)¶

Saves the weights of an agent as an HDF5 file.

# Arguments: filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

core.algorithm.DDPG module¶

referenced by https://github.com/pemami4911/deep-rl/blob/master/ddpg/ddpg.py

class core.algorithm.DDPG.DDPGAgent(actor, critic, action_shape, memory, critic_action_input, policy=None, test_policy=None, discount_factor=0.99, learning_rate=0.001, batch_size=32, train_interval=1, delta_clip=inf, nb_warmup_critic_step_cnt=500, nb_warmup_actor_step_cnt=500, random_process=None, tau_for_actor=0.001, tau_for_critic=0.001, **kwargs)¶

Bases: core.common.agent.Agent

backward(reward, terminal)¶

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument: reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns: List of metrics values

compile(optimizer, metrics=[])¶

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments: optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.

forward(observation)¶

# Select an action. state = self.memory.get_recent_state(observation) action = self.select_action(state) # TODO: move this into policy

# Book-keeping. self.recent_observation = observation self.recent_action = action

return action

Parameters:	observation –
Returns:

layers¶

Returns all layers of the underlying model(s).

If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.

# Returns: A list of the model’s layers

load_weights(filepath, filename)¶

Loads the weights of an agent from an HDF5 file.

# Arguments: filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.

process_state_batch(batch)¶

reset_states()¶: Resets all internally kept states after an episode is completed.

save_weights(filepath, filename, yyyymmdd=None, overwrite=False)¶

Saves the weights of an agent as an HDF5 file.

# Arguments: filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

update_target_model_hard()¶

uses_learning_phase¶

core.algorithm.DDPG.ddpg_distance_metric(actions1, actions2)¶: Compute “distance” between actions taken by two policies at the same states Expects numpy arrays

core.algorithm.DDPG.hard_update(target, source)¶

core.algorithm.DQN module¶

class core.algorithm.DQN.DQNAgent(model, nb_actions, memory, discount_factor=0.99, batch_size=32, train_interval=1000, target_model_update=10000, delta_clip=inf, warmup_step_cnt=1000, enable_dueling=False, memory_interval=1, enable_double=False, dueling_type='avg', policy=None, test_policy=None, enable_encouraged_action=False, enable_discouraged_action=False, action_affected_observation_space=None, enable_pop_art=False, **kwargs)¶

Bases: core.common.agent.Agent

append_replay_memory(reward, terminal)¶

Parameters:	reward – terminal –
Returns:

backward(reward, terminal)¶

Parameters:	reward – terminal –
Returns:

compile(optimizer, metrics=[])¶

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments: optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.

forward(observation)¶: choose action :param observation: observation which is used for agent to choose action :return: action

load_weights(filepath)¶

Parameters:	filepath –
Returns:

policy¶

process_state_batch(batch)¶

reset_states()¶: you can specify any logic which run whenever episode ends.

save_weights(filepath, overwrite=False, force=False)¶

Parameters:	filepath – overwrite –
Returns:

test_policy¶

update_target_model_hard()¶

core.algorithm.Deep_sarsa module¶

class core.algorithm.Deep_sarsa.DeepSARSAgent(action_size, model, load_model=True, discount_factor=0.99, learning_rate=0.001, epsilon=1, epsilon_decay=0.999, epsilon_min=0.01, file_path='', training_mode=True, **kwargs)¶

Bases: core.common.agent.Agent

backward(reward, terminal)¶: Updates the agent’s network

compile(optimizer, metrics=[])¶: Compile the model

forward(observation)¶: Get action to be taken from observation See the description in agent.py

load_weights(filepath)¶: Load trained weight from an HDF5 file.

save_weights(filepath, overwrite=False)¶: Save trained weight from an HDF5 file.

core.algorithm.MADDPG module¶

# Based on Deep DPG as described by Lillicrap et al. (2015) # http://arxiv.org/pdf/1509.02971v2.pdf # http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.646.4324&rep=rep1&type=pdf

class core.algorithm.MADDPG.MA_DDPGAgent(nb_agents, nb_actions, actor, critic, critic_action_input, memory, gamma=0.99, batch_size=32, nb_steps_warmup_critic=1000, nb_steps_warmup_actor=1000, train_interval=1, memory_interval=1, delta_range=None, delta_clip=inf, random_process=None, custom_model_objects={}, target_model_update=0.001, **kwargs)¶

Bases: rl.core.Agent

Write me

backward(reward, terminal=False)¶

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument: reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns: List of metrics values

compile(optimizer, metrics=[])¶

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments: optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.

forward(observation)¶

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument: observation (object): The current observation from the environment.
# Returns: The next action to be executed in the environment.

layers¶

Returns all layers of the underlying model(s).

If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.

# Returns: A list of the model’s layers

load_weights(filepath)¶

Loads the weights of an agent from an HDF5 file.

# Arguments: filepath (str): The path to the HDF5 file.

metrics_names¶

The human-readable names of the agent’s metrics. Must return as many names as there are metrics (see also compile).

# Returns: A list of metric’s names (string)

process_state_batch(batch)¶

reset_states()¶: Resets all internally kept states after an episode is completed.

save_weights(filepath, overwrite=False)¶

Saves the weights of an agent as an HDF5 file.

# Arguments: filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

select_action(state)¶

update_target_models_hard()¶

uses_learning_phase¶

core.algorithm.MADDPG.mean_q(y_true, y_pred)¶

core.algorithm.MADQN module¶

Based on implementation of the DQN agent as described in Mnih (2013) and Mnih (2015). http://arxiv.org/pdf/1312.5602.pdf http://arxiv.org/abs/1509.06461

class core.algorithm.MADQN.AbstractMA_DQNAgent(nb_agents, nb_actions, memory, gamma=0.99, batch_size=32, nb_steps_warmup=1000, train_interval=1, memory_interval=1, target_model_update=10000, delta_range=None, delta_clip=inf, custom_model_objects={}, **kwargs)¶

Bases: core.common.agent.Agent

Write me

compute_batch_q_values(state_batch)¶

compute_q_values(state)¶

get_config()¶

process_state_batch(batch)¶

class core.algorithm.MADQN.MA_DQNAgent(model, policy=None, test_policy=None, enable_double_dqn=True, enable_dueling_network=False, dueling_type='avg', enable_encouraged_action=False, enable_discouraged_action=False, action_affected_observation_space=None, *args, **kwargs)¶

Bases: core.algorithm.MADQN.AbstractMA_DQNAgent

backward(reward, terminal)¶

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument: reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns: List of metrics values

compile(optimizer, metrics=[])¶

Compiles an agent and the underlaying models to be used for training and testing.

# Arguments: optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.

forward(observation)¶

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument: observation (object): The current observation from the environment.
# Returns: The next action to be executed in the environment.

get_config()¶

layers¶

Returns all layers of the underlying model(s).

If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.

# Returns: A list of the model’s layers

load_weights(filepath)¶

Loads the weights of an agent from an HDF5 file.

# Arguments: filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.

metrics_names¶

policy¶

reset_states()¶: you can specify any logic which run whenever episode ends.

save_weights(filepath, overwrite=False)¶

Saves the weights of an agent as an HDF5 file.

# Arguments: filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

test_policy¶

update_target_model_hard()¶

core.algorithm.MADQN.mean_q(y_true, y_pred)¶

core.algorithm.PPO module¶

class core.algorithm.PPO.PPOAgent(state_size, action_size, continuous, actor, critic, gamma=0.99, loss_clipping=0.2, epochs=10, noise=1.0, entropy_loss=0.001, buffer_size=256, batch_size=64, load_model=True, training_mode=True, file_path_actor='', file_path_critic='', **kwargs)¶

Bases: core.common.agent.Agent

backward(reward, terminal)¶

Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.

# Argument: reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
# Returns: List of metrics values

compile(optimizer, metrics=[])¶

# Argument: optimizer (object) : [0] = actor optimizer, [1] = critic optimizer metrics (Tensor) : [0] = Keras Tensor as an advantage , [1] = Keras Tensor as an old_prediction
# Return: None

discounted_reward()¶

forward(observation)¶

Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.

# Argument: observation (object): The current observation from the environment.
# Returns: The next action to be executed in the environment.

load_weights(filepath, filename)¶

Loads the weights of an agent from an HDF5 file.

# Arguments: filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.

proximal_policy_optimization_loss(advantage, old_prediction)¶

proximal_policy_optimization_loss_continuous(advantage, old_prediction)¶

reset_env()¶

save_weights(filepath, filename=None, overwrite=False)¶

Saves the weights of an agent as an HDF5 file.

# Arguments: filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

core.algorithm.QLearning module¶

class core.algorithm.QLearning.QLearningAgent(actions, learning_rate=0.01, discount_factor=0.9, epsilon=0.9)¶

Bases: object

static arg_max(state_action)¶

choose_action(state)¶

learn(state, action, reward, done, next_state)¶

core.algorithm.REINFORCE module¶

class core.algorithm.REINFORCE.ReinforceAgent(state_size, action_size, model, load_model=True, discount_factor=0.99, learning_rate=0.001, training_mode=True, file_path='', **kwargs)¶

Bases: core.common.agent.Agent

backward(reward, terminal)¶: Updates the agent See the description in agent.py

compile(optimizer, metrics=[])¶: Compiles an agent Define new optimizer instead of input optimizer See the description in agent.py

discount_rewards(rewards)¶

calculate discounted rewards

# Argument: rewards (float): list of rewards
# Returns: List of discounted rewards

forward(observation)¶: Get action to be taken from observation See the description in agent.py

load_weights(file_path)¶

Loads the weights of an agent from an HDF5 file.

# Arguments: filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.

save_weights(file_path, overwrite=False)¶

Saves the weights of an agent as an HDF5 file.

# Arguments: filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.

core.algorithm package¶

Submodules¶

core.algorithm.A2C module¶

core.algorithm.DDPG module¶

core.algorithm.DQN module¶

core.algorithm.Deep_sarsa module¶

core.algorithm.MADDPG module¶

core.algorithm.MADQN module¶

core.algorithm.PPO module¶

core.algorithm.QLearning module¶

core.algorithm.REINFORCE module¶

Module contents¶