core.algorithm package¶
Submodules¶
core.algorithm.A2C module¶
-
class
core.algorithm.A2C.
A2CAgent
(state_size, action_size, actor, critic, load_model=True, training_mode=True, discount_factor=0.99, actor_lr=0.001, critic_lr=0.005, file_path_actor='', file_path_critic='', **kwargs)¶ Bases:
core.common.agent.Agent
-
actor_optimizer
()¶ Create optimizer for actor network model
- # return
- Keras function (object)
-
backward
(reward, terminal)¶ Updates critic and actor network See the details in agent.py
-
compile
(optimizer, metrics=[])¶ Compiles an agent and the underlaying models to be used for training and testing.
- # Arguments
- optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
-
critic_optimizer
()¶ Create optimizer for critic network model
- # return
- Keras function (object)
-
forward
(observation)¶ Takes the an observation from the environment and returns the action to be taken next. See the details in agent.py
-
load_weights
(filepath)¶ Loads the weights of an agent from an HDF5 file.
- # Arguments
- filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
-
save_weights
(filepath, overwrite=False)¶ Saves the weights of an agent as an HDF5 file.
- # Arguments
- filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
-
core.algorithm.DDPG module¶
referenced by https://github.com/pemami4911/deep-rl/blob/master/ddpg/ddpg.py
-
class
core.algorithm.DDPG.
DDPGAgent
(actor, critic, action_shape, memory, critic_action_input, policy=None, test_policy=None, discount_factor=0.99, learning_rate=0.001, batch_size=32, train_interval=1, delta_clip=inf, nb_warmup_critic_step_cnt=500, nb_warmup_actor_step_cnt=500, random_process=None, tau_for_actor=0.001, tau_for_critic=0.001, **kwargs)¶ Bases:
core.common.agent.Agent
-
backward
(reward, terminal)¶ Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.
- # Argument
- reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
- # Returns
- List of metrics values
-
compile
(optimizer, metrics=[])¶ Compiles an agent and the underlaying models to be used for training and testing.
- # Arguments
- optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
-
forward
(observation)¶ # Select an action. state = self.memory.get_recent_state(observation) action = self.select_action(state) # TODO: move this into policy
# Book-keeping. self.recent_observation = observation self.recent_action = action
return action
Parameters: observation – Returns:
-
layers
¶ Returns all layers of the underlying model(s).
If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.
- # Returns
- A list of the model’s layers
-
load_weights
(filepath, filename)¶ Loads the weights of an agent from an HDF5 file.
- # Arguments
- filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
-
process_state_batch
(batch)¶
-
reset_states
()¶ Resets all internally kept states after an episode is completed.
-
save_weights
(filepath, filename, yyyymmdd=None, overwrite=False)¶ Saves the weights of an agent as an HDF5 file.
- # Arguments
- filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
-
update_target_model_hard
()¶
-
uses_learning_phase
¶
-
-
core.algorithm.DDPG.
ddpg_distance_metric
(actions1, actions2)¶ Compute “distance” between actions taken by two policies at the same states Expects numpy arrays
-
core.algorithm.DDPG.
hard_update
(target, source)¶
core.algorithm.DQN module¶
-
class
core.algorithm.DQN.
DQNAgent
(model, nb_actions, memory, discount_factor=0.99, batch_size=32, train_interval=1000, target_model_update=10000, delta_clip=inf, warmup_step_cnt=1000, enable_dueling=False, memory_interval=1, enable_double=False, dueling_type='avg', policy=None, test_policy=None, enable_encouraged_action=False, enable_discouraged_action=False, action_affected_observation_space=None, enable_pop_art=False, **kwargs)¶ Bases:
core.common.agent.Agent
-
append_replay_memory
(reward, terminal)¶ Parameters: - reward –
- terminal –
Returns:
-
backward
(reward, terminal)¶ Parameters: - reward –
- terminal –
Returns:
-
compile
(optimizer, metrics=[])¶ Compiles an agent and the underlaying models to be used for training and testing.
- # Arguments
- optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
-
forward
(observation)¶ choose action :param observation: observation which is used for agent to choose action :return: action
-
load_weights
(filepath)¶ Parameters: filepath – Returns:
-
policy
¶
-
process_state_batch
(batch)¶
-
reset_states
()¶ you can specify any logic which run whenever episode ends.
-
save_weights
(filepath, overwrite=False, force=False)¶ Parameters: - filepath –
- overwrite –
Returns:
-
test_policy
¶
-
update_target_model_hard
()¶
-
core.algorithm.Deep_sarsa module¶
-
class
core.algorithm.Deep_sarsa.
DeepSARSAgent
(action_size, model, load_model=True, discount_factor=0.99, learning_rate=0.001, epsilon=1, epsilon_decay=0.999, epsilon_min=0.01, file_path='', training_mode=True, **kwargs)¶ Bases:
core.common.agent.Agent
-
backward
(reward, terminal)¶ Updates the agent’s network
-
compile
(optimizer, metrics=[])¶ Compile the model
-
forward
(observation)¶ Get action to be taken from observation See the description in agent.py
-
load_weights
(filepath)¶ Load trained weight from an HDF5 file.
-
save_weights
(filepath, overwrite=False)¶ Save trained weight from an HDF5 file.
-
core.algorithm.MADDPG module¶
# Based on Deep DPG as described by Lillicrap et al. (2015) # http://arxiv.org/pdf/1509.02971v2.pdf # http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.646.4324&rep=rep1&type=pdf
-
class
core.algorithm.MADDPG.
MA_DDPGAgent
(nb_agents, nb_actions, actor, critic, critic_action_input, memory, gamma=0.99, batch_size=32, nb_steps_warmup_critic=1000, nb_steps_warmup_actor=1000, train_interval=1, memory_interval=1, delta_range=None, delta_clip=inf, random_process=None, custom_model_objects={}, target_model_update=0.001, **kwargs)¶ Bases:
rl.core.Agent
Write me
-
backward
(reward, terminal=False)¶ Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.
- # Argument
- reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
- # Returns
- List of metrics values
-
compile
(optimizer, metrics=[])¶ Compiles an agent and the underlaying models to be used for training and testing.
- # Arguments
- optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
-
forward
(observation)¶ Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.
- # Argument
- observation (object): The current observation from the environment.
- # Returns
- The next action to be executed in the environment.
-
layers
¶ Returns all layers of the underlying model(s).
If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.
- # Returns
- A list of the model’s layers
-
load_weights
(filepath)¶ Loads the weights of an agent from an HDF5 file.
- # Arguments
- filepath (str): The path to the HDF5 file.
-
metrics_names
¶ The human-readable names of the agent’s metrics. Must return as many names as there are metrics (see also compile).
- # Returns
- A list of metric’s names (string)
-
process_state_batch
(batch)¶
-
reset_states
()¶ Resets all internally kept states after an episode is completed.
-
save_weights
(filepath, overwrite=False)¶ Saves the weights of an agent as an HDF5 file.
- # Arguments
- filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
-
select_action
(state)¶
-
update_target_models_hard
()¶
-
uses_learning_phase
¶
-
-
core.algorithm.MADDPG.
mean_q
(y_true, y_pred)¶
core.algorithm.MADQN module¶
Based on implementation of the DQN agent as described in Mnih (2013) and Mnih (2015). http://arxiv.org/pdf/1312.5602.pdf http://arxiv.org/abs/1509.06461
-
class
core.algorithm.MADQN.
AbstractMA_DQNAgent
(nb_agents, nb_actions, memory, gamma=0.99, batch_size=32, nb_steps_warmup=1000, train_interval=1, memory_interval=1, target_model_update=10000, delta_range=None, delta_clip=inf, custom_model_objects={}, **kwargs)¶ Bases:
core.common.agent.Agent
Write me
-
compute_batch_q_values
(state_batch)¶
-
compute_q_values
(state)¶
-
get_config
()¶
-
process_state_batch
(batch)¶
-
-
class
core.algorithm.MADQN.
MA_DQNAgent
(model, policy=None, test_policy=None, enable_double_dqn=True, enable_dueling_network=False, dueling_type='avg', enable_encouraged_action=False, enable_discouraged_action=False, action_affected_observation_space=None, *args, **kwargs)¶ Bases:
core.algorithm.MADQN.AbstractMA_DQNAgent
-
backward
(reward, terminal)¶ Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.
- # Argument
- reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
- # Returns
- List of metrics values
-
compile
(optimizer, metrics=[])¶ Compiles an agent and the underlaying models to be used for training and testing.
- # Arguments
- optimizer (keras.optimizers.Optimizer instance): The optimizer to be used during training. metrics (list of functions lambda y_true, y_pred: metric): The metrics to run during training.
-
forward
(observation)¶ Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.
- # Argument
- observation (object): The current observation from the environment.
- # Returns
- The next action to be executed in the environment.
-
get_config
()¶
-
layers
¶ Returns all layers of the underlying model(s).
If the concrete implementation uses multiple internal models, this method returns them in a concatenated list.
- # Returns
- A list of the model’s layers
-
load_weights
(filepath)¶ Loads the weights of an agent from an HDF5 file.
- # Arguments
- filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
-
metrics_names
¶
-
policy
¶
-
reset_states
()¶ you can specify any logic which run whenever episode ends.
-
save_weights
(filepath, overwrite=False)¶ Saves the weights of an agent as an HDF5 file.
- # Arguments
- filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
-
test_policy
¶
-
update_target_model_hard
()¶
-
-
core.algorithm.MADQN.
mean_q
(y_true, y_pred)¶
core.algorithm.PPO module¶
-
class
core.algorithm.PPO.
PPOAgent
(state_size, action_size, continuous, actor, critic, gamma=0.99, loss_clipping=0.2, epochs=10, noise=1.0, entropy_loss=0.001, buffer_size=256, batch_size=64, load_model=True, training_mode=True, file_path_actor='', file_path_critic='', **kwargs)¶ Bases:
core.common.agent.Agent
-
backward
(reward, terminal)¶ Updates the agent after having executed the action returned by forward. If the policy is implemented by a neural network, this corresponds to a weight update using back-prop.
- # Argument
- reward (float): The observed reward after executing the action returned by forward. terminal (boolean): True if the new state of the environment is terminal.
- # Returns
- List of metrics values
-
compile
(optimizer, metrics=[])¶ - # Argument
- optimizer (object) : [0] = actor optimizer, [1] = critic optimizer metrics (Tensor) : [0] = Keras Tensor as an advantage , [1] = Keras Tensor as an old_prediction
- # Return
- None
-
discounted_reward
()¶
-
forward
(observation)¶ Takes the an observation from the environment and returns the action to be taken next. If the policy is implemented by a neural network, this corresponds to a forward (inference) pass.
- # Argument
- observation (object): The current observation from the environment.
- # Returns
- The next action to be executed in the environment.
-
load_weights
(filepath, filename)¶ Loads the weights of an agent from an HDF5 file.
- # Arguments
- filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
-
proximal_policy_optimization_loss
(advantage, old_prediction)¶
-
proximal_policy_optimization_loss_continuous
(advantage, old_prediction)¶
-
reset_env
()¶
-
save_weights
(filepath, filename=None, overwrite=False)¶ Saves the weights of an agent as an HDF5 file.
- # Arguments
- filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
-
core.algorithm.QLearning module¶
core.algorithm.REINFORCE module¶
-
class
core.algorithm.REINFORCE.
ReinforceAgent
(state_size, action_size, model, load_model=True, discount_factor=0.99, learning_rate=0.001, training_mode=True, file_path='', **kwargs)¶ Bases:
core.common.agent.Agent
-
backward
(reward, terminal)¶ Updates the agent See the description in agent.py
-
compile
(optimizer, metrics=[])¶ Compiles an agent Define new optimizer instead of input optimizer See the description in agent.py
-
discount_rewards
(rewards)¶ calculate discounted rewards
- # Argument
- rewards (float): list of rewards
- # Returns
- List of discounted rewards
-
forward
(observation)¶ Get action to be taken from observation See the description in agent.py
-
load_weights
(file_path)¶ Loads the weights of an agent from an HDF5 file.
- # Arguments
- filepath (str or list): The path to the HDF5 files. In case of algorithms using multiple models, it could be list of path of models. filename (str or list): The name to the HDF5 files. In case of algorithms using multiple models, it could be list of name of models.
-
save_weights
(file_path, overwrite=False)¶ Saves the weights of an agent as an HDF5 file.
- # Arguments
- filepath (str): The path to where the weights should be saved. overwrite (boolean): If False and filepath already exists, raises an error.
-