core package¶
Subpackages¶
Submodules¶
core.callbacks module¶
-
class
core.callbacks.
DrawTrainMovingAvgPlotCallback
(file_path, plot_interval=10000, time_window=1000, l_label=['reward', 'kill_cnts', 'hps'], save_raw_data=False, title='')¶ Bases:
core.common.callback.Callback
-
on_episode_end
(episode, logs)¶ Parameters: - episode – episode index
- logs – logs is map containing value with its key which is also used as label.
Returns:
-
-
class
core.callbacks.
DrawTrainPlotCallback
(file_path=None, plot_interval=10000, data_for_plot=['episode_reward', 'nb_episode_steps'])¶ Bases:
core.common.callback.Callback
-
on_episode_end
(episode, logs)¶ Called at end of each episode
-
-
class
core.callbacks.
FileLogger
(filepath, interval=None)¶ Bases:
core.common.callback.Callback
-
on_episode_begin
(episode, logs={})¶ Initialize metrics at the beginning of each episode
-
on_episode_end
(episode, logs={})¶ Compute and print metrics at the end of each episode
-
on_step_end
(step, logs={})¶ Append metric at the end of each step
-
on_train_begin
(logs={})¶ Initialize model metrics before training
-
on_train_end
(logs={})¶ Save model at the end of training
-
save_data
()¶ Save metrics in a json file
-
-
class
core.callbacks.
History
(agent=None, *args, **kwargs)¶ Bases:
core.common.callback.Callback
Callback that records events into a History object.
This callback is automatically applied to every Keras model. The History object gets returned by the fit method of models.
-
on_epoch_end
(epoch, logs=None)¶
-
on_train_begin
(logs=None)¶
-
-
class
core.callbacks.
ModelIntervalCheckpoint
(filepath, step_interval=None, episode_interval=None, condition=None, condition_count=0, verbose=0, **kwargs)¶ Bases:
core.common.callback.Callback
-
on_episode_end
(episode, logs={})¶ Called at end of each episode
-
on_step_end
(step, logs={})¶ Called at end of each step
-
-
class
core.callbacks.
TestLogger
(agent=None, *args, **kwargs)¶ Bases:
core.common.callback.Callback
Logger Class for Test
-
on_episode_end
(episode, logs={})¶ Print logs at end of each episode
-
on_train_begin
(logs={})¶ Print logs at beginning of training
-
-
class
core.callbacks.
TrainEpisodeLogger
¶ Bases:
core.common.callback.Callback
-
on_episode_begin
(episode, logs={})¶ Reset environment variables at beginning of each episode
-
on_episode_end
(episode, logs={})¶ Compute and print training statistics of the episode when done
-
on_step_end
(step, logs={})¶ Update statistics of episode after each step
-
on_train_begin
(logs={})¶ Print training values at beginning of training
-
on_train_end
(logs={})¶ Print training time at end of training
-
-
class
core.callbacks.
TrainIntervalLogger
(interval=10000)¶ Bases:
core.common.callback.Callback
-
on_episode_end
(episode, logs={})¶ Update reward value at the end of each episode
-
on_step_begin
(step, logs={})¶ Print metrics if interval is over
-
on_step_end
(step, logs={})¶ Update progression bar at the end of each step
-
on_train_begin
(logs={})¶ Initialize training statistics at beginning of training
-
on_train_end
(logs={})¶ Print training duration at end of training
-
reset
()¶ Reset statistics
-
core.memories module¶
-
class
core.memories.
MinSegmentTree
(capacity)¶ Bases:
core.memories.SegmentTree
-
min
(start=0, end=None)¶ Returns min(arr[start], …, arr[end])
-
-
class
core.memories.
SegmentTree
(capacity, operation, neutral_element)¶ Bases:
object
-
reduce
(start=0, end=None)¶ Returns result of applying self.operation to a contiguous subsequence of the array.
self.operation(arr[start], operation(arr[start+1], operation(… arr[end])))- start: int
- beginning of the subsequence
- end: int
- end of the subsequences
- reduced: obj
- result of reducing self.operation over the specified range of array elements.
-
-
class
core.memories.
SequentialMemory
(limit, enable_per=False, per_alpha=0.6, per_beta=0.4, **kwargs)¶ Bases:
core.common.memory.Memory
-
append
(observation, action, reward, terminal, training=True)¶ Append an observation to the memory
- # Argument
- observation (dict): Observation returned by environment action (int): Action taken to obtain this observation reward (float): Reward obtained by taking this action terminal (boolean): Is the state terminal
-
get_config
()¶ Return configurations of SequentialMemory
- # Returns
- Dict of config
-
nb_entries
¶ Return number of observations
- # Returns
- Number of observations
-
sample
(batch_size, batch_idxs=None)¶ Return a randomized batch of experiences
- # Argument
- batch_size (int): Size of the all batch batch_idxs (int): Indexes to extract per_beta (float): Prioritized Experience Replay Memory Hyper parameter, To what degree to use importance weights(0 - no corrections, 1 - full correction)
- # Returns
- A list of experiences randomly selected
-
update_priorities
(idxes, priorities)¶ Update priorities of sampled transitions. sets priority of transition at index idxes[i] in buffer to priorities[i]. Parameters ———- idxes: [int]
List of idxes of sampled transitions- priorities: [float]
- List of updated priorities corresponding to transitions at the sampled idxes denoted by variable idxes.
-
-
class
core.memories.
SumSegmentTree
(capacity)¶ Bases:
core.memories.SegmentTree
-
find_prefixsum_idx
(prefixsum)¶ - Find the highest index i in the array such that
- sum(arr[0] + arr[1] + … + arr[i - i]) <= prefixsum
if array values are probabilities, this function allows to sample indexes according to the discrete probability efficiently. Parameters ———- perfixsum: float
upperbound on the sum of array prefix- idx: int
- highest index satisfying the prefixsum constraint
-
sum
(start=0, end=None)¶ Returns arr[start] + … + arr[end]
-
core.policies module¶
-
class
core.policies.
AdvEpsGreedyPolicy
(max_score, min_score=0, score_queue_size=100, score_name='episode_reward', score_type='mean', str_eps=1, nb_agents=1, **kwargs)¶ Bases:
core.policies.LinearAnnealedPolicy
Implement the AdvEpsGreedyPolicy
Eps Greedy policy either:
- takes a random action with probability epsilon
- takes current best action with prob (1 - epsilon)
epsilon is calculated by: - max(epsilon greedy value, score based value)
-
get_current_value
()¶ Return current annealing value
- # Returns
- Value to use in annealing
-
on_episode_end
(episode, logs={})¶
-
class
core.policies.
BoltzmannGumbelQPolicy
(C=1.0)¶ Bases:
core.common.policy.Policy
Implements Boltzmann-Gumbel exploration (BGE) adapted for Q learning based on the paper Boltzmann Exploration Done Right (https://arxiv.org/pdf/1705.10257.pdf).
BGE is invariant with respect to the mean of the rewards but not their variance. The parameter C, which defaults to 1, can be used to correct for this, and should be set to the least upper bound on the standard deviation of the rewards.
BGE is only available for training, not testing. For testing purposes, you can achieve approximately the same result as BGE after training for N steps on K actions with parameter C by using the BoltzmannQPolicy and setting tau = C/sqrt(N/K).
-
get_config
()¶ Return configurations of BoltzmannGumbelQPolicy
- # Returns
- Dict of config
-
select_action
(q_values)¶ Return the selected action
- # Arguments
- q_values (np.ndarray): List of the estimations of Q for each action
- # Returns
- Selection action
-
-
class
core.policies.
BoltzmannQPolicy
(tau=1.0, clip=(-500.0, 500.0))¶ Bases:
core.common.policy.Policy
Implement the Boltzmann Q Policy
Boltzmann Q Policy builds a probability law on q values and returns an action selected randomly according to this law.
-
get_config
()¶ Return configurations of BoltzmannQPolicy
- # Returns
- Dict of config
-
select_action
(q_values)¶ Return the selected action
- # Arguments
- q_values (np.ndarray): List of the estimations of Q for each action
- # Returns
- Selection action
-
-
class
core.policies.
EpsGreedyQPolicy
(eps=0.1)¶ Bases:
core.common.policy.Policy
Implement the epsilon greedy policy
Eps Greedy policy either:
- takes a random action with probability epsilon
- takes current best action with prob (1 - epsilon)
-
get_config
()¶ Return configurations of EpsGreedyQPolicy
- # Returns
- Dict of config
-
select_action
(q_values)¶ Return the selected action
- # Arguments
- q_values (np.ndarray): List of the estimations of Q for each action
- # Returns
- Selection action
-
class
core.policies.
GreedyQPolicy
¶ Bases:
core.common.policy.Policy
Implement the greedy policy
Greedy policy returns the current best action according to q_values
-
select_action
(q_values)¶ Return the selected action
- # Arguments
- q_values (np.ndarray): List of the estimations of Q for each action
- # Returns
- Selection action
-
-
class
core.policies.
LinearAnnealedPolicy
(inner_policy, attr, value_max, value_min, value_test, nb_steps)¶ Bases:
core.common.policy.Policy
Implement the linear annealing policy
Linear Annealing Policy computes a current threshold value and transfers it to an inner policy which chooses the action. The threshold value is following a linear function decreasing over time.
-
get_config
()¶ Return configurations of LinearAnnealedPolicy
- # Returns
- Dict of config
-
get_current_value
()¶ Return current annealing value
- # Returns
- Value to use in annealing
-
metrics
¶ Return metrics values
- # Returns
- List of metric values
-
metrics_names
¶ Return names of metrics
- # Returns
- List of metric names
-
select_action
(**kwargs)¶ Choose an action to perform
- # Returns
- Action to take (int)
-
-
class
core.policies.
MA_BoltzmannQPolicy
(tau=1.0, clip=(-500.0, 500.0))¶ Bases:
core.common.policy.Policy
-
get_config
()¶ Return configuration of the policy
- # Returns
- Configuration as dict
-
select_action
(q_values)¶
-
select_action_agent
(q_value)¶
-
-
class
core.policies.
MA_EpsGreedyQPolicy
(eps=0.1)¶ Bases:
core.common.policy.Policy
-
get_config
()¶ Return configuration of the policy
- # Returns
- Configuration as dict
-
select_action
(q_values)¶
-
-
class
core.policies.
MA_GreedyQPolicy
¶ Bases:
core.common.policy.Policy
-
select_action
(q_values)¶
-
-
class
core.policies.
MA_MaxBoltzmannQPolicy
(eps=0.1, tau=1.0, clip=(-500.0, 500.0))¶ Bases:
core.common.policy.Policy
A combination of the eps-greedy and Boltzman q-policy.
Wiering, M.: Explorations in Efficient Reinforcement Learning. PhD thesis, University of Amserdam, Amsterdam (1999)
https://pure.uva.nl/ws/files/3153478/8461_UBA003000033.pdf
-
get_config
()¶ Return configuration of the policy
- # Returns
- Configuration as dict
-
select_action
(q_values)¶
-
select_action_agent
(q_value)¶
-
-
class
core.policies.
MaxBoltzmannQPolicy
(eps=0.1, tau=1.0, clip=(-500.0, 500.0))¶ Bases:
core.common.policy.Policy
A combination of the eps-greedy and Boltzman q-policy.
Wiering, M.: Explorations in Efficient Reinforcement Learning. PhD thesis, University of Amsterdam, Amsterdam (1999)
https://pure.uva.nl/ws/files/3153478/8461_UBA003000033.pdf
-
get_config
()¶ Return configurations of MaxBoltzmannQPolicy
- # Returns
- Dict of config
-
select_action
(q_values)¶ Return the selected action The selected action follows the BoltzmannQPolicy with probability epsilon or return the Greedy Policy with probability (1 - epsilon)
- # Arguments
- q_values (np.ndarray): List of the estimations of Q for each action
- # Returns
- Selection action
-
-
class
core.policies.
NoisePolicy
(random_process, ratio_of_pure_action=1.0)¶ Bases:
core.common.policy.Policy
Implement policy based on OrnsteinUhlenbeck Process This policy returns action added by noise for exploration in ddpg
-
reset_states
()¶
-
select_action
(pure_action)¶ Return the selected action
- # Arguments
- random_process : Random process
- # Returns
- Selection action
-
-
class
core.policies.
starcraft_multiagent_eGreedyPolicy
(nb_agents, nb_actions, eps=0.1)¶ Bases:
core.common.policy.Policy
Implement the epsilon greedy policy
Eps Greedy policy either:
- takes a random action with probability epsilon
- takes current best action with prob (1 - epsilon)
nb_actions = (64*64, 3)
-
get_config
()¶ Return configurations of EpsGreedyPolicy
- # Returns
- Dict of config
-
select_action
(q_values)¶ Return the selected action
- # Arguments
- q_values (list): [action_xy (np.array), action_type (np.array)] [(1, nb_agents, actions), (1, nb_agents, actions)]
- # Returns
- Selection action: [(x,y), nothing/attack/move] [(nb_agents, 1), (nb_agents, 1)]