explainable_rl.agents package

Submodules

explainable_rl.agents.double_q_learner module

class DoubleQLearner(env, gamma, verbose=False)[source]

Bases: TD

Double Q-Learner agent.

__init__(env, gamma, verbose=False)[source]

Initialise the agent class.

Parameters

env (MDP) – MDP object.
gamma (float) – Discount factor.
verbose (bool) – Defines whether print statements should be called.

_step(epsilon, lr, use_uncertainty=False)[source]

Perform a step in the environment.

Parameters

epsilon (float) – Epsilon-greedy policy parameter.
lr (float) – Learning rate.
use_uncertainty (bool) – Whether to use uncertainty informed policy.

Returns

Defines whether the episode is finished.

Return type

bool

_update_q_values(state, action, next_state, reward, epsilon, lr, **kwargs)[source]

Update the Q table.

Parameters

state (list) – Current state of the agent.
action (int) – Selected action.
next_state (list) – Next state of the agent.
reward (float) – Reward for the selected action.
epsilon (float) – The exploration parameter.
lr (float) – Learning rate.
**kwargs (dict) – The keyword arguments.

create_tables(verbose=False)[source]

Initialize the agent.

This resets the environment, creates the q-table and the state to action mapping.

Parameters: verbose (bool) – Print information.

explainable_rl.agents.q_learner module

class QLearningAgent(env, gamma, verbose=False)[source]

Bases: TD

Q-Learning agent.

__init__(env, gamma, verbose=False)[source]

Initialise the agent class.

Parameters

env (MDP) – MDP object.
gamma (float) – Discount factor.
verbose (bool) – Defines whether print statements should be called.

_update_q_values(state, action, next_state, reward, epsilon, lr, **kwargs)[source]

Update the Q table.

Parameters

state (list) – Current state of the agent.
action (int) – Selected action.
next_state (list) – Next state of the agent.
reward (float) – Reward for the selected action.
epsilon (float) – The exploration parameter.
lr (float) – Learning rate.
**kwargs (dict) – The keyword arguments.

explainable_rl.agents.sarsa module

class SarsaAgent(env, gamma, verbose=False)[source]

Bases: TD

Sarsa agent.

__init__(env, gamma, verbose=False)[source]

Initialise the agent class.

Parameters

env (MDP) – MDP object.
gamma (float) – Discount factor.
verbose (bool) – Defines whether print statements should be called.

_update_q_values(state, action, next_state, reward, epsilon, lr, **kwargs)[source]

Update the Q table.

Parameters

state (list) – Current state of the agent.
action (int) – Selected action.
next_state (list) – Next state of the agent.
reward (float) – Reward for the selected action.
epsilon (float) – The exploration parameter.
lr (float) – Learning rate.
**kwargs (dict) – The keyword arguments.

explainable_rl.agents.sarsa_lambda module

class SarsaLambdaAgent(env, gamma, verbose=False, lambda_=0.9)[source]

Bases: TD

Sarsa Lambda agent.

__init__(env, gamma, verbose=False, lambda_=0.9)[source]

Initialise the agent class.

Parameters

env (MDP) – MDP object.
gamma (float) – Discount factor.
verbose (bool) – Defines whether print statements should be called.

_update_q_values(state, action, next_state, reward, epsilon, lr, **kwargs)[source]

Update the Q table.

Parameters

state (list) – Current state of the agent.
action (int) – Selected action.
next_state (list) – Next state of the agent.
reward (float) – Reward for the selected action.
epsilon (float) – The exploration parameter.
lr (float) – Learning rate.
**kwargs (dict) – The keyword arguments.

explainable_rl.agents.td module

class TD(env, gamma, verbose=False)[source]

Bases: Agent

Agent class to store and update q-table.

__init__(env, gamma, verbose=False)[source]

Initialise the agent.

Parameters

env (Environment) – Environment object.
gamma (float) – Discount factor.
verbose (bool) – Print training information.

_epsilon_greedy_policy(state=None, epsilon=0.1, Q=None)[source]

Epsilon-greedy policy.

Parameters

state (int) – State.
epsilon (float) – Epsilon of epsilon-greedy policy. Defaults to 0 for pure exploitation.

_get_action_scores(possible_actions, q_importance, q_values_weights, uncertainty_weights)[source]

Get the score for each action from a state.

Parameters

possible_actions (set) – the possible actions for an agent in a state.
q_importance (float) – the weighting of the q value vs the amount a state has been seen.
q_values_weights (dict) – the q-weight of each state-action pair.
uncertainty_weights (dict) – the count-weight of each state-action pair.

Returns

the weighted score of each possible action: from the state.

Return type

action_scores (dict)

_get_possible_actions(state)[source]

Get the possible actions from a state.

Parameters

state (list) – current state of the agent.

Returns

the possible actions that the agent can: take from the state.

Return type

possible_actions (set)

_get_q_value_weights(sum_possible_q, state, possible_actions)[source]

Get the q value of each action as a percentage of the total q value.

Parameters

sum_possible_q (float) – the sum of the q values for the state.
state (list) – the state of the agent.
possible_actions (set) – the possible actions that the agent can take from the state.

Returns

count of how many times a state-action pair has appeared. q_values_weights (dict): the q-weight of each state-action pair.

Return type

state_action_counts (dict)

static _get_uncertainty_weights(state_action_counts)[source]

Get uncertainty weight of an action from a state.

This is defined as the proportion of times a state is visited in the historical data vs the total state visits of the possible next states.

Parameters: state_action_counts (dict) – the number of times a state has been visited in the historical data.
Returns: uncertainty weight of each possible state.
Return type: dict

_step(epsilon, lr, use_uncertainty)[source]

Perform a step in the environment.

Parameters

epsilon (float) – Epsilon-greedy policy parameter.
lr (float) – Learning rate.
use_uncertainty (bool) – Whether to use uncertainty informed policy.

Returns

Defines whether the episode is finished.

Return type

bool

_update_q_values(state, action, next_state, reward, epsilon, lr, **kwargs)[source]

Update the Q table.

Parameters

state (list) – Current state of the agent.
action (int) – Selected action.
next_state (list) – Next state of the agent.
reward (float) – Reward for the selected action.
epsilon (float) – The exploration parameter.
lr (float) – Learning rate.
**kwargs (dict) – The keyword arguments.

create_tables(verbose=False)[source]

Initialize the agent.

This resets the environment, creates the q-table and the state to action mapping.

Parameters: verbose (bool) – Print information.

fit(agent_hyperparams, training_hyperparams, verbose=False, pbar=None)[source]

Fit agent to the dataset.

Parameters

agent_hyperparams (dict) – Dictionary of agent hyperparameters.
training_hyperparams (dict) – Dictionary of training hyperparameters.
verbose (bool) – Print training information.
pbar (tqdm) – Progress bar.

uncertainty_informed_policy(state=None, epsilon=0.1, use_uncertainty=False, q_importance=0.7)[source]

Get epsilon greedy policy that favours more densely populated state-action pairs.

Parameters

state (list) – Current state of the agent.
epsilon (float) – The exploration parameter.
use_uncertainty (bool) – Whether to use uncertainty informed policy.
q_importance (float) – The importance of the q value in the policy.

Returns

selected action.

Return type

action (int)

explainable_rl.agents package

Submodules

explainable_rl.agents.double_q_learner module

explainable_rl.agents.q_learner module

explainable_rl.agents.sarsa module

explainable_rl.agents.sarsa_lambda module

explainable_rl.agents.td module

Module contents