explainable_rl.foundation package

Submodules

explainable_rl.foundation.agent module

class Agent(env, gamma, verbose=False)[source]

Bases: object

Parent of all child agents (e.g Q-learner, SARSA).

__init__(env, gamma, verbose=False)[source]

Initialise the agent.

Parameters

env (Environment) – Environment object.
gamma (float) – Discount factor.
verbose (bool) – Print training information.

static _convert_to_string(state)[source]

Convert a state to a string.

Parameters: state (list) – The state to convert.
Returns: The state as a string.
Return type: state_str (string)

_epsilon_greedy_policy(state, epsilon)[source]

Epsilon-greedy policy.

Parameters

state (int) – State.
epsilon (float) – Epsilon of epsilon-greedy policy. Defaults to 0 for pure exploitation.

_init_q_table()[source]: Initialize the q-table with zeros.

fit(agent_hyperparams, training_hyperparams, verbose=False, pbar=None)[source]

Fit agent to the dataset.

Parameters

agent_hyperparams (dict) – Dictionary of agent hyperparameters.
training_hyperparams (dict) – Dictionary of training hyperparameters.
verbose (bool) – Print training information.
pbar (tqdm) – Progress bar.

predict_actions(states, epsilon=0)[source]

Predict action for a list of states using epsilon-greedy policy.

Parameters

states (list) – States (binned).
epsilon (float) – Epsilon of epsilon-greedy policy. Defaults to 0 for pure exploitation.

Returns

List of recommended actions.

Return type

list

predict_rewards(states, actions)[source]

Predict reward for a list of state-actions.

This function uses the avg reward matrix (which simulates a real-life scenario).

Parameters

states (list) – States (binned).
actions (list) – Actions (binned).

Returns

List of recommended actions.

Return type

list

uncertainty_informed_policy(state=None, epsilon=0.1, use_uncertainty=False, q_importance=0.7)[source]

Get epsilon greedy policy that favours more densely populated state-action pairs.

Parameters

state (list) – Current state of the agent.
epsilon (float) – The exploration parameter.
use_uncertainty (bool) – Whether to use uncertainty informed policy.
q_importance (float) – The importance of the q value in the policy.

Returns

selected action.

Return type

action (int)

explainable_rl.foundation.engine module

class Engine(dh, hyperparam_dict, verbose=False)[source]

Bases: object

Responsible for creating the agent and environment instances and running the training loop.

__init__(dh, hyperparam_dict, verbose=False)[source]

Initialise engine class.

Parameters

dh (DataHandler) – DataHandler to be given to the Environment.
hyperparam_dict (dict) – Dictionary containing all hyperparameters.
verbose (bool) – Whether print statements about the program flow should be displayed.

_evaluate_total_agent_reward()[source]

Calculate the total reward obtained on the evaluation states using the agent’s policy.

Returns: Total (not scaled) cumulative reward.
Return type: total_agent_reward (float)

_evaluate_total_hist_reward()[source]

Calculate the total reward obtained on the evaluation states using the agent’s policy.

Returns: Total (not scaled) cumulative based on historical data.
Return type: total_hist_reward (float)

_get_bins()[source]: Get the bins for the states and actions.

build_evaluation()[source]: Save data for evaluation.

create_agent()[source]: Create an agent and store it in Engine.

create_env()[source]: Create an env and store it in Engine.

create_world()[source]: Create the Agent and MDP instances for the given task.

inverse_scale_feature(values, labels)[source]

De-bin and de-normalize feature values.

Parameters

labels (list) – list of feature labels.
values (list) – list of (scaled) feature values.

Returns

Inverse transformation coefficient for all feature labels.

Return type

list

train_agent()[source]: Train the agent for a chosen number of steps and episodes.

explainable_rl.foundation.environment module

class MDP(dh)[source]

Bases: object

Define the MDP super class which all particular MDP should inherit from.

__init__(dh)[source]

Initialise the Strategic Pricing MDP class.

Parameters: dh (DataHandler) – Data handler object.

initialise_env()[source]: Create the environment given the MDP information.

reset()[source]

Reset environment.

Returns: Randomised initial state.
Return type: list

step(state, action)[source]

Take a step in the environment.

A True done flag indicates that the environment terminated.

Parameters

state (list) – Current state values of agent.
action (int) – Action for agent to take.

Returns

current state, action, next state, done flag.

Return type

tuple

explainable_rl.foundation.library module

explainable_rl.foundation.utils module

convert_to_list(state_str)[source]

Convert a state string to a list.

Parameters: state_str (str) – State as a string.
Returns: State as a list.
Return type: list

convert_to_string(state)[source]

Convert a state to a string.

Parameters: state (list) – State to convert.
Returns: State as a string.
Return type: str

decay_param(param, decay, min_param)[source]

Decay a parameter.

Parameters

param (float) – Parameter to decay.
decay (float) – Decay rate.
min_param (float) – Minimum value of the parameter.

Returns

Updated parameter.

Return type

float

load_data(data_path, n_samples, delimiter=',')[source]

Load data from file.

Parameters

data_path (str) – Path to data file.
n_samples (int) – Number of samples to load.
delimiter (str) – Which separates columns.

load_engine(path_name)[source]

Load engine.

Parameters: path_name (str or List(str)) – Path to save the engine.

save_engine(engine, path_name=None)[source]

Save engine.

Parameters

engine (Engine) – Engine to save.
path_name (str) – Path to save the engine.

split_train_test(dataset, train_test_split=0.2)[source]

Split dataset into train and test.

Parameters

dataset (pd.DataFrame) – Dataset.
train_test_split (float) – Proportion of test data.

Returns

Train dataset. test_dataset (pd.DataFrame): Test dataset.

Return type

train_dataset (pd.DataFrame)

explainable_rl.foundation package

Submodules

explainable_rl.foundation.agent module

explainable_rl.foundation.engine module

explainable_rl.foundation.environment module

explainable_rl.foundation.library module

explainable_rl.foundation.utils module

Module contents