explainable_rl.environments package

Submodules

explainable_rl.environments.strategic_pricing module

class StrategicPricing(dh, bins=None)[source]

Bases: MDP

Environment for Strategic Pricing.

__init__(dh, bins=None)[source]

Initialise the Strategic Pricing MDP class.

Parameters

dh (DataHandler) – Data handler object.

_bin_state_action_space(zipped)[source]

Bin the state-action pairs.

Parameters

zipped (list) – Group of states and actions per datapoint.

Returns

Binned state-action pairs.

Return type

np.array

_create_average_reward_matrix(bins_dict)[source]

Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Parameters

bins_dict (dict) – Dictionary of counts of datapoints per bin and sum of the associated rewards.

Returns

Sparse matrix of binned state-action pairs and their associated average reward.

Return type

sparse.COO

_debin_state(b_state, idxs=None)[source]

Debin a singular states.

Parameters

b_state (list) – Binned state to de-bin.

Returns

Debinned state.

Return type

list

_get_counts_and_rewards_per_bin(binned)[source]

Create a dictionary of counts of datapoints per bin and sum the associated rewards.

Parameters

binned (np.array) – Binned state-action pairs.

Returns

Counts of datapoints per bin and sums the associated rewards.

Return type

dict

_get_state_to_action(binned)[source]

Create a dictionary of states and their associated actions.

Parameters

binned (np.array) – Binned state-action pairs.

Returns

States and their associated actions.

Return type

state_to_action (dict)

_join_state_action()[source]

Join the state and action pairs together.

Returns

Group of states and actions per datapoint.

Return type

list

_make_rewards_from_data()[source]

Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Returns

Sparse matrix of binned state-action pairs and their associate average reward.

Return type

sparse.COO

_transform_df_to_numpy()[source]

Transform the MDP data from a dataframe to a numpy array.

bin_state(state, idxs=None)[source]

Bin a singular state.

The states are binned according to the number of bins of each feature.

Parameters
  • state (list) – State to bin.

  • idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions).

Returns

Binned state.

Return type

binned (list)

bin_states(states, idxs=None)[source]

Bin a list of states.

Parameters
  • states (list[list]) – State to bin.

  • idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions).

Returns

Binned state.

Return type

b_states (list)

debin_states(b_states, idxs=None)[source]

Debin a list of binned states.

Parameters
  • b_states (list[list]) – Binned states to debin.

  • idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions)

Returns

Binned state.

Return type

states (list)

initialise_env()[source]

Create the environment given the MDP information.

reset()[source]

Reset environment.

Returns

Randomised initial state.

Return type

list

step(state, action)[source]

Take a step in the environment.

Parameters
  • state (list) – Current state values of the agent.

  • action (int) – Action for agent to take.

explainable_rl.environments.strategic_pricing_prediction module

class StrategicPricingPredictionMDP(dh, bins=None, verbose=False)[source]

Bases: StrategicPricing

Environment for Strategic Pricing (prediction task).

__init__(dh, bins=None, verbose=False)[source]

Initialise Strategic Pricing Environment.

Parameters
  • dh (DataHandler) – Data Handler instance.

  • verbose (bool) – Whether print statements about the program flow should be displayed.

_create_average_reward_matrix(bins_dict)[source]

Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Parameters

bins_dict (dict) – Counts of datapoints per bin and sum of the associated rewards.

Returns

Sparse matrix of binned state-action pairs and their associated average reward.

Return type

sparse.COO

_get_counts_and_rewards_per_bin(binned)[source]

Create a dictionary of counts of datapoints per bin and sum the associated rewards.

Parameters

binned (np.array) – Binned state-action pairs.

Returns

Dictionary of counts of datapoints per bin and sums the associated rewards.

Return type

dict

_make_rewards_from_data()[source]

Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Returns

Sparse matrix of binned state-action pairs and their associate average reward.

Return type

sparse.COO

_transform_df_to_numpy()[source]

Transform the MDP data from a dataframe to a numpy array.

step(state, action)[source]

Take a step in the environment.

Note that the last element (the Done flag) of the return tuple is always True as the prediction problem requires single step episodes for which the Done flag is always True.

Parameters
  • state (list) – Current state values of agent.

  • action (int) – Action for agent to take.

Returns

Current state, action, next state, done flag.

Return type

tuple

explainable_rl.environments.strategic_pricing_suggestion module

class StrategicPricingSuggestionMDP(dh, bins=None, verbose=False)[source]

Bases: StrategicPricing

Environment for Strategic Pricing (suggestion task).

__init__(dh, bins=None, verbose=False)[source]

Initialise Strategic Pricing Environment.

_create_average_reward_matrix(bins_dict)[source]

Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Parameters

bins_dict (dict) – Counts of datapoints per bin and sum of the associated rewards.

Returns

Binned state-action pairs and their associated average reward.

Return type

sparse.COO

_find_next_state(state, action)[source]

Lookup whether the next state exists in the state-action space matrix.

Parameters
  • state (list) – Current state values of agent.

  • action (int) – Action for agent to take.

Returns

Next state for the agent to visit. bool: Whether the environment has terminated.

Return type

list

_get_counts_and_rewards_per_bin(binned)[source]

Create a dictionary of counts of datapoints per bin and sum the associated rewards.

Parameters

binned (np.array) – Binned state-action pairs.

Returns

Counts of datapoints per bin and sums the associated rewards.

Return type

dict

_make_rewards_from_data()[source]

Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Returns

Binned state-action pairs and their associate average reward.

Return type

sparse.COO

_transform_df_to_numpy()[source]

Transform the MDP data from a dataframe to a numpy array.

step(state, action)[source]

Take a step in the environment.

Done flag set to True means that the environment terminated.

Parameters
  • state (list) – Current state values of agent.

  • action (int) – Action for agent to take.

Returns

Current state, action, next state, done flag.

Return type

tuple

Module contents