explainable_rl.environments package

Submodules

explainable_rl.environments.strategic_pricing module

class StrategicPricing(dh, bins=None)[source]

Bases: MDP

Environment for Strategic Pricing.

__init__(dh, bins=None)[source]

Initialise the Strategic Pricing MDP class.

Parameters: dh (DataHandler) – Data handler object.

_bin_state_action_space(zipped)[source]

Bin the state-action pairs.

Parameters: zipped (list) – Group of states and actions per datapoint.
Returns: Binned state-action pairs.
Return type: np.array

_create_average_reward_matrix(bins_dict)[source]

Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Parameters: bins_dict (dict) – Dictionary of counts of datapoints per bin and sum of the associated rewards.
Returns: Sparse matrix of binned state-action pairs and their associated average reward.
Return type: sparse.COO

_debin_state(b_state, idxs=None)[source]

Debin a singular states.

Parameters: b_state (list) – Binned state to de-bin.
Returns: Debinned state.
Return type: list

_get_counts_and_rewards_per_bin(binned)[source]

Create a dictionary of counts of datapoints per bin and sum the associated rewards.

Parameters: binned (np.array) – Binned state-action pairs.
Returns: Counts of datapoints per bin and sums the associated rewards.
Return type: dict

_get_state_to_action(binned)[source]

Create a dictionary of states and their associated actions.

Parameters: binned (np.array) – Binned state-action pairs.
Returns: States and their associated actions.
Return type: state_to_action (dict)

_join_state_action()[source]

Join the state and action pairs together.

Returns: Group of states and actions per datapoint.
Return type: list

_make_rewards_from_data()[source]

Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Returns: Sparse matrix of binned state-action pairs and their associate average reward.
Return type: sparse.COO

_transform_df_to_numpy()[source]: Transform the MDP data from a dataframe to a numpy array.

bin_state(state, idxs=None)[source]

Bin a singular state.

The states are binned according to the number of bins of each feature.

Parameters

state (list) – State to bin.
idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions).

Returns

Binned state.

Return type

binned (list)

bin_states(states, idxs=None)[source]

Bin a list of states.

Parameters

states (list[list]) – State to bin.
idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions).

Returns

Binned state.

Return type

b_states (list)

debin_states(b_states, idxs=None)[source]

Debin a list of binned states.

Parameters

b_states (list[list]) – Binned states to debin.
idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions)

Returns

Binned state.

Return type

states (list)

initialise_env()[source]: Create the environment given the MDP information.

reset()[source]

Reset environment.

Returns: Randomised initial state.
Return type: list

step(state, action)[source]

Take a step in the environment.

Parameters

state (list) – Current state values of the agent.
action (int) – Action for agent to take.

explainable_rl.environments.strategic_pricing_prediction module

class StrategicPricingPredictionMDP(dh, bins=None, verbose=False)[source]

Bases: StrategicPricing

Environment for Strategic Pricing (prediction task).

__init__(dh, bins=None, verbose=False)[source]

Initialise Strategic Pricing Environment.

Parameters

dh (DataHandler) – Data Handler instance.
verbose (bool) – Whether print statements about the program flow should be displayed.

_create_average_reward_matrix(bins_dict)[source]

Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Parameters: bins_dict (dict) – Counts of datapoints per bin and sum of the associated rewards.
Returns: Sparse matrix of binned state-action pairs and their associated average reward.
Return type: sparse.COO

_get_counts_and_rewards_per_bin(binned)[source]

Create a dictionary of counts of datapoints per bin and sum the associated rewards.

Parameters: binned (np.array) – Binned state-action pairs.
Returns: Dictionary of counts of datapoints per bin and sums the associated rewards.
Return type: dict

_make_rewards_from_data()[source]

Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Returns: Sparse matrix of binned state-action pairs and their associate average reward.
Return type: sparse.COO

_transform_df_to_numpy()[source]: Transform the MDP data from a dataframe to a numpy array.

step(state, action)[source]

Take a step in the environment.

Note that the last element (the Done flag) of the return tuple is always True as the prediction problem requires single step episodes for which the Done flag is always True.

Parameters

state (list) – Current state values of agent.
action (int) – Action for agent to take.

Returns

Current state, action, next state, done flag.

Return type

tuple

explainable_rl.environments.strategic_pricing_suggestion module

class StrategicPricingSuggestionMDP(dh, bins=None, verbose=False)[source]

Bases: StrategicPricing

Environment for Strategic Pricing (suggestion task).

__init__(dh, bins=None, verbose=False)[source]: Initialise Strategic Pricing Environment.

_create_average_reward_matrix(bins_dict)[source]

Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Parameters: bins_dict (dict) – Counts of datapoints per bin and sum of the associated rewards.
Returns: Binned state-action pairs and their associated average reward.
Return type: sparse.COO

_find_next_state(state, action)[source]

Lookup whether the next state exists in the state-action space matrix.

Parameters

state (list) – Current state values of agent.
action (int) – Action for agent to take.

Returns

Next state for the agent to visit. bool: Whether the environment has terminated.

Return type

list

_get_counts_and_rewards_per_bin(binned)[source]

Create a dictionary of counts of datapoints per bin and sum the associated rewards.

Parameters: binned (np.array) – Binned state-action pairs.
Returns: Counts of datapoints per bin and sums the associated rewards.
Return type: dict

_make_rewards_from_data()[source]

Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.

Returns: Binned state-action pairs and their associate average reward.
Return type: sparse.COO

_transform_df_to_numpy()[source]: Transform the MDP data from a dataframe to a numpy array.

step(state, action)[source]

Take a step in the environment.

Done flag set to True means that the environment terminated.

Parameters

state (list) – Current state values of agent.
action (int) – Action for agent to take.

Returns

Current state, action, next state, done flag.

Return type

tuple

explainable_rl.environments package

Submodules

explainable_rl.environments.strategic_pricing module

explainable_rl.environments.strategic_pricing_prediction module

explainable_rl.environments.strategic_pricing_suggestion module

Module contents