explainable_rl.environments package
Submodules
explainable_rl.environments.strategic_pricing module
- class StrategicPricing(dh, bins=None)[source]
Bases:
MDPEnvironment for Strategic Pricing.
- __init__(dh, bins=None)[source]
Initialise the Strategic Pricing MDP class.
- Parameters
dh (DataHandler) – Data handler object.
- _bin_state_action_space(zipped)[source]
Bin the state-action pairs.
- Parameters
zipped (list) – Group of states and actions per datapoint.
- Returns
Binned state-action pairs.
- Return type
np.array
- _create_average_reward_matrix(bins_dict)[source]
Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.
- Parameters
bins_dict (dict) – Dictionary of counts of datapoints per bin and sum of the associated rewards.
- Returns
Sparse matrix of binned state-action pairs and their associated average reward.
- Return type
sparse.COO
- _debin_state(b_state, idxs=None)[source]
Debin a singular states.
- Parameters
b_state (list) – Binned state to de-bin.
- Returns
Debinned state.
- Return type
list
- _get_counts_and_rewards_per_bin(binned)[source]
Create a dictionary of counts of datapoints per bin and sum the associated rewards.
- Parameters
binned (np.array) – Binned state-action pairs.
- Returns
Counts of datapoints per bin and sums the associated rewards.
- Return type
dict
- _get_state_to_action(binned)[source]
Create a dictionary of states and their associated actions.
- Parameters
binned (np.array) – Binned state-action pairs.
- Returns
States and their associated actions.
- Return type
state_to_action (dict)
- _join_state_action()[source]
Join the state and action pairs together.
- Returns
Group of states and actions per datapoint.
- Return type
list
- _make_rewards_from_data()[source]
Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.
- Returns
Sparse matrix of binned state-action pairs and their associate average reward.
- Return type
sparse.COO
- bin_state(state, idxs=None)[source]
Bin a singular state.
The states are binned according to the number of bins of each feature.
- Parameters
state (list) – State to bin.
idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions).
- Returns
Binned state.
- Return type
binned (list)
- bin_states(states, idxs=None)[source]
Bin a list of states.
- Parameters
states (list[list]) – State to bin.
idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions).
- Returns
Binned state.
- Return type
b_states (list)
- debin_states(b_states, idxs=None)[source]
Debin a list of binned states.
- Parameters
b_states (list[list]) – Binned states to debin.
idxs (list) – indexes of the state dimensions. This argument can be used if the state list contains only certain features (e.g. only actions)
- Returns
Binned state.
- Return type
states (list)
explainable_rl.environments.strategic_pricing_prediction module
- class StrategicPricingPredictionMDP(dh, bins=None, verbose=False)[source]
Bases:
StrategicPricingEnvironment for Strategic Pricing (prediction task).
- __init__(dh, bins=None, verbose=False)[source]
Initialise Strategic Pricing Environment.
- Parameters
dh (DataHandler) – Data Handler instance.
verbose (bool) – Whether print statements about the program flow should be displayed.
- _create_average_reward_matrix(bins_dict)[source]
Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.
- Parameters
bins_dict (dict) – Counts of datapoints per bin and sum of the associated rewards.
- Returns
Sparse matrix of binned state-action pairs and their associated average reward.
- Return type
sparse.COO
- _get_counts_and_rewards_per_bin(binned)[source]
Create a dictionary of counts of datapoints per bin and sum the associated rewards.
- Parameters
binned (np.array) – Binned state-action pairs.
- Returns
Dictionary of counts of datapoints per bin and sums the associated rewards.
- Return type
dict
- _make_rewards_from_data()[source]
Create sparse matrix of the state-action pairs and associated rewards from the inputted dataset.
- Returns
Sparse matrix of binned state-action pairs and their associate average reward.
- Return type
sparse.COO
- step(state, action)[source]
Take a step in the environment.
Note that the last element (the Done flag) of the return tuple is always True as the prediction problem requires single step episodes for which the Done flag is always True.
- Parameters
state (list) – Current state values of agent.
action (int) – Action for agent to take.
- Returns
Current state, action, next state, done flag.
- Return type
tuple
explainable_rl.environments.strategic_pricing_suggestion module
- class StrategicPricingSuggestionMDP(dh, bins=None, verbose=False)[source]
Bases:
StrategicPricingEnvironment for Strategic Pricing (suggestion task).
- _create_average_reward_matrix(bins_dict)[source]
Create a sparse matrix of the state-action pairs and associated rewards from the inputted dataset.
- Parameters
bins_dict (dict) – Counts of datapoints per bin and sum of the associated rewards.
- Returns
Binned state-action pairs and their associated average reward.
- Return type
sparse.COO
- _find_next_state(state, action)[source]
Lookup whether the next state exists in the state-action space matrix.
- Parameters
state (list) – Current state values of agent.
action (int) – Action for agent to take.
- Returns
Next state for the agent to visit. bool: Whether the environment has terminated.
- Return type
list
- _get_counts_and_rewards_per_bin(binned)[source]
Create a dictionary of counts of datapoints per bin and sum the associated rewards.
- Parameters
binned (np.array) – Binned state-action pairs.
- Returns
Counts of datapoints per bin and sums the associated rewards.
- Return type
dict