aprel.querying package¶

aprel.querying.acquisition_functions module¶

This module contains a set of acquisition functions that determine the value of a given query, which is useful for acitive query optimization.

aprel.querying.acquisition_functions.disagreement(weights: numpy.array, logprobs: List[float], **kwargs) → float¶

This function returns the disagreement value between two sets of reward weights (weights’s). This is useful as an acquisition function when a trajectory planner is available and when the desired query contains only two trajectories. The pair of weights with the highest disagreement is found and then the best trajectories according to them forms the optimized query.

This is implemented based on the following paper:

Learning an Urban Air Mobility Encounter Model from Expert Preferences

Parameters

weights (numpy.array) – 2 x d array where each row is a set of reward weights. The disagreement between these two weights will be calculated.
logprobs (List[float]) – log probabilities of the given reward weights under the belief.
**kwargs –
acquisition function hyperparameters:
- lambda (float) The tradeoff parameter. The higher lambda, the more important the
  disagreement between the weights is. The lower lambda, the more important their log probabilities. Defaults to 0.01.

Returns

the disagreement value (always nonnegative)

Return type

float

Raises

AssertionError – if weights and logprobs have mismatching number of elements.

aprel.querying.acquisition_functions.mutual_information(belief: aprel.learning.belief_models.Belief, query: aprel.learning.data_types.Query, **kwargs) → float¶

This function returns the mutual information between the given belief distribution and the query. Maximum mutual information is often desired for data-efficient learning.

This is implemented based on the following paper:

Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

Parameters

belief (Belief) – the current belief distribution over the reward function
query (Query) – a query to ask the user
**kwargs – none used currently

Returns

the mutual information value (always nonnegative)

Return type

float

aprel.querying.acquisition_functions.random()¶: This function does nothing, but is added so that aprel.querying.query_optimizer can use it as a check.

aprel.querying.acquisition_functions.regret(weights: numpy.array, logprobs: List[float], planned_trajectories: List[aprel.basics.trajectory.Trajectory], **kwargs) → float¶

This function returns the regret value between two sets of reward weights (weights’s). This is useful as an acquisition function when a trajectory planner is available and when the desired query contains only two trajectories. The pair of weights with the highest regret is found and then the best trajectories according to them forms the optimized query.

This is implemented based on the following paper:

Active Preference Learning using Maximum Regret

TODO

This acquisition function requires all rewards to be positive, but there is no check for that.

Parameters

weights (numpy.array) – 2 x d array where each row is a set of reward weights. The regret between these two weights will be calculated.
logprobs (List[float]) – log probabilities of the given reward weights under the belief.
planned_trajectories (List[Trajectory]) – the optimal trajectories under the given reward weights.
**kwargs – none used currently

Returns

the regret value

Return type

float

Raises

AssertionError – if weights, logprobs and planned_trajectories have mismatching number of elements.

aprel.querying.acquisition_functions.thompson()¶: This function does nothing, but is added so that aprel.querying.query_optimizer can use it as a check.

aprel.querying.acquisition_functions.volume_removal(belief: aprel.learning.belief_models.Belief, query: aprel.learning.data_types.Query, **kwargs) → float¶

This function returns the expected volume removal from the unnormalized belief distribution. Maximum volume removal is often desired for data-efficient learning.

This is implemented based on the following two papers:

Note

As Bıyık et al. (2019) pointed out, volume removal has trivial global maximizers when query maximizes the uncertainty for the user, e.g., when all trajectories in the slate of a PreferenceQuery is identical. Hence, the optimizations with volume removal are often ill-posed.

Parameters

belief (Belief) – the current belief distribution over the reward function
query (Query) – a query to ask the user
**kwargs – none used currently

Returns

the expected volume removal value (always nonnegative)

Return type

float

aprel.querying.query_optimizer module¶

This file contains classes which have functions to optimize the queries to ask the human.

class aprel.querying.query_optimizer.QueryOptimizer¶

Bases: object

An abstract class for query optimizer frameworks.

acquisition_functions¶

keeps name-function pairs for the acquisition functions. If new acquisition functions are implemented, they should be added to this dictionary.

Type: Dict

class aprel.querying.query_optimizer.QueryOptimizerDiscreteTrajectorySet(trajectory_set: aprel.basics.trajectory.TrajectorySet)¶

Bases: aprel.querying.query_optimizer.QueryOptimizer

Query optimization framework that assumes a discrete set of trajectories is available. The query optimization is then performed over this discrete set.

Parameters: trajectory_set (TrajectorySet) – The set of trajectories from which the queries will be optimized. This set defines the possible set of trajectories that may show up in the optimized query.

trajectory_set¶

The set of trajectories from which the queries are optimized. This set defines the possible set of trajectories that may show up in the optimized query.

Type: TrajectorySet

argplanner(user: aprel.learning.user_models.User) → int¶

Given a user model, returns the index of the trajectory that best fits the user in the trajectory set.

Parameters: user (User) – The user object for whom the optimal trajectory is being searched.
Returns: The index of the optimal trajectory in the trajectory set.
Return type: int

boundary_medoids_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs) → Tuple[List[aprel.learning.data_types.Query], numpy.array]¶

Uses the boundary medoids method to find a batch of queries. See Batch Active Preference-Based Learning of Reward Functions for more information about the method.

Parameters

acquisition_func (Callable) – the acquisition function to be maximized by each individual query.
belief (Belief) – the current belief distribution over the user.
initial_query (Query) – an initial query such that the output query will have the same type.
batch_size (int) – the batch size of the output.
**kwargs –
Hyperparameters reduced_size, distance, and extra arguments needed for specific acquisition functions.
- reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the boundary medoids selection. Defaults to 100.
- distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().

Returns

List[Query]: The optimized batch of queries as a list.
numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

dpp_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs) → Tuple[List[aprel.learning.data_types.Query], numpy.array]¶

Uses the determinantal point process (DPP) based method to find a batch of queries. See Batch Active Learning Using Determinantal Point Processes for more information about the method.

Parameters

acquisition_func (Callable) – the acquisition function to be maximized by each individual query.
belief (Belief) – the current belief distribution over the user.
initial_query (Query) – an initial query such that the output query will have the same type.
batch_size (int) – the batch size of the output.
**kwargs –
Hyperparameters reduced_size, distance, gamma, and extra arguments needed for specific acquisition functions.
- reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the boundary medoids selection. Defaults to 100.
- distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().
- gamma (float): The hyperparameter gamma in the original method. The higher gamma, the more important the acquisition function values. The lower gamma, the more important the diversity of queries. Defaults to 1.

Returns

List[Query]: The optimized batch of queries as a list.
numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

exhaustive_search(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, **kwargs) → Tuple[List[aprel.learning.data_types.Query], numpy.array]¶

Searches over the possible queries to find the singular most optimal query.

Parameters

acquisition_func (Callable) – the acquisition function to be maximized.
belief (Belief) – the current belief distribution over the user.
initial_query (Query) – an initial query such that the output query will have the same type.
**kwargs – extra arguments needed for specific acquisition functions.

Returns

List[Query]: The optimal query as a list of one Query.
numpy.array: An array of floats that keep the acquisition function value corresponding to the output query.

Return type

2-tuple

greedy_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs) → Tuple[List[aprel.learning.data_types.Query], numpy.array]¶

Uses the greedy method to find a batch of queries by selecting the batch_size individually most optimal queries.

Parameters

acquisition_func (Callable) – the acquisition function to be maximized by each individual query.
belief (Belief) – the current belief distribution over the user.
initial_query (Query) – an initial query such that the output query will have the same type.
batch_size (int) – the batch size of the output.
**kwargs – extra arguments needed for specific acquisition functions.

Returns

List[Query]: The optimized batch of queries as a list.
numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

medoids_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs) → Tuple[List[aprel.learning.data_types.Query], numpy.array]¶

Uses the medoids method to find a batch of queries. See Batch Active Preference-Based Learning of Reward Functions for more information about the method.

Parameters

acquisition_func (Callable) – the acquisition function to be maximized by each individual query.
belief (Belief) – the current belief distribution over the user.
initial_query (Query) – an initial query such that the output query will have the same type.
batch_size (int) – the batch size of the output.
**kwargs –
Hyperparameters reduced_size, distance, and extra arguments needed for specific acquisition functions.
- reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the medoids selection. Defaults to 100.
- distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().

Returns

List[Query]: The optimized batch of queries as a list.
numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

optimize(acquisition_func_str: str, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int = 1, optimization_method: str = 'exhaustive_search', **kwargs) → Tuple[List[aprel.learning.data_types.Query], numpy.array]¶

This function generates the optimal query or the batch of queries to ask to the user given a belief distribution about them. It also returns the acquisition function values of the optimized queries.

Parameters

acquisition_func_str (str) –
the name of the acquisition function used to decide the value of each query. Currently implemented options are:
- disagreement: Based on Katz. et al. (2019).
- mutual_information: Based on Bıyık et al. (2019).
- random: Randomly chooses a query.
- regret: Based on Wilde et al. (2020).
- thompson: Based on Tucker et al. (2019).
- volume_removal: Based on Sadigh et al. (2017) and Bıyık et al..
belief (Belief) – the current belief distribution over the user.
initial_query (Query) – an initial query such that the output query will have the same type.
batch_size (int) – the number of queries to return.
optimization_method (str) –
the name of the method used to select queries. Currently implemented options are:
- exhaustive_search: Used for exhaustively searching a single query.
- greedy: Exhaustively searches for the top batch_size queries in terms of the acquisition function.
- medoids: Batch generation method based on Bıyık et al. (2018).
- boundary_medoids: Batch generation method based on Bıyık et al. (2018).
- successive_elimination: Batch generation method based on Bıyık et al. (2018).
- dpp: Batch generation method based on Bıyık et al. (2019).
**kwargs – extra arguments needed for specific optimization methods or acquisition functions.
Returns –
2-tuple:
- List[Query]: The list of optimized queries. Note: Even if batch_size is 1, a list is returned.
- numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

planner(user: aprel.learning.user_models.User) → aprel.basics.trajectory.Trajectory ¶

Given a user model, returns the trajectory in the trajectory set that best fits the user.

Parameters: user (User) – The user object for whom the optimal trajectory is being searched.
Returns: The optimal trajectory in the trajectory set.
Return type: Trajectory

successive_elimination_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs) → Tuple[List[aprel.learning.data_types.Query], numpy.array]¶

Uses the successive elimination method to find a batch of queries. See Batch Active Preference-Based Learning of Reward Functions for more information about the method.

Parameters

acquisition_func (Callable) – the acquisition function to be maximized by each individual query.
belief (Belief) – the current belief distribution over the user.
initial_query (Query) – an initial query such that the output query will have the same type.
batch_size (int) – the batch size of the output.
**kwargs –
Hyperparameters reduced_size, distance, and extra arguments needed for specific acquisition functions.
- reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the boundary medoids selection. Defaults to 100.
- distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().

Returns

List[Query]: The optimized batch of queries as a list.
numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple