aprel.querying package

aprel.querying.acquisition_functions module

This module contains a set of acquisition functions that determine the value of a given query, which is useful for acitive query optimization.

aprel.querying.acquisition_functions.disagreement(weights: numpy.array, logprobs: List[float], **kwargs)float

This function returns the disagreement value between two sets of reward weights (weights’s). This is useful as an acquisition function when a trajectory planner is available and when the desired query contains only two trajectories. The pair of weights with the highest disagreement is found and then the best trajectories according to them forms the optimized query.

This is implemented based on the following paper:
Parameters
  • weights (numpy.array) – 2 x d array where each row is a set of reward weights. The disagreement between these two weights will be calculated.

  • logprobs (List[float]) – log probabilities of the given reward weights under the belief.

  • **kwargs

    acquisition function hyperparameters:

    • lambda (float) The tradeoff parameter. The higher lambda, the more important the

      disagreement between the weights is. The lower lambda, the more important their log probabilities. Defaults to 0.01.

Returns

the disagreement value (always nonnegative)

Return type

float

Raises

AssertionError – if weights and logprobs have mismatching number of elements.

aprel.querying.acquisition_functions.mutual_information(belief: aprel.learning.belief_models.Belief, query: aprel.learning.data_types.Query, **kwargs)float

This function returns the mutual information between the given belief distribution and the query. Maximum mutual information is often desired for data-efficient learning.

This is implemented based on the following paper:
Parameters
  • belief (Belief) – the current belief distribution over the reward function

  • query (Query) – a query to ask the user

  • **kwargs – none used currently

Returns

the mutual information value (always nonnegative)

Return type

float

aprel.querying.acquisition_functions.random()

This function does nothing, but is added so that aprel.querying.query_optimizer can use it as a check.

aprel.querying.acquisition_functions.regret(weights: numpy.array, logprobs: List[float], planned_trajectories: List[aprel.basics.trajectory.Trajectory], **kwargs)float

This function returns the regret value between two sets of reward weights (weights’s). This is useful as an acquisition function when a trajectory planner is available and when the desired query contains only two trajectories. The pair of weights with the highest regret is found and then the best trajectories according to them forms the optimized query.

This is implemented based on the following paper:
TODO

This acquisition function requires all rewards to be positive, but there is no check for that.

Parameters
  • weights (numpy.array) – 2 x d array where each row is a set of reward weights. The regret between these two weights will be calculated.

  • logprobs (List[float]) – log probabilities of the given reward weights under the belief.

  • planned_trajectories (List[Trajectory]) – the optimal trajectories under the given reward weights.

  • **kwargs – none used currently

Returns

the regret value

Return type

float

Raises

AssertionError – if weights, logprobs and planned_trajectories have mismatching number of elements.

aprel.querying.acquisition_functions.thompson()

This function does nothing, but is added so that aprel.querying.query_optimizer can use it as a check.

aprel.querying.acquisition_functions.volume_removal(belief: aprel.learning.belief_models.Belief, query: aprel.learning.data_types.Query, **kwargs)float

This function returns the expected volume removal from the unnormalized belief distribution. Maximum volume removal is often desired for data-efficient learning.

This is implemented based on the following two papers:
Note

As Bıyık et al. (2019) pointed out, volume removal has trivial global maximizers when query maximizes the uncertainty for the user, e.g., when all trajectories in the slate of a PreferenceQuery is identical. Hence, the optimizations with volume removal are often ill-posed.

Parameters
  • belief (Belief) – the current belief distribution over the reward function

  • query (Query) – a query to ask the user

  • **kwargs – none used currently

Returns

the expected volume removal value (always nonnegative)

Return type

float

aprel.querying.query_optimizer module

This file contains classes which have functions to optimize the queries to ask the human.

class aprel.querying.query_optimizer.QueryOptimizer

Bases: object

An abstract class for query optimizer frameworks.

acquisition_functions

keeps name-function pairs for the acquisition functions. If new acquisition functions are implemented, they should be added to this dictionary.

Type

Dict

class aprel.querying.query_optimizer.QueryOptimizerDiscreteTrajectorySet(trajectory_set: aprel.basics.trajectory.TrajectorySet)

Bases: aprel.querying.query_optimizer.QueryOptimizer

Query optimization framework that assumes a discrete set of trajectories is available. The query optimization is then performed over this discrete set.

Parameters

trajectory_set (TrajectorySet) – The set of trajectories from which the queries will be optimized. This set defines the possible set of trajectories that may show up in the optimized query.

trajectory_set

The set of trajectories from which the queries are optimized. This set defines the possible set of trajectories that may show up in the optimized query.

Type

TrajectorySet

argplanner(user: aprel.learning.user_models.User)int

Given a user model, returns the index of the trajectory that best fits the user in the trajectory set.

Parameters

user (User) – The user object for whom the optimal trajectory is being searched.

Returns

The index of the optimal trajectory in the trajectory set.

Return type

int

boundary_medoids_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs)Tuple[List[aprel.learning.data_types.Query], numpy.array]

Uses the boundary medoids method to find a batch of queries. See Batch Active Preference-Based Learning of Reward Functions for more information about the method.

Parameters
  • acquisition_func (Callable) – the acquisition function to be maximized by each individual query.

  • belief (Belief) – the current belief distribution over the user.

  • initial_query (Query) – an initial query such that the output query will have the same type.

  • batch_size (int) – the batch size of the output.

  • **kwargs

    Hyperparameters reduced_size, distance, and extra arguments needed for specific acquisition functions.

    • reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the boundary medoids selection. Defaults to 100.

    • distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().

Returns

  • List[Query]: The optimized batch of queries as a list.

  • numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

dpp_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs)Tuple[List[aprel.learning.data_types.Query], numpy.array]

Uses the determinantal point process (DPP) based method to find a batch of queries. See Batch Active Learning Using Determinantal Point Processes for more information about the method.

Parameters
  • acquisition_func (Callable) – the acquisition function to be maximized by each individual query.

  • belief (Belief) – the current belief distribution over the user.

  • initial_query (Query) – an initial query such that the output query will have the same type.

  • batch_size (int) – the batch size of the output.

  • **kwargs

    Hyperparameters reduced_size, distance, gamma, and extra arguments needed for specific acquisition functions.

    • reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the boundary medoids selection. Defaults to 100.

    • distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().

    • gamma (float): The hyperparameter gamma in the original method. The higher gamma, the more important the acquisition function values. The lower gamma, the more important the diversity of queries. Defaults to 1.

Returns

  • List[Query]: The optimized batch of queries as a list.

  • numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

Searches over the possible queries to find the singular most optimal query.

Parameters
  • acquisition_func (Callable) – the acquisition function to be maximized.

  • belief (Belief) – the current belief distribution over the user.

  • initial_query (Query) – an initial query such that the output query will have the same type.

  • **kwargs – extra arguments needed for specific acquisition functions.

Returns

  • List[Query]: The optimal query as a list of one Query.

  • numpy.array: An array of floats that keep the acquisition function value corresponding to the output query.

Return type

2-tuple

greedy_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs)Tuple[List[aprel.learning.data_types.Query], numpy.array]

Uses the greedy method to find a batch of queries by selecting the batch_size individually most optimal queries.

Parameters
  • acquisition_func (Callable) – the acquisition function to be maximized by each individual query.

  • belief (Belief) – the current belief distribution over the user.

  • initial_query (Query) – an initial query such that the output query will have the same type.

  • batch_size (int) – the batch size of the output.

  • **kwargs – extra arguments needed for specific acquisition functions.

Returns

  • List[Query]: The optimized batch of queries as a list.

  • numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

medoids_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs)Tuple[List[aprel.learning.data_types.Query], numpy.array]

Uses the medoids method to find a batch of queries. See Batch Active Preference-Based Learning of Reward Functions for more information about the method.

Parameters
  • acquisition_func (Callable) – the acquisition function to be maximized by each individual query.

  • belief (Belief) – the current belief distribution over the user.

  • initial_query (Query) – an initial query such that the output query will have the same type.

  • batch_size (int) – the batch size of the output.

  • **kwargs

    Hyperparameters reduced_size, distance, and extra arguments needed for specific acquisition functions.

    • reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the medoids selection. Defaults to 100.

    • distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().

Returns

  • List[Query]: The optimized batch of queries as a list.

  • numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple

optimize(acquisition_func_str: str, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int = 1, optimization_method: str = 'exhaustive_search', **kwargs)Tuple[List[aprel.learning.data_types.Query], numpy.array]

This function generates the optimal query or the batch of queries to ask to the user given a belief distribution about them. It also returns the acquisition function values of the optimized queries.

Parameters
  • acquisition_func_str (str) –

    the name of the acquisition function used to decide the value of each query. Currently implemented options are:

  • belief (Belief) – the current belief distribution over the user.

  • initial_query (Query) – an initial query such that the output query will have the same type.

  • batch_size (int) – the number of queries to return.

  • optimization_method (str) –

    the name of the method used to select queries. Currently implemented options are:

    • exhaustive_search: Used for exhaustively searching a single query.

    • greedy: Exhaustively searches for the top batch_size queries in terms of the acquisition function.

    • medoids: Batch generation method based on Bıyık et al. (2018).

    • boundary_medoids: Batch generation method based on Bıyık et al. (2018).

    • successive_elimination: Batch generation method based on Bıyık et al. (2018).

    • dpp: Batch generation method based on Bıyık et al. (2019).

  • **kwargs – extra arguments needed for specific optimization methods or acquisition functions.

  • Returns

    2-tuple:

    • List[Query]: The list of optimized queries. Note: Even if batch_size is 1, a list is returned.

    • numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

planner(user: aprel.learning.user_models.User)aprel.basics.trajectory.Trajectory

Given a user model, returns the trajectory in the trajectory set that best fits the user.

Parameters

user (User) – The user object for whom the optimal trajectory is being searched.

Returns

The optimal trajectory in the trajectory set.

Return type

Trajectory

successive_elimination_batch(acquisition_func: Callable, belief: aprel.learning.belief_models.Belief, initial_query: aprel.learning.data_types.Query, batch_size: int, **kwargs)Tuple[List[aprel.learning.data_types.Query], numpy.array]

Uses the successive elimination method to find a batch of queries. See Batch Active Preference-Based Learning of Reward Functions for more information about the method.

Parameters
  • acquisition_func (Callable) – the acquisition function to be maximized by each individual query.

  • belief (Belief) – the current belief distribution over the user.

  • initial_query (Query) – an initial query such that the output query will have the same type.

  • batch_size (int) – the batch size of the output.

  • **kwargs

    Hyperparameters reduced_size, distance, and extra arguments needed for specific acquisition functions.

    • reduced_size (int): The hyperparameter B in the original method. This method first greedily chooses B queries from the feasible set of queries out of the trajectory set, and then applies the boundary medoids selection. Defaults to 100.

    • distance (Callable): A distance function which returns a pairwise distance matrix (numpy.array) when inputted a list of queries. Defaults to aprel.utils.batch_utils.default_query_distance().

Returns

  • List[Query]: The optimized batch of queries as a list.

  • numpy.array: An array of floats that keep the acquisition function values corresponding to the output queries.

Return type

2-tuple