aprel.learning package¶

aprel.learning.belief_models module¶

This file contains Belief classes, which store and update the belief distributions about the user whose reward function is being learned.

TODO: GaussianBelief class will be implemented so that the library will include the following work: E. Biyik, N. Huynh, M. J. Kochenderger, D. Sadigh; “Active Preference-Based Gaussian Process Regression for Reward Learning”, RSS’20.

class aprel.learning.belief_models.Belief¶

Bases: object

An abstract class for Belief distributions.

update(data: Union[aprel.learning.data_types.QueryWithResponse, List[aprel.learning.data_types.QueryWithResponse]], **kwargs)¶: Updates the belief distribution with a given feedback or a list of feedbacks.

class aprel.learning.belief_models.LinearRewardBelief¶

Bases: aprel.learning.belief_models.Belief

An abstract class for Belief distributions for the problems where reward function is assumed to be a linear function of the features.

property mean: Dict¶: Returns the mean parameters with respect to the belief distribution.

class aprel.learning.belief_models.SamplingBasedBelief(user_model: aprel.learning.user_models.User, dataset: List[aprel.learning.data_types.QueryWithResponse], initial_point: Dict, logprior: Callable = <function uniform_logprior>, num_samples: int = 100, **kwargs)¶

Bases: aprel.learning.belief_models.LinearRewardBelief

A class for sampling based belief distributions.

In this model, the entire dataset of user feedback is stored and used for calculating the true posterior value for any given set of parameters. A set of parameter samples are then sampled from this true posterior using Metropolis-Hastings algorithm.

Parameters

logprior (Callable) – The logarithm of the prior distribution over the user parameters.
user_model (User) – The user response model that will be assumed by this belief distribution.
dataset (List[QueryWithResponse]) – A list of user feeedbacks.
initial_point (Dict) – An initial set of user parameters for Metropolis-Hastings to start.
logprior – The logarithm of the prior distribution over the user parameters. Defaults to a uniform distribution over the hyperball.
num_samples (int) – The number of parameter samples that will be sampled using Metropolis-Hastings.
**kwargs –
Hyperparameters for Metropolis-Hastings, which include:
- burnin (int): The number of initial samples that will be discarded to remove the correlation with the initial parameter set.
- thin (int): Once in every thin sample will be kept to reduce the autocorrelation between the samples.
- proposal_distribution (Callable): The proposal distribution for the steps in Metropolis-Hastings.

user_model¶

The user response model that is assumed by the belief distribution.

Type: User

dataset¶

A list of user feeedbacks.

Type: List[QueryWithResponse]

num_samples¶

The number of parameter samples that will be sampled using Metropolis-Hastings.

Type: int

sampling_params¶

Hyperparameters for Metropolis-Hastings, which include:

burnin (int): The number of initial samples that will be discarded to remove the correlation with the initial parameter set.
thin (int): Once in every thin sample will be kept to reduce the autocorrelation between the samples.
proposal_distribution (Callable): The proposal distribution for the steps in Metropolis-Hastings.

Type: Dict

create_samples(initial_point: Dict) → Tuple[List[Dict], List[float]]¶

Samples num_samples many user parameters from the posterior using Metropolis-Hastings.

Parameters

initial_point (Dict) – initial point to start the chain for Metropolis-Hastings.

Returns

List[Dict]: dictionaries where each dictionary is a sample of user parameters.
List[float]: float values where each entry is the log-probability of the corresponding sample.

Return type

2-tuple

property mean: Dict¶: Returns the mean of the belief distribution by taking the mean over the samples generated by Metropolis-Hastings.

update(data: Union[aprel.learning.data_types.QueryWithResponse, List[aprel.learning.data_types.QueryWithResponse]], initial_point: Optional[Dict] = None)¶

Updates the belief distribution based on the new feedback (query-response pairs), by adding these to the current dataset and then re-sampling with Metropolis-Hastings. :param data: one or more QueryWithResponse, which

contains multiple trajectory options and the index of the one the user selected as most optimal

Parameters: initial_point (Dict) – the initial point to start Metropolis-Hastings from, will be set to the mean from the previous distribution if None

aprel.learning.data_types module¶

Modules for queries and user responses.

TODO: OrdinalQuery classes will be implemented so that the library will include ordinal data, which was used for reward learning in: K. Li, M. Tucker, E. Biyik, E. Novoseller, J. W. Burdick, Y. Sui, D. Sadigh, Y. Yue, A. D. Ames; “ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes”, ICRA’21.

class aprel.learning.data_types.Demonstration(trajectory: aprel.basics.trajectory.Trajectory, query: Optional[aprel.learning.data_types.DemonstrationQuery] = None)¶

Bases: aprel.learning.data_types.QueryWithResponse

The trajectory generated by the DemonstrationQuery, along with the DemonstrationQuery that prompted the user with the initial state.

For preference-based reward learning initialized with demonstrations, this class should be used (without actually querying the user). First, the demonstration should be collected as a Trajectory object. Then, a Demonstration instance should be created with this trajectory without specifying the query parameter, in which case it is automatically assigned as the initial state of the trajectory.

Parameters

trajectory (Trajectory) – The demonstrated trajectory.
query (DemonstrationQuery) – The query that led to the trajectory, i.e., the initial state of the trajectory.

trajectory¶

The demonstrated trajectory.

Type: Trajectory

features¶

The features of the demonstrated trajectory.

Type: numpy.array

Raises: AssertionError – if the initial state of the trajectory does not match with the query.

class aprel.learning.data_types.DemonstrationQuery(initial_state: numpy.array)¶

Bases: aprel.learning.data_types.Query

A demonstration query is one where the initial state is given to the user, and they are asked to control the robot.

Although not practical for optimization, this class is defined for coherence with other query types.

Parameters: initial_state (numpy.array) – The initial state of the environment.

initial_state¶

The initial state of the environment.

Type: numpy.array

class aprel.learning.data_types.FullRanking(query: aprel.learning.data_types.FullRankingQuery, response: List[int])¶

Bases: aprel.learning.data_types.QueryWithResponse

A Full Ranking feedback.

Contains the FullRankingQuery the user responded to and the response.

Parameters

query (FullRankingQuery) – The query for which the feedback was given.
response (numpy.array) – The response of the user to the query, indices from the most preferred to the least.

response¶

The response of the user to the query, indices from the most preferred to the least.

Type: numpy.array

Raises: AssertionError – if the response is not in the response set of the query.

class aprel.learning.data_types.FullRankingQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])¶

Bases: aprel.learning.data_types.Query

A full ranking query is one where the user is presented with multiple trajectories and asked for a ranking from their most preferred trajectory to the least.

Parameters: slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.

K¶

The number of trajectories in the query.

Type: int

response_set¶

The set of possible responses to the query, which is all K-combinations of the trajectory indices in the slate.

Type: numpy.array

Raises: AssertionError – if slate has less than 2 trajectories.

property slate: aprel.basics.trajectory.TrajectorySet¶: Returns a TrajectorySet of the trajectories in the query.

visualize(delay: float = 0.0) → List[int]¶

Visualizes the query and interactively asks for a response.

Parameters: delay (float) – The waiting time between each trajectory visualization in seconds.
Returns: The response of the user, as a list from the most preferred to the least.
Return type: List[int]

class aprel.learning.data_types.Preference(query: aprel.learning.data_types.PreferenceQuery, response: int)¶

Bases: aprel.learning.data_types.QueryWithResponse

A Preference feedback.

Contains the PreferenceQuery the user responded to and the response.

Parameters

query (PreferenceQuery) – The query for which the feedback was given.
response (int) – The response of the user to the query.

response¶

The response of the user to the query.

Type: int

Raises: AssertionError – if the response is not in the response set of the query.

class aprel.learning.data_types.PreferenceQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])¶

Bases: aprel.learning.data_types.Query

A preference query is one where the user is presented with multiple trajectories and asked for their favorite among them.

Parameters: slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.

K¶

The number of trajectories in the query.

Type: int

response_set¶

The set of possible responses to the query.

Type: numpy.array

Raises: AssertionError – if slate has less than 2 trajectories.

property slate: aprel.basics.trajectory.TrajectorySet¶: Returns a TrajectorySet of the trajectories in the query.

visualize(delay: float = 0.0) → int¶

Visualizes the query and interactively asks for a response.

Parameters: delay (float) – The waiting time between each trajectory visualization in seconds.
Returns: The response of the user.
Return type: int

class aprel.learning.data_types.Query¶

Bases: object

An abstract parent class that is useful for typing.

A query is a question to the user.

copy()¶: Returns a deep copy of the query.

visualize(delay: float = 0.0)¶

Visualizes the query, i.e., asks it to the user.

Parameters: delay (float) – The waiting time between each trajectory visualization in seconds.

class aprel.learning.data_types.QueryWithResponse(query: aprel.learning.data_types.Query)¶

Bases: object

An abstract parent class that is useful for typing.

An instance of this class holds both the query and the user’s response to that query.

Parameters: query (Query) – The query.

query¶

The query.

Type: Query

class aprel.learning.data_types.WeakComparison(query: aprel.learning.data_types.WeakComparisonQuery, response: int)¶

Bases: aprel.learning.data_types.QueryWithResponse

A Weak Comparison feedback.

Contains the WeakComparisonQuery the user responded to and the response.

Parameters

query (WeakComparisonQuery) – The query for which the feedback was given.
response (int) – The response of the user to the query.

response¶

The response of the user to the query.

Type: int

Raises: AssertionError – if the response is not in the response set of the query.

class aprel.learning.data_types.WeakComparisonQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])¶

Bases: aprel.learning.data_types.Query

A weak comparison query is one where the user is presented with two trajectories and asked for their favorite among them, but also given an option to say ‘they are about equal’.

Parameters: slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.

K¶

The number of trajectories in the query. It is always equal to 2 and kept for consistency with PreferenceQuery and FullRankingQuery.

Type: int

response_set¶

The set of possible responses to the query, which is always equal to [-1, 0, 1] where -1 represents the About Equal option.

Type: numpy.array

Raises: AssertionError – if slate does not have exactly 2 trajectories.

property slate: aprel.basics.trajectory.TrajectorySet¶: Returns a TrajectorySet of the trajectories in the query.

visualize(delay: float = 0.0) → int¶

Visualizes the query and interactively asks for a response.

Parameters: delay (float) – The waiting time between each trajectory visualization in seconds.
Returns: The response of the user.
Return type: int

aprel.learning.data_types.isinteger(input: str) → bool¶

Returns whether input is an integer.

Note: This function returns False if input is a string of a float, e.g., ‘3.0’.
TODO: Should this go to utils?
Parameters: input (str) – The string to be checked for being an integer.
Returns: True if the input is an integer, False otherwise.
Return type: bool
Raises: AssertionError – if the input is not a string.

aprel.learning.user_models module¶

Modules for user response models, including human users.

class aprel.learning.user_models.HumanUser(delay: float = 0.0)¶

Bases: aprel.learning.user_models.User

Human user class whose response model is unknown. This class is useful for interactive runs, where a real human responds to the queries rather than simulated user models.

Parameters: delay (float) – The waiting time between each trajectory visualization during querying in seconds.

delay¶

The waiting time between each trajectory visualization during querying in seconds.

Type: float

respond(queries: Union[aprel.learning.data_types.Query, List[aprel.learning.data_types.Query]]) → List¶

Interactively asks for the user’s responses to the given queries.

Parameters

queries (Query or List[Query]) – A query or a list of queries for which the user’s response(s) is/are requested.

Returns

A list of user responses where each response corresponds to the query in the queries.

Note: The return type is always a list, even if the input is a single query.

Return type

List

class aprel.learning.user_models.SoftmaxUser(params_dict: Dict)¶

Bases: aprel.learning.user_models.User

Softmax user class whose response model follows the softmax choice rule, i.e., when presented with multiple trajectories, this user chooses each trajectory with a probability that is proportional to the expontential of the reward of that trajectory.

Parameters: params_dict (Dict) – the parameters of the softmax user model, which are: - weights (numpy.array): the weights of the linear reward function. - beta (float): rationality coefficient for comparisons and rankings. - beta_D (float): rationality coefficient for demonstrations. - delta (float): the perceivable difference parameter for weak comparison queries.
Raises: AssertionError – if a weights parameter is not provided in the params_dict.

loglikelihood(data: aprel.learning.data_types.QueryWithResponse) → float¶

Overwrites the parent’s method. See User for more information.

Note: The loglikelihood value is the logarithm of the unnormalized likelihood if the input is a demonstration. Otherwise, it is the exact loglikelihood.

response_logprobabilities(query: aprel.learning.data_types.Query) → numpy.array¶: Overwrites the parent’s method. See User for more information.

reward(trajectories: Union[aprel.basics.trajectory.Trajectory, aprel.basics.trajectory.TrajectorySet]) → Union[float, numpy.array]¶

Returns the reward of a trajectory or a set of trajectories conditioned on the user.

Parameters: trajectories (Trajectory or TrajectorySet) – The trajectories for which the reward will be calculated.
Returns: the reward value of the trajectories conditioned on the user.
Return type: numpy.array or float

class aprel.learning.user_models.User(params_dict: Optional[Dict] = None)¶

Bases: object

An abstract class to model the user of which the reward function is being learned.

Parameters: params_dict (Dict) – parameters of the user model.

copy()¶

likelihood(data: aprel.learning.data_types.QueryWithResponse) → float¶

Returns the likelihood of the given user feedback under the user.

Parameters: data (QueryWithResponse) – The data (which keeps a query and a response) for which the likelihood is going to be calculated.
Returns: The likelihood of data under the user.
Return type: float

likelihood_dataset(dataset: List[aprel.learning.data_types.QueryWithResponse]) → float¶

Returns the likelihood of the given feedback dataset under the user.

Parameters: dataset (List[QueryWithResponse]) – The dataset (which keeps a list of feedbacks) for which the likelihood is going to be calculated.
Returns: The likelihood of dataset under the user.
Return type: float

loglikelihood(data: aprel.learning.data_types.QueryWithResponse) → float¶

Returns the loglikelihood of the given user feedback under the user.

Parameters: data (QueryWithResponse) – The data (which keeps a query and a response) for which the loglikelihood is going to be calculated.
Returns: The loglikelihood of data under the user.
Return type: float

loglikelihood_dataset(dataset: List[aprel.learning.data_types.QueryWithResponse]) → float¶

Returns the loglikelihood of the given feedback dataset under the user.

Parameters: dataset (List[QueryWithResponse]) – The dataset (which keeps a list of feedbacks) for which the loglikelihood is going to be calculated.
Returns: The loglikelihood of dataset under the user.
Return type: float

property params¶: Returns the parameters of the user.

respond(queries: Union[aprel.learning.data_types.Query, List[aprel.learning.data_types.Query]]) → List¶

Simulates the user’s responses to the given queries.

Parameters

queries (Query or List[Query]) – A query or a list of queries for which the user’s response(s) is/are requested.

Returns

A list of user responses where each response corresponds to the query in the queries.

Note: The return type is always a list, even if the input is a single query.

Return type

List

response_logprobabilities(query: aprel.learning.data_types.Query) → numpy.array¶

Returns the log probability for each response in the response set for the query under the user.

Parameters

query (Query) – The query for which the log-probabilites are being calculated.

Returns

An array, where each entry is the log-probability of the corresponding response: in the query’s response set.

Return type

numpy.array

response_probabilities(query: aprel.learning.data_types.Query) → numpy.array¶

Returns the probability for each response in the response set for the query under the user.

Parameters

query (Query) – The query for which the probabilites are being calculated.

Returns

An array, where each entry is the probability of the corresponding response in: the query’s response set.

Return type

numpy.array