aprel.learning package

aprel.learning.belief_models module

This file contains Belief classes, which store and update the belief distributions about the user whose reward function is being learned.

TODO

GaussianBelief class will be implemented so that the library will include the following work: E. Biyik, N. Huynh, M. J. Kochenderger, D. Sadigh; “Active Preference-Based Gaussian Process Regression for Reward Learning”, RSS’20.

class aprel.learning.belief_models.Belief

Bases: object

An abstract class for Belief distributions.

update(data: Union[aprel.learning.data_types.QueryWithResponse, List[aprel.learning.data_types.QueryWithResponse]], **kwargs)

Updates the belief distribution with a given feedback or a list of feedbacks.

class aprel.learning.belief_models.LinearRewardBelief

Bases: aprel.learning.belief_models.Belief

An abstract class for Belief distributions for the problems where reward function is assumed to be a linear function of the features.

property mean: Dict

Returns the mean parameters with respect to the belief distribution.

class aprel.learning.belief_models.SamplingBasedBelief(user_model: aprel.learning.user_models.User, dataset: List[aprel.learning.data_types.QueryWithResponse], initial_point: Dict, logprior: Callable = <function uniform_logprior>, num_samples: int = 100, **kwargs)

Bases: aprel.learning.belief_models.LinearRewardBelief

A class for sampling based belief distributions.

In this model, the entire dataset of user feedback is stored and used for calculating the true posterior value for any given set of parameters. A set of parameter samples are then sampled from this true posterior using Metropolis-Hastings algorithm.

Parameters
  • logprior (Callable) – The logarithm of the prior distribution over the user parameters.

  • user_model (User) – The user response model that will be assumed by this belief distribution.

  • dataset (List[QueryWithResponse]) – A list of user feeedbacks.

  • initial_point (Dict) – An initial set of user parameters for Metropolis-Hastings to start.

  • logprior – The logarithm of the prior distribution over the user parameters. Defaults to a uniform distribution over the hyperball.

  • num_samples (int) – The number of parameter samples that will be sampled using Metropolis-Hastings.

  • **kwargs

    Hyperparameters for Metropolis-Hastings, which include:

    • burnin (int): The number of initial samples that will be discarded to remove the correlation with the initial parameter set.

    • thin (int): Once in every thin sample will be kept to reduce the autocorrelation between the samples.

    • proposal_distribution (Callable): The proposal distribution for the steps in Metropolis-Hastings.

user_model

The user response model that is assumed by the belief distribution.

Type

User

dataset

A list of user feeedbacks.

Type

List[QueryWithResponse]

num_samples

The number of parameter samples that will be sampled using Metropolis-Hastings.

Type

int

sampling_params

Hyperparameters for Metropolis-Hastings, which include:

  • burnin (int): The number of initial samples that will be discarded to remove the correlation with the initial parameter set.

  • thin (int): Once in every thin sample will be kept to reduce the autocorrelation between the samples.

  • proposal_distribution (Callable): The proposal distribution for the steps in Metropolis-Hastings.

Type

Dict

create_samples(initial_point: Dict)Tuple[List[Dict], List[float]]

Samples num_samples many user parameters from the posterior using Metropolis-Hastings.

Parameters

initial_point (Dict) – initial point to start the chain for Metropolis-Hastings.

Returns

  • List[Dict]: dictionaries where each dictionary is a sample of user parameters.

  • List[float]: float values where each entry is the log-probability of the corresponding sample.

Return type

2-tuple

property mean: Dict

Returns the mean of the belief distribution by taking the mean over the samples generated by Metropolis-Hastings.

update(data: Union[aprel.learning.data_types.QueryWithResponse, List[aprel.learning.data_types.QueryWithResponse]], initial_point: Optional[Dict] = None)

Updates the belief distribution based on the new feedback (query-response pairs), by adding these to the current dataset and then re-sampling with Metropolis-Hastings. :param data: one or more QueryWithResponse, which

contains multiple trajectory options and the index of the one the user selected as most optimal

Parameters

initial_point (Dict) – the initial point to start Metropolis-Hastings from, will be set to the mean from the previous distribution if None

aprel.learning.data_types module

Modules for queries and user responses.

TODO

OrdinalQuery classes will be implemented so that the library will include ordinal data, which was used for reward learning in: K. Li, M. Tucker, E. Biyik, E. Novoseller, J. W. Burdick, Y. Sui, D. Sadigh, Y. Yue, A. D. Ames; “ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes”, ICRA’21.

class aprel.learning.data_types.Demonstration(trajectory: aprel.basics.trajectory.Trajectory, query: Optional[aprel.learning.data_types.DemonstrationQuery] = None)

Bases: aprel.learning.data_types.QueryWithResponse

The trajectory generated by the DemonstrationQuery, along with the DemonstrationQuery that prompted the user with the initial state.

For preference-based reward learning initialized with demonstrations, this class should be used (without actually querying the user). First, the demonstration should be collected as a Trajectory object. Then, a Demonstration instance should be created with this trajectory without specifying the query parameter, in which case it is automatically assigned as the initial state of the trajectory.

Parameters
  • trajectory (Trajectory) – The demonstrated trajectory.

  • query (DemonstrationQuery) – The query that led to the trajectory, i.e., the initial state of the trajectory.

trajectory

The demonstrated trajectory.

Type

Trajectory

features

The features of the demonstrated trajectory.

Type

numpy.array

Raises

AssertionError – if the initial state of the trajectory does not match with the query.

class aprel.learning.data_types.DemonstrationQuery(initial_state: numpy.array)

Bases: aprel.learning.data_types.Query

A demonstration query is one where the initial state is given to the user, and they are asked to control the robot.

Although not practical for optimization, this class is defined for coherence with other query types.

Parameters

initial_state (numpy.array) – The initial state of the environment.

initial_state

The initial state of the environment.

Type

numpy.array

class aprel.learning.data_types.FullRanking(query: aprel.learning.data_types.FullRankingQuery, response: List[int])

Bases: aprel.learning.data_types.QueryWithResponse

A Full Ranking feedback.

Contains the FullRankingQuery the user responded to and the response.

Parameters
  • query (FullRankingQuery) – The query for which the feedback was given.

  • response (numpy.array) – The response of the user to the query, indices from the most preferred to the least.

response

The response of the user to the query, indices from the most preferred to the least.

Type

numpy.array

Raises

AssertionError – if the response is not in the response set of the query.

class aprel.learning.data_types.FullRankingQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])

Bases: aprel.learning.data_types.Query

A full ranking query is one where the user is presented with multiple trajectories and asked for a ranking from their most preferred trajectory to the least.

Parameters

slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.

K

The number of trajectories in the query.

Type

int

response_set

The set of possible responses to the query, which is all K-combinations of the trajectory indices in the slate.

Type

numpy.array

Raises

AssertionError – if slate has less than 2 trajectories.

property slate: aprel.basics.trajectory.TrajectorySet

Returns a TrajectorySet of the trajectories in the query.

visualize(delay: float = 0.0)List[int]

Visualizes the query and interactively asks for a response.

Parameters

delay (float) – The waiting time between each trajectory visualization in seconds.

Returns

The response of the user, as a list from the most preferred to the least.

Return type

List[int]

class aprel.learning.data_types.Preference(query: aprel.learning.data_types.PreferenceQuery, response: int)

Bases: aprel.learning.data_types.QueryWithResponse

A Preference feedback.

Contains the PreferenceQuery the user responded to and the response.

Parameters
  • query (PreferenceQuery) – The query for which the feedback was given.

  • response (int) – The response of the user to the query.

response

The response of the user to the query.

Type

int

Raises

AssertionError – if the response is not in the response set of the query.

class aprel.learning.data_types.PreferenceQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])

Bases: aprel.learning.data_types.Query

A preference query is one where the user is presented with multiple trajectories and asked for their favorite among them.

Parameters

slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.

K

The number of trajectories in the query.

Type

int

response_set

The set of possible responses to the query.

Type

numpy.array

Raises

AssertionError – if slate has less than 2 trajectories.

property slate: aprel.basics.trajectory.TrajectorySet

Returns a TrajectorySet of the trajectories in the query.

visualize(delay: float = 0.0)int

Visualizes the query and interactively asks for a response.

Parameters

delay (float) – The waiting time between each trajectory visualization in seconds.

Returns

The response of the user.

Return type

int

class aprel.learning.data_types.Query

Bases: object

An abstract parent class that is useful for typing.

A query is a question to the user.

copy()

Returns a deep copy of the query.

visualize(delay: float = 0.0)

Visualizes the query, i.e., asks it to the user.

Parameters

delay (float) – The waiting time between each trajectory visualization in seconds.

class aprel.learning.data_types.QueryWithResponse(query: aprel.learning.data_types.Query)

Bases: object

An abstract parent class that is useful for typing.

An instance of this class holds both the query and the user’s response to that query.

Parameters

query (Query) – The query.

query

The query.

Type

Query

class aprel.learning.data_types.WeakComparison(query: aprel.learning.data_types.WeakComparisonQuery, response: int)

Bases: aprel.learning.data_types.QueryWithResponse

A Weak Comparison feedback.

Contains the WeakComparisonQuery the user responded to and the response.

Parameters
  • query (WeakComparisonQuery) – The query for which the feedback was given.

  • response (int) – The response of the user to the query.

response

The response of the user to the query.

Type

int

Raises

AssertionError – if the response is not in the response set of the query.

class aprel.learning.data_types.WeakComparisonQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])

Bases: aprel.learning.data_types.Query

A weak comparison query is one where the user is presented with two trajectories and asked for their favorite among them, but also given an option to say ‘they are about equal’.

Parameters

slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.

K

The number of trajectories in the query. It is always equal to 2 and kept for consistency with PreferenceQuery and FullRankingQuery.

Type

int

response_set

The set of possible responses to the query, which is always equal to [-1, 0, 1] where -1 represents the About Equal option.

Type

numpy.array

Raises

AssertionError – if slate does not have exactly 2 trajectories.

property slate: aprel.basics.trajectory.TrajectorySet

Returns a TrajectorySet of the trajectories in the query.

visualize(delay: float = 0.0)int

Visualizes the query and interactively asks for a response.

Parameters

delay (float) – The waiting time between each trajectory visualization in seconds.

Returns

The response of the user.

Return type

int

aprel.learning.data_types.isinteger(input: str)bool

Returns whether input is an integer.

Note

This function returns False if input is a string of a float, e.g., ‘3.0’.

TODO

Should this go to utils?

Parameters

input (str) – The string to be checked for being an integer.

Returns

True if the input is an integer, False otherwise.

Return type

bool

Raises

AssertionError – if the input is not a string.

aprel.learning.user_models module

Modules for user response models, including human users.

class aprel.learning.user_models.HumanUser(delay: float = 0.0)

Bases: aprel.learning.user_models.User

Human user class whose response model is unknown. This class is useful for interactive runs, where a real human responds to the queries rather than simulated user models.

Parameters

delay (float) – The waiting time between each trajectory visualization during querying in seconds.

delay

The waiting time between each trajectory visualization during querying in seconds.

Type

float

respond(queries: Union[aprel.learning.data_types.Query, List[aprel.learning.data_types.Query]])List

Interactively asks for the user’s responses to the given queries.

Parameters

queries (Query or List[Query]) – A query or a list of queries for which the user’s response(s) is/are requested.

Returns

A list of user responses where each response corresponds to the query in the queries.
Note

The return type is always a list, even if the input is a single query.

Return type

List

class aprel.learning.user_models.SoftmaxUser(params_dict: Dict)

Bases: aprel.learning.user_models.User

Softmax user class whose response model follows the softmax choice rule, i.e., when presented with multiple trajectories, this user chooses each trajectory with a probability that is proportional to the expontential of the reward of that trajectory.

Parameters

params_dict (Dict) – the parameters of the softmax user model, which are: - weights (numpy.array): the weights of the linear reward function. - beta (float): rationality coefficient for comparisons and rankings. - beta_D (float): rationality coefficient for demonstrations. - delta (float): the perceivable difference parameter for weak comparison queries.

Raises

AssertionError – if a weights parameter is not provided in the params_dict.

loglikelihood(data: aprel.learning.data_types.QueryWithResponse)float

Overwrites the parent’s method. See User for more information.

Note

The loglikelihood value is the logarithm of the unnormalized likelihood if the input is a demonstration. Otherwise, it is the exact loglikelihood.

response_logprobabilities(query: aprel.learning.data_types.Query)numpy.array

Overwrites the parent’s method. See User for more information.

reward(trajectories: Union[aprel.basics.trajectory.Trajectory, aprel.basics.trajectory.TrajectorySet])Union[float, numpy.array]

Returns the reward of a trajectory or a set of trajectories conditioned on the user.

Parameters

trajectories (Trajectory or TrajectorySet) – The trajectories for which the reward will be calculated.

Returns

the reward value of the trajectories conditioned on the user.

Return type

numpy.array or float

class aprel.learning.user_models.User(params_dict: Optional[Dict] = None)

Bases: object

An abstract class to model the user of which the reward function is being learned.

Parameters

params_dict (Dict) – parameters of the user model.

copy()
likelihood(data: aprel.learning.data_types.QueryWithResponse)float

Returns the likelihood of the given user feedback under the user.

Parameters

data (QueryWithResponse) – The data (which keeps a query and a response) for which the likelihood is going to be calculated.

Returns

The likelihood of data under the user.

Return type

float

likelihood_dataset(dataset: List[aprel.learning.data_types.QueryWithResponse])float

Returns the likelihood of the given feedback dataset under the user.

Parameters

dataset (List[QueryWithResponse]) – The dataset (which keeps a list of feedbacks) for which the likelihood is going to be calculated.

Returns

The likelihood of dataset under the user.

Return type

float

loglikelihood(data: aprel.learning.data_types.QueryWithResponse)float

Returns the loglikelihood of the given user feedback under the user.

Parameters

data (QueryWithResponse) – The data (which keeps a query and a response) for which the loglikelihood is going to be calculated.

Returns

The loglikelihood of data under the user.

Return type

float

loglikelihood_dataset(dataset: List[aprel.learning.data_types.QueryWithResponse])float

Returns the loglikelihood of the given feedback dataset under the user.

Parameters

dataset (List[QueryWithResponse]) – The dataset (which keeps a list of feedbacks) for which the loglikelihood is going to be calculated.

Returns

The loglikelihood of dataset under the user.

Return type

float

property params

Returns the parameters of the user.

respond(queries: Union[aprel.learning.data_types.Query, List[aprel.learning.data_types.Query]])List

Simulates the user’s responses to the given queries.

Parameters

queries (Query or List[Query]) – A query or a list of queries for which the user’s response(s) is/are requested.

Returns

A list of user responses where each response corresponds to the query in the queries.
Note

The return type is always a list, even if the input is a single query.

Return type

List

response_logprobabilities(query: aprel.learning.data_types.Query)numpy.array

Returns the log probability for each response in the response set for the query under the user.

Parameters

query (Query) – The query for which the log-probabilites are being calculated.

Returns

An array, where each entry is the log-probability of the corresponding response

in the query’s response set.

Return type

numpy.array

response_probabilities(query: aprel.learning.data_types.Query)numpy.array

Returns the probability for each response in the response set for the query under the user.

Parameters

query (Query) – The query for which the probabilites are being calculated.

Returns

An array, where each entry is the probability of the corresponding response in

the query’s response set.

Return type

numpy.array