aprel.learning package¶
aprel.learning.belief_models module¶
This file contains Belief classes, which store and update the belief distributions about the user whose reward function is being learned.
- TODO
GaussianBelief class will be implemented so that the library will include the following work: E. Biyik, N. Huynh, M. J. Kochenderger, D. Sadigh; “Active Preference-Based Gaussian Process Regression for Reward Learning”, RSS’20.
- class aprel.learning.belief_models.Belief¶
Bases:
object
An abstract class for Belief distributions.
- update(data: Union[aprel.learning.data_types.QueryWithResponse, List[aprel.learning.data_types.QueryWithResponse]], **kwargs)¶
Updates the belief distribution with a given feedback or a list of feedbacks.
- class aprel.learning.belief_models.LinearRewardBelief¶
Bases:
aprel.learning.belief_models.Belief
An abstract class for Belief distributions for the problems where reward function is assumed to be a linear function of the features.
- property mean: Dict¶
Returns the mean parameters with respect to the belief distribution.
- class aprel.learning.belief_models.SamplingBasedBelief(user_model: aprel.learning.user_models.User, dataset: List[aprel.learning.data_types.QueryWithResponse], initial_point: Dict, logprior: Callable = <function uniform_logprior>, num_samples: int = 100, **kwargs)¶
Bases:
aprel.learning.belief_models.LinearRewardBelief
A class for sampling based belief distributions.
In this model, the entire dataset of user feedback is stored and used for calculating the true posterior value for any given set of parameters. A set of parameter samples are then sampled from this true posterior using Metropolis-Hastings algorithm.
- Parameters
logprior (Callable) – The logarithm of the prior distribution over the user parameters.
user_model (User) – The user response model that will be assumed by this belief distribution.
dataset (List[QueryWithResponse]) – A list of user feeedbacks.
initial_point (Dict) – An initial set of user parameters for Metropolis-Hastings to start.
logprior – The logarithm of the prior distribution over the user parameters. Defaults to a uniform distribution over the hyperball.
num_samples (int) – The number of parameter samples that will be sampled using Metropolis-Hastings.
**kwargs –
Hyperparameters for Metropolis-Hastings, which include:
burnin (int): The number of initial samples that will be discarded to remove the correlation with the initial parameter set.
thin (int): Once in every thin sample will be kept to reduce the autocorrelation between the samples.
proposal_distribution (Callable): The proposal distribution for the steps in Metropolis-Hastings.
- dataset¶
A list of user feeedbacks.
- Type
List[QueryWithResponse]
- num_samples¶
The number of parameter samples that will be sampled using Metropolis-Hastings.
- Type
int
- sampling_params¶
Hyperparameters for Metropolis-Hastings, which include:
burnin (int): The number of initial samples that will be discarded to remove the correlation with the initial parameter set.
thin (int): Once in every thin sample will be kept to reduce the autocorrelation between the samples.
proposal_distribution (Callable): The proposal distribution for the steps in Metropolis-Hastings.
- Type
Dict
- create_samples(initial_point: Dict) → Tuple[List[Dict], List[float]]¶
Samples num_samples many user parameters from the posterior using Metropolis-Hastings.
- Parameters
initial_point (Dict) – initial point to start the chain for Metropolis-Hastings.
- Returns
List[Dict]: dictionaries where each dictionary is a sample of user parameters.
List[float]: float values where each entry is the log-probability of the corresponding sample.
- Return type
2-tuple
- property mean: Dict¶
Returns the mean of the belief distribution by taking the mean over the samples generated by Metropolis-Hastings.
- update(data: Union[aprel.learning.data_types.QueryWithResponse, List[aprel.learning.data_types.QueryWithResponse]], initial_point: Optional[Dict] = None)¶
Updates the belief distribution based on the new feedback (query-response pairs), by adding these to the current dataset and then re-sampling with Metropolis-Hastings. :param data: one or more QueryWithResponse, which
contains multiple trajectory options and the index of the one the user selected as most optimal
- Parameters
initial_point (Dict) – the initial point to start Metropolis-Hastings from, will be set to the mean from the previous distribution if None
aprel.learning.data_types module¶
Modules for queries and user responses.
- TODO
OrdinalQuery classes will be implemented so that the library will include ordinal data, which was used for reward learning in: K. Li, M. Tucker, E. Biyik, E. Novoseller, J. W. Burdick, Y. Sui, D. Sadigh, Y. Yue, A. D. Ames; “ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes”, ICRA’21.
- class aprel.learning.data_types.Demonstration(trajectory: aprel.basics.trajectory.Trajectory, query: Optional[aprel.learning.data_types.DemonstrationQuery] = None)¶
Bases:
aprel.learning.data_types.QueryWithResponse
The trajectory generated by the DemonstrationQuery, along with the DemonstrationQuery that prompted the user with the initial state.
For preference-based reward learning initialized with demonstrations, this class should be used (without actually querying the user). First, the demonstration should be collected as a
Trajectory
object. Then, aDemonstration
instance should be created with this trajectory without specifying the query parameter, in which case it is automatically assigned as the initial state of the trajectory.- Parameters
trajectory (Trajectory) – The demonstrated trajectory.
query (DemonstrationQuery) – The query that led to the trajectory, i.e., the initial state of the trajectory.
- trajectory¶
The demonstrated trajectory.
- Type
- features¶
The features of the demonstrated trajectory.
- Type
numpy.array
- Raises
AssertionError – if the initial state of the trajectory does not match with the query.
- class aprel.learning.data_types.DemonstrationQuery(initial_state: numpy.array)¶
Bases:
aprel.learning.data_types.Query
A demonstration query is one where the initial state is given to the user, and they are asked to control the robot.
Although not practical for optimization, this class is defined for coherence with other query types.
- Parameters
initial_state (numpy.array) – The initial state of the environment.
- initial_state¶
The initial state of the environment.
- Type
numpy.array
- class aprel.learning.data_types.FullRanking(query: aprel.learning.data_types.FullRankingQuery, response: List[int])¶
Bases:
aprel.learning.data_types.QueryWithResponse
A Full Ranking feedback.
Contains the
FullRankingQuery
the user responded to and the response.- Parameters
query (FullRankingQuery) – The query for which the feedback was given.
response (numpy.array) – The response of the user to the query, indices from the most preferred to the least.
- response¶
The response of the user to the query, indices from the most preferred to the least.
- Type
numpy.array
- Raises
AssertionError – if the response is not in the response set of the query.
- class aprel.learning.data_types.FullRankingQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])¶
Bases:
aprel.learning.data_types.Query
A full ranking query is one where the user is presented with multiple trajectories and asked for a ranking from their most preferred trajectory to the least.
- Parameters
slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.
- K¶
The number of trajectories in the query.
- Type
int
- response_set¶
The set of possible responses to the query, which is all
K
-combinations of the trajectory indices in the slate.- Type
numpy.array
- Raises
AssertionError – if slate has less than 2 trajectories.
- property slate: aprel.basics.trajectory.TrajectorySet¶
Returns a
TrajectorySet
of the trajectories in the query.
- visualize(delay: float = 0.0) → List[int]¶
Visualizes the query and interactively asks for a response.
- Parameters
delay (float) – The waiting time between each trajectory visualization in seconds.
- Returns
The response of the user, as a list from the most preferred to the least.
- Return type
List[int]
- class aprel.learning.data_types.Preference(query: aprel.learning.data_types.PreferenceQuery, response: int)¶
Bases:
aprel.learning.data_types.QueryWithResponse
A Preference feedback.
Contains the
PreferenceQuery
the user responded to and the response.- Parameters
query (PreferenceQuery) – The query for which the feedback was given.
response (int) – The response of the user to the query.
- response¶
The response of the user to the query.
- Type
int
- Raises
AssertionError – if the response is not in the response set of the query.
- class aprel.learning.data_types.PreferenceQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])¶
Bases:
aprel.learning.data_types.Query
A preference query is one where the user is presented with multiple trajectories and asked for their favorite among them.
- Parameters
slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.
- K¶
The number of trajectories in the query.
- Type
int
- response_set¶
The set of possible responses to the query.
- Type
numpy.array
- Raises
AssertionError – if slate has less than 2 trajectories.
- property slate: aprel.basics.trajectory.TrajectorySet¶
Returns a
TrajectorySet
of the trajectories in the query.
- visualize(delay: float = 0.0) → int¶
Visualizes the query and interactively asks for a response.
- Parameters
delay (float) – The waiting time between each trajectory visualization in seconds.
- Returns
The response of the user.
- Return type
int
- class aprel.learning.data_types.Query¶
Bases:
object
An abstract parent class that is useful for typing.
A query is a question to the user.
- copy()¶
Returns a deep copy of the query.
- visualize(delay: float = 0.0)¶
Visualizes the query, i.e., asks it to the user.
- Parameters
delay (float) – The waiting time between each trajectory visualization in seconds.
- class aprel.learning.data_types.QueryWithResponse(query: aprel.learning.data_types.Query)¶
Bases:
object
An abstract parent class that is useful for typing.
An instance of this class holds both the query and the user’s response to that query.
- Parameters
query (Query) – The query.
- class aprel.learning.data_types.WeakComparison(query: aprel.learning.data_types.WeakComparisonQuery, response: int)¶
Bases:
aprel.learning.data_types.QueryWithResponse
A Weak Comparison feedback.
Contains the
WeakComparisonQuery
the user responded to and the response.- Parameters
query (WeakComparisonQuery) – The query for which the feedback was given.
response (int) – The response of the user to the query.
- response¶
The response of the user to the query.
- Type
int
- Raises
AssertionError – if the response is not in the response set of the query.
- class aprel.learning.data_types.WeakComparisonQuery(slate: Union[aprel.basics.trajectory.TrajectorySet, List[aprel.basics.trajectory.Trajectory]])¶
Bases:
aprel.learning.data_types.Query
A weak comparison query is one where the user is presented with two trajectories and asked for their favorite among them, but also given an option to say ‘they are about equal’.
- Parameters
slate (TrajectorySet or List[Trajectory]) – The set of trajectories that will be presented to the user.
- K¶
The number of trajectories in the query. It is always equal to 2 and kept for consistency with
PreferenceQuery
andFullRankingQuery
.- Type
int
- response_set¶
The set of possible responses to the query, which is always equal to [-1, 0, 1] where -1 represents the About Equal option.
- Type
numpy.array
- Raises
AssertionError – if slate does not have exactly 2 trajectories.
- property slate: aprel.basics.trajectory.TrajectorySet¶
Returns a
TrajectorySet
of the trajectories in the query.
- visualize(delay: float = 0.0) → int¶
Visualizes the query and interactively asks for a response.
- Parameters
delay (float) – The waiting time between each trajectory visualization in seconds.
- Returns
The response of the user.
- Return type
int
- aprel.learning.data_types.isinteger(input: str) → bool¶
Returns whether input is an integer.
- Note
This function returns False if input is a string of a float, e.g., ‘3.0’.
- TODO
Should this go to utils?
- Parameters
input (str) – The string to be checked for being an integer.
- Returns
True if the
input
is an integer, False otherwise.- Return type
bool
- Raises
AssertionError – if the input is not a string.
aprel.learning.user_models module¶
Modules for user response models, including human users.
- class aprel.learning.user_models.HumanUser(delay: float = 0.0)¶
Bases:
aprel.learning.user_models.User
Human user class whose response model is unknown. This class is useful for interactive runs, where a real human responds to the queries rather than simulated user models.
- Parameters
delay (float) – The waiting time between each trajectory visualization during querying in seconds.
- delay¶
The waiting time between each trajectory visualization during querying in seconds.
- Type
float
- respond(queries: Union[aprel.learning.data_types.Query, List[aprel.learning.data_types.Query]]) → List¶
Interactively asks for the user’s responses to the given queries.
- Parameters
queries (Query or List[Query]) – A query or a list of queries for which the user’s response(s) is/are requested.
- Returns
- A list of user responses where each response corresponds to the query in the
queries
. - Note
The return type is always a list, even if the input is a single query.
- A list of user responses where each response corresponds to the query in the
- Return type
List
- class aprel.learning.user_models.SoftmaxUser(params_dict: Dict)¶
Bases:
aprel.learning.user_models.User
Softmax user class whose response model follows the softmax choice rule, i.e., when presented with multiple trajectories, this user chooses each trajectory with a probability that is proportional to the expontential of the reward of that trajectory.
- Parameters
params_dict (Dict) – the parameters of the softmax user model, which are: - weights (numpy.array): the weights of the linear reward function. - beta (float): rationality coefficient for comparisons and rankings. - beta_D (float): rationality coefficient for demonstrations. - delta (float): the perceivable difference parameter for weak comparison queries.
- Raises
AssertionError – if a weights parameter is not provided in the
params_dict
.
- loglikelihood(data: aprel.learning.data_types.QueryWithResponse) → float¶
Overwrites the parent’s method. See
User
for more information.- Note
The loglikelihood value is the logarithm of the unnormalized likelihood if the input is a demonstration. Otherwise, it is the exact loglikelihood.
- response_logprobabilities(query: aprel.learning.data_types.Query) → numpy.array¶
Overwrites the parent’s method. See
User
for more information.
- reward(trajectories: Union[aprel.basics.trajectory.Trajectory, aprel.basics.trajectory.TrajectorySet]) → Union[float, numpy.array]¶
Returns the reward of a trajectory or a set of trajectories conditioned on the user.
- Parameters
trajectories (Trajectory or TrajectorySet) – The trajectories for which the reward will be calculated.
- Returns
the reward value of the
trajectories
conditioned on the user.- Return type
numpy.array or float
- class aprel.learning.user_models.User(params_dict: Optional[Dict] = None)¶
Bases:
object
An abstract class to model the user of which the reward function is being learned.
- Parameters
params_dict (Dict) – parameters of the user model.
- copy()¶
- likelihood(data: aprel.learning.data_types.QueryWithResponse) → float¶
Returns the likelihood of the given user feedback under the user.
- Parameters
data (QueryWithResponse) – The data (which keeps a query and a response) for which the likelihood is going to be calculated.
- Returns
The likelihood of
data
under the user.- Return type
float
- likelihood_dataset(dataset: List[aprel.learning.data_types.QueryWithResponse]) → float¶
Returns the likelihood of the given feedback dataset under the user.
- Parameters
dataset (List[QueryWithResponse]) – The dataset (which keeps a list of feedbacks) for which the likelihood is going to be calculated.
- Returns
The likelihood of
dataset
under the user.- Return type
float
- loglikelihood(data: aprel.learning.data_types.QueryWithResponse) → float¶
Returns the loglikelihood of the given user feedback under the user.
- Parameters
data (QueryWithResponse) – The data (which keeps a query and a response) for which the loglikelihood is going to be calculated.
- Returns
The loglikelihood of
data
under the user.- Return type
float
- loglikelihood_dataset(dataset: List[aprel.learning.data_types.QueryWithResponse]) → float¶
Returns the loglikelihood of the given feedback dataset under the user.
- Parameters
dataset (List[QueryWithResponse]) – The dataset (which keeps a list of feedbacks) for which the loglikelihood is going to be calculated.
- Returns
The loglikelihood of
dataset
under the user.- Return type
float
- property params¶
Returns the parameters of the user.
- respond(queries: Union[aprel.learning.data_types.Query, List[aprel.learning.data_types.Query]]) → List¶
Simulates the user’s responses to the given queries.
- Parameters
queries (Query or List[Query]) – A query or a list of queries for which the user’s response(s) is/are requested.
- Returns
- A list of user responses where each response corresponds to the query in the
queries
. - Note
The return type is always a list, even if the input is a single query.
- A list of user responses where each response corresponds to the query in the
- Return type
List
- response_logprobabilities(query: aprel.learning.data_types.Query) → numpy.array¶
Returns the log probability for each response in the response set for the query under the user.
- Parameters
query (Query) – The query for which the log-probabilites are being calculated.
- Returns
- An array, where each entry is the log-probability of the corresponding response
in the
query
’s response set.
- Return type
numpy.array
- response_probabilities(query: aprel.learning.data_types.Query) → numpy.array¶
Returns the probability for each response in the response set for the query under the user.
- Parameters
query (Query) – The query for which the probabilites are being calculated.
- Returns
- An array, where each entry is the probability of the corresponding response in
the
query
’s response set.
- Return type
numpy.array