General constructor.
The single argument specifies the name (with path if not in the current directory) of the text file with the structural parameters of the CMAC, such as the number of tilings and the number of tiles along each dimension for each tiling. It is assumed that the static variable State::dimensionality has been set to the appropriate value before the call to this constructor. The text file must have the following format (example for two state variables):
TilingsNumber = 3
Number of tiles along each dimension:
Tiling 0
25 25
Tiling 1
25 25
Tiling 2
25 25
The lines following Tiling n list the number of tiles along each dimension for the corresponding nth tiling.
Sets bounds on input variables. Argument left is an array with lower bounds and right is array with upper bounds.
Usage: CMAC::setInputBounds(left, right)
static void deleteInputBounds()
Deallocates memory used for bounds on input variables.
Usage: CMAC::deleteInputBounds()
int getSize()
Implements the corresponding pure virtual function of the base class Approximator. Returns total number of tunable parameters in this cmac architecture.
void predict(const State& s, double& output)
Implements the corresponding pure virtual function of the base class Approximator. Predicts an output value for a given input s.
void learn(const State& s, const double target)
Implements the corresponding pure virtual function of the base class Approximator. Learns an input-output pair, where input is state s and target is passed by target argument. An appropriate update to a tunable parametr is found by gradient descent on Mean Squared Error criteria and then multiplied by the learning rate and eligibility trace of this parameter. The learnng rate is updated automatically according to the schedule specified as the argument to setLearningParameters function. By default, all eligibility traces are initiated to 1 and will stay like this if not explicitly changed by the user code by means of the appropriate functions described below. If learning algorithm with eligibility traces is desired (e.g. replacing or accumulating traces), eligibility traces have to be updated separately, with appropriate functions.
Implements the corresponding pure virtual function of the base class Approximator. Compute the gradient vector w.r.t. architecture parameters at input s and current parameter values. The computed gradient values are returned with argument GradientVector. The user code has to make sure that this array has the correct size, namely the number of tunable parameters in this architecture. This number can be obrained with getSize function.
void updateParameters(double* delta)
Implements the corresponding pure virtual function of the base class Approximator. Increase tunable parameters by amounts in delta array multiplied by appropriate learning step and eligibility trace for each parameter.
Implements the corresponding pure virtual function of the base class Approximator. Set eligibility traces of tunable parameters, activated by input state s, to the value replace.
void decayTraces(double factor)
Implements the corresponding pure virtual function of the base class Approximator. Multiply all eligibility traces by factor.
Implements the corresponding pure virtual function of the base class Approximator. Increase traces of the parameters, activated by input state s, by amount.
Implements the corresponding pure virtual function of the base class Approximator. Loads parameters of the architecture from a text file. Parameters to this function have a command-line-like format, where argc is number of supplied arguments in array argv. In this case 1 argument is expected (argc=1) and argv[0] must be the name of the file from which parameters are to be read.
Implements the corresponding pure virtual function of the base class Approximator. Saves parameters of the architecture to a text file. Parameters to this function have a command-line-like format, where argc is number of supplied arguments in array argv. In this case 1 argument is expected (argc=1) and argv[0] must be the name of the file to which parameters are to be read.
Implements the corresponding pure virtual function of the base class Approximator. Sets learning parameters. Parameters to this function have a command-line-like format, where argc is number of supplied arguments in array argv.
Format and meaning of the parameters passed in "argv":
schedule=value : schedule for the learning rates. Acceptable values are: constant, decrease and visitation.
alpha=value : initial value for learning rates (in (0,1])
f=value : frequency of decreasing learning rates with decrease schedule (in terms of the number of seen training examples).
d=value : factor (>1) by which learning rates should be decreased with decrease schedule.
v=value : constant used in the visitation schedule for the learning rate decrease: v/(v+number of visits to the parameter).
decay=value : decay factor for eligibility traces (usually set directly by RL agent functions, since it depends on discounted factor gamma) .
These parameters may be specified as command-line arguments to main(int argc, char *argv[]) and then passed directly to this function.
static void helpLearningParameters()
Print out the format for command-line specification of the learning parameters.
Usage: CMAC::helpLearningParameters();