# Conceptual Background

Potential Fields

Potential fields as applied to robot navigation are an alternative to older, planning-intensive conceptions of navigation. Rather than directing motion with rules (i.e. "If the goal is to the right, then right motor gets 10 cm/sec right rotation"), or using a map to directly determine motor commands, potential fields navigation is a fast, continuous method of directing motion. A potential field is composed of vectors. The motor commands of the robot at any position in a potential field correspond to the vector on which the robot is situated. Goals attract, and thus the goals will have vectors pointing towards them; obstacles repulse, and will be surrounded by vectors pointing away. Forces can be of constant magnitude on every vector in a system, or operate on a gradient. Many potential fields impletations have goal forces exert a constant force across the entire gradient, while obstacles will exert greater outward force the nearer the vector is to the obstacle. The vector at any given position is computed by summing all the forces exerted on that vector. In a simple world with one goal and a single obstacle, the vectors will generally point directly towards the goal except in the area around the obstacle. A robot would move through this world towards the goal by following vectors primarily influenced by the goal position. If the robot encountered an obstacle, it would be pushed away from the obstacle by vectors primarily influenced by the obstacle, until it moved around the obstacle and could again move towards the goal (Dudek and Jenkin 138-140).

It may seem at first that this method could be slow and require substantial internal state, as it appears that vectors must be computed based on a map of the entire region. But a potential fields method is can be easily adapted to a reactive system. There is no real need to factor in forces outside of the robot's perceptual range, though true potential fields require computing vectors based on the entire field (Dudek and Jenkin 141). A goal outside the perceptual range may need to be maintained, but obstacles that the robot cannot sense do not neccessarily need to affect the motion vectors. Thus the vector accompanying the current position of the robot can be computed using only the sensory information from a single time step. This makes this method reactive, in that it requires no internal state, and allows the potential field to be computed quite quickly. In the reactive formulation of the potential fields method, no real potential field is computed, only the vector at the robot's actual position.

Motor Schema

The potential fields method is sufficient to get good behavior when the environment consists of only simple attractive and repulsing forces, but is not designed to support more complicated and situationally-dependent forces. The motor schema method is an attempt to create a more general, behavior-based, conception of the potential fields method. A motor scheme consists of an activation portion that examines the sensory data, and effector portion that computes an action vector based on that stimuli. The effector portion gives an action vector based on a function that corresponds to the particular goal of the motor scheme (Arkin 141-4). For instance, an avoid-obstacle scheme might examine sonar data for especially low sonar readings, and give an action response that guides the robot away from the obstacle. Each motor scheme will examine a part of the sensor data, and respond in an appropriate manner. Generally, a motor schema will have a continuous response, meaning that there can be an infinite number of action vectors for each motor scheme (Arkin 142). In the avoid-obstacle scheme above, an obstacle 2 meters away might result in a low magnitude vector pushing away from the obstacle, and a obstacle 30 centimeters away might result in a much stronger push, following a continuous range. A robot operating on a motor schema based system may have many schemes, each of which examines a part of the sensory data and gives an action vector. In this way sensory information from a variety of different modalities, such as vision, ir, sonar, and bump sensors, can be integrated into motor command decisions. A vision-based navigation system can examine camera data in its activation portion, and give a motor vector based on percieved distance to nearest obstacle, while the sonar data can also filter to motor commands in a similar fashion. The behavior of the robot is computed by summing all of the action vectors together and acting according to the result. Thus all the schema are blended together in the end (Arkin 150).

If a single set of schema doesn't seem sufficient to get the desired behavior, motor schema can be clumped together into true behaviors, where a behavior consists of a number of different schema. A wall-following behavior could consist of a goal-push scheme that would push the robot beyond the wall, an obstacle-avoidance scheme that would push the robot away from the wall, and a scheme to turn the robot in one direction or the other when encountering a wall. In a true behavior-based system there would be many such behaviors, each consisting of a variety of schemas. A single behavior would be in effect at any one time, and the decision as to which behavior to have functioning would be made by a behavior sequencer (Arkin 162).
Motor Schema Versus Subsumption

Comparison with the more widely know subsumption architecture can highlight some of the distinctive aspects of a motor schema architecture. The subsumptive style of behavior-based architecture was pioneered by Rodney Brooks in attempt to find a fast, low- or no-state architecture for robot control. Older AI methods largely consisted of a sense-plan-act style of control, which was slow and often could not adapt to the rapidly changing and inhospitable real-world environments. Subsumption architectures consist of a behavioral hierarchy, with each behavioral running simultaneously and in parallel. As in a motor schema approach, each behavior has independent access to the sensory data, and uses that data to formulate an action to be taken, or it may decide to recommend no action at all But unlike the motor schema approach, a subsumption style architecture creates a hierarchy of behaviors, consisting of low-level behaviors that have no knowledge of the higher level behaviors. Coordination of behaviors occurs according to the priority hierarchy, in a winner-take-all approach. Thus if a more complex, high-level behavior decides to act, it suppresses and subsumes the lower-level behavior, and the motor values at that time step will be those recommended by the highest-priority active layer. In the motor schema approach the action outputs of each of the schemes is blended together, whereas in subsumption one behavior controls the motor outputs at each time step. In a subsumption architecture, the robot is doing one thing at once; in a motor schema the schema are thoroughly blended (Arkin 139-142).

Layered Architectures

Motor schema can be used to create a successful reactive system, and can do impressive things operating exclusively within a reactive framework, especially when combined with an able sequencer. Yet the motor schema architecture is completely reactive, which can ultimately limit its function in important ways. Reactive architectures are fast, cheap, and durable, but many complex tasks seem impossible to solve without planning and keeping an internal state, both of which are expressly non-reactive.

A so-called hybrid architecture attempts to combine planning and reaction to harness the power of both methods. The most pertinent hybrid architectures for this paper are layer architectures that consist of a fast lower-level reactive layer, and a slower, high-level deliberative layer. The deliberative layer examines an often extensive internal state, creates plans, and passes the plan on to the reactive level, which then autonomously executes it. Thus the layered architecture attempts to let the reactive level do what it does best, which is negotiate with a hostile and rapidly changing environment, while also giving the capability of solving problems that seem to require planning and evaluation of an internal state (Arkin 206-7).

Many different types of layered hybrid architectures have been implemented, but the basics concepts have remained constant. Sensory information passes from the low-level sensors, and updates an internal state kept by the deliberative layer. The sensory information also is used by the reactive layer to create the actual motor responses to the environment. The planning layer examines the accumulated sensory information in real time, and at any point can suggest a change in the wider goals of the system. Many layered architectures will include a middle-layer that is responsible for taking the goals established by the deliberative level and translating those goals into sequences of smaller actions that the reactive system then executes. Different systems will contain more or fewer layers, and connect the layers in a variety of interesting ways, but the basic schematic remains fairly constant over all hybrid architectures (Arkin 213).

Mapping

A reactive robot does not keep an internal state, as reactivity means responding to the environment based only on the sensory input at that moment. The deliberative layer of a layered architecture, however, needs to accumulate data over time in order to make planning decisions. A primary method of organizing sensory data concerning the shape of a world is through a map of that world. Constructing a map requires that sensory data about the position of things in the world be collected over time, and organized into a consistent representation. There are many types of maps that can be constructed using sensory information, as the semsory information can be accumulated using a number of different methadologies. Some maps attempt to represent the space in absolute metric terms, and others try to represent it using shapes, or even creating graphs that represent spaces and the connections between them.

Spatial Occupancy Grids

Spatial occupancy grids are an example of a model that attempts to represent the world in absolute terms without trying to identify any individual objects. A spatial occupancy grid consists of a two-dimensional grid of cells, each cell corresponding to a small region of the world. Each cell contains a value that identifies some property of its occupancy, whether or not the cell is occupied by an object. In the spatial occupancy representation of most interest to this paper, evidence grids, the cells each contain a probability that the cell is occupied. A cell that is almost certainly occupied will contain a high value, and one that is most certainly empty will contain a low value. As the robot moves through the world, the sensory data that it accumulates is used to update the probabilities of the cells. Evidence grids attempt to compensate for the noise and random errors that often accompany sensors' interactions with the real world. A single free reading will not cause a cell to be forever judged empty. Only repeated, consistent readings will cause a cell's occupancy probability to change substantially, which insures that mistaken readings and noise will not compromise the map representation. Spatial occupancy grids are useful because they don't depend on identifying any particular objects or features of the world, and take a pragmatic attitude toward the reliability if actual robot sensors. There are drawbacks to this kind of model though. It requires substantial processing power, needs a good localization scheme (see below) for real accuracy, and is difficult to plan on given the giant amount of uninterpreted data in the model (Dudek and Jenkin 214-7).

Geometric Representations

Geometric models move away from evidence grids, as they try to identify coherent objects in the world. Geometric models attempt to fit certain primitive geometric forms to an environment. These forms may include things such as door, wall, or street. As the robot moves through the world, the sensory data is interpreted using a list of primitives. A primitive is designated at a certain position in the world if the sensory data suggest that an object of that type actually resides in the world. Thus a map is constructed of primitives, representations of objects that try to fit the sensor data. Geometric representations are a more compact form of mapping than spatial occupancy grids, as there is no need to keep an update all the individual grid cells. Because the sensor data is being examined and interpreted into primitives, planning is easier as well, as the inner representation of the world has been substantially simplified (Dudek and Jenkin 219-221).

Unfortunately, the simplifications that make geometric mapping nicely compact introduce a host of problems. One must be extremely careful in the design of the primitives; if the world is not well described by the available primitives, the map will probably be a poor reflection of the actual world. Also, the programmer is left with the difficult task of giving instructions for the interpreting of sensor data into the primitives. It is difficult, for instance, to specify what exactly constitutes a wall, or a door. If one is careful though, and designs primitives that describe the world accurately, geometric representations can be valuable.

Topological Representations

Both spatial occupancy grids and geometric mapping try to define the world in absolute terms, where a grid cell or a set amount of distance corresponds to a metric measurement. Unfortunately, sensor noise and localization issues often skew the metric data that both these methods need to remain accurate representations of a space Topological mapping seeks to avoid the pitfalls of metric representation by instead focusing on regions or objects, and the connections between them. For robot navigation, knowing that one place is accessible from another may be more important than precise metric knowledge of an environment. Thus, the representations of a topological depend largely on the notion of connectedness of regions. A topological map may look like a graph, with vertices representing a feature of the environment, such as a landmark (discussed below) and edges representing connections between those vertices. Often, an aspect of metricality will be added to a topological representation, so that vertices will be situated relationally in a space, and edges will have length and orientation proportional to their metric equivalents (Dudek and Jenkin 224-7).

Topological representations are best used when connectivity of features is the most important reason for mapping the environment. If this is the primary reason for mapping, than this information can be held quite compactly by a topological representation, and can be searched easily using all the standard graph search techniques. The main difficulties that a topological representation faces involve the creation of the representation. Creating schemes for feature detection and identifying connectivity in non-ideal environments can be extremely difficult. An ideal landmark for a topological scheme is one that is static, uniquely identifiable upon repeated visits, and can be used to deduce position and orientation. A final requirement is that these landmarks be able to be sensed by the robot fairly easily. These stringent requirements combine to make landmark detection no easy task. The most complete systems use vision and other robotic sensor systems to identify landmarks, but these systems are subject to all the normal difficulties facing robotic sensing systems. These systems also face the added difficulty of dealing with sensorally homogenous environments like office buildings, which hamper unique feature identification, and can lead to many false positive feature recognitons. Another method of landmark creation attempts to overcome the difficulties associated with sensing obstacles in homogenous environments by physically placing uniquely recognizable objects, thus creating landmarks that resemble the ideal ones above. Finally, an easier method to engineer for landmark identification is the doctoring of an environment with obstacles that are readily sensed and identified, like bright patches of color or bar codes (Dudek and Jenkin 225-7, 117).

Localization

All mapping methods must consider how the positional information of the robot is maintained, as all mapping methods depend on a consistency and accuracy of positional information. The primary method of localization on most robots is the use of dead-reckoning. Dead-reckoning uses the wheel encoders of the robot, which record how many revolutions each of the wheels has gone forward and backward. This information is used to compute a relative position of the robot. This method does not reference the outside world in any way, hence the name dead-reckoning. Unfortunately, the wheel encoders on any robot are prone to error, and these errors are compounded over time if not corrected. Thus, the robot's estimations of its own position monotically worsen as time passes in a system that uses only dead-reckoning. A robot that cannot reference the outside world to correct its estimation of its own position will inevitably become so inaccurate that maps constructed on the faulty estimation will be virtually useless (Dudek and Jenkin 175-176).

Because the accurate estimation of position is so important for mapping tasks, every robotic mapping system that needs accuracy in the long term must use some method of localization, using information from the outside world to make position estimates more accurate. Ideally, some system like Global Positioning could be used to determine the position of a robot to a very high-degree of absolute accuracy, but unfortunately such systems do not work indoors. Many methods of localization have been implemented, but the ones of most interest for this paper involve the use of landmarks to help determine position more accurately. If consistent, static, easily detectable landmarks can be discovered or placed close to the starting position of the robot, then their positions can be recorded while dead-reckoning errors have not had time to compound. Subsequent visits to these landmarks can then be used to localize the robot. If vision is being used to determine landmarks, then depth and orientation information can sometimes be deduced from obstacles that are a substantial distance from the robot. If two or more landmarks are discovered simultaneously, than triangulation methods can be used to determine the position of the robot if not the orientation (Dudek and Jenkin 176-183).

All material on these pages was created by Gil Jones, Swarthmore College, January 2001