Back to Introduction
Bibliography
Potential Fields
Potential fields as applied to robot navigation are an alternative to
older, planning-intensive conceptions of navigation. Rather than
directing motion with rules (i.e. "If the goal is to the right, then
right motor gets 10 cm/sec right rotation"), or using a map to
directly determine motor commands, potential fields navigation is a fast,
continuous method of directing motion. A potential field is
composed of vectors. The motor commands of the robot at any position
in a potential field
correspond to the vector on which the robot is situated. Goals
attract, and thus the goals will have vectors pointing towards them;
obstacles repulse, and will be surrounded by vectors pointing away.
Forces can be of constant magnitude on every vector in a system, or operate on a
gradient. Many potential fields impletations have goal forces exert a
constant force across the entire gradient, while obstacles will exert
greater outward force the nearer the vector is to the obstacle. The
vector at any given position is computed by summing all the forces
exerted on that vector. In a simple world with one goal and a single
obstacle, the vectors will generally point directly towards the goal
except in the area around the obstacle. A robot would move through
this world towards the goal by following vectors primarily influenced
by the goal position. If the robot encountered an obstacle, it would
be pushed away from the obstacle by vectors primarily influenced by
the obstacle, until it moved around the obstacle and could again move
towards the goal (Dudek and Jenkin 138-140).
It may seem at first that this method could be slow and require
substantial internal state, as it appears that vectors must
be computed based on a map of the entire region. But a potential
fields method is can be easily adapted to a reactive system.
There is no real need to factor in forces outside of the robot's
perceptual range, though true potential fields require computing vectors
based on the entire field (Dudek and Jenkin 141). A goal outside the perceptual
range may need to be
maintained, but obstacles that the robot cannot sense do not
neccessarily need to affect the motion vectors. Thus the vector accompanying the
current position of the robot can be computed using only the sensory
information from a single time step. This makes this method reactive,
in that it requires no internal state, and allows the potential field
to be computed quite quickly. In the reactive formulation of the
potential fields method, no real potential field is computed, only the
vector at the robot's actual position.
Back To Top
Motor Schema
The potential fields method is sufficient to get good behavior when
the environment consists of only simple attractive and repulsing forces,
but is not designed to support more complicated and
situationally-dependent forces. The motor
schema method is an attempt to create a more general, behavior-based,
conception of the potential fields method. A motor scheme consists of
an activation portion that examines the sensory data, and effector
portion that computes an action vector based on that stimuli. The
effector portion gives an action vector based on a function that
corresponds to the particular goal of the motor scheme (Arkin 141-4). For instance,
an avoid-obstacle scheme might examine sonar data for especially low
sonar readings, and give an action response that guides the robot away
from the obstacle. Each motor scheme will examine a part of the
sensor data, and respond in an appropriate manner. Generally, a motor
schema will have a continuous response, meaning that there can be an
infinite number of action vectors for each motor scheme (Arkin 142). In the
avoid-obstacle scheme above, an obstacle 2 meters away might result in
a low magnitude vector pushing away from the obstacle, and a obstacle
30 centimeters away might result in a much stronger push, following a
continuous range.
A robot operating on a motor schema based system may have many
schemes, each of which examines a part of the sensory data and gives
an action vector. In this way sensory information from a variety of
different modalities, such as vision, ir, sonar, and bump sensors, can
be integrated into motor command decisions. A vision-based navigation
system can examine camera data in its activation portion, and give a motor
vector based on percieved distance to nearest obstacle, while the sonar data
can also filter to motor commands in a similar fashion.
The behavior of the robot is computed by summing
all of the action vectors together and acting according to the result.
Thus all the schema are blended together in the end (Arkin 150).
If a single set of schema doesn't seem sufficient to get the desired behavior,
motor schema can be clumped together into true
behaviors, where a behavior consists of a number of different schema.
A wall-following behavior could consist of a goal-push scheme that
would push the robot beyond the wall, an obstacle-avoidance scheme
that would push the robot away from the wall, and a scheme to turn the
robot in one direction or the other when encountering a
wall. In a true behavior-based system there would be many such
behaviors, each consisting of a variety of schemas. A single behavior
would be in effect at any one time, and the decision as to which
behavior to have functioning would be made by a behavior sequencer (Arkin 162).
Back To Top
Motor Schema Versus
Subsumption
Comparison with the more widely know subsumption architecture can
highlight some of the distinctive aspects of a motor schema
architecture. The subsumptive style of behavior-based architecture
was pioneered by Rodney Brooks in attempt to find a fast, low- or
no-state architecture for robot control. Older AI methods largely
consisted of a sense-plan-act style of control, which was slow and
often could not adapt to the rapidly changing and inhospitable
real-world environments. Subsumption architectures consist of a
behavioral hierarchy, with each behavioral running simultaneously and
in parallel. As in a motor schema approach, each behavior has
independent access to the sensory data, and uses that data to
formulate an action to be taken, or it may decide to recommend no
action at all But unlike the motor schema approach, a subsumption
style architecture creates a hierarchy of behaviors, consisting of
low-level behaviors that have no knowledge of the higher level
behaviors. Coordination of behaviors occurs according to the priority
hierarchy, in a winner-take-all approach. Thus if a more complex,
high-level behavior decides to act, it suppresses and subsumes the
lower-level behavior, and the motor values at that time step will be
those recommended by the highest-priority active layer. In the motor
schema approach the action outputs of each of the schemes is blended
together, whereas in subsumption one behavior controls the motor
outputs at each time step. In a subsumption architecture, the robot
is doing one thing at once; in a motor schema the schema are thoroughly
blended (Arkin 139-142).
Back To Top
Layered
Architectures
Motor schema can be used to create a successful reactive system,
and can do impressive things operating exclusively within a reactive
framework, especially when combined with an able sequencer.
Yet the motor schema architecture is completely reactive,
which can ultimately limit its function in important ways. Reactive
architectures are fast, cheap, and durable, but many complex tasks
seem impossible to solve without planning and keeping an internal
state, both of which are expressly non-reactive.
A so-called hybrid architecture attempts to combine planning and
reaction to harness the power of both methods. The most pertinent
hybrid architectures for this paper are layer architectures that
consist of a fast lower-level reactive layer, and a slower, high-level
deliberative layer. The deliberative layer examines an often
extensive internal state, creates plans, and passes the plan on to the
reactive level, which then autonomously executes it. Thus the layered
architecture attempts to let the reactive level do what it does best,
which is negotiate with a hostile and rapidly changing environment,
while also giving the capability of solving problems that seem to
require planning and evaluation of an internal state (Arkin 206-7).
Many different types of layered hybrid architectures have been
implemented, but the basics concepts have remained constant. Sensory
information passes from the low-level sensors, and updates an internal
state kept by the deliberative layer. The sensory information also is
used by the reactive layer to create the actual motor responses to the
environment. The planning layer examines the accumulated sensory
information in real time, and at any point can suggest a change in the
wider goals
of the system. Many layered architectures will include a middle-layer
that is responsible for taking the goals established by the
deliberative level and translating those goals into sequences of
smaller actions that the reactive system then executes. Different
systems will contain more or fewer layers, and connect the layers in a
variety of interesting ways, but the basic schematic remains fairly
constant over all hybrid architectures (Arkin 213).
Back To Top
Mapping
A reactive robot does not keep an internal state, as reactivity means
responding to the environment based only on the sensory input at that
moment. The deliberative layer of a layered architecture, however,
needs to accumulate data over time in order to make planning
decisions. A primary method of organizing sensory data concerning the
shape of a world is through a map of that world. Constructing a map
requires that sensory data about the position of things in the world
be collected over time, and organized into a consistent
representation. There are many types of maps that can be constructed
using sensory information, as the semsory information can be accumulated using
a number of different methadologies. Some maps
attempt to represent the space in absolute metric terms, and others
try to represent it using shapes, or even creating graphs that
represent spaces and the connections between them.
Back To Top
Spatial Occupancy Grids
Spatial occupancy grids are an example of a model that attempts to
represent the world in absolute terms without trying to identify any
individual objects. A spatial occupancy grid consists of a two-dimensional
grid
of cells, each cell corresponding to a small region of the world. Each
cell contains a value that identifies some property of its occupancy,
whether or not the cell is occupied by an object. In the spatial
occupancy representation of most interest to this paper, evidence
grids, the cells each contain a probability that the cell is
occupied. A cell that is almost certainly occupied will contain a
high value, and one that is most certainly empty will contain a low
value. As the robot moves through the world, the sensory data that it
accumulates is used to update the probabilities of the cells.
Evidence grids attempt to compensate for the noise and random errors
that often accompany sensors' interactions with the real world. A
single free reading will not cause a cell to be forever judged
empty. Only repeated, consistent readings will cause a cell's occupancy
probability to change substantially, which insures that mistaken
readings and noise will not compromise the map representation. Spatial
occupancy grids are useful because they don't depend on identifying
any particular objects or features of the world, and take a pragmatic
attitude toward the reliability if actual robot sensors. There are
drawbacks to this kind of model though. It requires substantial
processing power, needs a good localization scheme (see below) for
real accuracy, and is difficult to plan on given the giant amount of
uninterpreted data in the model (Dudek and Jenkin 214-7).
Back To Top
Geometric Representations
Geometric models move away from evidence grids, as they try to identify
coherent objects in the world.
Geometric models attempt to fit certain primitive geometric forms to
an environment. These forms may include things such as door, wall, or
street. As the robot moves through the world, the sensory data is
interpreted using a list of primitives. A primitive is designated at
a certain position in the world if the sensory data suggest that an
object of that type actually resides in the world. Thus a map is
constructed of primitives, representations of objects that try to fit
the sensor data. Geometric representations are a more compact form of
mapping than spatial occupancy grids, as there is no need to keep an
update all the individual grid cells. Because the sensor data is
being examined and interpreted into primitives, planning is easier as
well, as the inner representation of the world has been substantially
simplified (Dudek and Jenkin 219-221).
Unfortunately, the simplifications that make geometric mapping nicely
compact introduce a host of problems. One must be extremely careful
in the design of the primitives; if the world is not well described by
the available primitives, the map will probably be a poor reflection
of the actual world. Also, the programmer is left with the difficult
task of giving instructions for the interpreting of sensor data into
the primitives. It is difficult, for instance, to specify what
exactly constitutes a wall, or a door. If one is careful though, and
designs primitives that describe the world accurately, geometric
representations can be valuable.
Back To Top
Topological Representations
Both spatial occupancy grids and geometric mapping try to define the
world in absolute terms, where a grid cell or a set amount of distance
corresponds to a metric measurement. Unfortunately, sensor noise and
localization issues often skew the metric data that both these methods
need to remain accurate representations of a space Topological mapping
seeks to avoid the pitfalls of metric representation by instead focusing on
regions or objects, and the connections between them. For robot
navigation, knowing that one place is accessible from another may be
more important than precise metric knowledge of an environment. Thus, the
representations of a topological depend largely on the notion of
connectedness of regions. A topological map may look like a graph,
with vertices representing a feature of the environment, such as a
landmark (discussed below) and edges representing connections between
those vertices. Often, an aspect of metricality will be added to a
topological representation, so that vertices will be situated
relationally in a space, and edges will have length and orientation
proportional to their metric equivalents (Dudek and Jenkin 224-7).
Topological representations are best used when connectivity of
features is the most important reason for mapping the environment. If
this is the primary reason for mapping, than this information can be
held quite compactly by a topological representation, and can be
searched easily using all the standard graph search techniques. The
main difficulties that a topological representation faces involve the
creation of the representation. Creating schemes for feature
detection and identifying connectivity in non-ideal environments can
be extremely difficult. An ideal landmark for a topological scheme
is one that is static, uniquely identifiable upon repeated visits, and
can be used to deduce position and orientation. A final requirement
is that these landmarks be able to be sensed by the robot fairly
easily. These stringent requirements combine to make landmark
detection no easy task. The most complete systems use vision and
other robotic sensor systems to identify landmarks, but these systems
are subject to all the normal difficulties facing robotic sensing
systems. These systems also face the added difficulty of dealing with
sensorally homogenous environments like office buildings, which hamper
unique feature identification, and can lead to many false positive
feature recognitons. Another method of landmark creation attempts to
overcome the difficulties associated with sensing obstacles in
homogenous environments by physically placing uniquely recognizable
objects, thus creating landmarks that resemble the ideal ones above.
Finally, an easier method to engineer for landmark identification is
the doctoring of an environment with obstacles that are readily sensed
and identified, like bright patches of color or bar codes
(Dudek and Jenkin 225-7, 117).
Back To Top
Localization
All mapping methods must consider how the positional information of
the robot is maintained, as all mapping methods depend on a
consistency and accuracy of positional information. The primary
method of localization on most robots is the use of dead-reckoning.
Dead-reckoning uses the wheel encoders of the robot, which record how
many revolutions each of the wheels has gone forward and backward.
This information is used to compute a relative position of the robot.
This method does not reference the outside world in any way, hence the
name dead-reckoning. Unfortunately, the wheel encoders on any robot
are prone to error, and these errors are compounded over time if not
corrected. Thus, the robot's estimations of its own position
monotically worsen as time passes in a system that uses only
dead-reckoning. A robot that cannot reference the outside world to
correct its estimation of its own position will inevitably become so
inaccurate that maps constructed on the faulty estimation will be
virtually useless (Dudek and Jenkin 175-176).
Because the accurate estimation of position is so important for
mapping tasks, every robotic mapping system that needs accuracy
in the long term must use some method of localization, using
information from the outside world to make position estimates more
accurate. Ideally, some system like Global Positioning could be used
to determine the position of a robot to a very high-degree of absolute
accuracy, but unfortunately such systems do not work indoors. Many
methods of localization have been implemented, but the ones of most
interest for this paper involve the use of landmarks to help determine
position more accurately. If consistent, static, easily detectable
landmarks can be discovered or placed close to the starting position of the
robot, then their positions can be recorded while dead-reckoning
errors have not had time to compound. Subsequent visits to these
landmarks can then be used to localize the robot. If vision is being
used to determine landmarks, then depth and orientation information
can sometimes be deduced from obstacles that are a substantial
distance from the robot. If two or more landmarks are discovered
simultaneously, than triangulation methods can be used to determine
the position of the robot if not the orientation (Dudek and Jenkin 176-183).
Back to Introduction
Bibliography
All material on these pages was created by Gil Jones, Swarthmore
College, January 2001
Questions or Comments?
gil@sccs.swarthmore.edu
Shameless self-promotion