How can robotic systems learn to move their own bodies in purposeful ways, to gain facility in basic tasks such as moving through the environment and manipulating objects? We are investigating representations and methods with which an agent can initiate motor actions while observing the consequences of these actions in its sensory inputs, in effect learning how a given motor action changes the relationship between the agent's body and the external environment.
A long-standing problem in robotics is the realization of purposeful behaviors, i.e., the selection and execution of motor actions that advance an embodied agent toward its goals. In light of the huge number of distinct motor actions exhibitable by a humanoid mobile manipulator, systems considerations suggest that explicitly programming each action is not a scalable approach; nor would it be intellectually interesting to do so.
We propose instead to investigate the development of methods through which an agent can initiate motor actions while observing the consequences of these actions in its sensory inputs, in effect learning how a given motor action changes the relationship between the agent.s body and the external environment. The key technical challenge is to craft an appropriate representation for encoding this relationship, and develop an algorithm that can efficiently exercise the agent's motor behaviors and draw useful inferences from the observed consequences. A measure of success is the agent's ability to exploit its inferences in order to produce causal (purposeful) behaviors, for example: turning to look at a sound source; tracking an object with coordinated eye and head motion; or moving in order to approach, reach for, and grasp an object. We propose to study the questions above through the development of experimental software on a suitable humanoid mobile manipulation platform. We make a minimal set of assumptions about the motor and sensory repertoire of the agent, patterned after analogous capabilities of a human infant or toddler. Specifically we assume that the agent has a body with articulated limbs, and the ability to sense: its body-relative joint angles (kinesthetic sense); its head motion with respect to an inertial frame (vestibular sense); vision (including static and dynamic salient visual feature detectors, as well as the ability to sense motion in the periphery, and detail in the fovea); hearing (including a salient auditory feature detector); and touch (including contact force). We assume that the agent can form a persistent internal representation of any salient feature in its environment, and generate a body-relative pose estimate for that feature. Moreover, we assume that changes in sense data can be detected.
In terms of mechanisms for learning, motor babble provides a powerful source of statistical signal. At early stages, motor babble gives rise to frequent discrepancies between actual and anticipated sensory inputs following motor actions. This can help guide initial correspondences across preliminary representations of sensing modalities as well as help build associative cross-modal representations. At later stages, motor babble will become more refined involving segments of purposeful movements. This creates discrepancies that are temporally more distant and serve better to guide the estimation of causal models with the purpose of establishing expectations of future sensory inputs in response to motor actions across spatio-temporal scales. The statistical signal inherent in motor babble, across stages, initiates learning at low enough levels so as to force the segmentation of more complex motor tasks by their appropriate sensory feedback points. The resulting representations will facilitate planning on the basis of desired sensory goals rather than goals expressed in typical cartesian coordinate frames.