Adapting by Analogy

Abstract

End-to-end visuomotor policies trained using behavior cloning have shown a remarkable ability to generate complex, multi-modal low-level robot behaviors. However, at deployment time, these policies still struggle to act reliably when faced with out-of-distribution (OOD) visuals induced by objects, backgrounds, or environment changes. Prior works in interactive imitation learning solicit corrective expert demonstrations under the OOD conditions—but this can be costly and inefficient. We observe that task success under OOD conditions does not always warrant novel robot behaviors. In-distribution (ID) behaviors can directly be transferred to OOD conditions that share functional similarities with ID conditions. For example, behaviors trained to interact with in-distribution (ID) pens can apply to interacting with a visually-OOD pencil. The key challenge lies in disambiguating which ID observations functionally correspond to the OOD observation for the task at hand. We propose that an expert can provide this OOD-to-ID functional correspondence. Thus, instead of collecting new demonstrations and re-training at every OOD encounter, our method: (1) detects the need for feedback by checking if current observations are OOD and the most similar training observations show divergent behaviors (2) solicits functional correspondence feedback to disambiguate between those behaviors, and (3) intervenes on the OOD observations with the functionally corresponding ID observations to perform deployment-time generalization. We validate our method across diverse real-world robotic manipulation tasks with a Franka Panda robotic manipulator. Our results show that test-time functional correspondences can improve the generalization of a vision-based diffusion policy to OOD objects and environment conditions with low feedback.

Adapting By Analogy

Adapting by Analogy consists of four key phases. (left) First, we run a fast OOD detector by checking the cosine similarity between the current observation and the training observations. (center, top-left) Given a correspondence description l, we establish OOD-to-ID functional correspondences to retrieve corresponding ID observations (center, bottom). We refine the correspondances with the expert as long as there is ambiguity in the predicted behavior mode (center, top-right). Once finalized, we intervene on the observations and execute the planned actions (right).

Training Demonstrations in ID environments

The training demonstrations for our two tasks, with their sub-goals(A, B, C). For the object in cup task, the pen is grasped below the center-of-mass, and is dropped into the mug from the front. The marker is grasped above the center-of-mass and is dropped into the mug from the bottom. For the sweep trash task, paper (i.e., recycling) is swept up, and M$Ms (i.e., organic) is swept down.

Qualitative Videos for Object in Cup on In-Distribution Environments

Vanilla Policy: Marker in cup, Success

ABA: Marker in cup, Success

Vanilla Policy: Pen in cup, Failure *Vanilla Policy picks the incorrect mode, the pen needed to be dropped into the cup from the top*

ABA: Pen in cup, Success

Qualitative Videos for Object in Cup on Out-of-Distribution Environments with Novel Backgrounds

Vanilla Policy: Marker in cup, Failure *Vanilla Policy fails to grasp the marker*

ABA: Marker in Cup, Success

Vanilla Policy: Pen in cup, Failure *Vanilla Policy fails to grasp the pen*

ABA: Pen in cup, Success

Qualitative Videos for Object in Cup on Out-of-Distribution Environments with novel objects

Vanilla Policy: Pencil in cup, Failure *Vanilla Policy picks the incorrect mode, the pen needed to be dropped into the cup from the top*

ABA Pencil in cup, Success

Vanilla Policy: Battery in cup, Failure *Vanilla Policy picks the incorrect mode, the battery needed to be dropped into the cup from the top*

ABA Battery in cup, Success

Vanilla Policy: Jenga Block in cup, Failure *Vanilla Policy picks the incorrect mode, the jenga-block needed to be dropped into the cup from the top*

ABA: Jenga Block in cup, Success

Qualitative Videos for Sweep Trash on In-Distribution Environments

Vanilla Policy: Sweep Paper

ABA: Sweep Paper, Success

Vanilla Policy: Sweep M&Ms

ABA: Sweep M&Ms, Success

Qualitative Videos for Sweep Trash on Out-of-Distribution Environments with Novel Backgrounds

Vanilla Policy: Sweep Paper, Success

ABA: Sweep Paper, Success

Vanilla Policy: Sweep M&Ms, Failure *Vanilla Policy picked the incorrect mode, and sweeps m&ms as recycling trash.

ABA: Sweep M&Ms, Success

Qualitative Videos for Sweep Trash on Out-of-Distribution Environments with novel objects

Vanilla Policy: Sweep napkin, Failure *Vanilla Policy picked the incorrect mode, and sweeps napkin towards organic trash*

ABA Sweep napkin, Success

Vanilla Policy: Sweep doritos, Failure *Vanilla Policy picked the incorrect mode, and sweeps doritos towards recycling trash*

ABA: Sweep doritos, Success

Vanilla Policy: Sweep Thumb Tacks, Failure*Vanilla Policy picked the incorrect mode, and sweeps thumb tacks towards organic trash*

ABA: Sweep Thumb Tacks, Success

Sensitivity to number of aggregated observations (Hyperparameter M)

The plot here compares the L2 distance between the action predicted using the most aligned retrieval and the action predicted after aggregating M retrieval, for the ID environments for the Object in Cup task. The difference stabilizes as M increases. Note that the policy predicts a 16 (timestep) * 10 (dimension) action.

All rollout videos for our experiments on ABA, used to compute the quantitative numbers.

Link to videos

BibTeX

BibTex Coming Soon

Adapting by Analogy: OOD Generalization of Visuomoter Policies via Functional Correspondences

We present Adapting by Analogy, a test-time method that uses functional correspondences between deployment and training conditions to improve a diffusion policy’s performance in OOD conditions.

Abstract