End-to-end visuomotor policies trained using behavior cloning have shown a remarkable ability to generate complex, multi-modal low-level robot behaviors. However, at deployment time, these policies still struggle to act reliably when faced with out-of-distribution (OOD) visuals induced by objects, backgrounds, or environment changes. Prior works in interactive imitation learning solicit corrective expert demonstrations under the OOD conditions—but this can be costly and inefficient. We observe that task success under OOD conditions does not always warrant novel robot behaviors. In-distribution (ID) behaviors can directly be transferred to OOD conditions that share functional similarities with ID conditions. For example, behaviors trained to interact with in-distribution (ID) pens can apply to interacting with a visually-OOD pencil. The key challenge lies in disambiguating which ID observations functionally correspond to the OOD observation for the task at hand. We propose that an expert can provide this OOD-to-ID functional correspondence. Thus, instead of collecting new demonstrations and re-training at every OOD encounter, our method: (1) detects the need for feedback by checking if current observations are OOD and the most similar training observations show divergent behaviors (2) solicits functional correspondence feedback to disambiguate between those behaviors, and (3) intervenes on the OOD observations with the functionally corresponding ID observations to perform deployment-time generalization. We validate our method across diverse real-world robotic manipulation tasks with a Franka Panda robotic manipulator. Our results show that test-time functional correspondences can improve the generalization of a vision-based diffusion policy to OOD objects and environment conditions with low feedback.
Adapting by Analogy consists of four key phases. (left) First, we run a fast OOD detector by checking the cosine similarity between the current observation and the training observations. (center, top-left) Given a correspondence description l, we establish OOD-to-ID functional correspondences to retrieve corresponding ID observations (center, bottom). We refine the correspondances with the expert as long as there is ambiguity in the predicted behavior mode (center, top-right). Once finalized, we intervene on the observations and execute the planned actions (right).
The training demonstrations for our two tasks, with their sub-goals(A, B, C). For the object in cup task, the pen is grasped below the center-of-mass, and is dropped into the mug from the front. The marker is grasped above the center-of-mass and is dropped into the mug from the bottom. For the sweep trash task, paper (i.e., recycling) is swept up, and M$Ms (i.e., organic) is swept down.
Vanilla Policy: Marker in cup, Success
ABA: Marker in cup, Success
Vanilla Policy: Pen in cup, Failure *Vanilla Policy picks the incorrect mode, the pen needed to be dropped into the cup from the top*
ABA: Pen in cup, Success
Vanilla Policy: Marker in cup, Failure *Vanilla Policy fails to grasp the marker*
ABA: Marker in Cup, Success
Vanilla Policy: Pen in cup, Failure *Vanilla Policy fails to grasp the pen*
ABA: Pen in cup, Success
Vanilla Policy: Pencil in cup, Failure *Vanilla Policy picks the incorrect mode, the pen needed to be dropped into the cup from the top*
ABA Pencil in cup, Success
Vanilla Policy: Battery in cup, Failure *Vanilla Policy picks the incorrect mode, the battery needed to be dropped into the cup from the top*
ABA Battery in cup, Success
Vanilla Policy: Jenga Block in cup, Failure *Vanilla Policy picks the incorrect mode, the jenga-block needed to be dropped into the cup from the top*
ABA: Jenga Block in cup, Success
Vanilla Policy: Sweep Paper
ABA: Sweep Paper, Success
Vanilla Policy: Sweep M&Ms
ABA: Sweep M&Ms, Success
Vanilla Policy: Sweep Paper, Success
ABA: Sweep Paper, Success
Vanilla Policy: Sweep M&Ms, Failure *Vanilla Policy picked the incorrect mode, and sweeps m&ms as recycling trash.
ABA: Sweep M&Ms, Success
Vanilla Policy: Sweep napkin, Failure *Vanilla Policy picked the incorrect mode, and sweeps napkin towards organic trash*
ABA Sweep napkin, Success
Vanilla Policy: Sweep doritos, Failure *Vanilla Policy picked the incorrect mode, and sweeps doritos towards recycling trash*
ABA: Sweep doritos, Success
Vanilla Policy: Sweep Thumb Tacks, Failure*Vanilla Policy picked the incorrect mode, and sweeps thumb tacks towards organic trash*
ABA: Sweep Thumb Tacks, Success
The plot here compares the L2 distance between the action predicted using the most aligned retrieval and the action predicted after aggregating M retrieval, for the ID environments for the Object in Cup task. The difference stabilizes as M increases. Note that the policy predicts a 16 (timestep) * 10 (dimension) action.
BibTex Coming Soon