Lookalike3D: Seeing Double in 3D


Lookalike3D: Seeing Double in 3D

Technical University of Munich
teaser image

Abstract

3D object understanding and generation methods produce impressive results, yet they often overlook a pervasive source of infor- mation in real-world scenes: repeated objects. We introduce the task of lookalike object detection in 3D scenes, which leverages repeated and complementary cues from identical and near-identical object pairs. Given a 3D scene, the task is to classify pairs of objects as identical, similar or different using multiview images as input. To address this, we present Lookalike3D, a multiview image transformer that effectively distinguishes such object pairs by harnessing strong semantic priors from large image foundation models. To support this task, we collected the 3DTwins dataset, containing 76k manually annotated identical, similar and different pairs of objects based on ScanNet++, and show an improvement of 104% IoU over baselines. We demonstrate how our method improves downstream tasks such as enabling joint 3D object reconstruction and part co-segmentation, turning repeated and lookalike objects into a powerful cue for consistent, high-quality 3D perception.

Video

Repeated Objects in Indoor Scenes

Repeated objects are extremely common in indoor environments, for instance the three small tables in this scene. However, state of the art 3D perception and generation methods such as SAM3D objects largely handle objects independently. This results in inconsistent reconstructions in the scene. Lookalike3D explicitly detects repeated objects and enables joint reasoning across them, in this case enabling joint object reconstruction consistent across the scene.

Detecting Identical and Similar Objects

We categorize object pairs into three types: identical, similar and different. Identical pairs have exactly the same shape and appearance. Similar object pairs share semantic structure, but differ in finer-grained geometry or appearance details. Lookalike3D takes multiview images of object pairs as input, and encodes them with a pretrained DINOv2 model into multiview patch tokens. These patch tokens are passed through alternating attention layers, which operate at the single-view, multiview and global levels. We aggregate these features across the views and compare them between objects to obtain a similarity score Each object pair is then classified as identical, similar or different. We train the model using triplet and score alignment losses

3DTwins Dataset

To enable the lookalike object detection task, we collected the 3DTwins dataset, containing 76k manually annotated identical, similar and different pairs of objects based on ScanNet++. 3DTwins dataset

Joint 3D Reconstruction

SAM3D reconstructs objects independently, often producing variations in shape and scale across repeated instances, Lookalike3D enables joint reconstruction of identical objects, across their input images, producing a single clean representation of repeated objects. SAM3D also tends to produce incomplete objects for occluded views, while our method enables recovering complete object geometry. At test time, our method can operate on single image views or multiview images from a single 3D scene.

Joint Part Segmentation

P3-SAM produces fine-grained part segmentation of 3D objects, but it may undersegment some objects. We use similar object pairs with shared structure to produce consistent part segmentations. For example, structures such as fused chair arms and legs can cause segmentation errors. In these cases, our method recovers finer-grained objects parts from correctly segmented objects with similar structure.

Quantitative Results

We evaluate lookalike object detection using intersection-over-union on all three types of object pairs. Our method significantly outperforms all baselines on both ground truth and predicted instances from MaskClustering. qualitative results

BibTeX

      @misc{yeshwanth2026lookalike3d,
        title={Lookalike3D: Seeing Double in 3D}, 
        author={Chandan Yeshwanth and Angela Dai},
        year={2026},
        eprint={},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={}, 
      }