AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment

Abstract

Single-view RGB model-based object pose estimation methods achieve strong generalization but are fundamentally limited by depth ambiguity, clutter, and occlusions. Multi-view pose estimation methods have the potential to solve these issues, but existing works rely on precise single-view pose estimates or lack generalization to unseen objects. We address these challenges via the following three contributions. First, we introduce AlignPose, a 6D object pose estimation method that aggregates information from multiple extrinsically calibrated RGB views and does not require any object-specific training or symmetry annotation. Second, the key component of this approach is a new multi-view feature-metric refinement specifically designed for object pose. It optimizes a single, consistent world-frame object pose minimizing the feature discrepancy between on-the-fly rendered object features and observed image features across all views simultaneously. Third, we report extensive experiments on four datasets (YCB-V, T-LESS, ITODD-MV, HouseCat6D) using the BOP benchmark evaluation and show that AlignPose outperforms other published methods, especially on challenging industrial datasets where multiple views are readily available in practice.

Qualitative Results

Input image

Input pose candidates

Our refined poses

YCB-V

HouseCat6D

T-LESS

BibTeX

@misc{AlignPose2025,
      title={AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment}, 
      author={Anna Šárová Mikeštíková and Médéric Fourmy and Martin Cífka and Josef Sivic and Vladimir Petrik},
      year={2025},
      eprint={2512.20538},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.20538}, 
}

Acknowledgements

This work was supported by the European Union’s Horizon Europe projects AGIMUS (No. 101070165), euROBIN (No. 101070596), ERC FRONTIER (No. 101097822), and ELLIOT (No. 101214398). It was further supported by the Grant Agency of the Czech Technical University in Prague (SGS25/152/OHK3/3T/13). Compute resources and infrastructure were supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254) and by the European Union’s Horizon Europe project CLARA (No. 101136607).