Применение дескрипторного подхода и трехмерного гауссова расщепления для визуальной локализации в динамическом окружении внутри и вне помещений

M. Мохрат; Г. К. Сидоров; Д. Д. Гридусов; С. А. Колюбин

doi:10.17586/0021-3454-2025-68-9-781-791

Применение дескрипторного подхода и трехмерного гауссова расщепления для визуальной локализации в динамическом окружении внутри и вне помещений

M. Мохрат, Г. К. Сидоров, Д. Д. Гридусов, С. А. Колюбин

https://doi.org/10.17586/0021-3454-2025-68-9-781-791

Полный текст:

PDF (Rus)

сгенерировать QR код

Аннотация

Робастная визуальная локализация в реальных условиях остается сложной задачей, особенно в присутствии динамических объектов и временных дистракторов. Несмотря на то, что нейронные представления сцен, такие как 3D Gaussian Splatting (3DGS) и NeRF, обеспечивают компактное кодирование геометрии и внешнего вида сцены, они чувствительны к предположению о статичности мира из-за зависимости от фотометрической согласованности. Представлен робастный фреймворк визуальной локализации, использующий 3DGS с семантически-осведомленной маскировкой для повышения точности в динамических сценах. Предлагаемый подход основан на GSplatLoc и представляет собой двухэтапный конвейер: на первом этапе плотные и легковесные дескрипторы ключевых точек, полученные из сети XFeat, интегрируются в представление 3DGS, что позволяет эффективно выполнять 2D-3D сопоставление для грубой оценки позы. Для снижения влияния динамических дистракторов используются семантические маски, сгенерированные предварительно обученными диффузионными моделями, для исключения непоследовательных областей при построении 3D-сцены. На втором этапе начальная поза уточняется с использованием фотометрической функции выравнивания на основе рендеринга. Эксперименты на динамических наборах данных в помещениях и на открытом воздухе демонстрируют, что предложенный метод превосходит базовое решение в сложных динамических условиях.

Ключевые слова

локализация, гауссово расщепление, нейросетевая модель

Об авторах

M. Мохрат

Университет ИТМО
Россия

Малик Мохрат — аспирант; факультет систем управления и робототехники

Санкт-Петербург

Г. К. Сидоров

Университет ИТМО
Россия

Геннадий Константинович Сидоров — магистрантфакультет систем управления и робототехники

Санкт-Петербург

и робототехники; E-mail: gksidorov@itmo.ru

Д. Д. Гридусов

Университет ИТМО
Россия

Денис Дмитриевич Гридусов — бакалавр; факультет систем управления и робототехники

Санкт-Петербург

С. А. Колюбин

Университет ИТМО
Россия

Сергей Алексеевич Колюбин — д-р техн. наук, профессор; факультет систем управления и робототехники

Санкт-Петербург

Список литературы

1. Dong Z., Zhang G., Jia J.. Bao H.. Keyframe-based real-time camera tracking // IEEE 12th Intern. Conf. on Computer Vision. Sept. 2009. P. 1538–1545. DOI: 10.1109/ICCV.2009.5459273.

2. Heng L. et al. Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a MultiCamera Syste // Intern. Conf. on Robotics and Automation (ICRA), May 2019. P. 4695–4702. DOI: 10.1109/ICRA.2019.8793949.

3. Mildenhall B., Srinivasan P. P., Tancik M., Barron J. T., Ramamoorthi R., Ng R. NeRF: representing scenes as neural radiance fields for view synthesis // Commun. ACM. 2022. Vol. 65, N 1. P. 99–106. DOI: 10.1145/3503250.

4. Kerbl B., Kopanas G., Leimkühler T., and Drettakis G. 3d gaussian splatting for real-time radiance field rendering // ACM Trans Graph. 2023. Vol. 42, N 4. P. 139–1, 2023.

5. Sabour S., Vora S., Duckworth D., Krasin I., Fleet D. J., Tagliasacchi A. Robustnerf: Ignoring distractors with robust losses // Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2023. P. 20626–20636. [Электронный ресурс]: http://openaccess.thecvf.com/content/CVPR2023/html/Sabour_RobustNeRF_Ignoring_Distractors_With_Robust_Losses_CVPR_2023_paper.html, 19.05.2025.

6. Tang L.,Jia M., Wang Q., Phoo C. P., Hariharan B. Emergent correspondence from image diffusion // Adv. Neural Inf. Process. Syst. 2023. Vol. 36. P. 1363–1389.

7. Martin-Brualla R., Radwan N., Sajjadi M. S., Barron J. T., Dosovitskiy A., Duckworth D. Nerf in the wild: Neural radiance fields for unconstrained photo collections // Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. P. 7210–7219. [Электронный ресурс]: https://openaccess.thecvf.com/content/CVPR2021/html/Martin-Brualla_NeRF_in_the_Wild_Neural_Radiance_Fields_for_Unconstrained_Photo_CVPR_2021_paper.html?ref=labelbox.ghost.io.

8. Ren W., Zhu Z., Sun B., Chen J., Pollefeys M., Peng S. Nerf on-the-go: Exploiting uncertainty for distractor-free nerfs in the wild // Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2024. P. 8931–8940. [Электронный ресурс]: https://openaccess.thecvf.com/content/CVPR2024/html/Ren_NeRF_On-the-go_Exploiting_Uncertainty_for_Distractor-free_NeRFs_in_the_Wild_CVPR_2024_paper.html

9. Oquab M. et al. DINOv2: Learning Robust Visual Features without Supervision. Feb. 02, 2024, arXiv: arXiv:2304.07193. DOI: 10.48550/arXiv.2304.07193.

10. Dahmani H., Bennehar M., Piasco N., Roldão L., Tsishkou D. SWAG: Splatting in the Wild Images with AppearanceConditioned Gaussians // Computer Vision — ECCV 2024; Lecture Notes in Computer Science. 2025. Vol. 15134. P. 325–340. DOI: 10.1007/978-3-031-73116-7_19.

11. Zhang D., Wang C., Wang W., Li P., Qin M., Wang H. Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections // Computer Vision — ECCV 2024; Lecture Notes in Computer Science. 2025. Vol. 15134. P. 341–359. DOI: 10.1007/978-3-031-73116-7_20.

12. Wang Y., Wang J., and Qi Y. WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections. arXiv: arXiv:2406.02407. DOI: 10.48550/arXiv.2406.02407.

13. Zhou Q., Maximov M., Litany O., Leal-Taixé L. The NeRFect Match: Exploring NeRF Features for Visual Localization // Computer Vision — ECCV 2024; Lecture Notes in Computer Science. 2025. Vol. 15082. P. 108–127. DOI: 10.1007/978-3-031-72691-0_7.

14. Sabour S. et al. SpotLessSplats: Ignoring Distractors in 3D Gaussian Splatting // ACM Trans. Graph. 2025. Vol. 44, N 2. P. 1–11. DOI: 10.1145/3727143.

15. Chen S., Li X., Wang Z., Prisacariu V. A. DFNet: Enhance Absolute Pose Regression with Direct Feature Matching // Computer Vision — ECCV 2022; Lecture Notes in Computer Science. 2022. Vol. 13670. P. 1–17. DOI: 10.1007/9783-031-20080-9_1.

16. Chen S. et al. Neural refinement for absolute pose regression with feature synthesis // Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2024. P. 20987–20996. [Электронный ресурс]: http://openaccess.thecvf.com/content/CVPR2024/html/Chen_Neural_Refinement_for_Absolute_Pose_Regression_with_Feature_Synthesis_CVPR_2024_paper.html

17. Yen-Chen L., Florence P., Barron J. T., Rodriguez A., Isola P., Lin T.-Y. Inerf: Inverting neural radiance fields for pose estimation // IEEE/RSJ Intern. Conf. on Intelligent Robots and Systems (IROS), IEEE. 2021. P. 1323–1330. [Электронный ресурс]: https://ieeexplore.ieee.org/abstract/document/9636708/

18. Kobayashi S., Matsumoto E., Sitzmann V. Decomposing nerf for editing via feature field distillation // Adv. Neural Inf. Process. Syst. 2022. Vol. 35. P. 23311–23330.

19. Tschernezki V., Laina I., Larlus D., Vedaldi A. Neural feature fusion fields: 3d distillation of self-supervised 2d image representations // Intern. Conf. on 3D Vision (3DV), IEEE. 2022. P. 443–453. [Электронный ресурс]: https://ieeexplore.ieee.org/abstract/document/10044452/

20. Zhao B., Yang L., Mao M., Bao H., Cui Z. PNeRFLoc: Visual localization with point-based neural radiance fields // Proceedings of the AAAI Conf. on Artificial Intelligence. 2024. P. 7450–7459. [Электронный ресурс]: https://ojs.aaai.org/index.php/AAAI/article/view/28576

21. Sun Y. et al. iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching // arXiv: arXiv:2312.09031. DOI: 10.48550/arXiv.2312.09031.

22. Botashev K., Pyatov V., Ferrer G., Lefkimmiatis S. GSLoc: Visual Localization with 3D Gaussian Splatting // IEEE/ RSJ Intern. Conf. on Intelligent Robots and Systems (IROS), IEEE. 2024. P. 5664–5671. [Электронный ресурс]: https://ieeexplore.ieee.org/abstract/document/10801919/

23. DeTone D., Malisiewicz T., Rabinovich A. Superpoint: Self-supervised interest point detection and description // Proc. of the IEEE conf. on computer vision and pattern recognition workshops. 2018. P. 224–236. [Электронный ресурс]: https://openaccess.thecvf.com/content_cvpr_2018_workshops/w9/html/DeTone_SuperPoint_Self-Supervised_Interest_CVPR_2018_paper.html

24. Dusmanu M. et al. D2-net: A trainable cnn for joint description and detection of local features // Proc. of the ieee/cvf conf. on computer vision and pattern recognition. 2019. P. 8092–8101. [Электронный ресурс]: http://openaccess.thecvf.com/content_CVPR_2019/html/Dusmanu_D2-Net_A_Trainable_CNN_for_Joint_Description_and_Detection_of_CVPR_2019_paper.html

25. Revaud J., De Souza C., Humenberger M., Weinzaepfel P. R2d2: Reliable and repeatable detector and descriptor // Adv. Neural Inf. Process. Syst. 2019. Vol. 32. [Электронный ресурс]: https://proceedings.neurips.cc/paper/2019/hash/3198dfd0aef271d22f7bcddd6f12f5cb-Abstract.html

26. Lindenberger P., Sarlin P.-E., Pollefeys M. Lightglue: Local feature matching at light speed // Proc. of the IEEE/CVF Intern. Conf. on Computer Vision. 2023. P. 17627–17638. [Электронный ресурс]: http://openaccess.thecvf.com/content/ICCV2023/html/Lindenberger_LightGlue_Local_Feature_Matching_at_Light_Speed_ICCV_2023_paper.html

27. Sun J., Shen Z., Wang Y., Bao H., Zhou X. LoFTR: Detector-free local feature matching with transformers // Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2021. P. 8922–8931. [Электронный ресурс]: http://openaccess.thecvf.com/content/CVPR2021/html/Sun_LoFTR_Detector-Free_Local_Feature_Matching_With_Transformers_CVPR_2021_paper.html

28. Potje G., Cadar F., Araujo A., Martins R., Nascimento E. R. Xfeat: Accelerated features for lightweight image matching // Proc. of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2024. P. 2682–2691. [Электронный ресурс]: http://openaccess.thecvf.com/content/CVPR2024/html/Potje_XFeat_Accelerated_Features_for_Lightweight_Image_Matching_CVPR_2024_paper.html

29. Lindenberger P., Sarlin P.-E., Larsson V., Pollefeys M. Pixel-perfect structure-from-motion with featuremetric refinement // Proc. of the IEEE/CVF Intern. Conference on Computer Vision. 2021. P. 5987–5997. [Электронный ресурс]: http://openaccess.thecvf.com/content/ICCV2021/html/Lindenberger_Pixel-Perfect_Structure-From-Motion_With_Featuremetric_Refinement_ICCV_2021_paper.html

30. Sidorov G., Mohrat M., Gridusov D., Rakhimov R., Kolyubin S. GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization. arXiv: arXiv:2409.16502. DOI: 10.48550/arXiv.2409.16502.

31. Zhou S. et al. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields // Proc. of the IEEE/ CVF Conf. on Computer Vision and Pattern Recognition. 2024. P. 21676–21685. [Электронный ресурс]: http://openaccess.thecvf.com/content/CVPR2024/html/Zhou_Feature_3DGS_Supercharging_3D_Gaussian_Splatting_to_Enable_Distilled_Feature_CVPR_2024_paper.html

32. Liu H.-T. D., Williams F., Jacobson A., Fidler S., Litany O. Learning Smooth Neural Functions via Lipschitz Regularization // Special Interest Group on Computer Graphics and Interactive Techniques. Conf. Proc., Vancouver, Canada, Aug. 2022. P. 1–13. DOI: 10.1145/3528233.3530713.

33. Shavit Y., Ferens R., Keller Y. Learning multi-scene absolute pose regression with transformers // Proc. of the IEEE/ CVF Intern. Conf. on Computer Vision. 2021. P. 2733–2742. [Электронный ресурс]: http://openaccess.thecvf.com/content/ICCV2021/html/Shavit_Learning_Multi-Scene_Absolute_Pose_Regression_With_Transformers_ICCV_2021_paper.html

Рецензия

Для цитирования:

Мохрат M., Сидоров Г.К., Гридусов Д.Д., Колюбин С.А. Применение дескрипторного подхода и трехмерного гауссова расщепления для визуальной локализации в динамическом окружении внутри и вне помещений. Известия высших учебных заведений. Приборостроение. 2025;68(9):781-791. https://doi.org/10.17586/0021-3454-2025-68-9-781-791

For citation:

Mohrat M., Sidorov G.K., Gridusov D.D., Kolyubin S.A. Grounding Keypoint Descriptors into 3D-Gaussian Splatting for Visual Localization in Dynamic Indoor/Outdoor Environments. Journal of Instrument Engineering. 2025;68(9):781-791. (In Russ.) https://doi.org/10.17586/0021-3454-2025-68-9-781-791

ISSN 0021-3454 (Print)
ISSN 2500-0381 (Online)

Логин
Пароль
	Запомнить меня
Регистрация нового пользователя Забыли Ваш пароль?

Войти

Известия высших учебных заведений. Приборостроение

Применение дескрипторного подхода и трехмерного гауссова расщепления для визуальной локализации в динамическом окружении внутри и вне помещений

Полный текст:

Аннотация

Ключевые слова

Об авторах

Список литературы

Рецензия

Для цитирования:

For citation:

Использование куки-файлов