| Method | Split0 (30 scenes, val) | Split1 (200 scenes, train) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| AP50 | AP25 | AR50 | AR25 | mIoU | AP50 | AP25 | AR50 | AR25 | mIoU | |
| Training-based methods: | ||||||||||
| TASA-72B | 26.9 | 28.6 | — | — | 19.7 | trained on split1 | ||||
| AffordBot-72B | 20.91 | 24.76 | 18.99 | 22.84 | 14.42 | trained on split1 | ||||
| Training-free methods: | ||||||||||
| Fun3DU-9B | 16.9 | 33.3 | 38.2 | 46.7 | 15.2 | 12.6 | 23.1 | 32.9 | 40.5 | 11.5 |
| UniFunc3D-8B (Ours) | 23.82 | 44.04 | 46.07 | 55.51 | 20.92 | 16.24 | 29.02 | 38.91 | 48.15 | 14.23 |
| UniFunc3D-30B (Ours) | 31.24 | 51.01 | 46.97 | 58.88 | 24.30 | 21.32 | 35.76 | 40.03 | 51.00 | 17.09 |
AffordBot
Fun3DU
Ours
GT
"Open the top left drawer of the cabinet with the beauty products on top"




"Turn on the ceiling light"




"Control the water flow in the bathtub using the drain control dial"




"Select a washing program"




"Flush the toilet"




@InProceedings{Lin_UniFunc3D,
author = {Lin, Jiaying and Xu, Dan},
title = {UniFunc3D: Unified Active Spatial-Temporal Grounding for 3D Functionality Segmentation},
booktitle = {arXiv},
year = {2026},
}