Existing methods may fail when explicit physical cues (e.g., boundaries and reflections) are not reliable. In the top row, the center region has glass-like boundaries, and existing methods wrongly predict this region as glass. In the bottom row, as the smaller window on the right does not have obvious reflection, it can be difficult to detect. Our model can accurately detect glass surfaces in both situations, which are not distracted by regions with glass-like boundaries through learning semantics. It generalizes well even to glass regions without obvious reflection.
News
We further improve our proposed model for better performances. Please check the v2 branch of our code for more details. We also revise some minor issues (e.g., typos) in the paper. Please check the revised version for the latest version.
We visualize the features representing glass surfaces and semantics in the above figure. The semantic backbone is able to approximate the semantic context from segmentation. In later levels, the model can disambiguate and gradually localize the target area accurately with refinement. One may say that the semantic backbone would likely suffice for the glass surface detection task with further fine-tuning. Note that the backbone is meant for general context aggregation and not designed for the specific objective concerning glass surfaces. We believe that the proposed model is able to learn the semantics of glass surfaces and generalize well to unseen scenes.
Abstract
Glass surfaces are omnipresent in our daily lives and often go unnoticed by the majority of us. While humans are generally able to infer their locations and thus avoid collisions, it can be difficult for current object detection systems to handle them due to the transparent nature of glass surfaces. Previous methods approached the problem by extracting global context information to obtain priors such as object boundaries and reflections. However, their performances cannot be guaranteed when these deterministic features are not available. We observe that humans often reason through the semantic context of the environment, which offers insights into the categories of and proximity between entities that are expected to appear in the surrounding. For example, the odds of co-occurrence of glass windows with walls and curtains are generally higher than that with other objects such as cars and trees, which have relatively less semantic relevance.
Based on this observation, we propose a model ('GlassSemNet') that integrates the contextual relationship of the scenes for glass surface detection with two novel modules: (1) Scene Aware Activation (SAA) Module to adaptively filter critical channels with respect to spatial and semantic features, and (2) Context Correlation Attention (CCA) Module to progressively learn the contextual correlations among objects both spatially and semantically. In addition, we propose a large-scale glass surface detection dataset named Glass Surface Detection - Semantics('GSD-S'), which contains 4,519 real-world RGB glass surface images from diverse real-world scenes with detailed annotations for both glass surface detection and semantic segmentation. Experimental results show that our model outperforms contemporary works, especially with 42.6% MAE improvement on our proposed GSD-S dataset.
BibTeX
@article{neurips2022:gsds2022,
author = {Lin, Jiaying and Yeung, Yuen-Hei and Lau, Rynson W.H.},
title = {Exploiting Semantic Relations for Glass Surface Detection},
journal = {NeurIPS},
year = {2022},
}