标题：Fast Semantic Segmentation for Scene Perception
作者：Zhang, Xuetao; Chen, Zhenxue; Wu, Q. M. Jonathan; Cai, Lei; Lu, Dan; Li, Xianming
作者机构：[Zhang, Xuetao; Chen, Zhenxue; Lu, Dan; Li, Xianming] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Shandong, Peoples R China.; [Wu, Q. M. Jo 更多
通讯作者：Chen, Zhenxue;Chen, ZX
通讯作者地址：[Chen, ZX]Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Shandong, Peoples R China.
来源：IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
关键词：Convolutional neural network (CNN); real-time; ResNet; scene perception;; semantic segmentation
摘要：Semantic segmentation is a challenging problem in computer vision. Many applications, such as autonomous driving and robot navigation with urban road scene, need accurate and efficient segmentation. Most state-of-the-art methods focus on accuracy, rather than efficiency. In this paper, we propose a more efficient neural network architecture, which has fewer parameters, for semantic segmentation in the urban road scene. An asymmetric encoder-decoder structure based on ResNet is used in our model. In the first stage of encoder, we use continuous factorized block to extract low-level features. Continuous dilated block is applied in the second stage, which ensures that the model has a larger view field, while keeping the model small-scale and shallow. The down sampled features from encoder are up sampled with decoder to the same-size output as the input image and the details refined. Our model can achieve end-to-end and pixel-to-pixel training without pretraining from scratch. The parameters of our model are only 0.2M, 100x less than those of others such as SegNet, etc. Experiments are conducted on five public road scene datasets (CamVid, CityScapes, Gatech, KITTI Road Detection, and KITTI Semantic Segmentation), and the results demonstrate that our model can achieve better performance.