基于高维特征表示的交通场景识别

doi:10.11959/j.issn.2096-6652.201943

摘要/Abstract

摘要：

随着智能交通的发展，快速、精确识别交通场景成为亟待解决的重要问题。目前已有许多识别方法可以提高交通场景的识别效果，但这些算法无法提取视觉概念的交通语义特征，导致识别精度低下。为此，设计了一种提取高维场景语义特征和结构信息的识别算法，以提高识别精度。为减少图像高维与低维特征表示之间的“语义鸿沟”，首先构建了一个场景类的语义描述系统，然后通过最小化损失（element-wise logistic loss）函数训练多标签分类网络，获取交通场景图像的高维特征表示，最后在4个大规模场景识别数据集上进行验证，实验结果显示，新算法在识别性能上优于其他的方法。

关键词: 场景识别, 卷积神经网络, 高维特征, 低维特征

Abstract:

With the development of intelligent transportation,it has become an urgent problem to quickly and accurately recognize complex traffic scene.In recent years,a large number of scene recognition methods have been proposed to improve the effectiveness of traffic scene recognition,however,most of these algorithms cannot extract the semantic characteristics of the concept of vision,leading to the low recognition accuracy in traffic scenes.Therefore,a novel traffic scene recognition algorithm which extracts the high-level semantic and structural information for improving the accuracy was proposed.A system to discover semantically meaningful descriptions of the scene classes to reduce the “semantic gap” between the high level and the low-level feature representation was built.Then,the multi-label network was trained by minimizing loss function (namely,element-wise logistic loss) to obtain the high-level semantic representation of traffic scene images.Finally,experiments on four large-scale scene recognition datasets show that the proposed algorithm considerably outperforms other state-of-the-art methods.

Key words: scene recognition, CNN, high-level feature, low-level feature

中图分类号:

TP30

刘文华, 李浥东, 王涛, 等. 基于高维特征表示的交通场景识别[J]. 智能科学与技术学报, 2019, 1(4): 392-399.

Wenhua LIU, Yidong LI, Tao WANG, et al. Transportation scene recognition based on high level feature representation[J]. Chinese Journal of Intelligent Science and Technology, 2019, 1(4): 392-399.

图/表 10

图1

图2

表1

图3

图4

表2

表3

图5

表4

表5

参考文献 37

[1]	AKER A , GAIZAUSKAS R . Generating image descriptions using dependency relational patterns[C]// Meeting of the Association for Computational Linguistics,July 11-16,2010,Uppsala,Sweden.[S.l.:s.n. ], 2010: 1250-1258.
[2]	AMRINE D E , WHITE B J , LARSON R L . Comparison of classi?cation algorithms to predict outcomes of feedlot cattle identi?ed and treated for bovine respiratory disease[J]. Computers ＆ Electronics in Agriculture,Computers and Electronics in Agriculture, 2014,105: 9-19.
[3]	BELONGIE S , MALIK J , PUZICHA J . Shape matching and object recognition using shape contexts[J]. IEEE Transactions on Pattern Analysis ＆ Machine Intelligence, 2010,24(4): 509-522.
[4]	BO L F , REN X F , FOX D . Hierarchical matching pursuit for image classi?cation:architecture and fast algorithms[C]// NIPS,December 3-6,2012,Lake Tahoe,USA.[S.l.:s.n]. 2012: 2115-2123.
[5]	CHERIYADAT A M . Unsupervised feature learning for aerial sceneclassiTcation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2014,52(1): 439-451.
[6]	DENG J , DONG W , SOCHER R ,et al. ImageNet:a large-scale hierarchical image database[C]// 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009),June 20-25,2009,Miami,USA. Piscataway:IEEE Press, 2009: 248-255.
[7]	DIXIT M , RASIWASIA N , VASCONCELOS N . Adapted Gaussian models for image classi?cation[C]// IEEE Conference on Computer Vision and Pattern Recognition,June 20-25,2011,Colorado Springs,USA. Piscataway:IEEE Press, 2011: 937-943.
[8]	FANG H , PLATT J C , ZITNICK C L . From captions to visual concepts and back[J]. Computer Science, 2015,12(7): 1473-1482.
[9]	FARHADI A , HEJRATI M , SADEGHI M A ,et al. Every picture tells a story:generating sentences from images[C]// European Conference on Computer Vision,September 5-11,2010,Crete,Greece. Heidelberg:Springer, 2010: 15-29.
[10]	LI F F , PERONA P . A Bayesian hierarchical model for learning natural scene categories[C]// CVPR,June 20-26,2005,San Diego,USA. Piscataway:IEEE Press, 2005: 524-531.
[11]	FELZENSZWALB P F , GIRSHICK R , MCALLESTER D ,et al. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Software Engineering, 2014,32(9): 1627-1645.
[12]	GAO S H , CHIA L T , TSANG I W . Multi-layer group sparse coding for concurrent image classi?cation and annotation[C]// IEEE Conference on Computer Vision ＆ Pattern Recognition,June 20-25,2011,Colorado Springs,USA. Piscataway:IEEE Press, 2011: 2809-2816.
[13]	GLOROT X , BENGIO Y . Understanding the difficulty of training deep feed forward neural networks[J]. Journal of Machine Learning Research, 2010,9: 249-256.
[14]	HAN Y , LIU G Z . Ef?cient learning of sample-speci?c discriminative features for scene classi?cation[J]. Signal Processing Letters IEEE, 2011,18(11): 683-686.
[15]	HAUPTMAN N , YAN R , LIN W H ,et al. Can high-level concepts ?ll the semantic gap in video retrieval? a case study with broadcast news[J]. IEEE Transactions on Multimedia, 2007,9(5): 958-966.
[16]	HINTON G E , SRIVASTAVA N , KRIZHEVSKY A ,et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. Computer Science, 2012,3(4): 212-223.
[17]	JIA Y Q , SHELHAMER E , DONAHUE J ,et al. Caffe:convolutional architecture for fast feature embedding[C]// ACM International Conference on Multimedia,October 23-27,2006,Santa Barbara,USA. New York:ACM Press, 2014: 675-678.
[18]	KULKARNI G , PREMRAJ V , ORDONEZ V . Babytalk:understanding and generating simple image descriptions[C]// IEEE Conference on Computer Vision ＆ Pattern Recognition,June 20-25,2011,Colorado Springs,USA. Piscataway:IEEE Press, 2011: 1601-1608.
[19]	LAMPERT C H , NICKISCH H , HARMELING S . Attribute-based classiTcation for zero-shot visual object categorization[J]. IEEE Transactions on Pattern Analysis ＆ Machine Intelligence, 2014,36(3): 453-465.
[20]	LAPIN M , SCHIELE B , HEIN M . Scalable multitask representation learning for scene classi?cation[C]// IEEE Conference on Computer Vision and Pattern Recognition,June 23-28,2014,Columbus,USA. Piscataway:IEEE Press, 2014: 1434-1441.
[21]	LAZEBNIK S , SCHMID C , PONCE J . Beyond bags of features:spatial pyramid matching for recognizing natural scene categories[C]// IEEE Computer Society Conference on Computer Vision ＆ Pattern Recognition,June 17-22,2006,New York,USA. Piscataway:IEEE Press, 2006: 2169-2178.
[22]	赵斐, 张文凯, 闫志远 ,等. 基于多特征图金字塔融合深度网络的遥感图像语义分割[J]. 电子与信息学报, 2019,41(10): 2525-2531.
	ZHAO F , ZHANG W K , YAN Z Y ,et al. Multi-feature map pyramid fusion deep network for semantic segmentation on remote sensing data[J]. Journal of Electronics and Information Technology, 2019,41(10): 2525-2531.
[23]	LI J L , LI F F . What,where and who? classifying events by scene and object recognition[C]// 2007 IEEE 11th International Conference on Computer Vision,October 14-21,2007,Rio De Janeiro,Brazil. Piscataway:IEEE Press, 2007: 1-8.
[24]	LI J L , SU H , LIM Y W ,et al. Object bank:an object-level image representation for high-level visual recognition[J]. International Journal of Computer Vision, 2014,107(1): 20-39.
[25]	LI J L , SU H , XING E P ,et al. Object bank:a high-level image representation for scene classi?cation ＆ semantic feature sparsi?cation[C]// The 24th Annual Conference on Neural Information Processing Systems 2010,December 6-9,2010,Vancouver,Canada.[S.l.:s.n. ], 2010.
[26]	OLIVA AUDE , TORRALBA A . Modeling the shape of the scene:a holistic representation of the spatial envelope[J]. International Journal of Computer Vision, 2001,42(3): 145-175.
[27]	PANDEY M , LAZEBNIK S . Scene recognition and weakly supervised object localization with deformable part-based models[C]// IEEE International Conference on Computer Vision,November 6-13,2011,Barcelona,Spain. Piscataway:IEEE Press, 2011: 1307-1314.
[28]	PAVLOPOULOU C , YU S X . Indoor-outdoor classi?cation with human accuracies:image or edge GIST[C]// IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops,June 13-18,2010,San Francisco,USA. Piscataway:IEEE Press, 2010: 41-47.
[29]	QUATTON I , TORRALBA A . Recognizing indoor scenes[C]// IEEE Conference on Computer Vision and Pattern Recognition,June 20-25,2009,Miami,USA. Piscataway:IEEE Press, 2009: 413-420.
[30]	RAMANA B V , BABU M S P . A critical study of selected classi?cation algorithms for liver disease diagnosis[J]. International Journal of Database Management Systems, 2011,3(2): 324-335.
[31]	SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2015.
[32]	TORRESANI L , SZUMMER M , FITZGIBBON A . Ef?cient object category recognition using classemes[C]// ECCV,September 5-11,2010,Crete,Greece. Heidelberg:Springer, 2010: 776-789.
[33]	WU J X , REHG M . Centrist:a visual descriptor for scene categorization[J]. IEEE Transactions on Pattern Analysis ＆ Machine Intelligence, 2011,33(8): 1489-1501.
[34]	XIAO J X , HAYS J , KRISTA A ,et al. Sun database:large-scale scene recognition from abbey to zoo[C]// IEEE Conference on Computer Vision ＆ Pattern Recognition,June 13-18,2010,San Francisco,USA. Piscataway:IEEE Press, 2010: 3485-3492.
[35]	YANG Y Z , CHING L T , HAL D III ,et al. Corpus-guided sentence generation of natural images[C]// Conference on Empirical Methods in Natural Language Processing,July 27-29,2011,Edinburgh,UK. New York:ACM Press, 2011: 444-454.
[36]	ZHANG F , DU B , ZHANG L P . Saliency-guided unsupervised feature learning for scene classi?cation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2015,53(4): 2175-2184.
[37]	ZHANG L , ZHEN X , SHAO L . Learning object-to-class kernels for scene classi?cation[J]. IEEE Transactions on Image Processing:A Publication of the IEEE Signal Processing Society, 2014,23(8): 3241-3253.

数据集	描述
Sport	数据集包含8个运动场景
Indoor	数据集共15 620张图片，含67类室内场景，每类至少包含100张图
Outdoor	数据集含有8类室外场景，共2 688张图
15 Scene	数据集是基于文献[11]创建的，共包含15种类别的自然场景

数据集	Sport	Indoor	Outdoor	15 Scene
GIST	82.6%	5.7%	81.9%	73.3%
SIFT	82.8%	44.2%	15.67%	82.4%
CENTIRST	86.2%	31.9%	89.57%	83.9%
OB	77.5%	33.3%	88.12%	82.0%
KCL	86.0%	32.4%	88.8%	88.8%
Attributes_Finetune	96.23%	68.32%	98.83%	91.92%

网络结构	识别准确率
网络结构	Sport	Indoor	Outdoor	15 Scene
CaffeNet	94.91%	60.05%	94.54%	88.62%
AlexNet	94.06%	58.15%	93.42%	86.84%
VGGNet16	96.21%	64.13%	92.30%	91.23%
Places-CNN	94.12%	68.24%	95.23%	90.19%
Soft_attribute	93.87%	66.43%	98.71%	88.48%
Attributes_Finetune	96.23%	68.32%	98.83%	91.92%

算法	训练数据	Sport	Indoor	Outdoor	15 Scene
Attributes_Finetune	50%	85.32%	46.07%	92.61%	14.53%
	70%	89.73%	57.72%	93.68%	89.92%
	100%	96.23%	68.32%	98.83%	91.92%
Feature_GIST	50%	59.95%	4.89%	75.68%	58.93%
	70%	67.59%	5.69%	81.96%	67.68%
	100%	68.86%	5.92%	83.53%	69.27%

分类算法	识别准确率
分类算法	线性支持向量机	线性二叉树
Sport	92.95%	96.23%
Indoor	73.38%	68.72%
Outdoor	93.38%	98.43%
15 Scene	93.86%	91.22%