视频行为识别综述

doi:10.11959/j.issn.1000-436x.2018107

Abstract

Abstract:

Behavior recognition is developing rapidly,and a number of behavior recognition algorithms based on deep network automatic learning features have been proposed.The deep learning method requires a large number of data to train,and requires higher computer storage and computing power.After a brief review of the current popular behavior recognition method based on deep network,it focused on the traditional behavior recognition methods.Traditional behavior recognition methods usually followed the processes of video feature extraction,modeling of features and classification.Following the basic process,the recognition process was overviewed according to the following steps,feature sampling,feature descriptors,feature processing,descriptor aggregation and vector coding.At the same time,the benchmark data set commonly used for evaluating the algorithm performance was also summarized.

Key words: behavior recognition, handcrafted, deep network, data set

CLC Number:

TP391

Huilan LUO,Chanjuan WANG,Fei LU. Survey of video behavior recognition[J]. Journal on Communications, 2018, 39(6): 169-180.

Figures/Tables 8

数据库名	发表年份	动作数	数据库简介	2015-2017年被引用次数
MU MoBo^[58]	2001	4	该数据库包含4类不同的行为，分别是慢走、快走、斜走以及带球走，以上动作由25个演员在3D CMU房间的跑步机上演示	62
KTH^[59]	2004	6	该数据库包含6类动作，共计2 391个视频样本，由25个演员在4个不同场景下完成。数据库中的视频样本中包含了尺度变化、衣着变化和光照变化，但其背景比较单一，相机角度也是固定的	492
Weizmann^[33]	2005	10	该数据库包含10类动作，每类动作有9个不同的样本。相机视角是固定的，背景相对简单，每一帧中只有一个人在做动作。数据集包含类别标签、剪影和背景序列	230
IXMAS^[60]	2006	14	该数据库为多视角数据库，包含14类动作，由11个演员完成，每个动作重复3次。相机分布在5个位置，分别是室内4个角落和头顶位置	4
UCF-Sports^[61]	2008	10	该数据库的视频来源于电视频道ESPN和BBC，包含10个运动动作类	220
Hollywood1^[36]	2008	8	该数据库包含8类动作，这些动作从32部电影当中收集	674
Hollywood2^[62]	2009	12	该数据库包含12类动作，共3 669个视频，所有视频都是从69部Hollywood电影中抽取出来的。视频样本中行为人的表情、姿态、穿着以及相机运动、光照变化、遮挡、背景等变化很大，接近于真实场景下的情况，因而对于行为的分析识别极具挑战性	272
HumanEva^[63]	2009	6	该数据库中的视频采用3个色彩相机、4个灰度相机拍摄而成，由4个演员演示了6个动作类	81
UCF-YouTube^[64]	2009	11	该数据库包含11个动作类，其中的视频存在抖动、视觉变化、光照变化和背景遮挡等问题	167
MSR Action3D^[65]	2010	20	该数据库包含20类动作，总计557个深度图视频序列	309
HMDB51^[42]	2011	51	该数据库包含 51 类动作，总计 6 849个视频，视频多数来源于电影以及 YouTube等网络视频库，每个动作类至少包含有101段样本	390
UCF101^[11]	2012	101	该数据库是目前公开数据库中最大的数据库之一，它的视频来源YouTube，包含101个动作类	459
UCF-50^[66]	2013	50	该数据库视频来源YouTube，它依据视频的标签被划分为50个动作类，共有6 618个视频序列	161
UCF Kinect^[67]	2013	16	该数据库中的骨架序列是使用单个Kinect和OpenNI框架采集获取的，一共有16个行为，都是为游戏场景所设计的	70

References 89

[1]	MOESLUND T B , HILTON A , KRUGER V . A survey of advances in vision-based human motion capture and analysis[J]. Computer Vision＆ Image Understanding, 2006,104(2): 90-126.
[2]	CHENG G C , WAN Y F , SAUDAGAR A N ,et al. Advances in human action recognition:a survey[J]. Computer Science, 2015,2015(1): 1-30.
[3]	JI X , LIU H . Advances in view-invariant human motion analysis:a review[J]. IEEE Transactions on Systems Man ＆ Cybernetics Part C, 2009,40(1): 13-24.
[4]	DHAMSANIA C J , RATANPARA T V . A survey on human action recognition from videos[C]// Online International Conference on Green Engineering and Technologies. 2017: 1-5.
[5]	CANDAMO J , SHREVE M , GOLDGOF D B ,et al. Understanding transit scenes:a survey on human behavior recognition algorithms[J]. IEEE Transactions on Intelligent Transportation Systems, 2010,11(1): 206-224.
[6]	POPPE R . A survey on vision-based human action recognition[J]. Image ＆ Vision Computing, 2010,28(6): 976-990.
[7]	WEINLAND D , RONFARD R , BOYER E . A survey of vision-based methods for action representation,segmentation and recognition[J]. Computer Vision ＆ Image Understanding, 2011,115(2): 224-241.
[8]	CHAUDHARY A , RAHEJA J L , DAS K ,et al. A survey on hand gesture recognition in context of soft computing[C]// International Conference on Computer Science and Information Technology. 2011: 46-55.
[9]	LAPTEV I . On space-time interest points[J]. International Journal of Computer Vision, 2005,64(2-3): 107-123.
[10]	HARRIS C J . A combined corner and edge detector[J]. Proc Alvey Vision Conf, 1988,1988(3): 147-151.
[11]	SOOMRO K , ZAMIR A R , SHAH M . UCF101:a dataset of 101 human actions classes from videos in the wild[J]. Computer Science, 2012.
[12]	OIKONOMOPOULOS A , PATRAS I , PANTIC M . Spatiotemporal salient points for visual recognition of human actions[J]. IEEE Transactions on Systems Man ＆ Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man ＆ Cybernetics Society, 2006,36(3): 710-719.
[13]	DOLLAR P , RABAUD V , COTTRELL G ,et al. Behavior recognition via sparse spatio-temporal features[C]// IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 2006: 65-72.
[14]	RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Spatiotemporal saliency for event detection and representation in the 3d wavelet domain:potential in human action recognition[C]// ACM International Conference on Image and Video Retrieval. 2007: 294-301.
[15]	RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Dense saliency-based spatiotemporal feature points for action recognition[C]// Computer Vision and Pattern Recognition. 2009: 1454-1461.
[16]	WILLEMS G , TUYTELAARS T , GOOL L . An efficient dense and scale-invariant spatio-temporal interest point detector[C]// European Conference on Computer Vision. 2008: 650-663.
[17]	WANG H , KLASER A , SCHMID C ,et al. Action recognition by dense trajectories[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2011: 3169-3176.
[18]	MURTHY O V R , GOECKE R . Ordered trajectories for human action recognition with large number of classes[J]. Image ＆ Vision Computing, 2015,42(C): 22-34.
[19]	CHO J , LEE M , CHANG H J ,et al. Robust action recognition using local motion and group sparsity[J]. Pattern Recognition, 2014,47(5): 1813-1825.
[20]	WANG H , SCHMID C . Action recognition with improved trajectories[C]// IEEE International Conference on Computer Vision. 2014: 3551-3558.
[21]	FERNANDO B , GAVVES E , ORAMAS M J ,et al. Modeling video evolution for action recognition[C]// IEEE Conference Computer Vision and Pattern Recognition. 2015: 5378-5387.
[22]	JHUANG H , SERRE T , WOLF L ,et al. A biologically inspired system for action recognition[C]// International Conference on Computer Vision. 2007: 1-8.
[23]	PENG X , QIAO Y , PENG Q ,et al. Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition[C]// British Machine Vision Conference. 2013.
[24]	ALI S , BASHARAT A , SHAH M . Chaotic invariants for human action recognition[C]// International Conference on Computer Vision. 2007: 1-8.
[25]	YILMA A , SHAH M . Recognizing human actions in videos acquired by uncalibrated moving cameras[C]// Tenth IEEE International Conference on Computer Vision. 2005: 150-157.
[26]	JHUANG H , GALL J , ZUFFI S ,et al. Towards understanding action recognition[C]// IEEE International Conference on Computer Vision. 2014: 3192-3199.
[27]	SINGH V K , NEVATIA R . Action recognition in cluttered dynamic scenes using pose-specific part models[C]// International Conference on Computer Vision. 2011: 113-120.
[28]	DU Y , WANG W , WANG L . Hierarchical recurrent neural network for skeleton based action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 1110-1118.
[29]	WU D , SHAO L . Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2014: 724-731.
[30]	WANG C , WANG Y , YUILLE A L . An approach to pose-based action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2013: 915-922.
[31]	JIANG Z , LIN Z , DAVIS L S . Recognizing human actions by learning and matching shape-motion prototype trees[J]. IEEE Transactions on Pattern Analysis ＆ Machine Intelligence, 2012,34(3): 533-547.
[32]	HUANG M , SU S Z , CAI G R ,et al. Meta-action descriptor for action recognition in RGBD video[J]. IET Computer Vision, 2017,11(4): 301-308.
[33]	GORELICK L , BLANK M , SHECHTMAN E ,et al. Actions as space-time shapes[J]. IEEE Transactions on Pattern Analysis ＆ Machine Intelligence, 2007,29(12): 2247-2253.
[34]	DALAL N , TRIGGS B . Histograms of oriented gradients for human detection[C]// Computer Vision and Pattern Recognition. 2005: 886-893.
[35]	DALAL N , TRIGGS B , SCHMID C . Human detection using oriented histograms of flow and appearance[C]// European Conference on Computer Vision. 2006: 428-441.
[36]	LAPTEV I , MARSZALEK M , SCHMID C ,et al. Learning realistic human actions from movies[C]// Computer Vision and Pattern Recognition. 2008: 1-8.
[37]	PENG X , WANG L , WANG X ,et al. Bag of visual words and fusion methods for action recognition:comprehensive study and good practice[J]. Computer Vision ＆ Image Understanding, 2016,150(C): 109-125.
[38]	PERRONNIN F , MENSINK T . Improving the fisher kernel for large-scale image classification[C]// European Conference on Computer Vision. 2010: 143-156.
[39]	JEGOU H , DOUZE M , SCHMID C ,et al. Aggregating local descriptors into a compact image representation[C]// Computer Vision and Pattern Recognition. 2010: 3304-3311.
[40]	SIMONYAN K , ZISSERMAN A . Two-stream convolutional networks for action recognition in videos[J]. Neural Information Processing Systems , 2014,1(4): 568-576.
[41]	WANG L , GE L , LI R ,et al. Three-stream CNNs for action recognition[J]. Pattern Recognition Letters, 2017,92(C): 33-40.
[42]	KUEHNE H , JHUANG H , STIEFELHAGEN R ,et al. HMDB51:a large video database for human motion recognition[C]// IEEE International Conference on Computer Vision. 2011: 2556-2563.
[43]	GKIOXARI G , GIRSHICK R , MALIK J . Contextual action recognition with R*CNN[J]. CoRR, 2016,40(1): 1080-1088.
[44]	GKIOXARI G , GIRSHICK R , MALIK J . Actions and attributes from wholes and parts[C]// International Conference on Computer Vision. 2015: 2470-2478.
[45]	HOAI M , . Regularized max pooling for image categorization[C]// British Machine Vision Conference. 2014: 94-100.
[46]	SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2014.
[47]	OQUAB M , BOTTOU L , LAPTEV I ,et al. Learning and transferring mid-level image representations using convolutional neural networks[C]// Conference on Computer Vision and Pattern Recognition. 2014: 1717-1724.
[48]	CHERON G , LAPTEV I , SCHMID C . P-CNN:pose-based CNN features for action recognition[C]// International Conference on Computer Vision. 2015: 3218-3226.
[49]	ROHRBACH M , AMIN S , ANDRILUKA M ,et al. A database for fine grained activity detection of cooking activities[C]// Conference on Computer Vision and Pattern Recognition. 2012: 1194-1201.
[50]	ZHOU Y , NI B , HONG R ,et al. Interaction part mining:a mid-level approach for fine-grained action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3323-3331.
[51]	ZHOU Y , NI B , YAN S ,et al. Pipelining localized semantic features for fine-grained action recognition[C]// European Conference on Computer Vision. 2014: 481-496.
[52]	GRAVES A , MOHAMED A , HINTON G . Speech recognition with deep recurrent neural networks[C]// IEEE International Conference on Acoustics,Speech and Signal Processing. 2013: 6645-6649.
[53]	HOCHREITER S , SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780.
[54]	NIEBLES J C , WANG H , LI F F . Unsupervised learning of human action categories using spatial-temporal words[J]. International Journal of Computer Vision, 2008,79(3): 299-318.
[55]	DONAHUE J , HENDRICKS L A , GUADARRAMA S ,et al. Long-term recurrent convolutional networks for visual recognition and description[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 2625-2634.
[56]	NG Y H , HAUSKNECHT M , VIJAYANARASIMHAN S ,et al. Beyond short snippets:deep networks for video classification[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 4694-4702.
[57]	YU S , CHENG Y , XIE L ,et al. A novel recurrent hybrid network for feature fusion in action recognition[J]. Journal of Visual Communication ＆ Image Representation, 2017,49: 192-203.
[58]	GROSS R , SHI J . The CMU motion of body (MoBo) database[J]. Monumenta Nipponica, 2001,45(4).
[59]	SCHULDT C , LAPTEV I , CAPUTO B . Recognizing human actions:a local SVM approach[C]// International Conference on Pattern Recognition. 2004: 32-36.
[60]	WEINLAND D , RONFARD R , BOYER E . Free viewpoint action recognition using motion history volumes[J]. Computer Vision ＆ Image Understanding, 2011,104(2): 249-257.
[61]	RODRIGUEZ M D , AHMED J , SHAH M . Action MACH a spatio-temporal maximum average correlation height filter for action recognition[C]// Conference on Computer Vision and Pattern Recognition. 2008: 1-8.
[62]	MARSZALEK M , LAPTEV I , SCHMID C . Actions in context[C]// Conference on Computer Vision and Pattern Recognition. 2009: 2929-2936.
[63]	SIGAL L , BALAN A O , BLACK M J . HumanEva:synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion[J]. International Journal of Computer Vision, 2006,87(1-2): 4-27.
[64]	LIU J , LUO J , SHAH M . Recognizing realistic actions from videos in the wild[C]// Computer Vision and Pattern Recognition. 2009: 1996-2003.
[65]	LI W , ZHANG Z , LIU Z . Action recognition based on a bag of 3D points[C]// Conference on Computer Vision and Pattern Recognition. 2010: 9-14.
[66]	REDDY K K , SHAH M . Recognizing 50 human action categories of web videos[J]. Machine Vision ＆ Applications, 2013,24(5): 971-981.
[67]	ELLIS C , MASOOD S Z , TAPPEN M F ,et al. Exploring the trade-off between accuracy and observational latency in action recognition[J]. International Journal of Computer Vision, 2013,101(3): 420-436.
[68]	PENG X , ZOU C , QIAO Y ,et al. Action recognition with stacked fisher vectors[C]// European Conference on Computer Vision. 2014: 581-595.
[69]	DUTA I C , LONESCU B , AIZAWA K ,et al. Spatio-temporal VLAD encoding for human action recognition in videos[C]// International Conference on Multimedia Modeling. 2017: 365-378.
[70]	BILEN H , FERNANDO B , GAVVES E ,et al. Action recognition with dynamic image networks[J]. IEEE Transactions on Pattern Analysis ＆Machine Intelligence, 2017,PP(99): 1.
[71]	WU X , XU D , DUAN L ,et al. Action recognition using context and appearance distribution features[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2011: 489-496.
[72]	LIU J , KUIPERS B , SAVARESE S . Recognizing human actions by attributes[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2011: 3337-3344.
[73]	CORSO J J , . Action bank:a high-level representation of activity in video[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2012: 1234-1241.
[74]	CHEN M , GONG L , WANG T ,et al. Action recognition using lie Algebrized Gaussians over dense local spatio-temporal features[J]. Multimedia Tools ＆ Applications, 2015,74(6): 2127-2142.
[75]	ZHANG Z , TAO D . Slow feature analysis for human action recognition[J]. IEEE Transactions on Pattern Analysis ＆ Machine Intelligence, 2012,34(3): 436-450.
[76]	JI S , XU W , YANG M ,et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis ＆Machine Intelligence, 2013,35(1): 221-231.
[77]	HASAN M , ROY-CHOWDHURY A K , . Continuous learning of human activity models using deep nets[C]// European Conference on Computer Vision. 2014: 705-720.
[78]	SUN L , JIA K , CHAN T H ,et al. DL-SFA:deeply-learned slow feature analysis for action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2014: 2625-2632.
[79]	JIANG Y G , DAI Q , XUE X ,et al. Trajectory-based modeling of human actions with motion reference points[C]// European Conference on Computer Vision. 2012: 425-438.
[80]	WANG L M , QIAO Y , TANG X . Motionlets:mid-level 3d parts for human motion recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2013: 2674-2681.
[81]	SUN L , JIA K , YEUNG D Y ,et al. Human action recognition using factorized spatio-temporal convolutional networks[C]// IEEE International Conference on Computer Vision. 2015: 4597-4605.
[82]	WANG L , QIAO Y , TANG X . Action recognition with trajectory-pooled deep-convolutional descriptors[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 4305-4314.
[83]	PARK E , HAN X , BERG T L ,et al. Combining multiple sources of knowledge in deep CNNs for action recognition[C]// IEEE Winter Conference on Applications of Computer Vision. 2016: 1-8.
[84]	SOUZA C R D , GAIDON A , VIG E ,et al. Sympathy for the details:dense trajectories and hybrid classification architectures for action recognition[C]// European Conference on Computer Vision. 2016: 697-716.
[85]	YU S , CHENG Y , SU S ,et al. Stratified pooling based deep convolutional neural networks for human action recognition[J]. Multimedia Tools ＆ Applications, 2017,76(11): 13367-13382.
[86]	MURTHY O V R , GOECKE R . Ordered trajectories for large scale human action recognition[C]// IEEE International Conference on Computer Vision. 2014: 412-419.
[87]	PENG X , WANG L , QIAO Y ,et al. Boosting VLAD with supervised dictionary learning and high-order statistics[C]// European Conference on Computer Vision. 2014: 660-674.
[88]	LAN Z , LIN M , LI X ,et al. Beyond gaussian pyramid:multi-skip feature stacking for action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 204-212.
[89]	FEICHTENHOFER C , PINZ A , WILDES R P . Spatiotemporal multip lier networks for video action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7445-7454.

Metrics

Recommended 0

No Suggested Reading articles found!

方法类型	文献	出版源	发表年份	准确度
基于手动提取特征表示方法分析比较	文献[71]	IEEE Conference on Computer Vision and Pattern Recognition	2011	94.5%
	文献[72]	IEEE Conference on Computer Vision and Pattern Recognition	2011	91.59%
	文献[17]	IEEE Conference on Computer Vision and Pattern Recognition	2011	95%
	文献[73]	IEEE Conference on Computer Vision and Pattern Recognition	2012	98.2%
	文献[23]	British Machine Vision Conference	2013	95.6%
	文献[74]	Multimedia Tools ＆ Applications	2015	97.41%
深度网络学习特征表示方法	文献[75]	IEEE Transactions on Pattern Analysis ＆ Machine Intelligence	2012	93.5%
	文献[76]	IEEE Transactions on Pattern Analysis ＆ Machine Intelligence	2013	90.2%
	文献[77]	European Conference on Computer Vision	2014	96.6%
	文献[78]	IEEE Conference on Computer Vision and Pattern Recognition	2014	93.1%

方法类型	文献	出版源	发表年份	准确度
基于手动提取特征表示方法分析比较	文献[17]	IEEE Conference on Computer Vision and Pattern Recognition	2011	46.6%
	文献[79]	European Conference on Computer Vision	2012	40.7%
	文献[73]	IEEE Conference on Computer Vision and Pattern Recognition	2012	26.9%
	文献[20]	IEEE International Conference on Computer Vision	2013	57.2%
	文献[80]	IEEE Conference on Computer Vision and Pattern Recognition	2013	33.7%
	文献[68]	European Conference on Computer Vision	2014	66.79%
	文献[21]	IEEE Conference on Computer Vision and Pattern Recognition	2015	63.7%
深度网络学习特征表示方法	文献[40]	Neural Information Processing Systems	2014	59.4%
	文献[81]	IEEE International Conference on Computer Vision	2015	59.1%
	文献[82]	IEEE Conference on Computer Vision and Pattern Recognition	2015	65.9%
	文献[83]	IEEE Winter Conference on Applications of Computer Vision	2016	54.9%
	文献[84]	European Conference on Computer Vision	2016	70.4%
	文献[85]	Multimedia Tools ＆ Applications	2016	74.7%
	文献[70]	IEEE Transactions on Pattern Analysis ＆ Machine Intelligence	2016	74.9%

方法类型	文献	出版源	发表年份	准确度
基于手动提取特征表示方法分析比较	文献[79]	ICCV 2013 workshop of THUMOS'13 Action Recognition Challenge	2013	87.46%
	文献[86]	IEEE International Conference on Computer Vision	2014	73.1%
	文献[87]	European Conference on Computer Vision	2014	87.7%
	文献[88]	IEEE Conference on Computer Vision and Pattern Recognition	2015	89.1%
	文献[37]	Computer Vision ＆ Image Understanding	2016	87.9%
	文献[69]	International Conference on MultiMedia Modeling	2017	91.5%
深度网络学习特征表示方法	文献[40]	Neural Information Processing Systems	2014	88%
	文献[56]	IEEE Conference on Computer Vision and Pattern Recognition	2015	88.6%
	文献[82]	IEEE Conference on Computer Vision and Pattern Recognition	2015	91.5%
	文献[83]	IEEE Winter Conference on Applications of Computer Vision	2016	89.1%
	文献[85]	Multimedia Tools ＆ Applications	2016	91.6%
	文献[89]	IEEE Conference on Computer Vision and Pattern Recognition	2017	94.6%
	文献[70]	IEEE Transactions on Pattern Analysis ＆ Machine Intelligence	2016	96%

Survey of video behavior recognition

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 8

References 89

Related Articles 3

Metrics

Recommended 0

[1]	Ping ZHANG,Huiyong LIU,Wenjing LI,Fanqin ZHOU. Industrial intelligent network: deepening and upgrading of industrial Internet [J]. Journal on Communications, 2018, 39(12): 134-140.
[2]	. Face recognition under unconstrained based on LBP and deep learning [J]. Journal on Communications, 2014, 35(6): 20-160.
[3]	Shu-fen LIANG,Yin-hua LIU,Li-chen LI. Face recognition under unconstrained based on LBP and deep learning [J]. Journal on Communications, 2014, 35(6): 154-160.