Journal on Communications ›› 2018, Vol. 39 ›› Issue (6): 169-180.doi: 10.11959/j.issn.1000-436x.2018107
• Comprehensive Reviews • Previous Articles Next Articles
Huilan LUO,Chanjuan WANG,Fei LU
Revised:
2018-05-16
Online:
2018-06-01
Published:
2018-07-09
Supported by:
CLC Number:
Huilan LUO,Chanjuan WANG,Fei LU. Survey of video behavior recognition[J]. Journal on Communications, 2018, 39(6): 169-180.
"
数据库名 | 发表年份 | 动作数 | 数据库简介 | 2015-2017年被引用次数 |
MU MoBo[58] | 2001 | 4 | 该数据库包含4类不同的行为,分别是慢走、快走、斜走以及带球走,以上动作由25个演员在3D CMU房间的跑步机上演示 | 62 |
KTH[59] | 2004 | 6 | 该数据库包含6类动作,共计2 391个视频样本,由25个演员在4个不同场景下完成。数据库中的视频样本中包含了尺度变化、衣着变化和光照变化,但其背景比较单一,相机角度也是固定的 | 492 |
Weizmann[33] | 2005 | 10 | 该数据库包含10类动作,每类动作有9个不同的样本。相机视角是固定的,背景相对简单,每一帧中只有一个人在做动作。数据集包含类别标签、剪影和背景序列 | 230 |
IXMAS[60] | 2006 | 14 | 该数据库为多视角数据库,包含14类动作,由11个演员完成,每个动作重复3次。相机分布在5个位置,分别是室内4个角落和头顶位置 | 4 |
UCF-Sports[61] | 2008 | 10 | 该数据库的视频来源于电视频道ESPN和BBC,包含10个运动动作类 | 220 |
Hollywood1[36] | 2008 | 8 | 该数据库包含8类动作,这些动作从32部电影当中收集 | 674 |
Hollywood2[62] | 2009 | 12 | 该数据库包含12类动作,共3 669个视频,所有视频都是从69部Hollywood电影中抽取出来的。视频样本中行为人的表情、姿态、穿着以及相机运动、光照变化、遮挡、背景等变化很大,接近于真实场景下的情况,因而对于行为的分析识别极具挑战性 | 272 |
HumanEva[63] | 2009 | 6 | 该数据库中的视频采用3个色彩相机、4个灰度相机拍摄而成,由4个演员演示了6个动作类 | 81 |
UCF-YouTube[64] | 2009 | 11 | 该数据库包含11个动作类,其中的视频存在抖动、视觉变化、光照变化和背景遮挡等问题 | 167 |
MSR Action3D[65] | 2010 | 20 | 该数据库包含20类动作,总计557个深度图视频序列 | 309 |
HMDB51[42] | 2011 | 51 | 该数据库包含 51 类动作,总计 6 849个视频,视频多数来源于电影以及 YouTube等网络视频库,每个动作类至少包含有101段样本 | 390 |
UCF101[11] | 2012 | 101 | 该数据库是目前公开数据库中最大的数据库之一,它的视频来源YouTube,包含101个动作类 | 459 |
UCF-50[66] | 2013 | 50 | 该数据库视频来源YouTube,它依据视频的标签被划分为50个动作类,共有6 618个视频序列 | 161 |
UCF Kinect[67] | 2013 | 16 | 该数据库中的骨架序列是使用单个Kinect和OpenNI框架采集获取的,一共有16个行为,都是为游戏场景所设计的 | 70 |
"
方法类型 | 文献 | 出版源 | 发表年份 | 准确度 |
基于手动提取特征表示方法分析比较 | 文献[71] | IEEE Conference on Computer Vision and Pattern Recognition | 2011 | 94.5% |
文献[72] | IEEE Conference on Computer Vision and Pattern Recognition | 2011 | 91.59% | |
文献[17] | IEEE Conference on Computer Vision and Pattern Recognition | 2011 | 95% | |
文献[73] | IEEE Conference on Computer Vision and Pattern Recognition | 2012 | 98.2% | |
文献[23] | British Machine Vision Conference | 2013 | 95.6% | |
文献[74] | Multimedia Tools & Applications | 2015 | 97.41% | |
深度网络学习特征表示方法 | 文献[75] | IEEE Transactions on Pattern Analysis & Machine Intelligence | 2012 | 93.5% |
文献[76] | IEEE Transactions on Pattern Analysis & Machine Intelligence | 2013 | 90.2% | |
文献[77] | European Conference on Computer Vision | 2014 | 96.6% | |
文献[78] | IEEE Conference on Computer Vision and Pattern Recognition | 2014 | 93.1% |
"
方法类型 | 文献 | 出版源 | 发表年份 | 准确度 |
基于手动提取特征表示方法分析比较 | 文献[17] | IEEE Conference on Computer Vision and Pattern Recognition | 2011 | 46.6% |
文献[79] | European Conference on Computer Vision | 2012 | 40.7% | |
文献[73] | IEEE Conference on Computer Vision and Pattern Recognition | 2012 | 26.9% | |
文献[20] | IEEE International Conference on Computer Vision | 2013 | 57.2% | |
文献[80] | IEEE Conference on Computer Vision and Pattern Recognition | 2013 | 33.7% | |
文献[68] | European Conference on Computer Vision | 2014 | 66.79% | |
文献[21] | IEEE Conference on Computer Vision and Pattern Recognition | 2015 | 63.7% | |
深度网络学习特征表示方法 | 文献[40] | Neural Information Processing Systems | 2014 | 59.4% |
文献[81] | IEEE International Conference on Computer Vision | 2015 | 59.1% | |
文献[82] | IEEE Conference on Computer Vision and Pattern Recognition | 2015 | 65.9% | |
文献[83] | IEEE Winter Conference on Applications of Computer Vision | 2016 | 54.9% | |
文献[84] | European Conference on Computer Vision | 2016 | 70.4% | |
文献[85] | Multimedia Tools & Applications | 2016 | 74.7% | |
文献[70] | IEEE Transactions on Pattern Analysis & Machine Intelligence | 2016 | 74.9% |
"
方法类型 | 文献 | 出版源 | 发表年份 | 准确度 |
基于手动提取特征表示方法分析比较 | 文献[79] | ICCV 2013 workshop of THUMOS'13 Action Recognition Challenge | 2013 | 87.46% |
文献[86] | IEEE International Conference on Computer Vision | 2014 | 73.1% | |
文献[87] | European Conference on Computer Vision | 2014 | 87.7% | |
文献[88] | IEEE Conference on Computer Vision and Pattern Recognition | 2015 | 89.1% | |
文献[37] | Computer Vision & Image Understanding | 2016 | 87.9% | |
文献[69] | International Conference on MultiMedia Modeling | 2017 | 91.5% | |
深度网络学习特征表示方法 | 文献[40] | Neural Information Processing Systems | 2014 | 88% |
文献[56] | IEEE Conference on Computer Vision and Pattern Recognition | 2015 | 88.6% | |
文献[82] | IEEE Conference on Computer Vision and Pattern Recognition | 2015 | 91.5% | |
文献[83] | IEEE Winter Conference on Applications of Computer Vision | 2016 | 89.1% | |
文献[85] | Multimedia Tools & Applications | 2016 | 91.6% | |
文献[89] | IEEE Conference on Computer Vision and Pattern Recognition | 2017 | 94.6% | |
文献[70] | IEEE Transactions on Pattern Analysis & Machine Intelligence | 2016 | 96% |
[1] | MOESLUND T B , HILTON A , KRUGER V . A survey of advances in vision-based human motion capture and analysis[J]. Computer Vision& Image Understanding, 2006,104(2): 90-126. |
[2] | CHENG G C , WAN Y F , SAUDAGAR A N ,et al. Advances in human action recognition:a survey[J]. Computer Science, 2015,2015(1): 1-30. |
[3] | JI X , LIU H . Advances in view-invariant human motion analysis:a review[J]. IEEE Transactions on Systems Man & Cybernetics Part C, 2009,40(1): 13-24. |
[4] | DHAMSANIA C J , RATANPARA T V . A survey on human action recognition from videos[C]// Online International Conference on Green Engineering and Technologies. 2017: 1-5. |
[5] | CANDAMO J , SHREVE M , GOLDGOF D B ,et al. Understanding transit scenes:a survey on human behavior recognition algorithms[J]. IEEE Transactions on Intelligent Transportation Systems, 2010,11(1): 206-224. |
[6] | POPPE R . A survey on vision-based human action recognition[J]. Image & Vision Computing, 2010,28(6): 976-990. |
[7] | WEINLAND D , RONFARD R , BOYER E . A survey of vision-based methods for action representation,segmentation and recognition[J]. Computer Vision & Image Understanding, 2011,115(2): 224-241. |
[8] | CHAUDHARY A , RAHEJA J L , DAS K ,et al. A survey on hand gesture recognition in context of soft computing[C]// International Conference on Computer Science and Information Technology. 2011: 46-55. |
[9] | LAPTEV I . On space-time interest points[J]. International Journal of Computer Vision, 2005,64(2-3): 107-123. |
[10] | HARRIS C J . A combined corner and edge detector[J]. Proc Alvey Vision Conf, 1988,1988(3): 147-151. |
[11] | SOOMRO K , ZAMIR A R , SHAH M . UCF101:a dataset of 101 human actions classes from videos in the wild[J]. Computer Science, 2012. |
[12] | OIKONOMOPOULOS A , PATRAS I , PANTIC M . Spatiotemporal salient points for visual recognition of human actions[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society, 2006,36(3): 710-719. |
[13] | DOLLAR P , RABAUD V , COTTRELL G ,et al. Behavior recognition via sparse spatio-temporal features[C]// IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. 2006: 65-72. |
[14] | RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Spatiotemporal saliency for event detection and representation in the 3d wavelet domain:potential in human action recognition[C]// ACM International Conference on Image and Video Retrieval. 2007: 294-301. |
[15] | RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Dense saliency-based spatiotemporal feature points for action recognition[C]// Computer Vision and Pattern Recognition. 2009: 1454-1461. |
[16] | WILLEMS G , TUYTELAARS T , GOOL L . An efficient dense and scale-invariant spatio-temporal interest point detector[C]// European Conference on Computer Vision. 2008: 650-663. |
[17] | WANG H , KLASER A , SCHMID C ,et al. Action recognition by dense trajectories[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2011: 3169-3176. |
[18] | MURTHY O V R , GOECKE R . Ordered trajectories for human action recognition with large number of classes[J]. Image & Vision Computing, 2015,42(C): 22-34. |
[19] | CHO J , LEE M , CHANG H J ,et al. Robust action recognition using local motion and group sparsity[J]. Pattern Recognition, 2014,47(5): 1813-1825. |
[20] | WANG H , SCHMID C . Action recognition with improved trajectories[C]// IEEE International Conference on Computer Vision. 2014: 3551-3558. |
[21] | FERNANDO B , GAVVES E , ORAMAS M J ,et al. Modeling video evolution for action recognition[C]// IEEE Conference Computer Vision and Pattern Recognition. 2015: 5378-5387. |
[22] | JHUANG H , SERRE T , WOLF L ,et al. A biologically inspired system for action recognition[C]// International Conference on Computer Vision. 2007: 1-8. |
[23] | PENG X , QIAO Y , PENG Q ,et al. Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition[C]// British Machine Vision Conference. 2013. |
[24] | ALI S , BASHARAT A , SHAH M . Chaotic invariants for human action recognition[C]// International Conference on Computer Vision. 2007: 1-8. |
[25] | YILMA A , SHAH M . Recognizing human actions in videos acquired by uncalibrated moving cameras[C]// Tenth IEEE International Conference on Computer Vision. 2005: 150-157. |
[26] | JHUANG H , GALL J , ZUFFI S ,et al. Towards understanding action recognition[C]// IEEE International Conference on Computer Vision. 2014: 3192-3199. |
[27] | SINGH V K , NEVATIA R . Action recognition in cluttered dynamic scenes using pose-specific part models[C]// International Conference on Computer Vision. 2011: 113-120. |
[28] | DU Y , WANG W , WANG L . Hierarchical recurrent neural network for skeleton based action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 1110-1118. |
[29] | WU D , SHAO L . Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2014: 724-731. |
[30] | WANG C , WANG Y , YUILLE A L . An approach to pose-based action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2013: 915-922. |
[31] | JIANG Z , LIN Z , DAVIS L S . Recognizing human actions by learning and matching shape-motion prototype trees[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2012,34(3): 533-547. |
[32] | HUANG M , SU S Z , CAI G R ,et al. Meta-action descriptor for action recognition in RGBD video[J]. IET Computer Vision, 2017,11(4): 301-308. |
[33] | GORELICK L , BLANK M , SHECHTMAN E ,et al. Actions as space-time shapes[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2007,29(12): 2247-2253. |
[34] | DALAL N , TRIGGS B . Histograms of oriented gradients for human detection[C]// Computer Vision and Pattern Recognition. 2005: 886-893. |
[35] | DALAL N , TRIGGS B , SCHMID C . Human detection using oriented histograms of flow and appearance[C]// European Conference on Computer Vision. 2006: 428-441. |
[36] | LAPTEV I , MARSZALEK M , SCHMID C ,et al. Learning realistic human actions from movies[C]// Computer Vision and Pattern Recognition. 2008: 1-8. |
[37] | PENG X , WANG L , WANG X ,et al. Bag of visual words and fusion methods for action recognition:comprehensive study and good practice[J]. Computer Vision & Image Understanding, 2016,150(C): 109-125. |
[38] | PERRONNIN F , MENSINK T . Improving the fisher kernel for large-scale image classification[C]// European Conference on Computer Vision. 2010: 143-156. |
[39] | JEGOU H , DOUZE M , SCHMID C ,et al. Aggregating local descriptors into a compact image representation[C]// Computer Vision and Pattern Recognition. 2010: 3304-3311. |
[40] | SIMONYAN K , ZISSERMAN A . Two-stream convolutional networks for action recognition in videos[J]. Neural Information Processing Systems , 2014,1(4): 568-576. |
[41] | WANG L , GE L , LI R ,et al. Three-stream CNNs for action recognition[J]. Pattern Recognition Letters, 2017,92(C): 33-40. |
[42] | KUEHNE H , JHUANG H , STIEFELHAGEN R ,et al. HMDB51:a large video database for human motion recognition[C]// IEEE International Conference on Computer Vision. 2011: 2556-2563. |
[43] | GKIOXARI G , GIRSHICK R , MALIK J . Contextual action recognition with R*CNN[J]. CoRR, 2016,40(1): 1080-1088. |
[44] | GKIOXARI G , GIRSHICK R , MALIK J . Actions and attributes from wholes and parts[C]// International Conference on Computer Vision. 2015: 2470-2478. |
[45] | HOAI M , . Regularized max pooling for image categorization[C]// British Machine Vision Conference. 2014: 94-100. |
[46] | SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition[J]. Computer Science, 2014. |
[47] | OQUAB M , BOTTOU L , LAPTEV I ,et al. Learning and transferring mid-level image representations using convolutional neural networks[C]// Conference on Computer Vision and Pattern Recognition. 2014: 1717-1724. |
[48] | CHERON G , LAPTEV I , SCHMID C . P-CNN:pose-based CNN features for action recognition[C]// International Conference on Computer Vision. 2015: 3218-3226. |
[49] | ROHRBACH M , AMIN S , ANDRILUKA M ,et al. A database for fine grained activity detection of cooking activities[C]// Conference on Computer Vision and Pattern Recognition. 2012: 1194-1201. |
[50] | ZHOU Y , NI B , HONG R ,et al. Interaction part mining:a mid-level approach for fine-grained action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3323-3331. |
[51] | ZHOU Y , NI B , YAN S ,et al. Pipelining localized semantic features for fine-grained action recognition[C]// European Conference on Computer Vision. 2014: 481-496. |
[52] | GRAVES A , MOHAMED A , HINTON G . Speech recognition with deep recurrent neural networks[C]// IEEE International Conference on Acoustics,Speech and Signal Processing. 2013: 6645-6649. |
[53] | HOCHREITER S , SCHMIDHUBER J . Long short-term memory[J]. Neural Computation, 1997,9(8): 1735-1780. |
[54] | NIEBLES J C , WANG H , LI F F . Unsupervised learning of human action categories using spatial-temporal words[J]. International Journal of Computer Vision, 2008,79(3): 299-318. |
[55] | DONAHUE J , HENDRICKS L A , GUADARRAMA S ,et al. Long-term recurrent convolutional networks for visual recognition and description[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 2625-2634. |
[56] | NG Y H , HAUSKNECHT M , VIJAYANARASIMHAN S ,et al. Beyond short snippets:deep networks for video classification[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 4694-4702. |
[57] | YU S , CHENG Y , XIE L ,et al. A novel recurrent hybrid network for feature fusion in action recognition[J]. Journal of Visual Communication & Image Representation, 2017,49: 192-203. |
[58] | GROSS R , SHI J . The CMU motion of body (MoBo) database[J]. Monumenta Nipponica, 2001,45(4). |
[59] | SCHULDT C , LAPTEV I , CAPUTO B . Recognizing human actions:a local SVM approach[C]// International Conference on Pattern Recognition. 2004: 32-36. |
[60] | WEINLAND D , RONFARD R , BOYER E . Free viewpoint action recognition using motion history volumes[J]. Computer Vision & Image Understanding, 2011,104(2): 249-257. |
[61] | RODRIGUEZ M D , AHMED J , SHAH M . Action MACH a spatio-temporal maximum average correlation height filter for action recognition[C]// Conference on Computer Vision and Pattern Recognition. 2008: 1-8. |
[62] | MARSZALEK M , LAPTEV I , SCHMID C . Actions in context[C]// Conference on Computer Vision and Pattern Recognition. 2009: 2929-2936. |
[63] | SIGAL L , BALAN A O , BLACK M J . HumanEva:synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion[J]. International Journal of Computer Vision, 2006,87(1-2): 4-27. |
[64] | LIU J , LUO J , SHAH M . Recognizing realistic actions from videos in the wild[C]// Computer Vision and Pattern Recognition. 2009: 1996-2003. |
[65] | LI W , ZHANG Z , LIU Z . Action recognition based on a bag of 3D points[C]// Conference on Computer Vision and Pattern Recognition. 2010: 9-14. |
[66] | REDDY K K , SHAH M . Recognizing 50 human action categories of web videos[J]. Machine Vision & Applications, 2013,24(5): 971-981. |
[67] | ELLIS C , MASOOD S Z , TAPPEN M F ,et al. Exploring the trade-off between accuracy and observational latency in action recognition[J]. International Journal of Computer Vision, 2013,101(3): 420-436. |
[68] | PENG X , ZOU C , QIAO Y ,et al. Action recognition with stacked fisher vectors[C]// European Conference on Computer Vision. 2014: 581-595. |
[69] | DUTA I C , LONESCU B , AIZAWA K ,et al. Spatio-temporal VLAD encoding for human action recognition in videos[C]// International Conference on Multimedia Modeling. 2017: 365-378. |
[70] | BILEN H , FERNANDO B , GAVVES E ,et al. Action recognition with dynamic image networks[J]. IEEE Transactions on Pattern Analysis &Machine Intelligence, 2017,PP(99): 1. |
[71] | WU X , XU D , DUAN L ,et al. Action recognition using context and appearance distribution features[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2011: 489-496. |
[72] | LIU J , KUIPERS B , SAVARESE S . Recognizing human actions by attributes[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2011: 3337-3344. |
[73] | CORSO J J , . Action bank:a high-level representation of activity in video[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2012: 1234-1241. |
[74] | CHEN M , GONG L , WANG T ,et al. Action recognition using lie Algebrized Gaussians over dense local spatio-temporal features[J]. Multimedia Tools & Applications, 2015,74(6): 2127-2142. |
[75] | ZHANG Z , TAO D . Slow feature analysis for human action recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2012,34(3): 436-450. |
[76] | JI S , XU W , YANG M ,et al. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis &Machine Intelligence, 2013,35(1): 221-231. |
[77] | HASAN M , ROY-CHOWDHURY A K , . Continuous learning of human activity models using deep nets[C]// European Conference on Computer Vision. 2014: 705-720. |
[78] | SUN L , JIA K , CHAN T H ,et al. DL-SFA:deeply-learned slow feature analysis for action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2014: 2625-2632. |
[79] | JIANG Y G , DAI Q , XUE X ,et al. Trajectory-based modeling of human actions with motion reference points[C]// European Conference on Computer Vision. 2012: 425-438. |
[80] | WANG L M , QIAO Y , TANG X . Motionlets:mid-level 3d parts for human motion recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2013: 2674-2681. |
[81] | SUN L , JIA K , YEUNG D Y ,et al. Human action recognition using factorized spatio-temporal convolutional networks[C]// IEEE International Conference on Computer Vision. 2015: 4597-4605. |
[82] | WANG L , QIAO Y , TANG X . Action recognition with trajectory-pooled deep-convolutional descriptors[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 4305-4314. |
[83] | PARK E , HAN X , BERG T L ,et al. Combining multiple sources of knowledge in deep CNNs for action recognition[C]// IEEE Winter Conference on Applications of Computer Vision. 2016: 1-8. |
[84] | SOUZA C R D , GAIDON A , VIG E ,et al. Sympathy for the details:dense trajectories and hybrid classification architectures for action recognition[C]// European Conference on Computer Vision. 2016: 697-716. |
[85] | YU S , CHENG Y , SU S ,et al. Stratified pooling based deep convolutional neural networks for human action recognition[J]. Multimedia Tools & Applications, 2017,76(11): 13367-13382. |
[86] | MURTHY O V R , GOECKE R . Ordered trajectories for large scale human action recognition[C]// IEEE International Conference on Computer Vision. 2014: 412-419. |
[87] | PENG X , WANG L , QIAO Y ,et al. Boosting VLAD with supervised dictionary learning and high-order statistics[C]// European Conference on Computer Vision. 2014: 660-674. |
[88] | LAN Z , LIN M , LI X ,et al. Beyond gaussian pyramid:multi-skip feature stacking for action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2015: 204-212. |
[89] | FEICHTENHOFER C , PINZ A , WILDES R P . Spatiotemporal multip lier networks for video action recognition[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7445-7454. |
[1] | Ping ZHANG,Huiyong LIU,Wenjing LI,Fanqin ZHOU. Industrial intelligent network: deepening and upgrading of industrial Internet [J]. Journal on Communications, 2018, 39(12): 134-140. |
[2] | . Face recognition under unconstrained based on LBP and deep learning [J]. Journal on Communications, 2014, 35(6): 20-160. |
[3] | Shu-fen LIANG,Yin-hua LIU,Li-chen LI. Face recognition under unconstrained based on LBP and deep learning [J]. Journal on Communications, 2014, 35(6): 154-160. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|