面向Logo识别的合成数据生成方法研究

doi:10.11959/j.issn.2096-109x.2018043

摘要/Abstract

摘要：

针对深度学习框架下Logo识别任务中可训练样本稀疏的问题，提出了一种基于上下文的Logo数据合成算法，该算法综合利用了Logo对象内部、Logo周围邻域、Logo与其他对象之间以及Logo所处场景等多种类型的上下文信息指导Logo图像的合成。在FlickrLogos-32数据集上的实验结果显示，所提算法能够在不依赖额外手工标注的前提下，提升Logo识别算法的性能（mAP提升8.5%），验证了该合成算法的有效性。

关键词: Logo识别, 数据合成, 上下文, 深度学习, 数据增强

Abstract:

Aiming at the problem of training sample sparse in Logo recognition task under the deep learning framework,a Logo data synthesis algorithm based on contexts was proposed.The algorithm comprehensively utilizes various types of context information to guide the synthesis of Logo images,such as the interior of Logo object,the neighborhood of Logo object,the link between Logo object and other objects and the scene where Logo object lives in.The experimental results on the FlickrLogos-32 dataset show that the proposed algorithm can improve the performance of the Logo identification algorithm (mAP increase by 8.5%) without relying on additional manual annotation,verifying the effectiveness of the synthesis algorithm.

Key words: Logo recognition, data synthesis, context, deep learning, data augmentation

中图分类号:

TP391

江玉朝,吉立新,高超,李邵梅. 面向Logo识别的合成数据生成方法研究[J]. 网络与信息安全学报, 2018, 4(5): 21-31.

Yuchao JIANG,Lixin JI,Chao GAO,Shaomei LI. Research on synthesis data generation method for logo recognition[J]. Chinese Journal of Network and Information Security, 2018, 4(5): 21-31.

图/表 14

表1

图1

图2

图3

图4

图5

图6

图7

图8

图9

表2

本文合成算法实验结果及与文献[12]的对比"

					AP值
方法	训练集/测试集划分（每类图片数）	Adidas	Aldi	Apple	Becks	BMW	Carls	Chim	Coke	mAP
		Corona	DHL	Erdi	Esso	Fedex	Ferra	Ford	Fost
		Google	Guin	Hein	HP	Milka	Nvid	Paul	Pepsi
		Ritt	Shell	Sing	Starb	Stel	Texa	Tsin	Ups
		23.7	57.5	63.0	69.6	63.7	50.6	55.2	26.8
RealImg（[12]）	训练：10 真实	79.0	25.8	61.2	44.2	45.9	80.6	64.3	43.2	50.4%
	测试：60 真实	47.7	58.2	61.8	21.3	19.4	17.4	48.2	17.8
		34.8	45.8	71.8	70.2	79.6	56.7	56.9	52.2
		9.4	47.3	9.6	70.3	39.9	28.3	15.8	21.7
SynImg-32Cls	训练：100 合成	6.1	11.1	4.1	44.7	22.9	60.9	43.6	28.8	27.6%
（[12]）	测试：60 真实	23.0	16.7	43.1	9.9	4.6	1.1	39.1	9.7
		22.7	38.3	15.5	65.6	28.7	55.1	27.4	20.1
		26.8	63.7	65.8	72.7
SynImg-32Cls +	训练：100 合成	76.0	31.5	63.0	52.2				46.6	54.8%
RealImg（[12]）	精调：10 真实	58.0	52.6	65.2	23.2	24.0	12.5	54.1
	测试：60 真实	37.9		75.0		79.0	64.2	57.4	54.4
		24.4	57.2	66.6	72.0	70.8	42.8	55.3	24.8
RealImg（Ours）	训练：10 真实	82.8	29.5	62.5	44.1	42.7	87.2	59.3	39.9	50.5%
	测试：60 真实	51.2	54.6	65.1	24.2	15.5	16.9	52.3	17.4
		32.3	44.7	72.7	69.7	77.3	62.5	52.6	44.3
			54.0	34.6	44.1	17.6	18.4	33.9	20.6
SynImg-32Cls	训练：100 合成	8.2	21.5	12.1	49.6	16.6	28.2	35.4	50.3	32.6%
（Ours）	测试：60 真实	46.4	40.0	33.1	26.1	9.9	18.8	64.5	22.3
		39.4	45.5	15.8	55.5	17.6	47.5	38.7	39.9
		31.0				76.6	51.1	62.1	29.8
SynImg-32Cls+	训练：100 合成					49.0	87.9	76.9
RealImg（Ours）	精调：10 真实								22.8
	测试：60 真实		45.5		72.5
		36.2	61.3	30.1	46.9	25.3	30.8	32.2	22.3
SynImg-32Cls +	训练：100 合成 +10 真实	11.4	16.7	19.6	47.1	28.0	33.2	33.8	50.1	34.2%
RealImg（fusion）	（混合）	45.7	41.6	37.1	25.6	13.0	16.3	63.7	20.9
（Ours）	测试：60 真实	33.4	45.0	14.1	62.7	22.8	46.4	43.5	36.6
SynImg-32Cls +	训练：100 合成 +10 真实	31.1	71.2	71.6	71.7	84.4	48.0	66.3	28.5
RealImg（fusion）	（混合）	84.4	33.3	82.0	51.7	51.0	90.3	76.4	54.8	58.9%
+RealImg（Ours）	精调：10 真实	59.6	67.8	69.4	37.8	23.2	22.9	66.0	23.6
	测试：60 真实	40.6	46.0	75.3	77.2	83.5	67.8	63.6	64.2

表2

图10

表3

图11

参考文献 29

[1]	符亚彬 . 基于 Logo 标志检测的暴恐视频识别系统的设计与实现[D]. 北京:北京交通大学, 2016.
	FU Y B . Design and implementation of violence and fear video recognition system based on Logo mark detection[D]. Beijing:Beijing Jiaotong University, 2016.
[2]	GAO Y , WANG F , LUAN H ,et al. Brand data gathering from live social media streams[C]// ACM International Conference on Multimedia Retrieval. 2014:169.
[3]	PAN C , YAN Z , XU X ,et al. Vehicle logo recognition based on deep learning architecture in video surveillance for intelligent traffic system[C]// IET International Conference on Smart and Sustainable City. 2013: 123-126.
[4]	HE K , GKIOXARI G , DOLLAR P ,et al. Mask R-CNN[C]// IEEE International Conference on Computer Vision. 2017: 2980-2988.
[5]	WANG X , SHRIVASTAVA A , GUPTA A . A-Fast-RCNN:hard positive generation via adversary for object detection[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2017: 3039-3048.
[6]	LIU W , ANGUELOV D , ERHAN D ,et al. SSD:single shot MultiBox detector[M]// Computer Vision-ECCV 2016. Springer International Publishing, 2016: 21-37.
[7]	JOLY A , BUISSON O . Logo retrieval with a contrario visual query expansion[C]// International Conference on Multimedia 2009. 2009: 581-584.
[8]	KALANTIDIS Y , PUEYO L G , TREVISIOL M ,et al. Scalable triangulation-based logo recognition[C]// ACM International Conference on Multimedia Retrieval. 2011: 1-7.
[9]	ROMBERG S , PUEYO L G , LIENHART R ,et al. Scalable logo recognition in real-world images[C]// ACM International Conference on Multimedia Retrieval. 2011:25.
[10]	HOI S C H , WU X , LIU H ,et al. LOGO-Net:Large-scale deep logo detection and brand recognition with deep region-based convolutional networks[J]. IEEE Transactions on Pattern Analysis ＆Machine Intelligence, 2015,46(5): 2403-2412.
[11]	BIANCO S , BUZZELLI M , MAZZINI D ,et al. Deep learning for logo recognition[J]. Neuro Computing, 2017,245(C): 23-30.
[12]	SU H , ZHU X , GONG S . Deep learning logo detection with data expansion by synthesising context[C]// IEEE Winter Conference on Applications of Computer Vision. 2017: 530-539.
[13]	CHEN X , GUPTA A . Webly supervised learning of convolutional networks[C]// IEEE International Conference on Computer Vision. 2016: 1431-1439.
[14]	SHRIVASTAVA A , GUPTA A , GIRSHICK R . Training region-based object detectors with online hard example mining[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). 2016: 761-769.
[15]	GUPTA A , VEDALDI A , ZISSERMAN A . Synthetic data for text localisation in natural images[C]// IEEE Computer Vision and Pattern Recognition. 2016: 2315-2324.
[16]	JADERBERG M , SIMONYAN K , VEDALDI A ,et al. Reading text in the wild with convolutional neural networks[J]. International Journal of Computer Vision, 2016,116(1): 1-20.
[17]	GEORGAKIS G , MOUSAVIAN A , BERG A C ,et al. Synthesizing training data for object detection in indoor scenes[C]// Robotics:Science and Systems. 2017.
[18]	EGGERT C , WINSCHEL A , LIENHART R . On the benefit of synthetic data for company logo detection[C]// ACM International Conference on Multimedia. 2015: 1283-1286.
[19]	REN S , HE K , GIRSHICK R ,et al. Faster R-CNN:towards real-time object detection with region proposal networks[C]// International Conference on Neural Information Processing Systems. 2015: 91-99.
[20]	BENGIO Y , COLLOBERT R , WESTON J . Curriculum learning[C]// ACM International Conference on Machine Learning. 2009: 41-48.
[21]	LIU B . Modest proposal for the principle of logo design[J]. Packaging Engineering, 2005,127(2): 222-222.
[22]	OLIVA A , TORRALBA A . The role of context in object recognition[J]. Trends in Cognitive Sciences, 2007,11(12): 520.
[23]	MOTTAGHI R , CHEN X , LIU X ,et al. The role of context for object detection and semantic segmentation in the wild[C]// IEEE Computer Vision and Pattern Recognition. 2014: 891-898.
[24]	KATTI H , PEELEN M V , ARUN S P . How do targets,nontargets,and scene context influence real-world object detection?[J]. Attention Perception ＆ Psychophysics, 2017(2): 1-16.
[25]	ZHOU B , LAPEDRIZA A , KHOSLA A ,et al. Places:a 10 million image database for scene recognition[J]. IEEE Trans Pattern Anal Mach Intell, 2017,99: 1-1.
[26]	GUO J , GOULD S . Deep CNN ensemble with data augmentation for object detection[J]. Computer Science, 2015.
[27]	OLIVEIRA G , FRAZ?O X , PIMENTEL A ,et al. Automatic graphic logo detection via fast region-based convolutional networks[C]// IEEE International Joint Conference on Neural Networks. 2016.
[28]	MUNNEKE J , BRENTARI V , PEELEN M . The influence of scene context on object recognition is independent of attentional focus[J]. Frontiers in Psychology, 2013,4(8): 552.
[29]	NGUYEN H V , HO H T , PATEL V M ,et al. DASH-N:joint hierarchical domain adaptation and feature learning[J]. IEEE Transactions on Image Processing, 2015,24(12): 5479-5491.

数据集	Logo类别数	对象总数	图像总数	是否公开
BelgaLogos^[7]	37	2 695	1 951	是
FlickrLogos-27^[8]	27	4 671	1 080	是
FlickrLogos-32^[9]	32	3 404 注1	2 240	是
LOGO-NET^[10]	160	130 608	73 414	否
Logos-32plus^[11]	32	12 302	7 830	是
TopLogo10^[12]	10	863	700	是
注1 已与文献[12]的作者取得联系，证实文献[12]中关于FlickrLogos-32数据集对象数量的统计有误，正确数量为3 404。

				AP值
	Adidas	Aldi	Apple	Becks	BMW	Carls	Chim	Coke
方法	Corona	DHL	Erdi	Esso	Fedex	Ferra	Ford	Fost	mAP
	Google	Guin	Hein	HP	Milka	Nvid	Paul	Pepsi
	Ritt	Shell	Sing	Starb	Stel	Texa	Tsin	Ups
	31.0	69.0	74.3		76.6	51.1		29.8
SynImg-32Cls+		35.3	70.9	53.5	49.0		76.9	52.9
RealImg	61.7	66.9	69.3		25.8			22.8
（Our Baseline）	41.8	45.5		72.5	79.7	69.2	62.1
	32.8	69.7		77.6	76.8		61.8	29.1
Transparent Only+	88.0	35.9		52.9	43.9	87.5	76.3	47.8	57.8%
RealImg	59.3	61.5	68.3	33.2	24.8	24.3	67.4	18.5
		47.5	78.6	73.6	80.3	69.2	61.3	61.2
	32.7	66.7	73.8	75.5	77.4	48.3	59.7
Pixel-level Only+	87.5		71.7	53.6		87.7	72.2	53.5	58.1%
RealImg		65.9	71.3	29.7		25.4	62.9
	36.5		75.4		79.3	66.6	63.4	59.3
	28.5	65.2	68.2	74.5		46.6	60.0	30.8
Random Context+	86.4	33.5	70.1		50.4	83.5	75.9	52.2	56.7%
RealImg	61.6	63.3	69.8	28.5	26.3	22.5	62.6	23.8
	35.6	47.2	74.6	74.0	79.5		61.0	56.1
	30.5	63.7	72.0	73.3	73.4	49.0	57.2	27.1
No Logo Transform+	85.1	33.9	68.1	52.8	47.6	85.0	75.4	48.6	56.3%
RealImg	56.2	62.2	69.1	33.9	24.0	26.0	67.6	23.6
	34.0	46.3	73.5	72.9	80.5	69.0	61.8	58.4
			73.6	74.6	77.0	50.0	61.3	30.5
Random Position+	83.0	35.8	71.7	53.9	47.0	86.6			58.2%
RealImg	57.9			33.7	20.4	24.9	64.3	24.1
	41.7	47.5	77.3	74.4		68.4		60.2