通信学报 ›› 2024, Vol. 45 ›› Issue (1): 63-76.doi: 10.11959/j.issn.1000-436x.2024008
• 学术论文 • 上一篇
李正欣1,2, 胡钢1, 张凤鸣1, 张晓丰1, 赵永梅1
修回日期:
2023-07-04
出版日期:
2024-01-01
发布日期:
2024-01-01
作者简介:
李正欣(1982- ),男,河南信阳人,博士,空军工程大学副教授、硕士生导师,主要研究方向为时间序列模式识别、数据挖掘与机器学习等基金资助:
Zhengxin LI1,2, Gang HU1, Fengming ZHANG1, Xiaofeng ZHANG1, Yongmei ZHAO1
Revised:
2023-07-04
Online:
2024-01-01
Published:
2024-01-01
Supported by:
摘要:
针对传统降维方法不能直接应用于多元时间序列,现有的多元时间序列降维方法难以在保证降维有效性的同时大幅降低数据维度的问题,提出一种基于全局-局部散度的多元时间序列无监督降维方法。首先,提出一种特征序列提取方法,提取多元时间序列协方差矩阵的上三角元素,将其组合为特征序列。然后,以“局部散度最小、全局散度最大”为基本思想,提出一种无监督降维模型,在保持局部近邻关系的同时,尽可能保留全局信息。将特征序列作为输入,最小化所有样本点邻域方差之和,最大化邻域中心点方差。求解模型得到的投影矩阵能够实现多元时间序列的降维。最后,在 20 组公开数据集上,对所提方法进行了实验验证。结果表明,所提方法能够在保证降维有效性的同时,较大幅度地降低多元时间序列的维度。
中图分类号:
李正欣, 胡钢, 张凤鸣, 张晓丰, 赵永梅. 基于全局-局部散度的多元时间序列无监督降维方法[J]. 通信学报, 2024, 45(1): 63-76.
Zhengxin LI, Gang HU, Fengming ZHANG, Xiaofeng ZHANG, Yongmei ZHAO. Unsupervised dimensionality reduction method for multivariate time series based on global and local scatter[J]. Journal on Communications, 2024, 45(1): 63-76.
表1
MTS数据集信息"
编号 | 数据集 | 类别 | 特征维度 | 序列长度 | 序列均长 | 样本数 |
1 | LP1 | 4 | 6 | 15 | 15 | 88 |
2 | LP2 | 5 | 6 | 15 | 15 | 47 |
3 | LP3 | 4 | 6 | 15 | 15 | 47 |
4 | LP4 | 5 | 6 | 15 | 15 | 47 |
5 | LP5 | 4 | 6 | 15 | 15 | 47 |
6 | DSA | 19 | 45 | 125 | 125 | 2 400 |
7 | FM | 2 | 28 | 50 | 50 | 416 |
8 | HMD | 4 | 10 | 400 | 400 | 234 |
9 | NATOPS | 6 | 24 | 51 | 51 | 360 |
10 | Cricket | 12 | 6 | 1197 | 1197 | 180 |
11 | RS | 4 | 6 | 30 | 30 | 303 |
12 | Epilepsy | 4 | 3 | 206 | 206 | 275 |
13 | BM | 4 | 6 | 100 | 100 | 80 |
14 | LSST | 14 | 6 | 36 | 36 | 4 925 |
15 | AWR | 25 | 9 | 144 | 144 | 575 |
16 | EEGeye | 2 | 14 | 20~2 401 | 624 | 24 |
17 | Wafer | 2 | 6 | 104~198 | 137 | 1 194 |
18 | WR | 2 | 62 | 128~1 918 | 368 | 44 |
19 | KP | 2 | 62 | 274~841 | 427 | 26 |
20 | ASL | 95 | 22 | 45~136 | 57 | 2 565 |
表2
降维有效性实验结果"
数据集编号 | GLSUP | PCA | CPCA | PPCA | SVD_LPP | PBLDA | VPCA |
1 | 0.81 | 0.76 | 0.83 | 0.85 | 0.64 | 0.60 | 0.90 |
2 | 0.64 | 0.66 | 0.66 | 0.60 | 0.55 | 0.51 | 0.66 |
3 | 0.70 | 0.68 | 0.68 | 0.62 | 0.53 | 0.53 | 0.68 |
4 | 0.91 | 0.89 | 0.85 | 0.84 | 0.86 | 0.78 | 0.91 |
5 | 0.62 | 0.63 | 0.65 | 0.60 | 0.58 | 0.51 | 0.63 |
6 | 0.99 | 0.48 | 0.99 | 0.50 | 0.99 | 0.82 | 0.78 |
7 | 0.54 | 0.46 | 0.50 | 0.56 | 0.53 | 0.52 | 0.50 |
8 | 0.30 | 0.30 | 0.29 | 0.30 | 0.32 | 0.29 | 0.28 |
9 | 0.78 | 0.73 | 0.78 | 0.73 | 0.57 | 0.21 | 0.83 |
10 | 0.98 | 0.67 | 0.78 | 0.67 | 0.87 | 1.00 | 0.96 |
11 | 0.79 | 0.55 | 0.76 | 0.56 | 0.57 | 0.60 | 0.81 |
12 | 0.88 | 0.44 | 0.59 | 0.47 | 0.78 | 0.42 | 0.59 |
13 | 1.00 | 0.69 | 0.70 | 0.69 | 0.64 | 0.51 | 0.73 |
14 | 0.44 | 0.35 | 0.41 | 0.34 | 0.45 | 0.33 | 0.36 |
15 | 0.91 | 0.58 | 0.95 | 0.55 | 0.60 | 0.11 | 0.96 |
16 | 0.79 | 0.75 | 0.79 | 0.50 | 0.75 | 0.50 | — |
17 | 0.98 | 0.98 | 0.98 | 0.97 | 0.98 | 0.93 | — |
18 | 0.96 | 0.89 | 0.98 | 0.73 | 0.75 | 0.93 | — |
19 | 0.85 | 0.69 | 0.80 | 0.62 | 0.62 | 0.65 | — |
20 | 0.93 | 0.31 | 0.49 | 0.57 | 0.73 | 0.50 | — |
表3
降维幅度实验结果"
数据集编号 | GLSUP | PCA | CPCA | PPCA | SVD_LPP | PBLDA | VPCA |
1 | 0.96 | 0.50 | 0.83 | 0.50 | 0.97 | 0.83 | 0.80 |
2 | 0.90 | 0.50 | 0.67 | 0.50 | 0.94 | 0.67 | 0.93 |
3 | 0.93 | 0.50 | 0.67 | 0.50 | 0.94 | 0.17 | 0.93 |
4 | 0.91 | 0.50 | 0.83 | 0.50 | 0.94 | 0.83 | 0.80 |
5 | 0.94 | 0.50 | 0.83 | 0.50 | 0.94 | 0.67 | 0.80 |
6 | 0.96 | 0.51 | 0.84 | 0.51 | 1.00 | 0.89 | 0.99 |
7 | 0.75 | 0.71 | 0.71 | 0.61 | 1.00 | 0.14 | 0.98 |
8 | 0.99 | 0.70 | 0.80 | 0.70 | 1.00 | 0.20 | 0.94 |
9 | 0.95 | 0.92 | 0.92 | 0.88 | 0.99 | 0.92 | 0.96 |
10 | 1.00 | 0.33 | 0.67 | 0.33 | 1.00 | 0.83 | 1.00 |
11 | 0.93 | 0.33 | 0.33 | 0.33 | 0.97 | 0.00 | 0.77 |
12 | 1.00 | 0.00 | 0.33 | 0.00 | 1.00 | 0.67 | 0.87 |
13 | 1.00 | 0.33 | 0.50 | 0.33 | 0.99 | 0.00 | 0.86 |
14 | 0.91 | 0.33 | 0.67 | 0.33 | 0.97 | 0.33 | 0.97 |
15 | 0.98 | 0.56 | 0.44 | 0.56 | 0.99 | 0.11 | 0.99 |
16 | 0.99 | 0.64 | 0.86 | 0.64 | 1.00 | 0.64 | — |
17 | 0.98 | 0.67 | 0.67 | 0.67 | 0.99 | 0.00 | — |
18 | 1.00 | 0.92 | 0.92 | 0.94 | 1.00 | 0.90 | — |
19 | 1.00 | 0.94 | 0.94 | 0.95 | 1.00 | 0.87 | — |
20 | 0.88 | 0.86 | 0.86 | 0.86 | 0.98 | 0.00 | — |
注:降维幅度1.00为保留两位小数的四舍五入结果 |
表4
时间代价实验结果"
数据集 | GLSUP | PCA | CPCA | PPCA | |||||||||||
降维时间代价/s | 分类时间代价/s | 总时间代价/s | 降维时间代价/s | 分类时间代价/s | 总时间代价/s | 降维时间代价/s | 分类时间代价/s | 总时间代价/s | 降维时间代价/s | 分类时间代价/s | 总时间代价/s | ||||
DSA | 17.47 | 2.68 | 20.15 | 1.35 | 386.24 | 387.59 | 0.16 | 82.85 | 83.01 | 417.77 | 414.56 | 832.32 | |||
LSST | 0.25 | 7.55 | 7.80 | 0.09 | 108.52 | 108.61 | 0.03 | 54.41 | 54.43 | 0.15 | 130.87 | 131.03 | |||
HMD | 0.04 | 0.03 | 0.07 | 0.02 | 1.93 | 1.95 | 0.01 | 1.51 | 1.52 | 0.02 | 1.42 | 1.44 | |||
Cricket | 0.04 | 0.01 | 0.05 | 0.02 | 1.92 | 1.93 | 0.01 | 1.91 | 1.92 | 0.02 | 1.62 | 1.64 | |||
FM | 0.77 | 0.21 | 0.98 | 0.05 | 1.83 | 1.87 | 0.01 | 1.81 | 1.82 | 0.06 | 2.73 | 2.79 | |||
NATOPS | 0.24 | 0.05 | 0.29 | 0.03 | 0.33 | 0.36 | 0.00 | 0.33 | 0.34 | 0.04 | 0.58 | 0.62 |
"
数据集 | SVD_LPP | PBLDA | VPCA | ||||||||
降维时间代价/s | 分类时间代价/s | 总时间代价/s | 降维时间代价/s | 分类时间代价/s | 总时间代价/s | 降维时间代价/s | 分类时间代价/s | 总时间代价/s | |||
DSA | 1.84 | 2.60 | 4.44 | 26.41 | 48.08 | 74.49 | 4.31 | 1.96 | 6.27 | ||
LSST | 0.26 | 10.90 | 11.16 | 5.72 | 98.83 | 104.55 | 0.67 | 7.19 | 7.86 | ||
HMD | 0.37 | 0.03 | 0.39 | 4.50 | 2.39 | 6.89 | 0.15 | 0.64 | 0.79 | ||
Cricket | 1.23 | 0.02 | 1.25 | 5.75 | 0.09 | 5.84 | 0.73 | 0.10 | 0.82 | ||
FM | 0.07 | 0.08 | 0.15 | 0.46 | 6.10 | 6.55 | 0.08 | 0.06 | 0.14 | ||
NATOPS | 0.05 | 0.07 | 0.11 | 0.11 | 0.33 | 0.45 | 0.06 | 0.24 | 0.30 |
[1] | DHAR V , SUN C S , BATRA P . Transforming finance into vision:concurrent financial time series as convolutional nets[J]. Big Data, 2019,7(4): 276-285. |
[2] | KANAVOS A , KOUNELIS F , ILIADIS L ,et al. Deep learning models for forecasting aviation demand time series[J]. Neural Computing and Applications, 2021,33(23): 16329-16343. |
[3] | 李正欣, 张凤鸣, 李克武 ,等. 一种支持DTW距离的多元时间序列索引结构[J]. 软件学报, 2014,25(3): 560-575. |
LI Z X , ZHANG F M , LI K W ,et al. Index structure for multivariate time series under DTW distance metric[J]. Journal of Software, 2014,25(3): 560-575. | |
[4] | MARIN Z P A , ROTH S , SCHMUTZLER D ,et al. Self-supervised feature extraction from image time series in plant phenotyping using triplet networks[J]. Bioinformatics, 2021,37(6): 861-867. |
[5] | ZHU H G , XIAO R Y , ZHANG J P ,et al. A driving behavior risk classification framework via the unbalanced time series samples[J]. IEEE Transactions on Instrumentation and Measurement, 2022,71:2503312. |
[6] | CHEN Y , MANCHESTER W B , HERO A O ,et al. Identifying solar flare precursors using time series of SDO/HMI images and SHARP parameters[J]. Space Weather, 2019,17(10): 1404-1426. |
[7] | LIU Y , GAO J , CAO W ,et al. A hybrid double-density dual-tree discrete wavelet transformation and marginal Fisher analysis for scoring sleep stages from unprocessed single-channel electroencephalogram[J]. Quantitative Imaging in Medicine and Surgery, 2020,10(3): 766-778. |
[8] | 张伟, 王志海, 原继东 ,等. 一种时间序列鉴别性特征字典构建算法[J]. 软件学报, 2020,31(10): 3216-3237. |
ZHANG W , WANG Z H , YUAN J D ,et al. Time series discriminative feature dictionary construction algorithm[J]. Journal of Software, 2020,31(10): 3216-3237. | |
[9] | HUANG X , WU L , YE Y S . A review on dimensionality reduction techniques[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2019,33(10): 1950017. |
[10] | RAY P , REDDY S S , BANERJEE T . Various dimension reduction techniques for high dimensional data analysis:a review[J]. Artificial Intelligence Review, 2021,54(5): 3473-3515. |
[11] | JIA W K , SUN M L , LIAN J ,et al. Feature dimensionality reduction:a review[J]. Complex & Intelligent Systems, 2022(8): 2663-2693. |
[12] | HE H , TAN Y H . Unsupervised classification of multivariate time series using VPCA and fuzzy clustering with spatial weighted matrix distance[J]. IEEE Transactions on Cybernetics, 2020,50(3): 1096-1105. |
[13] | KUMAR G , SINGH U P , JAIN S . Hybrid evolutionary intelligent system and hybrid time series econometric model for stock price forecasting[J]. International Journal of Intelligent Systems, 2021,36(9): 4902-4935. |
[14] | WAN X J , LI H L , ZHANG L P ,et al. Dimensionality reduction for multivariate time-series data mining[J]. The Journal of Supercomputing, 2022,78(7): 9862-9878. |
[15] | LIN W , HUANG J Z , MCELROY T . Time series seasonal adjustment using regularized singular value decomposition[J]. Journal of Business& Economic Statistics, 2020,38(3): 487-501. |
[16] | LIU C H , JAJA J , PESSOA L . LEICA:Laplacian eigenmaps for group ICA decomposition of fMRI data[J]. NeuroImage, 2018,169: 363-373. |
[17] | ZHAO J H , SUN F , LIANG H Y ,et al. Pseudo bidirectional linear discriminant analysis for multivariate time series classification[J]. IEEE Access, 2021,9: 88674-88684. |
[18] | POSPELOV N , TETEREVA A , MARTYNOVA O ,et al. The Laplacian eigenmaps dimensionality reduction of fMRI data for discovering stimulus-induced changes in the resting-state brain activity[J]. Neuroimage:Reports, 2021,1(3): 100035. |
[19] | WENG X Q , SHEN J Y . Classification of multivariate time series using locality preserving projections[J]. Knowledge-Based Systems, 2008,21(7): 581-587. |
[20] | WENG X Q . Classification of multivariate time series using supervised locality preserving projection[C]// Proceedings of the Third International Conference on Intelligent System Design and Engineering Applications. Piscataway:IEEE Press, 2013: 428-431. |
[21] | 董红玉, 陈晓云 . 基于奇异值分解和判别局部保持投影的多变量时间序列分类[J]. 计算机应用, 2014,34(1): 239-243. |
DONG H Y , CHEN X Y . Classification of multivariate time series based on singular value decomposition and discriminant locality preserving projection[J]. Journal of Computer Applications, 2014,34(1): 239-243. | |
[22] | YAO B B , SU J , WU L F ,et al. Modified local linear embedding algorithm for rolling element bearing fault diagnosis[J]. Applied Sciences, 2017,7(11): 1178. |
[23] | 胡钢, 李正欣, 张凤鸣 ,等. 二维类间边界 Fisher 分析的多元时间序列降维[J]. 北京航空航天大学学报, 2023,49(12): 3537-3546. |
HU G , LI Z X , ZHANG F M ,et al. Dimension reduction of multivariate time series based on two-dimensional inter-class marginal Fisher analysis[J]. Journal of Beijing University of Aeronautics and Astronautics, 2023,49(12): 3537-3546. | |
[24] | KARAMITOPOULOS L , EVANGELIDIS G , DERVOS D . PCA-based time series similarity search[M]. Berlin: Springer, 2010. |
[25] | 李正欣, 郭建胜, 惠晓滨 ,等. 基于共同主成分的多元时间序列降维方法[J]. 控制与决策, 2013,28(4): 531-536. |
LI Z X , GUO J S , HUI X B ,et al. Dimension reduction method for multivariate time series based on common principal component[J]. Control and Decision, 2013,28(4): 531-536. | |
[26] | LI H L . Accurate and efficient classification based on common principal components analysis for multivariate time series[J]. Neurocomputing, 2016,171: 744-753. |
[27] | 李正欣, 张凤鸣, 张晓丰 ,等. 多元时间序列特征降维方法研究[J]. 小型微型计算机系统, 2013,34(2): 338-344. |
LI Z X , ZHANG F M , ZHANG X F ,et al. Research on feature dimension reduction method for multivariate time series[J]. Journal of Chinese Computer Systems, 2013,34(2): 338-344. | |
[28] | SUNDARARAJAN R R . Principal component analysis using frequency components of multivariate time series[J]. Computational Statistics & Data Analysis, 2021,157:107164. |
[29] | 李海林 . 基于变量相关性的多元时间序列特征表示[J]. 控制与决策, 2015,30(3): 441-447. |
LI H L . Feature representation of multivariate time series based on correlation among variables[J]. Control and Decision, 2015,30(3): 441-447. | |
[30] | NIE F P , ZHU W , LI X L . Unsupervised large graph embedding based on balanced and hierarchical K-means[J]. IEEE Transactions on Knowledge and Data Engineering, 2022,34(4): 2008-2019. |
[31] | WANG Z , ZHANG L , WANG B J . Sparse modified marginal fisher analysis for facial expression recognition[J]. Applied Intelligence, 2019,49(7): 2659-2671. |
[32] | ZHANG S J , MA Z M , ZHANG G K ,et al. Dimensionality reduction based on multilocal linear pattern preservation[J]. IEEE Transactions on Knowledge and Data Engineering, 2022,34(4): 1696-1709. |
[33] | WANG H , NIE F P , HUANG H . Globally and locally consistent unsupervised projection[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2014,28(1): 1328-1333. |
[34] | CHEN J H , WAN Y , WANG X Y ,et al. Learning-based shapelets discovery by feature selection for time series classification[J]. Applied Intelligence, 2022,52(8): 9460-9475. |
[35] | BAGNALL A , DAU H A , LINES J ,et al. The UEA multivariate time series classification archive[J]. arXiv Preprint,arXiv:1811.00075, 2018. |
[36] | MARKELLE K , RACHEL L , KOLBY N . The UCI machine learning repository[R]. 2023. |
[1] | 郭璠, 李小虎, 刘文韬, 唐琎. 基于参数回归的快速全景图像拼接算法[J]. 通信学报, 2023, 44(9): 36-47. |
[2] | 江沸菠, 彭于波, 董莉. 面向6G的深度图像语义通信模型[J]. 通信学报, 2023, 44(3): 198-208. |
[3] | 兰巨龙, 朱棣, 李丹. 面向多模态网络业务切片的虚拟网络功能资源容量智能预测方法[J]. 通信学报, 2022, 43(6): 143-155. |
[4] | 王晓丹, 李京泰, 宋亚飞. DDAC:面向卷积神经网络图像隐写分析模型的特征提取方法[J]. 通信学报, 2022, 43(5): 68-81. |
[5] | 来杰, 王晓丹, 向前, 宋亚飞, 权文. 自编码器及其应用综述[J]. 通信学报, 2021, 42(9): 218-230. |
[6] | 肖利民,徐向荣,韦壮焜,刘圣涵,刘怡文. 基于信道冲激响应不敏感特征的分子通信非相干信号检测[J]. 通信学报, 2020, 41(9): 49-58. |
[7] | 顾纯祥,吴伟森,石雅男,李光松. 基于自编码器的未知协议分类方法[J]. 通信学报, 2020, 41(6): 88-97. |
[8] | 屈景怡,叶萌,渠星. 基于区域残差和LSTM网络的机场延误预测模型[J]. 通信学报, 2019, 40(4): 149-159. |
[9] | 盖杉. 四元共空间特征提取算法及其在纸币识别中的应用[J]. 通信学报, 2018, 39(12): 40-46. |
[10] | 沈伟国,王巍. 基于顽健线性判别分析的击键特征识别方法[J]. 通信学报, 2017, 38(Z2): 26-29. |
[11] | 李华亮,钱志鸿,田洪亮. 基于核函数特征提取的室内定位算法研究[J]. 通信学报, 2017, 38(1): 158-167. |
[12] | 徐小琳1,2,3,4,云晓春1,2,3,4,周勇林4 ,康学斌5. 基于特征聚类的海量恶意代码在线自动分析模型[J]. 通信学报, 2013, 34(8): 19-153. |
[13] | 徐小琳,云晓春,周勇林,康学斌. 基于特征聚类的海量恶意代码在线自动分析模型[J]. 通信学报, 2013, 34(8): 146-153. |
[14] | 王变琴,余顺争. 自适应网络应用特征发现方法[J]. 通信学报, 2013, 34(4): 127-137. |
[15] | 王变琴1,2,余顺争1. 自适应网络应用特征发现方法[J]. 通信学报, 2013, 34(4): 15-137. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|