大数据 ›› 2021, Vol. 7 ›› Issue (5): 111-130.doi: 10.11959/j.issn.2096-0271.2021052
孔芳1, 杨悦然1, 陈卫2, 李帅1
出版日期:
2021-09-15
发布日期:
2021-09-01
作者简介:
孔芳(1998- ),女,上海交通大学电子信息与电气工程学院博士生,主要研究方向为组合在线学习、在线影响力最大化等基金资助:
Fang KONG1, Yueran YANG1, Wei CHEN2, Shuai LI1
Online:
2021-09-15
Published:
2021-09-01
Supported by:
摘要:
组合在线学习问题研究如何在与环境的交互过程中学习未知参数,逐步找到最优的目标组合。该问题有丰富的应用场景,如广告投放、搜索和推荐等。首先阐述了组合在线学习问题的定义及其框架——组合多臂老虎机问题,归纳了此框架下的经典算法和研究进展;然后具体介绍了该问题的两个实际应用——在线影响力最大化和在线排序学习问题,以及其研究进展;最后展望了组合在线学习问题的未来研究方向。
中图分类号:
孔芳, 杨悦然, 陈卫, 李帅. 基于优化反馈的组合在线学习[J]. 大数据, 2021, 7(5): 111-130.
Fang KONG, Yueran YANG, Wei CHEN, Shuai LI. Combinatorial online learning based on optimizing feedbacks[J]. Big Data Research, 2021, 7(5): 111-130.
[1] | ALON N , CESA-BIANCHI N ,, DEKEL O ,et al. Online learning with feedback graphs:beyond bandits[J]. arXiv preprint,2015,arXiv:1502.07617. |
[2] | LYKOURIS T , TARDOS é ,, WALI D . Feedback graph regret bounds for Thompson sampling and UCB[C]// Proceedings of the 31st International Conference on Algorithmic Learning Theory.[S.l.:s.n.], 2020. |
[3] | BARTóK G , FOSTER D P , PáL D , ,et al. Partial monitoring-classification,regret bounds,and algorithms[J]. Mathematics of Operations Research, 2014,39(4): 967-997. |
[4] | AUER P , CESA-BIANCHI N , FISCHER P . Finite-time analysis of the multiarmed bandit problem[J]. Machine Learning, 2002,47(2): 235-256. |
[5] | THOMPSON W R . On the likelihood that one unknown probability exceeds another in view of the evidence of two samples[J]. Biometrika, 1933,25(3/4): 285-294. |
[6] | BUBECK S , MUNOS R , STOLTZ G . Pure exploration in multi-armed bandits problems[M]// Lecture Notes in Computer Science. Heidelberg: Springer, 2009: 23-37. |
[7] | AUDIBERT J Y , BUBECK S , MUNOS R . Best arm identification in multi-armed bandits[C]// Proceedings of the 23rd Annual Conference on Learning Theory.[S.l.:s.n.], 2010: 41-53. |
[8] | KARNIN Z , KOREN T , SOMEKH O . Almost optimal exploration in multiarmed bandits[C]// Proceedings of the International Conference on Machine Learning. New York:ACM Press, 2013: 1238-1246. |
[9] | GARIVIER A , KAUFMANN E . Optimal best arm identification with fixed confidence[C]// Proceedings of the 29th Annual Conference on Learning Theory.[S.l.:s.n.], 2016. |
[10] | WU Y F , SHARIFF R , LATTIMORE T ,et al. Conservative bandits[C]// Proceedings of the 33rd International Conference on Machine Learning. New York:ACM Press, 2016. |
[11] | AUER P , CESA-BIANCHI N ,, FREUND Y ,et al. The nonstochastic multiarmed bandit problem[J]. SIAM Journal on Computing, 2002,32(1): 48-77. |
[12] | LI L H , CHU W , LANGFORD J ,et al. A contextual-bandit approach to personalized news article recommendation[C]// Proceedings of the 19th International Conference on World Wide Web. New York:ACM Press, 2010. |
[13] | GAI Y , KRISHNAMACHARI B , JAIN R . Learning multiuser channel allocations in cognitive radio networks:a combinatorial multi-armed bandit formulation[C]// Proceedings of 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum. Piscataway:IEEE Press, 2010: 1-9. |
[14] | CHEN W , HU W , LI F ,et al. Combinatorial multi-armed bandit with general reward functions[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems.[S.l.:s.n.], 2016. |
[15] | WANG Q S , CHEN W . Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York:ACM Press, 2017: 1161-1171. |
[16] | CHEN W , WANG Y J , YUAN Y . combinatorial multi-armed bandit:general framework and applications[C]// Proceedings of the 30th International Conference on Machine Learning.[S.l.:s.n.], 2013: 151-159. |
[17] | GAI Y , KRISHNAMACHARI B , JAIN R . Combinatorial network optimization with unknown variables:multi-armed bandits with linear rewards and individual observations[J]. IEEE/ACM Transactions on Networking, 2012,20(5): 1466-1478. |
[18] | CHEN W , WANG Y J , YUAN Y ,et al. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms[J]. Journal of Machine Learning Research, 2016,17(1): 1746-1778. |
[19] | ABBASI-YADKORI Y , PáL D , SZEPESVáRI C . Improved algorithms for linear stochastic bandits[C]// Proceedings of the 24th International Conference on Neural Information Processing Systems. New York:ACM Press, 2011: 2312-2320. |
[20] | DANI V , HAYES T P , KAKADE S M . Stochastic linear optimization under bandit feedback[Z]. 2008. |
[21] | RUSMEVICHIENTONG P , TSITSIKLIS J N . Linearly parameterized bandits[J]. Mathematics of Operations Research, 2010,35(2): 395-411. |
[22] | KVETON B , WEN Z , ASHKAN A ,et al. Matroid bandits:fast combinatorial optimization with learning[C]// Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence.[S.l.:s.n.], 2014. |
[23] | CHEN S , LIN T , KING I ,et al. Combinatorial pure exploration of multiarmed bandits[C]// Proceedings of the 28th Conference on Neural Information Processing Systems.[S.l.:s.n.], 2014: 379-387. |
[24] | GABILLON V , LAZARIC A , GHAVAMZADEH M ,et al. Improved learning complexity in combinatorial pure exploration bandits[C]// Proceedings of the 19th International Conference on Artificial Intelligence and Statistics.[S.l.:s.n.], 2016. |
[25] | KUROKI Y , XU L Y , MIYAUCHI A ,et al. Polynomial-time algorithms for multiplearm identification with full-bandit feedback[J]. Neural Computation, 2020,32(9): 1733-1773. |
[26] | REJWAN I , MANSOUR Y . Top-k combinatorial bandits with full-bandit feedback[C]// Proceedings of the 31st International Conference on Algorithmic Learning Theory.[S.l.:s.n.], 2020. |
[27] | HUANG W R , OK J , LI L ,et al. Combinatorial pure exploration with continuous and separable reward functions and its applications[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. New York:ACM Press, 2018. |
[28] | DU Y H , KUROKI Y , CHEN W . Combinatorial pure exploration with fullbandit or partial linear feedback[J]. arXiv preprint,2020,arXiv:2006.07905. |
[29] | CHEN W , DU Y H , HUANG L B ,et al. Combinatorial pure exploration for dueling bandit[C]// Proceedings of the 37th International Conference on Machine Learning.[S.l.:s.n.], 2020. |
[30] | COMBES R , TALEBI M S , PROUTIERE A ,et al. Combinatorial bandits revisited[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. New York:ACM Press, 2015: 2116-2124. |
[31] | GARIVIER A , CAPPé O ,, . The KL-UCB algorithm for bounded stochastic bandits and beyond[C]// Proceedings of the 24th Annual Conference on Learning Theory.[S.l.:s.n.], 2011. |
[32] | CUVELIER T , COMBES R , GOURDIN E . Statistically efficient,polynomial-time algorithms for combinatorial semibandits[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2021,5(1): 1-31. |
[33] | QIN L J , CHEN S Y , ZHU X Y . Contextual combinatorial bandit and its application on diversified online recommendation[C]// Proceedings of the 2014 SIAM International Conference on Data Mining. Philadelphia:Society for Industrial and Applied Mathematics, 2014. |
[34] | CHEN L X , XU J , LU Z . Contextual combinatorial multi-armed bandits with volatile arms and submodular reward[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York:ACM Press, 2018: 3251-3260. |
[35] | ZUO J H , LIU X T , JOE-WONG C , ,et al. Online competitive influence maximization[J]. arXiv preprint,2020,arXiv:2006.13411. |
[36] | KOMIYAMA J , HONDA J , NAKAGAWA H . Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays[C]// Proceedings of the 32nd International Conference on International Conference on Machine Learning. New York:ACM Press, 2015: 1152-1161. |
[37] | WANG S W , CHEN W . Thompson sampling for combinatorial semi-bandits[C]// Proceedings of the 35th International Conference on Machine Learning.[S.l.:s.n.], 2018. |
[38] | WANG Q S , CHEN W . Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York:ACM Press, 2017: 1161-1171. |
[39] | HüYüK A , TEKIN C . Analysis of Thompson sampling for combinatorial multi-armed bandit with probabilistically triggered arms[C]// Proceedings of Machine Learning Research.[S.l.:s.n.], 2019. |
[40] | HüYüK A , TEKIN C . Thompson sampling for combinatorial network optimization in unknown environments[J]. IEEE/ACM Transactions on Networking, 2020,28(6): 2836-2849. |
[41] | WANG S W , CHEN W . Thompson sampling for combinatorial semi-bandits[C]// Proceedings of the 35th International Conference on Machine Learning.[S.l.:s.n.], 2018: 5114-5122. |
[42] | ZHOU H Z , WANG L D , VARSHNEY L ,et al. A near-optimal change-detection based algorithm for piecewise-stationary combinatorial semi-bandits[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4): 6933-6940. |
[43] | CHEN W , WANG L W , ZHAO H Y ,et al. Combinatorial semi-bandit in the non stationary environment[J]. arXiv preprint,2020,arXiv:2002.03580. |
[44] | ZHANG X J , LI S , LIU W W . Contextual combinatorial conservative bandits[J]. arXiv preprint,2019,arXiv:1911.11337. |
[45] | CESA-BIANCHI N , LUGOSI G . Combinatorial bandits[J]. Journal of Computer and System Sciences, 2012,78(5): 1404-1422. |
[46] | BUBECK S , CESA-BIANCHI N ,, KAKADE S M . Towards minimax policies for online linear optimization with bandit feedback[J]. Journal of Machine Learning Research, 2012,23. |
[47] | SAKAUE S , ISHIHATA M , MINATO S I . Practical adversarial combinatorial bandit algorithm via compression of decision sets[J]. arXiv preprint,2017,arXiv:1707.08300. |
[48] | LIN T , ABRAHAO B , KLEINBERG R ,et al. Combinatorial partial monitoring game with linear feedback and its applications[C]// Proceedings of the 31st International Conference on Machine Learning.[S.l.:s.n.], 2014. |
[49] | CHAUDHURI S , TEWARI A . Phased exploration with greedy exploitation in stochastic combinatorial partial monitoring games[J]. arXiv preprint,2016,arXiv:1608.06403. |
[50] | CHEN X Y , ZHENG K , ZHOU Z X ,et al. (Locally) Differentially private combinatorial semi-bandits[C]// Proceedings of the 37th International Conference on Machine Learning.[S.l.:s.n.], 2020. |
[51] | CHEN Y W , HOFMANN K.Online learning to rank:absolute vs . relative[C]// Proceedings of the 24th International Conference on World Wide Web. New York:ACM Press, 2015. |
[52] | KVETON B , SZEPESVARI C , WEN Z ,et al. Cascading bandits:learning to rank in the cascade model[C]// Proceedings of the 32nd International Conference on Machine Learning. New York:ACM Press, 2015: 767-776. |
[53] | KVETON B , WEN Z , ASHKAN A ,et al. Combinatorial cascading bandits[J]. Advances in Neural Information Processing Systems, 2015,28: 1450-1458. |
[54] | LI S , WANG B X , ZHANG S Y ,et al. Contextual combinatorial cascading bandits[C]// Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York:ACM Press, 2016: 1245-1253. |
[55] | KATARIYA S , KVETON B , SZEPESVáRI C ,, et al . DCM bandits:learning to rank with multiple clicks[C]// Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York:ACM Press, 2016: 1215-1224. |
[56] | CAO J Y , SUN W , SHEN Z J M ,et al. Fatigue-aware bandits for dependent click models[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020,34(4): 3341-3348. |
[57] | KVETON B , LI C , LATTIMORE T ,et al. BubbleRank:safe online learning to rerank via implicit click feedback[C]// Proceedings of the 35th Uncertainty in Artificial Intelligence Conference.[S.l.:s.n.], 2020. |
[58] | ZOGHI M , TUNYS T , GHAVAMZADEH M ,et al. Online learning to rank in stochastic click models[C]// Proceedings of the 34th International Conference on Machine Learning.[S.l.:s.n.], 2017. |
[59] | ZHU Z A , CHEN W Z , MINKA T ,et al. A novel click model and its applications to online advertising[C]// Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. New York:ACM Press, 2010. |
[60] | LI S , LATTIMORE T , SZEPESVARI C . Online learning to rank with features[C]// Proceedings of the 36th International Conference on Machine Learning.[S.l.:s.n.], 2019. |
[61] | LATTIMORE T , KVETON B , LI S ,et al. TopRank:a practical algorithm for online stochastic ranking[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems.[S.l.:s.n.], 2018. |
[62] | YUE Y S , GUESTRIN C . Linear submodular bandits and their application to diversified retrieval[C]// Proceedings of the 24th International Conference on Neural Information Processing Systems. New York:ACM Press, 2011: 2483-2491. |
[63] | YU B S , FANG M , TAO D C . Linear submodular bandits with a knapsack constraint[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. New York:ACM Press, 2016: 1380-1386. |
[64] | CHEN L , KRAUSE A , KARBASI A . Interactive submodular bandit[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems.[S.l.:s.n.], 2017. |
[65] | TAKEMORI S , SATO M , SONODA T ,et al. Submodular bandit problem under multiple constraints[C]// Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence.[S.l.:s.n.], 2020. |
[66] | 孔芳, 李奇之, 李帅 . 在线影响力最大化研究综述[J]. 计算机科学, 2020,47(5): 7-13. |
KONG F , LI Q Z , LI S . Survey on online influence maximization[J]. Computer Science, 2020,47(5): 7-13. | |
[67] | KEMPE D , KLEINBERG J , TARDOS é , . Maximizing the spread of influence through a social network[C]// Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.[S.l.:s.n.], 2003: 137-146. |
[68] | WEN Z , KVETON B , VALKO M ,et al. Online influence maximization under independent cascade model with semibandit feedback[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems.[S.l.:s.n.], 2017: 3026-3036. |
[69] | WU Q Y , LI Z G , WANG H Z ,et al. Factorization bandits for online influence maximization[C]// Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2019: 636-646. |
[70] | VASWANI S , LAKSHMANAN L V S , SCHMIDT M . Influence maximization with bandits[J]. arXiv preprint,2015,arXiv:1503.00024. |
[71] | LI S , KONG F , TANG K J ,et al. Online influence maximization under linear threshold model[J]. arXiv preprint,2020,arXiv:2011.06378. |
[72] | VASWANI S , KVETON B , WEN Z ,et al. Model-independent online learning for influence maximization[C]// Proceedings of the 34th International Conference on Machine Learning. New York:ACM Press, 2017: 3530-3539. |
[1] | 杜雪涛. 大数据认知计算在内容安全管控中的应用[J]. 大数据, 2021, 7(6): 53-66. |
[2] | 邓建国, 张素兰, 张继福, 荀亚玲, 刘爱琴. 监督学习中的损失函数及应用研究[J]. 大数据, 2020, 6(1): 60-80. |
[3] | 王成, 舒鹏飞. W EB:一种基于网络嵌入的互联网借贷欺诈预测方法[J]. 大数据, 2019, 5(6): 85-100. |
[4] | 王冬海,卢峰,方晓蓉,郭刚. 海洋大数据关键技术及在灾害天气下船舶行为预测上的应用[J]. 大数据, 2017, 3(4): 81-90. |
[5] | 鲁鸣鸣, 张丹, 王建新. 基于校园一卡通数据好友发现及应用[J]. 大数据, 2017, 3(2): 78-91. |
[6] | 陈维政, 张岩, 李晓明. 网络表示学习[J]. 大数据, 2015, 1(3): 8-22. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|