网络与信息安全学报 ›› 2021, Vol. 7 ›› Issue (5): 57-76.doi: 10.11959/j.issn.2096-109x.2021087
袁唯淋, 廖志勇, 高巍, 魏婷婷, 罗俊仁, 张万鹏, 陈璟
修回日期:
2021-03-06
出版日期:
2021-10-15
发布日期:
2021-10-01
作者简介:
袁唯淋(1994− ),男,云南曲靖人,国防科技大学博士生,主要研究方向为认知决策与智能博弈、对手建模、强化学习、多智能体系统基金资助:
Weilin YUAN, Zhiyong LIAO, Wei GAO, Tingting WEI, Junren LUO, Wanpeng ZAHNG, Jing CHEN
Revised:
2021-03-06
Online:
2021-10-15
Published:
2021-10-01
Supported by:
摘要:
计算机博弈是人工智能领域的“果蝇”,备受人工智能领域研究者的关注,已然成为研究认知智能的有利平台。扑克类博弈对抗问题可建模成边界确定、规则固定的不完美信息动态博弈,计算机扑克 AI 需要具备不完全信息动态决策、对手误导欺诈行为识别以及多回合筹码和风险管理等能力。首先梳理了以德州扑克为代表的计算机扑克智能博弈的发展历程,其次针对计算机扑克智能博弈典型模型算法、关键技术以及存在的主要问题进行了综述分析,最后探讨了计算机扑克智能博弈的未来发展趋势和应用前景。
中图分类号:
袁唯淋, 廖志勇, 高巍, 魏婷婷, 罗俊仁, 张万鹏, 陈璟. 计算机扑克智能博弈研究综述[J]. 网络与信息安全学报, 2021, 7(5): 57-76.
Weilin YUAN, Zhiyong LIAO, Wei GAO, Tingting WEI, Junren LUO, Wanpeng ZAHNG, Jing CHEN. Survey on intelligent game of computer poker[J]. Chinese Journal of Network and Information Security, 2021, 7(5): 57-76.
表1
德州扑克AI相关工作总结Table 1 AI related work in Texas Hold'em poker"
组织 | 时间/年 | 程序名称 | 场景 | 关键技术 | 智能水平 | 硬件配置 |
UA | 1997 | Loki | M-L | 自适应对手建模 | IRC服务器前10% | — |
1999 | Poki | M-L | Miximax搜索神经网络对手建模 | IRC服务器前10% | — | |
2002 | PsOpti | 2-L | 分桶抽象 | 击败Poki | — | |
2003 | Vexbot | 2-L | Expectimax搜索 | 击败Poki和PsOpti | — | |
2006 | Hyperboren | 2-L | CFR、ε-NE | ACPC-2006第一名 | 4个CPU并行计算,耗时近14天 | |
2007 | Polaris | 2-L | CFR、频率最佳响应限制纳什响应 | ACPC-2007第一名 | 4个CPU并行计算,耗时大于14天 | |
2008 | Hyperborean No-Limit | 2-NL | fcpa 下注抽象,Hard Translation | ACPC-2008第一名 | — | |
2009 | Hyperborean | 3-L | FASSCFR | ACPC-2009第一名 | — | |
2015 | Cepheus | 2-L | CFR+ | — | 200个计算节点集群,32GB RAM, 1TB内存,耗时68.5天 | |
2017 | DeepStack | 2-NL | CFVnet | 击败33位职业玩家 | 转牌网络:6144CPU,超175核年;翻牌网络:20GPU,0.5GPU核年 | |
CMU | 2007 | Tartanian | 2-NL | 潜在意识自动抽象 | ACPC-2007第二名 | — |
2014 | Tartanian7 | 2-NL | 分层抽象、MCCFR | ACPC-2014第一名 | 1,153,200核时,128GB RAM | |
2015 | Claudico | 2-NL | 分层抽象、MCCFR | ACPC-2014第一名 | 200-300万核时,128GB RAM | |
2016 | Baby Tatanian8 | 2-NL | 非对称动作抽象RBS | ACPC-2016第一名 | 200万核时,128GB RAM | |
2017 | Libratus | 2-NL | 子博弈求解、MCCFR | 击败顶尖职业选手 | 2500万核时 | |
2019 | Pluribus | 6-NL | 深度限制搜索、MCCFR | 击败顶尖职业选手 | 2500万核时,128GB RAM |
[1] | 兴军亮 . 人工智能新突破:AI程序攻克多人无限注德州扑克[EB]. |
XING J L . The new breakthrough of artificial intelligence:AI program conquered multiplayer unlimited Texas Hold 'em[EB]. | |
[2] | SILVER D , HUANG AJA , MADDISON C J ,et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016,529(7587): 484-489. |
[3] | SILVER D , SCHRITTWIESER J , SIMONYAN K ,et al. Mastering the game of Go without human knowledge[J]. Nature, 2017,550(7676): 354-359. |
[4] | BLAIR A , SAFFIDINE A . AI surpasses humans at six-player poker[J]. Science, 2019,365(6456): 864-865. |
[5] | MORAVCIK , MATEJ , SCHMID M ,et al. DeepStack:expert-level artificial intelligence in heads-up no-limit poker[J]. Science, 2017:6960. |
[6] | NOAM B,TUOMASS . Superhuman AI for heads-up no-limit poker:Libratus beats top professionals[J]. Science:eaao1733. |
[7] | JIANG Q , LI K , DU B ,et al. DeltaDou:expert/level doudizhu AI through self-play[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019 1265-1271. |
[8] | LI J , KOYAMADA S , YE Q ,et al. Suphx:mastering mahjong with deep reinforcement learning[J]. arXiv preprint arXiv:2003.13590, 2020. |
[9] | BERNER C , BROCKMAN G , CHAN B ,et al. Dota 2 with large scale deep reinforcement learning[J]. arXiv preprint arXiv:1912.06680, 2019. |
[10] | YE D , LIU Z , SUN M ,et al. Mastering complex control in MOBA games with deep reinforcement learning[J]. arXiv preprint arXiv:1912.09729, 2019. |
[11] | 朱建明, 高博 . 社交金融的信息安全风险分析与防范[J]. 网络与信息安全学报, 2016,2(3): 46-51. |
ZHU J M , GAO B . Social financial information security risk analysis and prevention[J]. Chinese Journal of Network and Information Security, 2016,2(3): 46-51. | |
[12] | 杨峻楠, 张红旗, 张传富 . 基于不完全信息随机博弈的防御决策方法[J]. 网络与信息安全学报, 2018,4(8): 12-20. |
YANG J N , ZHANG H Q , ZHANG C F . Defense decision-making method based on incomplete information stochastic game[J]. Chinese Journal of Network and Information Security, 2018,4(8): 12-20. | |
[13] | PAPP D R . Dealing with imperfect information in poker[R]. 1998. |
[14] | BILLINGS D , DAVIDSON A , SCHAEFFER J ,et al. The challenge of poker[J]. Artificial Intelligence, 2002,134(12): 201-240. |
[15] | BILLINGS D , BURCH N , DAVIDSON A ,et al. Approximating game-theoretic optimal strategies for full-scale poker[C]// IJCAI. 2003:661. |
[16] | SCHAUENBERG T C . Opponent modelling and search in poker[D]. Edmonton:University of Alberta, 2009. |
[17] | ZINKEVICH M , JOHANSON M , BOWLING M ,et al. Regret minimization in games with incomplete information[C]// Advances in Neural Information Processing Systems. 2008: 1729-1736. |
[18] | JOHANSON M B . Robust strategies and counter-strategies:from superhuman to optimal play[D]. Edmonton:University of Alberta, 2009. |
[19] | SCHNIZLEIN D . State translation in no-limit poker[D]. Edmonton:University of Alberta, 2009. |
[20] | ABOU R N , SZAFRON D . Using counterfactual regret minimization to create competitive multiplayer poker agents[C]// Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. 2010: 159-166. |
[21] | GILPIN A , SANDHOLM T , SORENSEN T B . A heads-up no-limit Texas Hold’em poker player:discretized betting models and automatically generated equilibrium-finding programs[C]// Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, 2008: 911-918. |
[22] | BROWN N , GANZFRIED S , SANDHOLM T . Tartanian7:a champion two-player no-limit texas hold’em poker-playing program[C]// Twenty-9 AAAI Conference on Artificial Intelligence. 2015. |
[23] | GANZFRIED S . Reflections on the first man vs.machine no-limit Texas Hold'em competition[J]. ACM SIGecom Exchanges, 2016,14(2): 2-15. |
[24] | BROWN N , SANDHOLM T . Baby Tartanian8:winning agent from the 2016 annual computer poker competition[C]// IJCAI. 2016: 4238-4239. |
[25] | BILLINGS D , BURCH N , DAVIDSON A ,et al. Approximating game-theoretic optimal strategies for full-scale poker[C]// Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence. 2003: 661-668. |
[26] | STURTEVANT N , BOWLING M . Robust game play against unknown opponents[C]// Proceedings of the fifth International Joint Conference on Autonomous Agents and Multiagent Systems. 2006: 713-719. |
[27] | JOHANSON M B . Robust strategies and counter- strategies:building a champion level computer poker player[D]. Edmonton:University of Alberta, 2007. |
[28] | ZINKEVICH M , JOHANSON M , BOWLING M H ,et al. Regret minimization in games with incomplete information[C]// International Conference on Neural Information Processing Systems. 2007. |
[29] | ABOU RISK N , SZAFRON D . Using counterfactual regret minimization to create competitive multiplayer poker agents[C]// Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems. 2010,,1: 159-166. |
[30] | BOWLING M , BURCH N , JOHANSON M ,et al. Heads-up limit hold’em poker is solved[J]. Science, 2015,347(6218): 145-149. |
[31] | LOCKHART E , LANCTOT M , PEROLAT J ,et al. Computing approximate equilibria in sequential adversarial games by exploitability descent[J]. arXiv preprint arXiv:1903.05614, 2019. |
[32] | BROWN N , SANDHOLM T , AMOS B . Depth-limited solving for imperfect-information games[C]// Advances in Neural Information Processing Systems. 2018: 7663-7674. |
[33] | BROWN N , LERER A , GROSS S ,et al. Deep counterfactual regret minimization[C]// International Conference on machine learning.PMLR. 2019: 793-802. |
[34] | PASSOS N M S . Poker learner:reinforcement learning applied to Texas Hold’em poker[D]. Porto:The University of Porto, 2011. |
[35] | ZHANG L , WANG W , LI S ,et al. Monte Carlo neural fictitious self-play:achieve approximate nash equilibrium of imperfect-information games[J]. arXiv preprint arXiv:1903.09569v2, 2019. |
[36] | ZHANG J , WANG X , YAO L ,et al. Using risk dominance strategy in poker[J]. Journal of Information Hiding and Multimedia Signal Processing, 2014,5(3): 546-554. |
[37] | 滕雯娟 . 基于虚拟遗憾最小化算法的德州扑克机器博弈研究[D]. 哈尔滨:哈尔滨工业大学, 2015. |
TENG W J . Research on texas poker game based on counterfactual regret minimization algorithm[D]. Harbin:Harbin Institute of Technology, 2015. | |
[38] | 代佳宁 . 基于虚拟遗憾最小化算法的非完备信息机器博弈研究[D]. 哈尔滨:哈尔滨工业大学, 2017. |
DAI S N . Research on imperfect information game based on counterfactual regret minimization algorithm[D]. Harbin :Harbin Institute of Technology, 2017. | |
[39] | SKLANSKY D . The theory of poker[M]. Two plus two publishing LLC, 1999. |
[40] | SKLANSKY D , MALMUTH M . Hold'em poker for advanced players[M]. Two Plus Two Publishing LLC, 1999. |
[41] | JOHANSON M . Measuring the size of large no-limit poker games[J]. arXiv preprint arXiv:1302.7008, 2013. |
[42] | BORGHETTI B J . Opponent modeling in interesting adversarial environments[M]. Minnesota: University of Minnesota, 2008. |
[43] | WRIGHT J R . Modeling human behavior in strategic settings[D]. COLUMBIA:University of British Columbia, 2016. |
[44] | PLONSKY O , APEL R , ERT E ,et al. Predicting human decisions with behavioral theories and machine learning[J]. arXiv preprint arXiv:1904.06866, 2019. |
[45] | 高巍, 罗俊仁, 袁唯淋 ,等. 面向对手建模的意图识别方法综述[J]. 网络与信息安全学报, 2021,7(4): 86-100. |
GAO W , LUO J R , YUAN W L ,et al. A survey of intention recognition for opponent modeling[J]. Chinese Journal of Network and Information Security, 2021,7(4): 86-100. | |
[46] | HOEHN B , SOUTHEY F , HOLTE R C ,et al. Effective short-term opponent exploitation in simplified poker[C]// AAAI. 2005: 783-788. |
[47] | GANZFRIED S , SANDHOLM T . Game theory-based opponent modeling in large imperfect-information games[C]// The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2. 2011: 533-540. |
[48] | NORRIS K , WATSON I . A statistical exploitation module for Texas Hold'em:and it's benefits when used with an approximate Nash equilibrium strategy[C]// 2013 IEEE Conference on Computational Inteligence in Games (CIG). 2013: 1-8. |
[49] | GANZFRIED S , SANDHOLM T . Safe opponent exploitation[J]. ACM Transactions on Economics and Computation (TEAC), 2015,3(2): 1-28. |
[50] | SANDHOLM T . The state of solving large incomplete-information games,and application to poker[J]. AI Magazine, 2010,31(4): 13-32. |
[51] | SANDHOLM T . Solving imperfect-information games[J]. Science, 2015,347(6218): 122-123. |
[52] | SANDHOLM T , . Abstraction for solving large incomplete-information games[C]// 29 AAAI Conference on Artificial Intelligence. 2015. |
[53] | GILPIN A , SANDHOLM T . Lossless abstraction of imperfect information games[J]. Journal of the ACM (JACM), 2007,54(5). |
[54] | GILPIN A , SANDHOLM T . Better automated abstraction techniques for imperfect information games,with application to Texas Hold’em poker[C]// Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems. 2007: 1-8. |
[55] | NOAM B , GANZFRIED S , SANDHOLM T . Hierarchical abstraction,distributed equilibrium computation,and post-processing,with application to a champion no-limit Texas Hold'em agent[C]// Workshops at the 29 AAAI Conference on Artificial Intelligence. 2015. |
[56] | BROWN N , SANDHOLM T . Safe and nested subgame solving for imperfect-information games[C]// Advances in Neural Information Processing Systems. 2017: 689-699. |
[57] | NESTEROV . Excessive gap technique in nonsmooth convex minimization[J]. SIAM Journal of Optimization, 2005,16(1): 235-249. |
[58] | HART S A , MAS D , COLELL A . A simple adaptive procedure leading to correlated equilibrium[J]. Econometrica, 2000,68(5): 1127-1150. |
[59] | GREENWALD A , LI Z , MARKS C . Bounds for regret-matching algorithms[C]// Proceedings of the 9 International Symposium on Artificial Intelligence and Mathematics. 2005. |
[60] | GIBSON R G , LANCTOT M , BURCH N ,et al. Generalized sampling and variance in counterfactual regret minimization[C]// Proceedings of the Twenty-Sixth Conference on Artificial Intelligence. 2012: 3690-3706. |
[61] | LISY V , LANCTOT M , BOWLING M . Online monte carlo counterfactual regret minimization for search in imperfect information games[C]// Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015: 27-36. |
[62] | GANZFRIED S , SANDHOLM T . Endgame solving large imperfect-information games[C]// Workshops at the Twenty-Ninth AAAAI Conference on Artificial Intelligence. 2015. |
[63] | BROWN N , SANDHOLM T . Solving imperfect-information games via discounted regret minimization[C]// AAAI Conference on Artificial Intelligence (AAAI). 2019. |
[64] | BROWN N , SANDHOLM T . Regret-based pruning in extensive form games[C]// Advances in Neural Information Processing Systems. 2015: 1972-1980. |
[65] | BROWN N , SANDHOLM T . Strategy-based warm starting for regret minimization in games[C]// Proceedings of 30 Conference on Artificial Intelligence. 2015: 432-438. |
[66] | LANCTOT M , WAUGH K , ZINKEVICH M ,et al. Monte Carlo sampling for regret minimization in extensive games[C]// Advances in Neural Information Processing Systems. 2009: 1078-1086. |
[67] | LI H , HU K , GE Z ,et al. Double neural counterfactual regret minimization[J]. 2018: |
[68] | NOAM B , LERER A , GROSS S ,et al. Deep counterfactual regret minimization[J]. axXiv Preprint arXiv:1812.10607, 2018. |
[69] | ERIC S . Single deep counterfactual regret minimization[J]. arXiv Preprint arXiv:1901.07621, 2018. |
[70] | FARINA G , KROER C , BROWN N ,et al. Stable-predictive optimistic counterfactual regret minimization[C]// International Conference on Machine Learning. 2019: 1853-1862. |
[71] | 陈学松, 杨宜民 . 强化学习研究综述[J]. 计算机应用研究, 2010(8): 40-44,50. |
YANG X S , CHEN Y M . Reinforcement learning:survey of recent work[J]. Application Research of Computers, 2010(8): 40-44,50. | |
[72] | HEINRICH J , SILVER D . Deep reinforcement learning from self-play in imperfect-information games[J]. arXiv preprint arXiv:1603.01121, 2016. |
[73] | LI K , XU H , ZHANG M ,et al. OpenHoldem:an open toolkit for large-scale imperfect-information game research[J]. arXiv preprint arXiv:2012.06168, 2020. |
[74] | ZHA D , LAI K H , CAO Y ,et al. RLCard:a toolkit for reinforcement learning in card games[J]. arXiv preprint arXiv:1910.04376, 2019. |
[75] | NIKOLAI Y , CAO L , RAFFEL C ,et al. Poker-CNN:a pattern learning strategy for making draws and bets in poker games using convolutional networks[C]// 30 AAAI Conference on Artificial Intelligence. 2016. |
[76] | KORB K B , NICHOLSON A , JITNAH N . Bayesian poker[J]. arXiv preprint arXiv:1301.6711, 2013. |
[77] | GANZFRIED S , SANDHOLM T . Endgame solving in large imperfect information games[C]// Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems. 2015: 37-45. |
[78] | SOUTHEY F , BOWLING M P , LARSON B ,et al. Bayes' bluff:opponent modelling in poker[J]. arXiv preprint arXiv:1207.1411, 2012. |
[79] | LI H , WANG X , JIA F ,et al. RLCFR:minimize counterfactual regret by deep reinforcement learning[J]. arXiv preprint arXiv:2009.06373, 2020. |
[80] | YANNAKAKIS G N , TOGELIUS J . Artificial intelligence and games[M]. New York: Springer, 2018. |
[1] | 陈晋音, 李荣昌, 黄国瀚, 刘涛, 郑海斌, 程瑶. 纵向联邦学习方法及其隐私和安全综述[J]. 网络与信息安全学报, 2023, 9(2): 1-20. |
[2] | 林点, 潘理, 易平. 面向图像识别的卷积神经网络鲁棒性研究进展[J]. 网络与信息安全学报, 2022, 8(3): 111-122. |
[3] | 陈晋音, 吴长安, 郑海斌. 基于softmax激活变换的对抗防御方法[J]. 网络与信息安全学报, 2022, 8(2): 48-63. |
[4] | 张宇, 李海良. 基于RSA的图像可识别对抗攻击方法[J]. 网络与信息安全学报, 2021, 7(5): 40-48. |
[5] | 王正龙, 张保稳. 生成对抗网络研究综述[J]. 网络与信息安全学报, 2021, 7(4): 68-85. |
[6] | 高巍, 罗俊仁, 袁唯淋, 张万鹏. 面向对手建模的意图识别方法综述[J]. 网络与信息安全学报, 2021, 7(4): 86-100. |
[7] | 顾伟, 沈剑, 任勇军. 基于智能电网的安全密钥共享算法[J]. 网络与信息安全学报, 2021, 7(4): 141-146. |
[8] | 刘西蒙,谢乐辉,王耀鹏,李旭如. 深度学习中的对抗攻击与防御[J]. 网络与信息安全学报, 2020, 6(5): 36-53. |
[9] | 王培,贾焰,李爱平,蒋千越. 基于DeepLink的社交网络去匿名方法[J]. 网络与信息安全学报, 2020, 6(4): 104-108. |
[10] | 刘勤让,刘崇阳,周俊,王孝龙. 基于线性脉动阵列的卷积神经网络计算优化与性能分析[J]. 网络与信息安全学报, 2018, 4(12): 16-24. |
[11] | 王佳林, 刘吉强, 赵迪, 王盈地, 相迎宵, 陈彤, 童恩栋, 牛温佳. 基于非对称卷积自编码器和支持向量机的入侵检测模型[J]. 网络与信息安全学报, 2018, 4(11): 57-68. |
[12] | 蒋惯樟,王士同. 具备隐私信息保护能力的学习器研究[J]. 网络与信息安全学报, 2016, 2(6): 22-31. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|