Big Data Research ›› 2021, Vol. 7 ›› Issue (5): 131-149.doi: 10.11959/j.issn.2096-0271.2021053
• COLUMN: DATA-DRIVEN OPTIMIZATION • Previous Articles Next Articles
Jinyu WANG, Xinran WEI, Wenlei SHI, Jia ZHANG
Online:
2021-09-15
Published:
2021-09-01
CLC Number:
Jinyu WANG, Xinran WEI, Wenlei SHI, Jia ZHANG. Applications of reinforcement learning in the field of resource optimization[J]. Big Data Research, 2021, 7(5): 131-149.
[1] | CRAINIC T G , LAPORTE G . Planning models for freight transportation[J]. European Journal of Operational Research, 1997,97(3): 409-438. |
[2] | EPSTEIN R , NEELY A , WEINTRAUB A ,et al. A strategic empty container logistics optimization in a major shipping company[J]. Interfaces, 2012,42(1): 5-16. |
[3] | LI J G , LEUNG S C H , WU Y ,et al. Allocation of empty containers between multi-ports[J]. European Journal of Operational Research, 2007,182(1): 400-412. |
[4] | POWELL W B . Toward a unified modeling framework for real-time logistics control[J]. Military Operations Research, 1996,1(4): 69-79. |
[5] | LEE D H , WANG H , CHEU R L ,et al. Taxi dispatch system based on current demands and real-time traffic conditions[J]. Transportation Research Record:Journal of the Transportation Research Board, 2004,1882(1): 193-200. |
[6] | ZHANG L Y , HU T , MIN Y ,et al. A taxi order dispatch model based on combinatorial optimization[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2017: 2151-2159. |
[7] | DAVIS T . Effective supply chain management[J]. MIT Sloan Management Review, 1993,34(4): 35-35. |
[8] | POIRIER C C , REITER S E . Supply chain optimization:building the strongest total business network[M]. San Francisco: Ber rett-Koehler Publishers, 1996. |
[9] | ZHOU Z Y , CHENG S W , HUA B . Supply chain optimization of continuous process industries with sustainability considerations[J]. Computers & Chemical Engineering, 2000,24(2-7): 1151-1158. |
[10] | DE LA VEGA W F , LUEKER G S . Bin packing can be solved within 1 + ε in linear time[J]. Combinatorica, 1981,1(4): 349-355. |
[11] | MARTELLO S , PISINGER D , VIGO D . The three-dimensional Bin packing problem[J]. Operations Research, 2000,48(2): 256-267. |
[12] | SILVER D , SCHRITTWIESER J , SIMONYAN K ,et al. Mastering the game of Go without human knowledge[J]. Nature, 2017,550(7676): 354-359. |
[13] | MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533. |
[14] | 刘朝阳, 穆朝絮, 孙长银 . 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4): 314-326. |
LIU Z Y , MU C X , SUN C Y . An overview on algorithms and applications of deep reinforcement learning[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(4): 314-326. | |
[15] | SUTTON R S , BARTO A G . Reinforcement learning:an introduction[M]. Cambridge: MIT Press, 1998. |
[16] | WILLIAMS R J . Simpl e statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992,8(3): 229-256. |
[17] | WATKINS C J C H , DAYAN P . Q-learning[J]. Machine Learning, 1992,8(3): 279-292. |
[18] | SCHULMAN J , LEVINE S , MORITZ P ,et al. Trust region policy optimization[C]// Proceedings of the 31st International Conference on Machine Learning.[S.l.:s.n.], 2015: 1889-1897. |
[19] | HEESS N , TB D , SRIRAM S ,et al. Emergence of locomotion behaviours in rich environments[J]. arXiv preprint,2017,arXiv:1707.02286. |
[20] | SCHULMAN J , WOLSKI F , DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv preprint,2017,arXiv:1707.06347. |
[21] | MNIH V , BADIA A P , MIRZA M ,et al. Asynchronous methods for deep reinforcement learning[C]// Proceedings of the 32nd Int ernational Conference on Machine Learning.[S.l.:s.n.], 2016: 1928-1937. |
[22] | LONG Y , LEE L H , CHEW E P . The sample average approximation method for empty container repositioning with uncertainties[J]. European Journal of Operational Research, 2012,222(1): 65-75. |
[23] | SONG D P , DONG J X . Empty container repositioning[M]// Handbook of ocean container transport logistics.[S.l.:s.n.], 2015: 163-208. |
[24] | LI X H , ZHANG J , BIAN J ,et al. A cooperative multi-agent reinforcement learning framework for resource balancing in co mplex logistics network[J]. arXiv preprint,2019,arXiv:1903.00714. |
[25] | JIANG J C , DUN C , HUANG T J ,et al. Graph convolutional reinforcement learning[J]. arXiv preprint,2018,arXiv:1810.09202, |
[26] | SHI W L , WEI X R , ZHANG J ,et al. Cooperative policy learning with pretrained heterogeneous observation representation s[J]. arXiv preprint,2020,arXiv:2012.13099. |
[27] | CONTARDO C , MORENCY C , ROUSSEAU L M . Balancing a dynamic public bikesharing system[M]. Montreal: CIRRELT, 2012. |
[28] | E RDO?AN G , BATTARRA M , CALVO R W . An exact algorithm for the static rebalancing problem arising in bicycle sharing systems[J]. European Journal of Operational Research, 2015,245(3): 667-679. |
[29] | GHOSH S , TRICK M , VARAKANTHAM P . Robust repositioning to counter unpredictable demand in bike sharing systems[C]// Proce edings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016: 3096-3102. |
[30] | LIU J M , SUN L L , CHEN W W ,et al. Rebalancing bike sharing systems:a multi-source data smart optimization[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 1005-1014. |
[31] | LI Y X , ZHENG Y , YANG Q . Dynamic bike reposition:a spatio-temporal reinforcement learning approach[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2018: 1724-1733. |
[32] | RAINER-HARBACH M , PAPAZEK P , HU B ,et al. Balancing bicycle sharing systems:a variable neighborhood search approach[C]// Proceedings of the 2013 European Conference on Evolutionary Computation in Combinatorial Optimization. Heidelberg:Springer, 2013: 121-132. |
[33] | SCHUIJBROEK J , HAMPSHIRE R C , VAN HOEVE W J . Inventory rebalancing and vehicle routing in bike sharing systems[J]. European Journal of Operational Research, 2017,257(3): 992-1004. |
[34] | CHEMLA D , MEUNIER F , PRADEAU T ,et al. Self-service bike sharing systems:simulation,repositioning,pricing[Z]. 2013. |
[35] | FRICKER C , GAST N . Incentives and redistribution in homogeneous bike-sharing systems with stations of finite capacity[J]. EURO Journal on Transportation and Logistics, 2016,5(3): 261-291. |
[36] | PAN L , CAI Q P , FANG Z X ,et al. A deep reinforcement learning framework for rebalancing dockless bike sharing systems[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2019: 1393-1400. |
[37] | SINGLA A , SANTONI M , BARTOK G ,et al. Incentivizing users for balancing bike sharing systems[C]// Proceedings of the 29t h AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2015: 723-729. |
[38] | WASERHOLE A , JOST V . Pricing in vehicle sharing systems:optimization in queuing networks with product forms[J]. EURO Journal on Transportation and Logistics, 2016,5(3): 293-320. |
[39] | GHOSH S , VARAKANTHAM P , ADULYASAK Y ,et al. Dynamic repositioning to reduce lost demand in bike sharing systems[J]. Journal of Artificial Intelligence Research, 2017,58: 387-430. |
[40] | LOWALEKAR M , VARAKANTHAM P , GHOSH S ,et al. Online repositioning in bike sharing systems[C]// Proceedings of the 27th In ternational Conference on Automated Planning and Scheduling.[S.l.:s.n.], 2017: 200-208. |
[41] | LILLICRAP T P , HUNT J J , PRITZEL A ,et al. Continuous control with deep reinforcement learning[J]. arXiv preprint,2015,arXiv:1509.02971. |
[42] | CHUNG L C GPS taxi dispatch system based on A* shortest path algorithm[Z]. 2005. |
[43] | LIAO Z . Taxi dispatching via glo bal positioning systems[J]. IEEE Transactions on Engineering Management, 2001,48(3): 342-347. |
[44] | ALSHAMSI A , ABDALLAH S , RAHWAN I . Multiagent self-organization for a taxi dispatch system[C]// Proceedings of the 8th In ternational Conference on Autonomous Agents and Multiagent Systems.[S.l.:s.n.], 2009: 21-28. |
[45] | LIN K X , ZHAO R Y , XU Z ,et al. Efficient large-scale fleet management via multiagent deep reinforcement learning[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2018: 1774-1783. |
[46] | LI M , QIN Z W , JIAO Y ,et al. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning[C]// Proceedings of the 2019 World Wide Web Conference. New York:ACM Press, 2019: 983-994. |
[47] | YANG Y D , LUO R , LI M ,et al. Mean field multi-agent reinforcement learning[C]// Proceedings of the 34th International Conference on Machine Learning.[S.l.:s.n.], 2018: 5571-5580. |
[48] | ZHOU M , JIN J R , ZHANG W N ,et al. Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2019: 2645-2653. |
[49] | GIJSBRECHTS J , BOUTE R N , MIEGHEM J A ,et al. Can deep reinforcement learning improve inventory management? Performance and implementation of dual sourcing-mode problems[J]. SSRN Electronic Journal, 2018. |
[50] | WU D , CHEN C , YANG X ,et al. A multiagent reinforcement learning method for impression allocation in online display advertising[J]. arXiv preprint,2018,arXiv:1809.03152. |
[51] | YAKOVLEVA D , POPOV A , FILCHENKOV A . Real-time bidding with soft actorcritic reinforcement learning in display advertising[C]// Proceedings of 2019 25th Conference of Open Innovations Association. Piscataway:IEEE Press, 2019: 373-382. |
[52] | ZHAO X Y , GU C S , ZHANG H ,et al. DEAR:deep reinforcement learning for online advertising impression in recommender systems[J]. arXiv preprint,2019,arXiv:1909.03602. |
[53] | ZHAO X Y , ZHENG X D , YANG X W ,et al. Jointly learning to recommend and advertise[C]// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2020: 3319-3327. |
[54] | KEMMER L , KLEIST H , ROCHEBOU?T D ,, et al . Reinforcement learning for supply chain optimization[C]// Proceedings of 2018 E uropean Workshop on Reinforcement Learning.[S.l.:s.n.], 2018. |
[55] | PENG Z D , ZHANG Y , FENG Y P ,et al. Deep reinforcement learning approach for capacitated supply chain optimization und er demand uncertainty[C]// Proceedings of 2019 Chinese Automation Congress. Piscataway:IEEE Press, 2019: 3512-3517. |
[56] | ALVES J C , MATEUS G R . Deep reinforcement learning and optimization approach for multi-echelon supply chain with uncert ain demands[C]// Proceedings of 2020 International Conference on Computational Logistics. Heidelberg:Springer, 2020: 584-599. |
[57] | JOHNSON D S , DEMERS A , ULLMAN J D ,et al. Worst-case performance bounds for simple one-dimensional packing algorithms[J]. SIAM Journal on computing, 1974,3(4): 299-325. |
[58] | HU H Y , ZHANG X D , YAN X W ,et al. Solving a new 3D Bin packing problem with deep reinforcement learning method[J]. arXiv preprint,2017,arXiv:1708.05930. |
[59] | SOLOZABAL R , CEBERIO J , TAKá? M , . Constrained combinatorial optimization with reinforcement learning[J]. arXiv preprint,2016,arXiv:1611.09940. |
[60] | HU H Y , DUAN L , ZHANG X D ,et al. A multi-task selected learning approach for solving new type 3D Bin packing problem[J]. arXiv preprint,2018,arXiv:1804.06896. |
[61] | LATERRE A , FU Y G , JABRI M K ,et al. Ranked reward:enabling self-play reinforcement learning for combinatorial optimization[J]. arXiv preprint,2018,arXiv:1807.01672. |
[62] | LI D D , REN C W , GU Z Q ,et al. Solving packing problems by conditional query learning[Z]. 2019. |
[63] | CAI Q P , HANG W , MIRHOSEINI A ,et al. Reinforcement learning driven heuristic optimization[J]. arXiv preprint,2019,arXiv:1906.06639. |
[64] | KUNDU O , DUTTA S , KUMAR S . Deeppack:a vision-based 2D online Bin packing algorithm with deep reinforcement learning[C]// Proceedings of 2019 28th IEEE International Conference on Robot and Human Interactive Communication. Piscataway:IEEE Press, 2019: 1-7. |
[65] | VERMA R , SINGHAL A , KHADILKAR H ,et al. A generalized reinforcement learning algorithm for online 3D Binpacking[J]. arXiv preprint,2020,arXiv:2007.00463. |
[66] | ZHAO H , SHE Q J , ZHU C Y ,et al. Online 3D Bin packing with constrained deep reinforcement learning[J]. arXiv preprint,2020,arXiv:2006.14978. |
[1] | Yuqi ZHANG, Xiaowen HUANG, Jitao SANG. Knowledge-enhanced policy-guided interactive reinforcement recommendation system [J]. Big Data Research, 2022, 8(5): 88-105. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
|