[1] |
CRAINIC T G , LAPORTE G . Planning models for freight transportation[J]. European Journal of Operational Research, 1997,97(3): 409-438.
|
[2] |
EPSTEIN R , NEELY A , WEINTRAUB A ,et al. A strategic empty container logistics optimization in a major shipping company[J]. Interfaces, 2012,42(1): 5-16.
|
[3] |
LI J G , LEUNG S C H , WU Y ,et al. Allocation of empty containers between multi-ports[J]. European Journal of Operational Research, 2007,182(1): 400-412.
|
[4] |
POWELL W B . Toward a unified modeling framework for real-time logistics control[J]. Military Operations Research, 1996,1(4): 69-79.
|
[5] |
LEE D H , WANG H , CHEU R L ,et al. Taxi dispatch system based on current demands and real-time traffic conditions[J]. Transportation Research Record:Journal of the Transportation Research Board, 2004,1882(1): 193-200.
|
[6] |
ZHANG L Y , HU T , MIN Y ,et al. A taxi order dispatch model based on combinatorial optimization[C]// Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2017: 2151-2159.
|
[7] |
DAVIS T . Effective supply chain management[J]. MIT Sloan Management Review, 1993,34(4): 35-35.
|
[8] |
POIRIER C C , REITER S E . Supply chain optimization:building the strongest total business network[M]. San Francisco: Ber rett-Koehler Publishers, 1996.
|
[9] |
ZHOU Z Y , CHENG S W , HUA B . Supply chain optimization of continuous process industries with sustainability considerations[J]. Computers & Chemical Engineering, 2000,24(2-7): 1151-1158.
|
[10] |
DE LA VEGA W F , LUEKER G S . Bin packing can be solved within 1 + ε in linear time[J]. Combinatorica, 1981,1(4): 349-355.
|
[11] |
MARTELLO S , PISINGER D , VIGO D . The three-dimensional Bin packing problem[J]. Operations Research, 2000,48(2): 256-267.
|
[12] |
SILVER D , SCHRITTWIESER J , SIMONYAN K ,et al. Mastering the game of Go without human knowledge[J]. Nature, 2017,550(7676): 354-359.
|
[13] |
MNIH V , KAVUKCUOGLU K , SILVER D ,et al. Human-level control through deep reinforcement learning[J]. Nature, 2015,518(7540): 529-533.
|
[14] |
刘朝阳, 穆朝絮, 孙长银 . 深度强化学习算法与应用研究现状综述[J]. 智能科学与技术学报, 2020,2(4): 314-326.
|
|
LIU Z Y , MU C X , SUN C Y . An overview on algorithms and applications of deep reinforcement learning[J]. Chinese Journal of Intelligent Science and Technology, 2020,2(4): 314-326.
|
[15] |
SUTTON R S , BARTO A G . Reinforcement learning:an introduction[M]. Cambridge: MIT Press, 1998.
|
[16] |
WILLIAMS R J . Simpl e statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992,8(3): 229-256.
|
[17] |
WATKINS C J C H , DAYAN P . Q-learning[J]. Machine Learning, 1992,8(3): 279-292.
|
[18] |
SCHULMAN J , LEVINE S , MORITZ P ,et al. Trust region policy optimization[C]// Proceedings of the 31st International Conference on Machine Learning.[S.l.:s.n.], 2015: 1889-1897.
|
[19] |
HEESS N , TB D , SRIRAM S ,et al. Emergence of locomotion behaviours in rich environments[J]. arXiv preprint,2017,arXiv:1707.02286.
|
[20] |
SCHULMAN J , WOLSKI F , DHARIWAL P ,et al. Proximal policy optimization algorithms[J]. arXiv preprint,2017,arXiv:1707.06347.
|
[21] |
MNIH V , BADIA A P , MIRZA M ,et al. Asynchronous methods for deep reinforcement learning[C]// Proceedings of the 32nd Int ernational Conference on Machine Learning.[S.l.:s.n.], 2016: 1928-1937.
|
[22] |
LONG Y , LEE L H , CHEW E P . The sample average approximation method for empty container repositioning with uncertainties[J]. European Journal of Operational Research, 2012,222(1): 65-75.
|
[23] |
SONG D P , DONG J X . Empty container repositioning[M]// Handbook of ocean container transport logistics.[S.l.:s.n.], 2015: 163-208.
|
[24] |
LI X H , ZHANG J , BIAN J ,et al. A cooperative multi-agent reinforcement learning framework for resource balancing in co mplex logistics network[J]. arXiv preprint,2019,arXiv:1903.00714.
|
[25] |
JIANG J C , DUN C , HUANG T J ,et al. Graph convolutional reinforcement learning[J]. arXiv preprint,2018,arXiv:1810.09202,
|
[26] |
SHI W L , WEI X R , ZHANG J ,et al. Cooperative policy learning with pretrained heterogeneous observation representation s[J]. arXiv preprint,2020,arXiv:2012.13099.
|
[27] |
CONTARDO C , MORENCY C , ROUSSEAU L M . Balancing a dynamic public bikesharing system[M]. Montreal: CIRRELT, 2012.
|
[28] |
E RDO?AN G , BATTARRA M , CALVO R W . An exact algorithm for the static rebalancing problem arising in bicycle sharing systems[J]. European Journal of Operational Research, 2015,245(3): 667-679.
|
[29] |
GHOSH S , TRICK M , VARAKANTHAM P . Robust repositioning to counter unpredictable demand in bike sharing systems[C]// Proce edings of the 25th International Joint Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2016: 3096-3102.
|
[30] |
LIU J M , SUN L L , CHEN W W ,et al. Rebalancing bike sharing systems:a multi-source data smart optimization[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM Press, 2016: 1005-1014.
|
[31] |
LI Y X , ZHENG Y , YANG Q . Dynamic bike reposition:a spatio-temporal reinforcement learning approach[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2018: 1724-1733.
|
[32] |
RAINER-HARBACH M , PAPAZEK P , HU B ,et al. Balancing bicycle sharing systems:a variable neighborhood search approach[C]// Proceedings of the 2013 European Conference on Evolutionary Computation in Combinatorial Optimization. Heidelberg:Springer, 2013: 121-132.
|
[33] |
SCHUIJBROEK J , HAMPSHIRE R C , VAN HOEVE W J . Inventory rebalancing and vehicle routing in bike sharing systems[J]. European Journal of Operational Research, 2017,257(3): 992-1004.
|
[34] |
CHEMLA D , MEUNIER F , PRADEAU T ,et al. Self-service bike sharing systems:simulation,repositioning,pricing[Z]. 2013.
|
[35] |
FRICKER C , GAST N . Incentives and redistribution in homogeneous bike-sharing systems with stations of finite capacity[J]. EURO Journal on Transportation and Logistics, 2016,5(3): 261-291.
|
[36] |
PAN L , CAI Q P , FANG Z X ,et al. A deep reinforcement learning framework for rebalancing dockless bike sharing systems[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2019: 1393-1400.
|
[37] |
SINGLA A , SANTONI M , BARTOK G ,et al. Incentivizing users for balancing bike sharing systems[C]// Proceedings of the 29t h AAAI Conference on Artificial Intelligence. Palo Alto:AAAI Press, 2015: 723-729.
|
[38] |
WASERHOLE A , JOST V . Pricing in vehicle sharing systems:optimization in queuing networks with product forms[J]. EURO Journal on Transportation and Logistics, 2016,5(3): 293-320.
|
[39] |
GHOSH S , VARAKANTHAM P , ADULYASAK Y ,et al. Dynamic repositioning to reduce lost demand in bike sharing systems[J]. Journal of Artificial Intelligence Research, 2017,58: 387-430.
|
[40] |
LOWALEKAR M , VARAKANTHAM P , GHOSH S ,et al. Online repositioning in bike sharing systems[C]// Proceedings of the 27th In ternational Conference on Automated Planning and Scheduling.[S.l.:s.n.], 2017: 200-208.
|
[41] |
LILLICRAP T P , HUNT J J , PRITZEL A ,et al. Continuous control with deep reinforcement learning[J]. arXiv preprint,2015,arXiv:1509.02971.
|
[42] |
CHUNG L C GPS taxi dispatch system based on A* shortest path algorithm[Z]. 2005.
|
[43] |
LIAO Z . Taxi dispatching via glo bal positioning systems[J]. IEEE Transactions on Engineering Management, 2001,48(3): 342-347.
|
[44] |
ALSHAMSI A , ABDALLAH S , RAHWAN I . Multiagent self-organization for a taxi dispatch system[C]// Proceedings of the 8th In ternational Conference on Autonomous Agents and Multiagent Systems.[S.l.:s.n.], 2009: 21-28.
|
[45] |
LIN K X , ZHAO R Y , XU Z ,et al. Efficient large-scale fleet management via multiagent deep reinforcement learning[C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2018: 1774-1783.
|
[46] |
LI M , QIN Z W , JIAO Y ,et al. Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning[C]// Proceedings of the 2019 World Wide Web Conference. New York:ACM Press, 2019: 983-994.
|
[47] |
YANG Y D , LUO R , LI M ,et al. Mean field multi-agent reinforcement learning[C]// Proceedings of the 34th International Conference on Machine Learning.[S.l.:s.n.], 2018: 5571-5580.
|
[48] |
ZHOU M , JIN J R , ZHANG W N ,et al. Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching[C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. New York:ACM Press, 2019: 2645-2653.
|
[49] |
GIJSBRECHTS J , BOUTE R N , MIEGHEM J A ,et al. Can deep reinforcement learning improve inventory management? Performance and implementation of dual sourcing-mode problems[J]. SSRN Electronic Journal, 2018.
|
[50] |
WU D , CHEN C , YANG X ,et al. A multiagent reinforcement learning method for impression allocation in online display advertising[J]. arXiv preprint,2018,arXiv:1809.03152.
|
[51] |
YAKOVLEVA D , POPOV A , FILCHENKOV A . Real-time bidding with soft actorcritic reinforcement learning in display advertising[C]// Proceedings of 2019 25th Conference of Open Innovations Association. Piscataway:IEEE Press, 2019: 373-382.
|
[52] |
ZHAO X Y , GU C S , ZHANG H ,et al. DEAR:deep reinforcement learning for online advertising impression in recommender systems[J]. arXiv preprint,2019,arXiv:1909.03602.
|
[53] |
ZHAO X Y , ZHENG X D , YANG X W ,et al. Jointly learning to recommend and advertise[C]// Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York:ACM Press, 2020: 3319-3327.
|
[54] |
KEMMER L , KLEIST H , ROCHEBOU?T D ,, et al . Reinforcement learning for supply chain optimization[C]// Proceedings of 2018 E uropean Workshop on Reinforcement Learning.[S.l.:s.n.], 2018.
|
[55] |
PENG Z D , ZHANG Y , FENG Y P ,et al. Deep reinforcement learning approach for capacitated supply chain optimization und er demand uncertainty[C]// Proceedings of 2019 Chinese Automation Congress. Piscataway:IEEE Press, 2019: 3512-3517.
|
[56] |
ALVES J C , MATEUS G R . Deep reinforcement learning and optimization approach for multi-echelon supply chain with uncert ain demands[C]// Proceedings of 2020 International Conference on Computational Logistics. Heidelberg:Springer, 2020: 584-599.
|
[57] |
JOHNSON D S , DEMERS A , ULLMAN J D ,et al. Worst-case performance bounds for simple one-dimensional packing algorithms[J]. SIAM Journal on computing, 1974,3(4): 299-325.
|
[58] |
HU H Y , ZHANG X D , YAN X W ,et al. Solving a new 3D Bin packing problem with deep reinforcement learning method[J]. arXiv preprint,2017,arXiv:1708.05930.
|
[59] |
SOLOZABAL R , CEBERIO J , TAKá? M , . Constrained combinatorial optimization with reinforcement learning[J]. arXiv preprint,2016,arXiv:1611.09940.
|
[60] |
HU H Y , DUAN L , ZHANG X D ,et al. A multi-task selected learning approach for solving new type 3D Bin packing problem[J]. arXiv preprint,2018,arXiv:1804.06896.
|
[61] |
LATERRE A , FU Y G , JABRI M K ,et al. Ranked reward:enabling self-play reinforcement learning for combinatorial optimization[J]. arXiv preprint,2018,arXiv:1807.01672.
|
[62] |
LI D D , REN C W , GU Z Q ,et al. Solving packing problems by conditional query learning[Z]. 2019.
|
[63] |
CAI Q P , HANG W , MIRHOSEINI A ,et al. Reinforcement learning driven heuristic optimization[J]. arXiv preprint,2019,arXiv:1906.06639.
|
[64] |
KUNDU O , DUTTA S , KUMAR S . Deeppack:a vision-based 2D online Bin packing algorithm with deep reinforcement learning[C]// Proceedings of 2019 28th IEEE International Conference on Robot and Human Interactive Communication. Piscataway:IEEE Press, 2019: 1-7.
|
[65] |
VERMA R , SINGHAL A , KHADILKAR H ,et al. A generalized reinforcement learning algorithm for online 3D Binpacking[J]. arXiv preprint,2020,arXiv:2007.00463.
|
[66] |
ZHAO H , SHE Q J , ZHU C Y ,et al. Online 3D Bin packing with constrained deep reinforcement learning[J]. arXiv preprint,2020,arXiv:2006.14978.
|