智能科学与技术学报 ›› 2024, Vol. 6 ›› Issue (2): 115-133.doi: 10.11959/j.issn.2096-6652.202424
• 综述与展望 • 上一篇
黄峻1,2, 林飞1, 杨静3,4,5, 王兴霞3,4,5, 倪清桦1,2, 王雨桐3,4, 田永林3,4, 李娟娟3,4, 王飞跃1,3,5()
收稿日期:
2024-04-07
修回日期:
2024-05-26
出版日期:
2024-06-15
发布日期:
2024-07-31
通讯作者:
王飞跃
E-mail:feiyue.wang@ia.ac.cn
作者简介:
基金资助:
Jun HUANG1,2, Fei LIN1, Jing YANG3,4,5, Xingxia WANG3,4,5, Qinghua NI1,2, Yutong WANG3,4, Yonglin TIAN3,4, Juanjuan LI3,4, Fei-Yue WANG1,3,5()
Received:
2024-04-07
Revised:
2024-05-26
Online:
2024-06-15
Published:
2024-07-31
Contact:
Fei-Yue WANG
E-mail:feiyue.wang@ia.ac.cn
Supported by:
摘要:
大语言模型和视觉语言模型在各领域的应用中展示出巨大潜力,成为研究热点。然而,幻觉、知识迁移、与人类意图对齐等问题仍然影响着大模型的性能。首先,探讨了提示工程与对齐技术基本原理,提出基于提示优化、专家反馈机制及实时调整机制的引导概念,提升了大语言模型在跨领域应用中的性能;其次,深入分析提示工程的核心技术,如多步推理处理复杂任务的原理;然后,针对各领域的实际应用,讨论提示工程的发展现状;最后,总结提示工程面临的挑战并展望其未来发展方向。提示工程在理论与应用方面的发展,为提升大模型在实际应用中的性能提供了全面的解决方案。
中图分类号:
黄峻, 林飞, 杨静, 等. 生成式AI的大模型提示工程:方法、现状与展望[J]. 智能科学与技术学报, 2024, 6(2): 115-133.
Jun HUANG, Fei LIN, Jing YANG, et al. From prompt engineering to generative artificial intelligence for large models: the state of the art and perspective[J]. Chinese Journal of Intelligent Science and Technology, 2024, 6(2): 115-133.
1 | NADKARNI P M, OHNO-MACHADO L, CHAPMAN W W. Natural language processing: an introduction[J]. Journal of the American Medical Informatics Association, 2011, 18(5): 544-551. |
2 | GOLDBERG Y. A primer on neural network models for natural language processing[J]. Journal of Artificial Intelligence Research, 2016, 57: 345-420. |
3 | ZHAI C X. Statistical language models for information retrieval a critical review[J]. Foundations and Trends in Information Retrieval, 2007, 2(3): 137-213. |
4 | KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal neural language models[C]//Proceedings of the 31st International Conference on International Conference on Machine Learning. New York: ACM, 2014: 595-603. |
5 | KAPLAN J, MCCANDLISH S, HENIGHAN T, et al. Scaling laws forneural language models[EB]. arXiv preprint, 2020, arXiv: 2001.08361.. |
6 | CHURCH K W. Word2vec[J]. Natural Language Engineering, 2017, 23(1): 155-162. |
7 | PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing(EMNLP). Stroudsburg: Association for Computational Linguistics, 2014: 1532-1543. |
8 | LIU X, ZHENG Y N, DU Z X, et al. GPT understands, too[EB]. arXivpreprint, 2023, arXiv: 2103.10385.. |
9 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deepbidirectional transformers for language understanding[EB]. arXiv pre‐print, 2018, arXiv: 1810.04805. |
10 | BROWN T B, MANN B, RYDER N, et al. Language models are fewshot learners[EB]. arXiv preprint, 2020, arXiv: 2005.14165.. |
11 | TOUVRON H, MARTIN L, STONE K, et al. Llama 2: open foundation and fine-tuned chat models[EB]. arXiv preprint, 2023, arXiv: 2307.09288. |
12 | WEI J, TAY Y, BOMMASANI R, et al. Emergent abilities of large language models[EB]. arXiv preprint, 2022, arXiv: 2206.07682. |
13 | SCHAEFFER R, MIRANDA B, KOYEJO O. Are emergent abilities of large language models a mirage? [C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2023: 55565- 55581. |
14 | YU H K, LIU X Y, TIAN Y L, et al. Sora-based parallel vision for smart sensing of intelligent vehicles: from foundation models to foundation intelligence[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(2): 3123-3126. |
15 | ZHOU K Y, YANG J K, LOY C C, et al. Conditional prompt learning for vision-language models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 16795-16804. |
16 | ZHOU K Y, YANG J K, LOY C C, et al. Learning to prompt for vision-language models[J]. International Journal of Computer Vision, 2022, 130(9): 2337-2348. |
17 | ZHANG J Y, HUANG J X, JIN S, et al. Vision-language models for vision tasks: a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(8): 5625-5644. |
18 | ALAYRAC J B, DONAHUE J, LUC P, et al. Flamingo: a visual language model for few-shot learning[C]//Proceedings of 2022 Advances In Neural Information Processing Systems. New York: Curran Associates, Inc., 2022: 23716-23736. |
19 | SHTEDRITSKI A, RUPPRECHT C, VEDALDI A. What does CLIP know about a red circle? Visual prompt engineering for VLMs[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2023: 11987-11997. |
20 | RAMESH A, PAVLOV M, GOH G, et al. Zero-shot text-to-image generation[C]//Proceedings of the 38th International Conference on Machine Learning. New York: PMLR, 2021: 8821-8831. |
21 | RAMESH A, DHARIWAL P, NICHOL A, et al. Hierarchical text-conditional image generation with CLIP latents[EB]. arXiv preprint, 2022, arXiv: 2204.06125. |
22 | JI Z W, LEE N, FRIESKE R, et al. Survey of hallucination in natural language generation[J]. ACM Computing Surveys, 2023, 55(12): 1-38. |
23 | OPENAI, ACHIAM J, ADLER S, et al. GPT-4 technical report[EB]. arXiv preprint, 2023, arXiv: 2303.08774. |
24 | LIU P F, YUAN W Z, FU J L, et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing[J]. ACM Computing Surveys, 2023, 55(9): 1-35. |
25 | SHEN T H, JIN R R, HUANG Y F, et al. Large language model alignment: a survey[EB]. arXiv preprint, 2023, arXiv: 2309.15025 |
26 | WANG Y F, ZHONG W J, LI L Y, et al. Aligning large language models with human: a survey[EB]. arXiv preprint, 2023, arXiv: 2307.12966. |
27 | WHITE J, FU Q C, HAYS S, et al. A prompt pattern catalog to enhance prompt engineering with ChatGPT[EB]. arXiv preprint, 2023, arXiv: 2302.11382. |
28 | ZHOU Y C, MURESANU A, HAN Z W, et al. Large language models are human-level prompt engineers[EB]. arXiv preprint, 2022, arXiv: 2211.01910. |
29 | GU J D, HAN Z, CHEN S, et al. A systematic survey of prompt engineering on vision-language foundation models[EB]. arXiv preprint, 2023, arXiv: 2307.12980. |
30 | SCHULHOFF S, ILIE M, BALEPUR N, et al. The prompt report: a systematic survey of prompting techniques[EB]. arXiv preprint, 2024, arXiv: 2406.06608. |
31 | DONG Q X, LI L, DAI D M, et al. A survey on In-context learning[EB]. arXiv preprint, 2022, arXiv: 2301.00234. |
32 | LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[EB]. arXiv preprint, 2021, arXiv: 2104.08691. |
33 | YU F Y, PAN K J. The effects of student question-generation with online prompts on learning[J]. Journal of Educational Technology & Society, 2014, 17(3): 267–279. |
34 | SUN H. Offline prompt evaluation and optimization with inverse reinforcement learning[EB]. arXiv preprint, 2023, arXiv: 2309.06553. |
35 | ZIEGLER D M, STIENNON N, WU J, et al. Fine-tuning language models from human preferences[EB]. arXiv preprint, 2019, arXiv: 1909.08593. |
36 | GIRAY L. Prompt engineering with ChatGPT: a guide for academic writers[J]. Annals of Biomedical Engineering, 2023, 51(12): 2629-2633. |
37 | ZHANG Y H, ZHOU K Y, LIU Z W. What makes good examples for visual in-context learning? [C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2023: 17773-17794. |
38 | FENG G H, ZHANG B H, GU Y T, et al. Towards revealing the mystery behind chain of thought: a theoretical perspective[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2023: 70757-70798. |
39 | TIAN Y L, LI X, ZHANG H, et al. VistaGPT: generative parallel transformers for vehicles with intelligent systems for transport automation[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8(9): 4198-4207. |
40 | REYNOLDS L, MCDONELL K. Prompt programming for large language models: beyond the few-shot paradigm[C]//Proceedings of the Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. New York: ACM, 2021: 1-7. |
41 | SHIN T, RAZEGHI Y, LOGAN IV R L, et al. AutoPrompt: eliciting knowledge from language models with automatically generated prompts[EB]. arXiv preprint, 2020, arXiv: 2010.15980. |
42 | ZHANG Z S, ZHANG A, LI M, et al. Automatic chain of thought prompting in large language models[EB]. arXiv preprint, 2022, arXiv: 2210.03493. |
43 | PRASAD A, HASE P, ZHOU X, et al. GrIPS: gradient-free, edit-based instruction search for prompting large language models[EB]. arXiv preprint, 2022, arXiv: 2203.07281. |
44 | PRYZANT R, ITER D, LI J, et al. Automatic prompt optimization with "gradient descent" and beam search[EB]. arXiv preprint, 2023, arXiv: 2305.03495. |
45 | LI C, LIU X M, WANG Y C, et al. Dialogue for prompting: a policy-gradient-based discrete prompt generation for few-shot learning[EB]. arXiv preprint, 2023, arXiv: 2308.07272. |
46 | SHAH D, SCHWARTZ H A, HOVY D. Predictive biases in natural language processing models: a conceptual framework and overview[EB]. arXiv preprint, 2019, arXiv: 1912.11078. |
47 | GEHMAN S, GURURANGAN S, SAP M, et al. RealToxicityPrompts: evaluating neural toxic degeneration in language models[EB]. arXiv preprint, 2020, arXiv: 2009.11462. |
48 | DALE R. GPT-3: what's it good for?[J]. Natural Language Engineering, 2021, 27(1): 113-118. |
49 | AKYüREK E, BOLUKBASI T, LIU F, et al. Towards tracing factual knowledge in language models back to the training data[EB]. arXiv preprint, 2022, arXiv: 2205.11482. |
50 | POESIA G, POLOZOV O, LE V, et al. Synchromesh: reliable code generation from pre-trained language models[EB]. arXiv preprint, 2022, arXiv: 2201.11227. |
51 | LIU H T, LI C Y, WU Q Y, et al. Visual instruction tuning[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2023: 34892-34916. |
52 | LI J N, LI D X, SILVIO S, et al. Blip-2: bootstrapping language-image pre-training with frozen image encoders and large language models[C]//Proceedings of the 40th International Conference on Machine Learning. New York: PMLR, 2023: 19730-19742. |
53 | WIENER N. Some moral and technical consequences of automation: as machines learn they may develop unforeseen strategies at rates that baffle their programmers[J]. Science, 1960, 131(3410): 1355-1358. |
54 | RUSSELL S J, NORVIG P. Artificial intelligence: a modern approach[M]. New Jersey: Prentice Hall, 2016. |
55 | OUYANG L, WU JEFF, XU J, et al. Training language models to follow instructions with human feedback[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2022: 27730-27744. |
56 | LIU R B, JIA C Y, ZHANG G, et al. Second thoughts are best: learning to re-align with human values from text edits[C]//Proceedings of 2022 Advances In Neural Information Processing Systems. New York: Curran Associates, Inc., 2022: 181-196. |
57 | LIU H, SFERRAZZA C, ABBEEL P. Chain of hindsight aligns language models with feedback[EB]. arXiv preprint, 2023, arXiv: 2302.02676. |
58 | STIENNON N, OUYANG L, WU J, et al., Learning to summarize with human feedback[C]//Proceedings of 2020 Advances In Neural Information Processing Systems. New York: Curran Associates, Inc., 2020: 3008-3021. |
59 | SUN Z Q, SHEN Y K, ZHOU Q H, et al. Principle-driven self-alignment of language models from scratch with minimal human supervision[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: Curran Associates, Inc., 2023: 2511-2565. |
60 | DU Y L, LI S, TORRALBA A, et al. Improving factuality and reasoning in language models through multiagent debate[EB]. arXiv preprint, 2023, arXiv: 2305.14325. |
61 | KHAN A, HUGHES J, VALENTINE D, et al. Debating with more persuasive LLMs leads to more truthful answers[EB]. arXiv preprint, 2024, arXiv: 2402.06782. |
62 | ELHAGE N, NANDA N, OLSSON C, et al. A mathematical framework for transformer circuits[J]. Transformer Circuits Thread, 2021, 1(1): 12. |
63 | VILONE G, LONGO L. Explainable artificial intelligence: a systematic review[EB]. arXiv preprint, 2020, arXiv: 2006.00093. |
64 | WALLACE E, FENG S, KANDPAL N, et al. Universal adversarial triggers for attacking and analyzing NLP[EB]. arXiv preprint, 2019, arXiv: 1908.07125. |
65 | AKHTAR N, MIAN A. Threat of adversarial attacks on deep learning in computer vision: a survey[J]. IEEE Access, 2018, 6: 14410-14430. |
66 | ZHU B H, JORDAN M, JIAO J T. Principled reinforcement learning with human feedback from pairwise or k-wise comparisons[C]//Proceedings of the 40th International Conference on Machine Learning. New York: PMLR, 2023: 43037-43067. |
67 | LIPTON Z C. The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery[J]. Queue, 2018, 16(3): 31-57. |
68 | CRITCH A, KRUEGER D. AI research considerations for human existential safety (ARCHES)[EB]. arXiv preprint, 2020, arXiv: 2006.04948. |
69 | CASPER S, DAVIES X, SHI C, et al. Open problems and fundamental limitations of reinforcement learning from human feedback[EB]. arXiv preprint, 2023, arXiv: 2307.15217. |
70 | ZOU A, WANG Z F, CARLINI N, et al. Universal and transferable adversarial attacks on aligned language models[EB]. arXiv preprint, 2023, arXiv: 2307.15043. |
71 | SUZGUN M, MELAS-KYRIAZI L, JURAFSKY D. Prompt-and-rerank: a method for zero-shot and few-shot arbitrary textual style transfer with small language models[EB]. arXiv preprint, 2022, arXiv: 2205.11503. |
72 | WANG Z M, PENG Z Y, QUE H R, et al. RoleLLM: benchmarking, eliciting, and enhancing role-playing abilities of large language models[EB]. arXiv preprint, 2023, arXiv: 2310.00746. |
73 | JIANG H, ZHANG X J, CAO X B, et al. PersonaLLM: investigating the ability of large language models to express personality traits[EB]. arXiv preprint, 2023, arXiv: 2305.02547. |
74 | LU A, ZHANG H X, ZHANG Y Z, et al. Bounding the capabilities of large language models in open text generation with prompt constraints[EB]. arXiv preprint, 2023, arXiv: 2302.09185. |
75 | LI C, WANG J D, ZHANG Y X, et al. Large language models understand and can be enhanced by emotional stimuli[EB]. arXiv preprint, 2023, arXiv: 2307.11760. |
76 | HUANG J T, LAM M H, LI E J, et al. Emotionally numb or empathetic? evaluating how LLMs feel using EmotionBench[EB]. arXiv preprint, 2023, arXiv: 2308.03656. |
77 | WILF A, LEE S S, LIANG P P, et al. Think twice: perspective-taking improves large language models' theory-of-mind capabilities[EB]. arXiv preprint, 2023, arXiv: 2311.10227. |
78 | WESTON J, SUKHBAATAR S. System 2 attention (is something you might need too)[EB]. arXiv preprint, 2023, arXiv: 2311.11829. |
79 | DENG Y H, ZHANG W T, CHEN Z X, et al. Rephrase and respond: let large language models ask better questions for themselves[EB]. arXiv preprint, 2023, arXiv: 2311.04205. |
80 | XU X H, TAO C Y, SHEN T, et al. Re-reading improves reasoning in large language models[EB]. arXiv preprint, 2023, arXiv: 2309.06275. |
81 | PRESS O, ZHANG M R, MIN S, et al. Measuring and narrowing the compositionality gap in language models[EB]. arXiv preprint, 2022, arXiv: 2210.03350. |
82 | ZHAO T, WALLACE E, FENG S, et al. Calibrate before use: improving few-shot performance of language models[C]//Proceedings of the 38th International Conference on Machine Learning. New York: PMLR, 2021: 12697-12706. |
83 | LEE D H, KADAKIA A, TAN K M, et al. Good examples make a faster learner: simple demonstration-based learning for low-resource NER[EB]. arXiv preprint, 2021, arXiv: 2110.08454. |
84 | EISENSTEIN J, ANDOR D, BOHNET B, et al. Honest students from untrusted teachers: learning an interpretable question-answering pipeline from a pretrained language model[EB]. arXiv preprint, 2022, arXiv: 2210.02498. |
85 | ZHANG H X, ZHANG Y Z, ZHANG R Y, et al. Robustness of demonstration-based learning under limited data scenario[EB]. arXiv preprint, 2022, arXiv: 2210.10693. |
86 | LI S Y, CHEN J S, SHEN Y L, et al. Explanations from large language models make small reasoners better[EB]. arXiv preprint, 2022, arXiv: 2210.06726. |
87 | DAI Z Y, ZHAO V Y, MA J, et al. Promptagator: few-shot dense retrieval from 8 examples[EB]. arXiv preprint, 2022, arXiv: 2209.11755. |
88 | YU W H, ITER D, WANG S H, et al. Generate rather than retrieve: large language models are strong context generators[EB]. arXiv preprint, 2022, arXiv: 2209.10063. |
89 | WEI J, WANG X Z, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[C]//Proceedings of the 36th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2022: 24824-24837. |
90 | LAMPINEN A K, DASGUPTA I, CHAN S C Y, et al. Can language models learn from explanations in context?[EB]. arXiv preprint, 2022, arXiv: 2204.02329. |
91 | ZHOU D, SCH?RLI N, HOU L, et al. Least-to-most prompting enables complex reasoning in large language models[EB]. arXiv preprint, 2022, arXiv: 2205.10625. |
92 | LI F F, FERGUS R, PERONA P. One-shot learning of object categories[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(4): 594-611. |
93 | LIU J C, SHEN D H, ZHANG Y Z, et al. What makes good In-context examples for GPT-3? [EB]. arXiv preprint, 2021, arXiv: 2101.06804. |
94 | LU Y, BARTOLO M, MOORE A, et al. Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity[EB]. arXiv preprint, 2021, arXiv: 2104.08786. |
95 | MIN S, LYU X X, HOLTZMAN A, et al. Rethinking the role of demonstrations: what makes in-context learning work?[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2022: 11048-11064. |
96 | YOO K M, KIM J, KIM H J, et al. Ground-truth labels matter: a deeper look into input-label demonstrations[EB]. arXiv preprint, 2022, arXiv: 2205.12685. |
97 | JIANG Z B, XU F F, ARAKI J, et al. How can we know what language models know?[J]. Transactions of the Association for Computational Linguistics, 2020, 8: 423-438. |
98 | MIN S, LYU X X, HOLTZMAN A, et al. Rethinking the role of demonstrations: what makes In-context learning work? [EB]. arXiv preprint, 2022, arXiv: 2202.12837. |
99 | SU H J, KASAI J, WU C H, et al. Selective annotation makes language models better few-shot learners[EB]. arXiv preprint, 2022, arXiv: 2209.01975. |
100 | KIM H J, CHO H, KIM J, et al. Self-generated in-context learning: leveraging auto-regressive language models as a demonstration generator[EB]. arXiv preprint, 2022, arXiv: 2206.08082. |
101 | LI X N, QIU X P. Finding support examples for in-context learning[EB]. arXiv preprint, 2023, arXiv: 2302.13539. |
102 | LI X N, LV K, YAN H, et al. Unified demonstration retriever for in-context learning[EB]. arXiv preprint, 2023, arXiv: 2305.04320. |
103 | ZHANG Y M, FENG S, TAN C H. Active example selection for in-context learning[EB]. arXiv preprint, 2022, arXiv: 2211.04486. |
104 | CHOWDHERY A, NARANG S, DEVLIN J, et al. Palm: scaling language modeling with pathways[J]. Journal of Machine Learning Research, 2023, 24(240): 1-113. |
105 | COBBE K, KOSARAJU V, BAVARIAN M, et al. Training verifiers to solve math word problems[EB]. arXiv preprint, 2021, arXiv: 2110.14168. |
106 | WANG X Z, WEI J, SCHUURMANS D, et al. Self-consistency improves chain of thought reasoning in language models[EB]. arXiv preprint, 2022, arXiv: 2203.11171. |
107 | PATEL A, BHATTAMISHRA S, GOYAL N. Are NLP models really able to solve simple math word problems? [EB]. arXiv preprint, 2021, arXiv: 2103.07191. |
108 | LING W, YOGATAMA D, DYER C, et al. Program induction by rationale generation: learning to solve and explain algebraic word problems[EB]. arXiv preprint, 2017, arXiv: 1705.04146. |
109 | GEVA M, GUPTA A, BERANT J. Injecting numerical reasoning skills into language models[EB]. arXiv preprint, 2020, arXiv: 2004.04487. |
110 | YAO S Y, YU D, ZHAO J, et al. Tree of thoughts: deliberate problem solving with large language models[C]//Proceedings of the 37th International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2023: 11809-11822. |
111 | YE Q Y, AXMED M, PRYZANT R, et al. Prompt engineering a prompt engineer[EB]. arXiv preprint, 2023, arXiv: 2311.05661. |
112 | GAO Y F, XIONG Y, GAO X Y, et al. Retrieval-augmented generation for large language models: a survey[EB]. arXiv preprint, 2023, arXiv: 2312.10997. |
113 | CHEN W H, HU H X, SAHARIA C, et al. Re-imagen: retrieval-augmented text-to-image generator[EB]. arXiv preprint, 2022, arXiv: 2209.14491. |
114 | HUANG R J, HUANG J W, YANG D C, et al., Make-an-audio: text-to-audio generation with prompt-enhanced diffusion models[C]//Proceedings of the 40th International Conference on Machine Learning. New York: PMLR, , 2023: 13916-13932. |
115 | ZHU Y H, REN C Y, XIE S Y, et al. REALM: rag-driven enhancement of multimodal electronic health records analysis via large language models[EB]. arXiv preprint, 2024, arXiv: 2402.07016. |
116 | RYU C, LEE S, PANG S, et al. Retrieval-based evaluation for LLMs: a case study in Korean legal QA[C]//Proceedings of the Natural Legal Language Processing Workshop 2023. Stroudsburg: Association for Computational Linguistics, 2023: 132-137. |
117 | MA A, A L. A rag chatbot for precision medicine of multiple myeloma[EB]. medRxiv preprint, 2024, medRxiv: 24304293 . |
118 | DAI X Y, GUO C, TANG Y, et al. VistaRAG: toward safe and trustworthy autonomous driving through retrieval-augmented generation[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(4): 4579-4582. |
119 | LI X, LIU E L, SHEN T Y, et al. ChatGPT-based scenario engineer: a new framework on scenario generation for trajectory prediction[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(3): 4422-4431. |
120 | WANG X, HUANG J, TIAN Y L, et al. Parallel driving with big models and foundation intelligence in cyber-physical-social spaces[J]. Research, 2024, 7(3): 1-16. |
121 | WANG X, HUANG J, TIAN Y L, et al. AGI in metaverse for smart cities and societies: a cyber physical social approach[C]//Proceedings of the 2024 Australian & New Zealand Control Conference (ANZCC). Piscataway: IEEE Press, 2024: 61-66. |
122 | 黄峻, 田永林, 戴星原, 等. 基于深度学习的自动驾驶多模态轨迹预测方法:现状及展望[J]. 智能科学与技术学报, 2023, 5(2): 180-199. |
HUANG J, TIAN Y L, DAI X Y, et al. Deep learning-based multimodal trajectory prediction methods for autonomous driving: state of the art and perspectives[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(2): 180-199. | |
123 | PENG C, YANG X, YU Z H, et al. Clinical concept and relation extraction using prompt-based machine reading comprehension[J]. Journal of the American Medical Informatics Association, 2023, 30(9): 1486-1493. |
124 | VASWANI A, SHAZEER N, NIKI P, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: Curran Associates, Inc., 2017: 6000-6010. |
125 | LI Y F, YIN Y J, LI J, et al. Prompt-driven neural machine translation[C]//Proceedings of the Findings of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2022: 2579-2590. |
126 | ZHANG Y Z, SUN S Q, GALLEY M, et al. DialoGPT: large-scale generative pre-training for conversational response generation[EB]. arXiv preprint, 2019, arXiv: 1911.00536. |
127 | FATOUROS G, SOLDATOS J, KOUROUMALI K, et al. Transforming sentiment analysis in the financial domain with ChatGPT[J]. Machine Learning with Applications, 2023, 14: 100508. |
128 | ZHANG P F, CHAI T T, XU Y D. Adaptive prompt learning-based few-shot sentiment analysis[J]. Neural Processing Letters, 2023, 55(6): 7259-7272. |
129 | GU X, CHEN X L, LU P, et al. AGCVT-prompt for sentiment classification: automatically generating chain of thought and verbalizer in prompt learning[J]. Engineering Applications of Artificial Intelligence, 2024, 132: 107907. |
130 | WANG Y T, WANG J G, CAO Y S, et al. Integrated inspection on PCB manufacturing in cyber-physical-social systems[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023, 53(4): 2098-2106. |
131 | DING M, YANG Z Y, HONG W Y, et al. Cogview: mastering text-to-image generation via transformers[C]//Proceedings of the 35th Conference on Neural Information Processing Systems. New York: Curran Associates, Inc., 2021: 19822-19835. |
132 | HINZ T, HEINRICH S, WERMTER S. Semantic object accuracy for generative text-to-image synthesis[J]. IEEE Trans Pattern Anal Mach Intell, 2022, 44(3): 1552-1565. |
133 | TAO M, TANG H, WU F, et al. DF-GAN: a simple and effective baseline for text-to-image synthesis[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2022: 16494-16504. |
134 | LI B W, QI X J, THOMAS L, et al. Controllable text-to-image generation[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. New York: Curran Associates Inc., 2019: 2665-2675. |
135 | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 10674-10685. |
136 | LI X J, YIN X, LI C Y, et al. Oscar: object-semantics aligned pre-training for vision-language tasks[C]//Proceedings of the 2020 European Conference on Computer Vision. Cham: Springer, 2020: 121-137. |
137 | KHALIL M, KHALIL A, NGOM A. A comprehensive study of vision transformers in image classification tasks[EB]. arXiv preprint, 2023, arXiv: 2312.01232. |
138 | CROWSON K, BIDERMAN S, KORNIS D, et al. VQGAN-CLIP: open domain image generation and editing with natural language guidance[C]//Proceedings of 2022 European Conference on Computer Vision. Cham: Springer, 2022: 88-105. |
139 | KWON G, YE J C. CLIPstyler: image style transfer with a single text condition[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 18041-18050. |
140 | BAR-TAL O, OFRI-AMAR D, FRIDMAN R, et al. Text2LIVE: text-driven layered image and video editing[C]//Proceedings of 2022 European Conference on Computer Vision. Cham: Springer, 2022: 707-723. |
141 | LI C X, LIU H Y, LIU Y F, et al. Endora: video generation models as endoscopy simulators[EB]. arXiv preprint, 2024, arXiv: 2403.11050. |
142 | LIANG J Y, FAN Y C, ZHANG K, et al. MoVideo: motion-aware video generation with diffusion models[EB]. arXiv preprint, 2023, arXiv: 2311.11325. |
143 | WU J Z, GE Y X, WANG X T, et al. Tune-A-video: one-shot tuning of image diffusion models for text-to-video generation[C]//Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2023: 7589-7599. |
144 | CHENG J X, XIAO T J, HE T. Consistent video-to-video transfer using synthetic dataset[EB]. arXiv preprint, 2023, arXiv: 2311.00213. |
145 | YOUSAF A, NASEER M, KHAN S, et al. Videoprompter: an ensemble of foundational models for zero-shot video understanding[EB]. arXiv preprint, 2023, arXiv: 2310.15324. |
146 | MI Y C, LI Y, SHU Y, et al. CLiF-VQA: enhancing video quality assessment by incorporating high-level semantic information related to human feelings[EB]. arXiv preprint, 2023, arXiv: 2311.07090. |
147 | KIM W, CHOI C, LEE W, et al. An image grid can be worth a video: zero-shot video question answering using a VLM[EB]. arXiv preprint, 2024, arXiv: 2403.18406. |
148 | ZHANG C, LU T X, ISLAM M M, et al. A simple LLM framework for long-range video question-answering[EB]. arXiv preprint, 2023, arXiv: 2312.17235. |
149 | WANG X H, ZHANG Y H, ZOHAR O, et al. VideoAgent: long-form video understanding with large language model as agent[EB]. arXiv preprint, 2024, arXiv: 2403.10517. |
150 | 倪清桦, 郭超, 王飞跃. 平行戏剧: 新时代戏剧的人机协同创作与智能管理[J]. 智能科学与技术学报, 2023, 5(4): 436-445. |
NI Q H, GUO C, WANG F Y. Parallel theaters: human-machine collaborative creation and intelligent management for theatrical art[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(4): 436-445. | |
151 | AGOSTINELLI A, DENK T I, BORSOS Z, et al. MusicLM: generating music from text[EB]. arXiv preprint, 2023, arXiv: 2301.11325. |
152 | AHN J, VERMA R, LOU R Z, et al. Large language models for mathematical reasoning: progresses and challenges[EB]. arXiv preprint, 2024, arXiv: 2402.00157. |
153 | SAHOO P, SINGH A K, SAHA S, et al. A systematic survey of prompt engineering in large language models: techniques and applications[EB]. arXiv preprint, 2024, arXiv: 2402.07927. |
154 | WANG X Y, AMAYUELAS A, ZHANG K X, et al. Understanding reasoning ability of language models from the perspective of reasoning paths aggregation[EB]. arXiv preprint, 2024, arXiv: 2402.03268. |
155 | WANG S Y, WEI Z Y, CHOI Y, et al. Can LLMs reason with rules? logic scaffolding for stress-testing and improving LLMs[EB]. arXiv preprint, 2024, arXiv: 2402.11442. |
156 | WAN Y X, WANG W X, YANG Y L, et al. A & B == B & A: triggering logical reasoning failures in large language models[EB]. arXiv preprint, 2024, arXiv: 2401.00757. |
157 | DENG S J, DONG H H, SI X J. Enhancing and evaluating logical reasoning abilities of large language models[C]//Proceedings of ICLR 2024 Workshop on Secure and Trustworthy Large Language Models. [S.l.:s.n.], 2024. |
158 | LI Y H, ZHANG R, LIU J Y. An enhanced prompt-based LLM reasoning scheme via knowledge graph-integrated collaboration[EB]. arXiv preprint, 2024, arXiv: 2402.04978. |
159 | IMANI S, DU L, SHRIVASTAVA H. MathPrompter: mathematical reasoning using large language models[EB]. arXiv preprint, 2023, arXiv: 2303.05398. |
160 | WU Z Y, JIANG M, SHEN C. Get an A in math: progressive rectification prompting[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. California: AAAI, 2024: 19288-19296. |
161 | SRIVASTAVA S S, GANDHI A. MathDivide: improved mathematical reasoning by large language models[EB]. arXiv preprint, 2024, arXiv: 2405.13004. |
162 | WU H, YU X G. Prompt incorporates math knowledge to improve efficiency and quality of large language models to translate math word problems[C]//Proceedings of the 2023 International Conference on Intelligent Education and Intelligent Research(IEIR). Piscataway: IEEE Press, 2023: 1-5. |
163 | SUTTON R T, PINCOCK D, BAUMGART D C, et al. An overview of clinical decision support systems: benefits, risks, and strategies for success[J]. NPJ Digital Medicine, 2020, 3: 17. |
164 | 林飞, 王飞跃, 田永林, 等. 平行药物系统: 基于大语言模型和三类人的框架与方法[J]. 智能科学与技术学报, 2024, 6(1): 88-99. |
LIN F, WANG F Y, TIAN Y L, et al. Parallel drug systems: framework and methods based on large language models and three types of humans[J]. Chinese Journal of Intelligent Science and Technology, 2024, 6(1): 88-99. | |
165 | LIN F, GAO T, SUN D, et al. Parallel medical devices and instruments: integrating edge and cloud intelligence for smart treatment and health systems[J]. IEEE/CAA Journal of Automatica Sinica, 2024, accepted. |
166 | NAZARY F, DELDJOO Y, DI NOIA T. ChatGPT-HealthPrompt. harnessing the power of XAI in prompt-based healthcare decision support using ChatGPT[C]//Proceedings of 2023 European Conference on Artificial Intelligence International Workshops. Cham: Springer, 2024: 382-397. |
167 | WANG S Y, ZHU Y X, LI Z H, et al. ChatGPT as your vehicle co-pilot: an initial attempt[J]. IEEE Transactions on Intelligent Vehicles, 2023, 8: 4706-4721. |
168 | GARRETT C R, LOZANO-PéREZ T, KAELBLING L P. PDDLStream: integrating symbolic planners and blackbox samplers via optimistic adaptive planning[C]//Proceedings of the 30th International Conference on Automated Planning and Scheduling. California: AAAI, 2020: 440-448. |
169 | LEVINE S, CHELSEA F, TREVOR D, et al. End-to-end training of deep visuomotor policies[J]. Journal of Machine Learning Research, 2016, 17(1): 1334-1373. |
170 | SINGH I, BLUKIS V, MOUSAVIAN A, et al. ProgPrompt: generating situated robot task plans using large language models[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA). Piscataway: IEEE Press, 2023: 11523-11530. |
171 | CHEN B Y, XIA F, ICHTER B, et al. Open-vocabulary queryable scene representations for real world planning[C]//Proceedings of the 2023 IEEE International Conference on Robotics and Automation(ICRA). Piscataway: IEEE Press, 2023: 11509-11522. |
172 | KANNAN S S, VENKATESH V L N, MIN B C. SMART-LLM: smart multi-agent robot task planning using large language models[EB]. arXiv preprint, 2023, arXiv: 2309.10062. |
173 | LIN F, TIAN Y, WANG Y, et al. Airvista: empowering uavs with 3D spatial reasoning abilities through multimodal large language model agent[C]//Proceedings of the 2024 IEEE 27th International Conference on Intelligent Transportation Systems. Piscataway: IEEE Press, 2024. |
174 | TIAN Y, LIN F, ZHANG X, et al. Logisticsvista: 3D terminal delivery services with uavs, ugvs and usvs based on foundation models and scenarios engineering[C]//Proceedings of 2024 IEEE International Conference on Service Operations and Logistics, and Informatics. Piscataway: IEEE Press, 2024. |
175 | WANG X, YANG J, HAN J P, et al. Metaverses and DeMetaverses: from digital twins in CPS to parallel intelligence in CPSS[J]. IEEE Intelligent Systems, 2022, 37(4): 97-102. |
176 | YANG J, WANG X, TIAN Y L, et al. Parallel intelligence in CPSSs: being, becoming, and believing[J]. IEEE Intelligent Systems, 2023, 38(6): 75-80. |
177 | SOLDAINI L, KINNEY R, BHAGIA A, et al. Dolma: an open corpus of three trillion tokens for language model pretraining research[EB]. arXiv preprint, 2024, arXiv: 2402.00159. |
178 | HUANG S H, DONG L, WANG W H, et al. Language is not all you need: aligning perception with language models[EB]. arXiv preprint, 2023, arXiv: 2302.14045. |
179 | GIRDHAR R, EL-NOUBY A, LIU Z, et al. ImageBind one embedding space to bind them all[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2023: 15180-15190. |
180 | HAN J M, ZHANG R R, SHAO W Q, et al. ImageBind-LLM: multi-modality instruction tuning[EB]. arXiv preprint, 2023, arXiv: 2309.03905. |
181 | YANG J, WANG Y T, WANG X X, et al. Generative AI empowering parallel manufacturing: building a "6S" collaborative production ecology for manufacturing 5.0[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2024: 1-15. |
182 | WANG X, WANG Y T, YANG J, et al. The survey on multi-source data fusion in cyber-physical-social systems: foundational infrastructure for industrial metaverses and industries 5.0[J]. Information Fusion, 2024, 107: 102321. |
183 | 皮佩定, 倪清桦, 杨静, 等. 平行夏尔希里:生态资源智能管护及其可持续发展新途径[J]. 智能科学与技术学报, 2023, 5(3): 283-292. |
PI P D, NI Q H, YANG J, et al. Parallel Sharhili: a new approach to sustainable development and intelligent management of ecological resources[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(3): 283-292. | |
184 | 李娟娟, 管桑田, 秦蕊, 等. 智能区块链与区块链智能:构筑DePIN的基础设施智能[J]. 智能科学与技术学报, 2024, 6(1): 5-16. |
LI J J, GUAN S T, QIN R, et al. Intelligent blockchains and blockchain intelligence: the infrastructure intelligence for DePIN[J]. Chinese Journal of Intelligent Science and Technology, 2024, 6(1): 5-16. | |
185 | 秦蕊, 梁小龙, 李娟娟, 等. 平行科研院所:从数字化转型到智能化变革[J]. 智能科学与技术学报, 2023, 5(2): 212-221. |
QIN R, LIANG X L, LI J J, et al. Parallel scientific research institutes: from digital transformation to intelligent revolution[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(2): 212-221. | |
186 | 李娟娟, 王戈, 王晓, 等. 加密管理: 一种基于区块链的新型组织管理模式[J]. 智能科学与技术学报, 2022, 4(2): 145-156. |
LI J J, WANG G, WANG X, et al. Crypto management: a novel organizational management model based on blockchain[J]. Chinese Journal of Intelligent Science and Technology, 2022, 4(2): 145-156. | |
187 | 杨静, 王晓, 王雨桐, 等. 平行智能与CPSS: 三十年发展的回顾与展望[J]. 自动化学报, 2023, 49(3): 614-634. |
YANG J, WANG X, WANG Y T, et al. Parallel intelligence and CPSS in 30 years: an ACP approach[J]. Acta Automatica Sinica, 2023, 49(3): 614-634. | |
188 | 田永林, 陈苑文, 杨静, 等. 元宇宙与平行系统: 发展现状、对比及展望[J]. 智能科学与技术学报, 2023, 5(1): 121-132. |
TIAN Y L, CHEN Y W, YANG J, et al. Metaverses and parallel systems: the state of the art, comparisons and prospects[J]. Chinese Journal of Intelligent Science and Technology, 2023, 5(1): 121-132. | |
189 | 缪青海, 王兴霞, 杨静, 等. 从基础智能到通用智能: 基于大模型的GenAI和AGI之现状与展望[J]. 自动化学报, 2024, 50(4): 674-687. |
MIAO Q H, WANG X X, YANG J, et al. From foundation intelligence to general intelligence: the state-of-art and perspectives of GenAI and AGI based on foundation models[J]. Acta Automatica Sinica, 2024, 50(4): 674-687. | |
190 | GE J W, CHANG C, ZHANG J W, et al. LLM-based operating systems for automated vehicles: a new perspective[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(4): 4563-4567. |
191 | TENG S Y, YAN R, ZHANG X T, et al. Sora for hierarchical parallel motion planner: a safe end-to-end method against OOD events[J]. IEEE Transactions on Intelligent Vehicles, 2024, 9(4): 4573-4576. |
192 | 王飞跃, 王艳芬, 陈薏竹, 等. 联邦生态:从联邦数据到联邦智能. 智能科学与技术学报[J], 2020, 2(4): 305-313. |
WANG F Y, WANG Y F, CHEN Y Z, et al. Federated ecology: from federated data to federated intelligence[J]. Chinese Journal of Intelligent Science and Technology, 2020, 2(4): 305-313. | |
193 | 邓攀, 刘俊廷, 王晓, 等. STCTN: 一种基于时域偏倚校正与空域因果传递的时空因果表示学习方法[J]. 计算机学报, 2023, 46(12): 2535-2550. |
DENG P, LIU J T, WANG X, et al. STCTN: a spatio-temporal causal representation learning method based on temporal bias adjustment and spatial causal transition[J]. Chinese Journal of Computers, 2023, 46(12): 2535-2550. | |
194 | FAN J, TU J H, LI G L, et al. Unicorn: a unified multi-tasking matching model[J]. ACM SIGMOD Record, 2024, 53(1): 44-53. |
[1] | 林飞, 王飞跃, 田永林, 丁显廷, 倪清桦, 王静, 申乐. 平行药物系统:基于大语言模型和三类人的框架与方法[J]. 智能科学与技术学报, 2024, 6(1): 88-99. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
|