Tactical Missile Technology

2022, 01, No.211 120-130

A reinforcement learning algorithm for 2V2 close-range air combat

Tang Wenquan¹ Sun Ying^2,3 Yang Qi^4,3 Li Hui^1,5 Wang Zhuang¹ He Li¹

College of Computer Science,Sichuan University;National University of Defense Technology;No.31001 Troop of PLA;Army Engineering University of PLA;Nation Key Laboratory of Fundamental Science on Synthetic Vision,Sichuan University;

Email:

DOI: 10.16358/j.issn.1009-1300.20210081

998	17	117
Downloads	Citas	Reads

Cite Download

PDF

Reference

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

Abstract Full Article References Publication Related

Abstract：

Aiming at the problems of the deep reinforcement learning algorithm in the decision-making process of multi-aircraft close-range air combat, such as difficulties to deal with high-dimensional state space and convergence, a proximal policy optimization algorithm based on attention mechanism is proposed. Based on the classical proximal policy optimization algorithm, the idea of attention is introduced. By constructing the attention model based on the threat degree of air combat, the attention distribution and information aggregation of air combat situation information in multi-aircraft operations are constructed, so that the algorithm does not directly deal with the high-dimensional state space. The simulation results of 2 V2 close-range air combat show that the training model of proximal policy optimization algorithm based on attention mechanism can drive the agent to make the correct maneuver against the opponent's strategy, to obtain the dominant position. The algorithm is superior to the traditional proximal policy optimization algorithm in convergence speed and stability. By introducing attention mechanism, the algorithm performance and air combat decision-making efficiency can be improved.

KeyWords： close-range air combat; PPO; reinforcement learning; attention mechanism; artificial intelligence;

References

[1]傅莉，王晓光.无人战机近距空战微分对策建模研究[J].兵工学报，2012,33(10):1210-1216.

[2]钱炜祺，车竞，何开锋.基于矩阵博弈的空战决策方法[C].中国指挥与控制学会会议论文集（上）：2014.

[3]方绍琨，李登峰.微分对策及其在军事领域的研究进展[J].指挥控制与仿真，2018(1):114-117.

[4]赵威.基于专家系统的双机协同攻击决策技术研究[D].西安：西北工业大学，2007:6-14.

[5]马文，李辉，王壮，等.基于深度随机博弈的近距空战机动决策[J].系统工程与电子技术，2021,43(2):443-451.

[6]张强，杨任农，张涛，等.基于Q-network强化学习的超视距空战的机动决策[J].空军工程大学学报（自然科学版），2018(6):8-14.

[7]孙楚，赵辉，王渊，等.基于强化学习的无人机自主机动决策方法[J].火力与指挥控制，2019,44(4):142-149.

[8]曹雷.基于深度强化学习的智能博弈对抗关键技术[J].指挥信息系统与技术，2019(5):1-7.

[9] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv:1707. 06347,2017.

[10]付长军，郑伟明，葛蕾，等.人工智能在作战仿真中的应用研究[J].无线电工程，2020(4):257-261.

[11] Mnih V, Kavukcuoglu K, Silver D, e t al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv:1312.5602,2013.

[12] Gao Y, Chen S F, Lu X. Research on reinforcement learning technology:A review[J]. Acta Automatica Sinica,2004,30(1):86-100.

[13]赵星宇，丁世飞.深度强化学习研究综述[J].计算机科学，2018(7):1-6.

[14] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[C]. International Conference on Machine Learning,PMLR,2014:387-395.

[15] Kastner S, Ungerleider L G. Mechanisms of visual attention in the human cortex[J]. Annual Review of Neuroscience,2000,23(1):315-341.

[16] Mnih V,Heess N,Graves A,et al. Recurrent models of visual attention[C]. Advances in Neural Information Processing System,2014:2204-2212.

[17] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[C].Proceedings of the International Conference on Learning Representations,San Diego,USA,2015.

[18] Vaswani A,Shazeer N,Parmar N,et al. Attention is all you need[C]. Advances in Neural Information Processing Systems,2017:5998-6008.

[19]周典成.基于注意力机制的弱监督目标检测方法的研究[D].合肥：中国科学技术大学，2020.

[20]苑帅，罗继勋，付昭旺.战斗机空战威胁特性建模与仿真分析[J].火力与指挥控制，2014, 39(1):13-17.

[21]姜龙亭，寇雅楠，王栋，等.动态变权重的近距空战态势评估方法[J].电光与控制，2019,26(4):1-5.

[22]董肖杰.空战机动动作库及控制算法设计研究[C].中国指挥与控制学会.第五届中国指挥控制大会论文集，2009,27(6):72-75+79.

[23]王锐平，高正红.无人机空战仿真中基于机动动作库的决策模型[J].飞行力学，2009,27(6):72-75+79.

[24] Wang Z, Wu H L, Li H, et al. Improving maneuver strategy in air combat by alternate freeze games with a deep reinforcement learn algorithm[J]. Mathematical Problems in Engineering,2020(1):1-17.

Basic Information:

DOI：10.16358/j.issn.1009-1300.20210081

China Classification Code:TP18;E91

Citation Information:

[1]Tang Wenquan,Sun Ying,Yang Qi ,et al.A reinforcement learning algorithm for 2V2 close-range air combat[J].Tactical Missile Technology,2022,No.211(01):120-130.DOI:10.16358/j.issn.1009-1300.20210081.

Fund Information:

请选择需要下载的pdf数据

Tactical Missile Technology

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

quote

请选择需要下载的pdf数据

Tactical Missile Technology

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

quote

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈