Supervisor: Southwest Ordnance Industry Bureau
Organizer: Chongqing Ordnance Industry Society
Chongqing University of Technology

Research on multi UAV cooperative confrontation algorithm based on improved reinforcement learning

DOI: 10.11809/bqzbgcxb2023.05.033
Keywords: unmanned aerial vehicle swarm; reinforcement learning; cooperative control; swarm intelligence; attack defense countermeasure
Abstract: The research of combat cooperation of multi UAVs mainly includes flight cooperation, reconnaissance cooperation and interference cooperation. With the increase of both the number of UAVs and the content of cooperative decisions, state space and action space dimensions of the multi agent reinforcement learning model grow exponentially. Multi agent reinforcement learning algorithm is not easy to converge in training, and the level of cooperative decision making is difficult to be significantly improved. This paper adopts and models on the principle of multi agent deep deterministic policy gradient (MADDPG) algorithm, based on which it also proposes a multi agent deep deterministic policy gradient algorithm of the selective experience storage policy (SES MADDPG). The algorithm selectively stores the experience entering the experience pool by setting the recycling storage criteria as well as selectivity factors to alleviate the problem of reward sparsity. The simulation experiments show that, with guaranteed time complexity of the algorithm, the SES MADDPG algorithm has a better convergence effect than other reinforcement learning algorithms, and shows an increase of 25.427% in task completion rate compared with MADDPG algorithm.
Issue: Vol. 44 No. 5 (2023)
Published: 2023-05-28
PDF HTML