DISTRIBUTED EVOLUTIONARY POLICY OPTIMIZATION FOR EFFICIENT TRAINING OF MULTI-AGENT REASONING MODELS

ICTACT Journal on Soft Computing ( Volume: 16 , Issue: 2 )

Abstract

Multi-Agent Reinforcement Learning (MARL) has emerged as a key paradigm for solving complex real-world problems involving multiple agents interacting in dynamic environments. However, training MARL models, especially for cooperative reasoning tasks, remains computationally intensive and sample-inefficient due to nonstationarity, credit assignment, and policy coupling issues. Conventional policy gradient methods struggle with convergence and scalability in multi-agent settings. Centralized training frameworks suffer from bottlenecks and synchronization overheads. Evolutionary algorithms, while more robust to non-differentiable objectives, are often too slow when applied in single-node environments. To address these challenges, we propose Distributed Co-evolutionary Policy Optimization (DCPO), a hybrid learning framework that distributes evolutionary computation across multiple nodes. DCPO decomposes the global policy search into sub-population-based parallel explorations, with each node evolving a subset of agent policies using fitness-driven mutation, crossover, and local policy gradient updates. A global coordinator aggregates top-performing policies periodically to ensure cooperative learning convergence. DCPO was tested on standard cooperative MARL benchmarks such as StarCraft II Micromanagement and Multi-Agent Particle Environments (MPE). Compared to traditional baselines such as MADDPG, QMIX, MAPPO, COMA, and EPOpt, DCPO showd up to 37% faster convergence, 25% higher final cumulative rewards, and enhanced generalization in unseen environments.

Authors

A. Rajavel1, P.T. Kalaivaani2
Kamaraj College of Engineering and Technology, India1, Vivekanandha College of Engineering for Women, India2

Keywords

Multi-Agent Reinforcement Learning, Evolutionary Algorithms, Distributed Learning, Policy Optimization, Cooperative Reasoning

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 16 , Issue: 2 )
Date of Publication
July 2025
Pages
3893 - 3898
Page Views
29
Full Text Views
3

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in