ICTACT Journals

DISTRIBUTED EVOLUTIONARY POLICY OPTIMIZATION FOR EFFICIENT TRAINING OF MULTI-AGENT REASONING MODELS

ICTACT Journal on Soft Computing ( Volume: 16 , Issue: 2 )

Abstract

Multi-Agent Reinforcement Learning (MARL) has emerged as a key paradigm for solving complex real-world problems involving multiple agents interacting in dynamic environments. However, training MARL models, especially for cooperative reasoning tasks, remains computationally intensive and sample-inefficient due to nonstationarity, credit assignment, and policy coupling issues. Conventional policy gradient methods struggle with convergence and scalability in multi-agent settings. Centralized training frameworks suffer from bottlenecks and synchronization overheads. Evolutionary algorithms, while more robust to non-differentiable objectives, are often too slow when applied in single-node environments. To address these challenges, we propose Distributed Co-evolutionary Policy Optimization (DCPO), a hybrid learning framework that distributes evolutionary computation across multiple nodes. DCPO decomposes the global policy search into sub-population-based parallel explorations, with each node evolving a subset of agent policies using fitness-driven mutation, crossover, and local policy gradient updates. A global coordinator aggregates top-performing policies periodically to ensure cooperative learning convergence. DCPO was tested on standard cooperative MARL benchmarks such as StarCraft II Micromanagement and Multi-Agent Particle Environments (MPE). Compared to traditional baselines such as MADDPG, QMIX, MAPPO, COMA, and EPOpt, DCPO showd up to 37% faster convergence, 25% higher final cumulative rewards, and enhanced generalization in unseen environments.

Authors

A. Rajavel
Kamaraj College of Engineering and Technology, India

Keywords

Multi-Agent Reinforcement Learning, Evolutionary Algorithms, Distributed Learning, Policy Optimization, Cooperative Reasoning

Published By

ICTACT

Published In

ICTACT Journal on Soft Computing
( Volume: 16 , Issue: 2 )

Date of Publication

July 2025

Pages

3893 - 3898

Doi

10.21917/ijsc.2025.0539

Page Views

372

Article Details ICTACT Journals