Agent-Oriented Centralized Critic for Asynchronous Multi-Agent Reinforcement Learning

Published: 13 Mar 2024, Last Modified: 22 Apr 2024ALA 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agent-Oriented Centralized Critic, MacDec-POMDP, Asynchronous Multi-Agent Reinforcement Learning
TL;DR: We propose a novel approach named agent-oriented centralized critic for asynchronous multi-agent reinforcement learning.
Abstract: Multi-agent reinforcement learning (MARL) has been actively developed and successfully applied in various fields. In the conventional MARL setting, which most previous works consider, all agents simultaneously take their actions every time due to the same duration across actions. However, real-world scenarios often involve agents executing actions with different duration resulting in asynchronous action selection across the agents. The macro-action decentralized partially observable Markov decision process (MacDec-POMDP) provides a framework for modeling multi-agent decision-making, where the action selection among the agents occurs asynchronously across time. While several works have explored MARL methods for MacDec-POMDP, existing methods for such asynchronicity focused on how to utilize trajectories for training and simply adopt conventional MARL architectures. In this paper, we propose a novel approach named agent-oriented centralized critic (AOCC) for MacDec-POMDP, which 1) explicitly encode each agent's observation history with the timestep information when the agent start to perform a macro-action, and 2) explicitly aggregating them for agent-oriented critic learning. Our experimental evaluation on a macro-action-based multi-agent benchmark demonstrates that the proposed approach significantly outperforms other baseline methods for MacDec-POMDP.
Type Of Paper: Full paper (max page 8)
Anonymous Submission: Anonymized submission.
Submission Number: 11
Loading