Information Channels
- Information usually comes from multiple sources: observed entities, different sensors, communication channels.
- Agents can differentiate between information channels, because they occupy always the same vector positions.
$$ \begin{eqnarray} \mathbf{O}_t^a = \begin{bmatrix} ent_1 \\ \vdots \\ ent_k \end{bmatrix}_t^a = \begin{bmatrix} f_{1,1} & \cdots & f_{1,z} \\ \vdots & \ddots & \vdots \\ f_{k,1} & \cdots & f_{k,z} \end{bmatrix} _t^a \end{eqnarray} $$
$$ \begin{eqnarray} \mathbf{S}_t = \begin{bmatrix} ent_1 \\ \vdots \\ ent_k \end{bmatrix}_t = \begin{bmatrix} f_{1,1} & \cdots & f_{1,z} \\ \vdots & \ddots & \vdots \\ f_{k,1} & \cdots & f_{k,z} \end{bmatrix}_t \end{eqnarray} $$
Graph-Approach Advantages | Disadvantages |
---|---|
1. Better represents coordination problems. | 1. Cannot differentiate entities a-priori in observations, for instance, raw images (although we can for states). |
2. Allow use of more appropriate NNs (GNN, Transformers). | 2. Cannot directly include last action and one hot encoding of agent id in the observation. |
3. Makes the NNs parameters invariant to the number of agents. | |
4. Makes transfer learning and curriculum learning easy to implement. |
$$ \begin{eqnarray} f_{i,\texttt{IS_SELF}}^a = \begin{cases} 1, & \text{if } i = a\\ 0, & \text{otherwise.} \end{cases} \end{eqnarray} $$
$$ \begin{eqnarray} f_{i, \texttt{IS_AGENT}}^a = \begin{cases} 1, & \text{if } i \in A\\ 0, & \text{otherwise.} \end{cases} \end{eqnarray} $$
Transformer hypernetwork:
Similar architecture to UPDET, but:
Spread 3v3
Spread 4v4
Spread 5v5
Spread 6v6
5m_vs_6m
8m_vs_9m
27m_vs_30m
6h_vs_8z
5s10z
3s5z_vs_3s6z
MMM2
corridor
Model | Agent | Mixer |
---|---|---|
TransfQMix | 50k | 50k |
QMix | 27k | 18k |
QPlex | 27k | 251k |
O-CWQMix | 27k | 179k |
Spread 3v3
Model | Agent | Mixer |
---|---|---|
TransfQMix | 50k | 50k |
QMix | 28k | 56k |
QPlex | 28k | 597k |
O-CWQMix | 28k | 301k |
Spread 6v6
Model | Agent | Mixer |
---|---|---|
TransfQMix | 50k | 50k |
QMix | 49k | 283k |
QPlex | 49k | 3184k |
O-CWQMix | 49k | 1021k |
SC2 27m_vs_30m
Model | 3v3 | 4v4 | 5v5 | 6v6 |
---|---|---|---|---|
TransfQMix (3v3) | 0.98 | 0.88 | 0.8 | 0.75 |
TransfQMix (4v4) | 0.96 | 0.93 | 0.9 | 0.86 |
TransfQMix (5v5) | 0.88 | 0.85 | 0.82 | 0.82 |
TransfQMix (6v6) | 0.91 | 0.88 | 0.85 | 0.84 |
TransfQMix (CL) | 0.88 | 0.88 | 0.87 | 0.87 |
State-of-the-art | 0.76 | 0.45 | 0.36 | 0.33 |
Spread: Zero Shot Transfer (POL)
SC2: 8m_vs_9m to 5m_vs_6m
SC2: 5s10z to 3s5z_vs_3s6z