Similar architecture to UPDET, but:
Transformer hypernetwork:
Setup:
Spread 3v3
Spread 4v4
Spread 5v5
Spread 6v6
Setup:
5m_vs_6m
8m_vs_9m
27m_vs_30m
6h_vs_8z
5m_vs_6m
3s5z_vs_3s6z
MM2
corridor
Model | Agent | Mixer |
---|---|---|
TransfQMix | 50k | 50k |
QMix | 27k | 18k |
QPlex | 27k | 251k |
O-CWQMix | 27k | 179k |
Spread 3v3
Model | Agent | Mixer |
---|---|---|
TransfQMix | 50k | 50k |
QMix | 28k | 56k |
QPlex | 28k | 597k |
O-CWQMix | 28k | 301k |
Spread 6v6
Model | Agent | Mixer |
---|---|---|
TransfQMix | 50k | 50k |
QMix | 49k | 283k |
QPlex | 49k | 3184k |
O-CWQMix | 49k | 1021k |
SC2 27m_vs_30m
Spread: the learned policy is transferable between different team of agents.
Model | 3v3 | 4v4 | 5v5 | 6v6 |
---|---|---|---|---|
TransfQMix (3v3) | 0.98 | 0.88 | 0.8 | 0.75 |
TransfQMix (4v4) | 0.96 | 0.93 | 0.9 | 0.86 |
TransfQMix (5v5) | 0.88 | 0.85 | 0.82 | 0.82 |
TransfQMix (6v6) | 0.91 | 0.88 | 0.85 | 0.84 |
TransfQMix (CL) | 0.88 | 0.88 | 0.87 | 0.87 |
State-of-the-art | 0.76 | 0.45 | 0.36 | 0.33 |
SC2: learning can be speeded up by transferring a learned policy.
8m_vs_9m to 5m_vs_6m
5s10z to 3s5z_vs_3s6z
SC2: 5m_vs_6m
Spread: 6v6
if agents can differentiate between information sources, why not to process them coherently