Agents for The Resistance: MiniMax, Reinforcement Q-Learning, and
Hidden Markov Model
MiniMax Agent
While the MiniMax algorithm is typically used for zero-sum games, it
can be employed here to generate a game state as a decision tree
where nodes reflect mission proposals, voting outcomes, and mission
success or failure. The agent aims to maximize its team’s chances of
success:
-
For the resistance team, it minimizes the chance of spies
infiltrating missions.
-
For the spy team, it maximizes the likelihood of sabotaging
missions without revealing their identity.
However, since this is not a zero-sum game with only two players,
this approach results in very high space and time complexity.
Reinforcement Q-Learning Agent
A Reinforcement Learning (RL) agent tracks rewards for its actions
after each round. For example:
-
Voting "Yes" for a proposed team: If the team
fails, the Q-value for voting "Yes" for teams containing the same
players decreases. Similarly, the Q-value for proposing such teams
also decreases.
- Conversely, if the mission succeeds, the Q-values increase.
If the agent is a spy, it tracks when missions fail, maintaining an
internal state of the team size, number of betrayals causing
failure, and number of spies in the team. It rewards missions with a
favorable ratio of spies to team members for voting to betray.
Hidden Markov Model (HMM) Agent
The HMM agent is well-suited for identifying spies, leveraging
probabilistic reasoning to make decisions based on player behavior.
-
Hidden Variable: Whether a player is a spy or
not.
-
Observations: Key observations include:
- The player is part of a failed team.
- The player voted for a failed team.
- The player proposed a failed team.
Initial Belief: The agent starts with a 1/3
probability that any player is a spy.
Belief Updating: Trust increases for players part
of successful missions and decreases for those involved in failed
missions or who voted for a failing team.
Actions:
-
Votes for or against proposed teams based on trust levels of the
players involved.
-
Proposes teams prioritizing players with the highest trust levels.
-
If the agent is a spy, it betrays only when the required number of
betrayals matches to ensure the mission’s failure.
Performance
The HMM agent outperforms reference agents across tournaments,
achieving the highest win-rate across 10 tournaments.