The project shows the application of multiagent reinforcement learning to the problem of traffic light signal control to decrease travel time. We model roads as a collection of agents for each signalized junction. Agents learn to set phases that jointly maximize a reward function that encourages short vehicle queuing delays and queue lengths at all junctions. The first approach that we tested exploits the fact that the reward function can be splitted into contributions per agent. Junctions are modeled as vertices in a coordination graph and the joint action is found with the variable elimination algorithm. The second method exploits the principle of locality to compute the best action for an agent as its best response for a two player game with each member of its neighborhood.
We apply the learning methods to a simulated network of six intersections, using data from the Transit Department of Bogotá, Colombia. These methods obtained significant reductions in queuing delay with respect to the fixed time control, and in general achieve shorter travel times across the network than some other reinforcement learning based methods found in the literature.