Everyday when we leave our house we can observe one grave issue. Often roads are jammed, traffic is slow moving and the world around steams and sometimes honks impatiently. Literally traffic affects us all, either by it’s contribution to decreasing air quality or by the stress it causes to all participants involved.
Nowadays implementation of adaptive traffic signal controls often don’t take into account the actual traffic situation manifesting on the streets. Current approaches use historical data and statistics generated based on that data provided by local authorities, as well as loop detectors or other detectors at the intersection. This provides the data for genetic algorithms used to optimize for the modeled environment. The generated models are utilized to provide input in order to phase the signals at the intersection. Now surely traffic is such a dynamic system, that these approaches can’t capture the reality on the street, which in turn leads to the above mentioned issues.
trive.traffic a reinforcement learning approach to optimize traffic flow through cities
Here at trive.me we always look for innovative approaches to mitigate problems in mobility.
So far ITS (intelligent transportation systems) make very little use of the flood of data we experience nowadays. Big Data has not reached the ITS world (yet). Ample research has shown that reinforcement learning algorithms can provide an alternative to genetic algorithms for optimizing traffic flow at intersections.
Reinforcement learning algorithms are a separate „crew“ in the machine learning world. They learn by experiencing their environment and the change in that environment through their actions, much like we learn from our experience. This means we don’t need labels which is great as they are often hard to come by in real world settings. The typical traffic in a city lends itself for such an reinforcement learning approach. Our intersection is governed by an agent (the traffic signal), and we need to find a policy mapping (i.e. appropriate actions for states of our environment).
Each move of our traffic signal is rewarded or punished by a reward, our agent is very greedy so he wants to learn those actions which lead to the overall maximum reward, for example a cumulative reduction in wait times for traffic participants. Combining this idea with using Big data to augment the data that is currently available from cities, lets us represent the current traffic situation much more accurately. Using reinforcement learning we aim to optimize the streets for both the traffic participants and the environment.
The nitty gritty details
In the simplest case we can generate a lookup table by discretizing our state space using methods such as vector quantization, and find the associated rewards. However traffic states are not a trivial problem to model, it’s a stochastic problem, because it’s usually not guaranteed that applying the same action on the exact same environment will lead to the same reward. Now that’s what probabilities are for. And a Markov decision process can exactly model these transition probabilities i.e. with a certain probability we will choose this action for a certain traffic state leading us to the next hopefully better traffic state. By visiting this state often enough we can find the transition probabilities that lead us to maximize our cumulative reward and we’ll converge on a good policy mapping with time. The below function definition is what we approximate using Q-learning.