The Problem

Everyday when we leave our house we can observe one grave issue. Often roads are jammed, traffic is slow moving and the world around steams and sometimes honks impatiently. Literally traffic affects us all, either by it’s contribution to decreasing air quality or by the stress it causes to all participants involved.

Nowadays implementation of adaptive traffic signal controls often don’t take into account the actual traffic situation manifesting on the streets. Current approaches use historical data and statistics generated based on that data provided by local authorities, as well as loop detectors or other detectors at the intersection. This provides the data for genetic algorithms used to optimize for the modeled environment. The generated models are utilized to provide input in order to phase the signals at the intersection. Now surely traffic is such a dynamic system, that these approaches can’t capture the reality on the street, which in turn leads to the above mentioned issues.

trive.traffic a reinforcement learning approach to optimize traffic flow through cities

Here at we always look for innovative approaches to mitigate problems in mobility.
So far ITS (intelligent transportation systems) make very little use of the flood of data we experience nowadays. Big Data has not reached the ITS world (yet). Ample research has shown that reinforcement learning algorithms can provide an alternative to genetic algorithms for optimizing traffic flow at intersections.

Reinforcement learning algorithms are a separate „crew“ in the machine learning world. They learn by experiencing their environment and the change in that environment through their actions, much like we learn from our experience. This means we don’t need labels which is great as they are often hard to come by in real world settings. The typical traffic in a city lends itself for such an reinforcement learning approach. Our intersection is governed by an agent (the traffic signal), and we need to find a policy mapping (i.e. appropriate actions for states of our environment).

Each move of our traffic signal is rewarded or punished by a reward, our agent is very greedy so he wants to learn those actions which lead to the overall maximum reward, for example a cumulative reduction in wait times for traffic participants. Combining this idea with using Big data to augment the data that is currently available from cities, lets us represent the current traffic situation much more accurately. Using reinforcement learning we aim to optimize the streets for both the traffic participants and the environment.

The nitty gritty details

In the simplest case we can generate a lookup table by discretizing our state space using methods such as vector quantization, and find the associated rewards. However traffic states are not a trivial problem to model, it’s a stochastic problem, because it’s usually not guaranteed that applying the same action on the exact same environment will lead to the same reward. Now that’s what probabilities are for. And a Markov decision process can exactly model these transition probabilities i.e. with a certain probability we will choose this action for a certain traffic state leading us to the next hopefully better traffic state. By visiting this state often enough we can find the transition probabilities that lead us to maximize our cumulative reward and we’ll converge on a good policy mapping with time. The below function definition is what we approximate using Q-learning.

Q(s,a) is a function that approximates the (reward) value of state action pairs. You see the weights? These parameters can be learned by a neural network, our reinforcement learning agent. Our state is a vector of features, our actions are the traffic light phases. The agent gathers information on the effect of these state-action pairs on future reward, state and subsequent actions to update the Q-value function. He does that by first exploring the space of possible actions, and further into the training choose actions that led to more optimal outcomes using an appropriate parametrisation. Which means that with sufficient exerperience we quit exploring our environment and converge on weights that let us exploit the shape of the Q-function. This implies that we have an idea on optimal actions to choose for states that are similar to previously visited ones.

Let’s harvest the power of AI to solve real world problems.

Q-learning has been successfully applied to other problems such as playing Go surpassing human level performance by defeating professional players. This doesn’t mean that it’s simple and it often requires sufficient compute power, but it promises that we can harvest the power of AI for new tasks in order to provide solutions to real world problems. Ultimately it’s humans who take the decision on the way of how we harvest the potential of this technology.

Research has already shown that these techniques work well for modeling policies for traffic lights and optimizing traffic flow, thus reducing CO2 emissions. These are problems that concern all of us, and cities as well as companies interested in this solution are welcome to contact us for further details. Obviously using Big data sources comes with it’s own set of complexities. We will give insight on this in a separate Blog post.