that's me


my name is Jacobo. I design and program web applications.

Dynamic Programming with Python

May 07, 2014

Dynamic Programming is an algorithmic strategy that can help solving problems which define a state space. These problems present a new subproblem with each configuration of their parameters, and in order to solve them we must explore this space and evaluate the different decisions we may take at every point. This technique takes advantage of the fact that many of these states may actually be the same, so no time is wasted in solving overlapping subproblems.

As we will see it also makes a heavy use of recursion, which is always fun.

To illustrate the strategy I’ll walk you through a cool problem, which was challenge 14 in the Tuenti Challenge #4, a programming contest I recently took part on [1].

Train Empire

We are presented with a board game called Train Empire, in which you must plan the most efficient routes for your trains in order to deliver a series of wagons that rest in each station. The rules are simple:

  • each station has a wagon waiting, which must be delivered to some other station.
  • each wagon awards the player some points when delivered, but any station may hold any wagon.
  • a train works a single route, can carry one wagon at a time, and can cover a limited number of kilometers.

We can pretty much picture the problem already. We will need to know which wagons to take and where to leave them, in order to win the maximum score with our limited fuel.

Train Empire example problem

As we can see in the picture, we have two trains working routes red and blue. The stations are located at certain coordinates so it’s easy to figure out the distance between them. Each station has a wagon called by the name of its destination, along with the score that we will be granted in case we deliver it.

Now, say our trains can cover 3 kilometers in distance. Train in route red can deliver wagon from A to its destination, E (5 pts); train in route blue can deliver wagon C (10 pts), and then pick up wagon B to deliver it as well (5 pts). Maximum score of 20 points.

Representing the states

The position of the trains, the distance covered and the wagons in each station form what we call a problem state. Change any of these values, and we’ll still have the same problem, but with different parameters. We see that each time we advance a train, our problem evolves into a different subproblem. To figure out what are the best movements, we must explore these states and take decisions based on what we learn in each of them. Let’s get to it.

We’ll start by defining a way to represent the train routes. As these are not lineal, a graph is a great way to represent them.

The TrainRoute class implements a very basic digraph which stores the stations as vertices on a set, and the connections between them in a dictionary. Note how we add both the edge (u, v) and (v, u), since train tracks can be travelled both ways.

An interesting note on the next_stations method. Here I made use of the cool Python 3 feature yield from. This allows a generator to delegate into another generator or iterator. Since each station maps to a set of stations, we’ll just be iterating over it.

Let’s have a look at the main class:

I’ve taken out some code, but we can see some interesting things already. Two named tuples will help us keep our data tidy and simple. The main class takes the maximum distance that our trains can travel, fuel, and the definition of the routes and stations. The method maximum_score, which just sums up the scores for each route, will be the interface to the solver. So we have:

  • a main class that holds our routes and its connections
  • a station tuple, with its name, position and a list of wagons that currently holds
  • a train wagon, with a value and a destination station

Dynamic programming

I’ve tried to explain how the key to dynamic programming is exploring the state space efficiently and take informed decisions based on what we find. We have a space definition —the position of the train, the fuel that remains available, and the position of each wagon to be delivered— so we can already modelate our initial state.

We must now think about the multiple choices that we face in each station. Should we pick up an available wagon and deliver it? What if we find a more valuable wagon in the next station? Should we take it back or move forward? Or maybe move without carrying a wagon?

Obviously, the answer to these questions is whatever makes us score more points. To reach the solution, we must evaluate all possible scenarios that derive from each situation or state that we get into. Of course, the way to evaluate how good each state is, is using the scoring function score.

All together:

From each state there are a couple of moves we could make. Either move to a connecting station with a wagon, or move without one. Staying in place does not yield a new state, since everything stays the same. If the current station holds several wagons, moving each one of them will represent a different state.

next_states is a generator that takes a state tuple and yields all the possible options that derive from it. Note how it stops when all wagons have been delivered, or how it only yields states within fuel reach. The wagon_choices function may seem complex, but it is merely producing all the combinations that would result from moving each wagon from the current station to the next one.

With this we have all the ingredients to implement our dynamic programming algorithm. From our initial state we will explore our options, and then choose the one that produces a higher score. Behold! The initial state will evolve in a different state, which in turn will evolve in a different state too! We are designing a recursive algorithm:

  • Take our state
  • Evaluate our choices
  • Decide wisely to win as many points as possible

It’s clear that each evaluation of a next state will happen to go through this list. Our recursive function will stop when a state is terminal. That is, when we run out of fuel or all wagons have been delivered.

One last trick completes the strategy of dynamic programming. In the code you can see I used a max_score dictionary that acts as a cache for each state the algorithm goes through. This is done so we don’t go through all our options over and over again each time we step on a situation we have seen already.

While exploring our state space, a station may be reached many times, and in some of them, it will be reached with the same fuel and wagon configuration. It doesn’t matter how the train got there, only what its options are at that time. If we evaluate that state once and save its outcome, we don’t need to go over the process of exploring its subspace again.

If we hadn’t made use of this memoization, we would have performed a huge amount of identical explorations. This usually makes the problem intractable by our algorithm.


These trains and wagons provided a perfect example of how dynamic programming can help in the resolution of state space problems with subproblem overlapping. Python expressivity once again allows us to translate our thinking easily, and write an algorithm that is not only clear but also efficient.

Grab the complete code at the contest repository.