Exposing a far too frequent and erroneous geospatial algorithm
My Initial Presentation
I presented what I coined the Legend of The Buffer last GIS Day at my alma mater, Sherbrooke University. You can find my presentation on youtube. I removed all algorithmic complexities with a straight-to-the-point article outlining my main ideas.
The main takeaway:
Just don’t use a buffer. The most efficient way to make an algorithm inefficient is to use the wrong data structure. A buffer is almost never the right tool for the job.
Let’s start unraveling the consequences of some common misconceptions of using a buffer.
What is a buffer?
A buffer is most likely the first operation you got introduced to if you followed any GIS course. A buffer takes an input geometry and a certain distance and outputs a new geometry. The input geometry can be of any type, whether it’d be a point, line, or polygon.
Buffer(Point, Distance) → Polygon
A common use of a buffer is finding geometries near our input geometry. We apply an intersection using the generated buffer. Geometries selected by the intersection should fall within our buffer.
The buffer-intersect algorithm operates with the assumption that our buffer is a perfect circle. Such an assumption is a mistake.
Mistake no. 1: A buffer is not a circle
If a buffer is a circle, then we could use the radius to find all geometries within it.
However, a buffer is not a circle, it is a polygon. A polygon is composed of vertices, those points that define its border. For example, take this Well-Know-Text definition of a polygon:
POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))
The vertices are the coordinates delimited by commas. For a polygon to be a circle, we would need an infinite amount of vertices. An infinite amount of vertices would require an infinite amount of memory. Zoom in on the output of a buffer polygon and you will see its edges. So a buffer is not a perfect circle but what does that entail anyway?
Mistake no. 2: Excluding some geometries within the buffer distance
Our buffer-intersect algorithm will exclude some geometries. Only the vertices will have the maximal distance from the input geometry. All other geometries falling on the border of our imagined circle are excluded from the algorithm.
Some buffer algorithms try to approximate a circle more precisely than others. For example, the GDAL algorithm in QGIS will generate many more vertices than the default algorithm used by QGIS. Does having more vertices solve the issue with excluded geometries? No. The buffer is still only an approximation of a circle.
Mistake no. 3: Travel cost is (almost) never equal to a bird-flight distance
My friends came to visit our most popular climbing crag near Quebec City: Kamouraska. Kamouraska is on the south shore of the St-Lawrence River and gets completely swamped during weekends and holidays. My friends looked for cheap accommodation on the web, the camping spots having runout. They found an interesting deal and booked it on the spot. After arriving, they realized their great deal was not so convenient. The motel, even though it was within a 30-kilometer distance from Kamouraska, was a 3h drive away. Bird-flight distance limited the search results of the website thus including search results on the opposite side of the river.
My friends booking a hotel is a single case gone wrong. Thousands of users could be experiencing the same unfortunate experience. I have seen too many production websites committing this same mistake. We should do better. We have the responsibility as developers to deliver great experiences to our customers.
Better Alternatives to a buffer exist
Sure, the use of a buffer will probably give you a result resembling the one you expected… in most cases. It is only a matter of time before you fall on an edge case where the consequences of the mistake are important. Here are some common examples to illustrate some alternatives to buffers.
Solution for finding geometries within a distance
For finding the closest geometries, throw your buffer-intersection into the trash. There is a simple and correct alternative:
- Calculate the distance between your input geometry and the other geometries
- Select only those with a distance inferior to your desired distance
Solution for a better distance representation
Bird-flight distance rarely represents the actual cost of our use cases. You will (almost) never travel in a straight line to get from point A to point B, even if you are flying. You are most likely interested in the cost of traveling from A to B. Travel times or some other cost would be better for most use cases.
Graphs are a classic data structure where different nodes are connected by edges (also called arcs). When these arcs have costs, called weights, the graph is a weighted graph. We can use weighted graphs to calculate travel times. In fact, graphs can represent a street network. The weights of the street network could be the distance or rather the time necessary to reach the next node. We can then calculate the shortest path to find the cost of getting from a single or even to multiple destinations.
When should we use a buffer?
Eliminating incorrect usages, I have found only two justified applications of a buffer:
- You want to display the region around a given geometry
- You want to perform some spatial operation that absolutely needs a buffer
In all cases, the only justified use cases should be for display purposes only. Using a buffer for algorithmic purposes will most likely return incorrect results.
When should you not use a buffer?
If it was not already clear, you should almost always find an alternative to using a buffer. Get out there and fix our broken geometry search implementations.
I challenge you to mention another way a buffer can be used legitimately. I am still struggling to understand how it became so prominent. Please reach out if you have some historic background on how the buffer got introduced and popularized.