Not the Shortest Path: Convert GPX Files for 3D Animation
Not the Shortest Path — Ep. 1 🌎 📡 💾
What is ‘Not the Shortest Path’?
Welcome to the 2nd episode of Not the Shortest Path where we explore the trials and tribulations behind our geospatial applications.
We, developers, consistently simplify the stories of how we got to an expected result. There is a tone of learning that happens beyond the beaten trail from point A to B. Let’s shine a light on the meandering path that occurs when building working software.
In the past year, I was deeply inspired by the work of Craig Taylor. Craig works in 3D animation (4D) with anything from Tour de France data to urban mobility. Coming from a world where we overemphasize the value of our analytical models and less so the emotional impact of our applications, I was burning to create my own visualizations.
The tutorials by Craig on Mapzilla were a great starting point to understand the structure of his workflows. I wanted to follow the tutorials more directly, but I simply was unable to get up and running with Houdini, the software Craig uses more extensively. The logic he uses was easily transferrable to other software.
The technology stack I’ve been using is open-source, heavily customizable, and easy to scale up:
- GDAL/OGR: for initial data loading and manipulations
- PostGIS: for geospatial data manipulation
- DBT: for structuring and automating data pipelines
- Blender: for creating 3D animations
Being an avid outdoor enthusiast, I immediately thought of applying some of the same patterns Craig had used with his work on urban mobility to my GPX tracks. This episode of Not the Shortest Path is on how I tried and eventually created a data pipeline to transform GPX files for 3D animation in Blender.
Even though some geospatial data importers do exist within Blender’s add-ons ecosystem, they usually fall short or simply can’t match all of our potential needs for animations. There are many creative ways of working with geospatial data once considering the extra dimension of elevation and especially time.
It might be hard to picture without any concrete reference to a Blender script. Most of the time, we want to define the procedural rules behind our animation during the importing of the mesh data. You could always select the objects you want to animate after importing. There are limitations to this approach and it's easier to define our rules programmatically at the on-slaught.
Our requirements to import the data are having a CSV where each feature is a point along the track. The attributes for each point are:
- X coordinate
- Y coordinate
- point number (just in case)
Following Craig’s best practice, I prefer having my data in a metric coordinate system. Ideally, a local metric projection is used to limit the distortions. Given that some of my Strava activities are spread across different regions, I will use EPSG:3857 for the project.
All of the data manipulations below are geared towards transforming the entire set of my Strava activities into CSVs that can be easily imported and animated in Blender.
I won’t cover the details of how I’ve been toying around with Blender. I would like to write up an entire post on Python scripting, add-ons, and other features about Blender in a separate post.
Retrieving Strava Data
There seem to be two common ways to retrieve data from Strava. You can either use their API or obtain a data dump of your profile. Passing by the API is most likely the only option when building out full-scale applications.
The authentication flow of the API is not as straightforward as it could be. There are a ton of examples online if you are interested in the approach.
Out of simplicity, I decided to use my data and simply request a copy of all my Strava data. You can follow Strava’s official support documentation to do the same. There is a lot of different data available, but we are mainly concerned about the files under the activities directory.
Converting a Single GPX File
GPX to SHP
After some careful searching, the first command I used to convert an example GPX file to a Shapefile was the following:
ogr2ogr -f "ESRI SHAPEFILE" 1150471868 data/strava_dump/activities/1150471868.gpx
The command created a directory with multiple shapefiles within it:
Not all of the transformed layers are useful. The
waypoints.shp are all empty. Only the
tracks.shpcontain any information.
Now our import still seems to have a subtle problem. Many warnings were raised by the command but the one of significance concerns the
DateTime field added to the points:
time the field only contains the date of the activity and not the actual timestamp. This can be problematic for our animation down the road if we want to differentiate pace throughout the activity. If we want to simply visualize activity at a constant speed, we could avoid resolving the issue.
The fix is relatively straightforward and covered in the official GDAL documentation:
ogr2ogr -f "ESRI SHAPEFILE" activity_1150471868 data/strava_dump/activities/1150471868.gpx -fieldTypeToString DateTime
GPX to CSV
Shapefiles might be the reluctant de-facto standard, but that does not mean we have to conform to it. Why shouldn’t we just use a CSV in this case and have a single file containing all the information we could need?
I initially changed the output driver used by the
ogr2ogr -f "CSV" activity_1150471868 data/strava_dump/activities/1150471868.gpx -fieldTypeToString DateTime
The activity directory slimmed down to only 5 files compared to the SHP equivalent of 20:
However, we lost all information about the geometries of the points with the new command:
The fix is once again relatively straightforward. We have to add a configuration flag to the command:
ogr2ogr -f "CSV" activity_1150471868 data/strava_dump/activities/1150471868.gpx -fieldTypeToString DateTime -lco GEOMETRY=AS_XYZ
PostGIS Pipeline with DBT
PostGIS and DBT are progressively becoming my bread and butter for automating GIS processes. To quote Paul Ramsey: “PostGIS is your GIS without the GIS” and DBT is the engine to groove everything together.
We can get a simple running instance of PostGIS using
Once the database instance is running (
docker-compose up ), we can upload a raw CSV file with
At the moment, we only need a single model to process our converted GPX
track_points.csv . Our goal is to transform the x and y coordinates to EPSG:3857 and filter the unused empty attributes:
Once our dbt environment is configured, we can run our simple data pipeline with the command
dbt run .
Finally, we can export our database table to a CSV that is ready to be imported by Blender:
Converting a Directory of GPX Files
We can use the building blocks from the previous process to create a batch script that converts all the activities in a directory:
The resulting directory contains every activity converted to match our initial requirement. We could play around with the filename but it is sufficient to start importing the data into Blender.
Refactoring our Pipeline
The above pipeline is sufficient for working with 5 GPS tracks but is not sufficient for working with my entire set of activities. The main issue is around running a separate
dbt pipeline for every activity. A better approach would be combining the activities into a single database table and running the pipeline only once. I initially used the above pipeline because my Blender add-on used a single exported file path of activity to work.
There are two main challenges in refactoring our design:
- Finding an efficient way to distinguish which activity every point belongs to within the database table.
- Adapting the Blender data importer to not directly use the file path (not covered here).
The simplest way I found to distinguish the activities is by adding a column for the filename with a simple
sed command: sed -i -e “s/^/$file_name,/” “$tmp_directory/track_points.csv”. We also have to modify the DLL used to create our database track table:
-- auto-generated definition
create table public.track_points
alter table track_points
owner to postgres;
After the refactoring, this is what the pipeline script looks like:
Limiting The Number of Points
Most activities, or at least mine, collect the coordinates of your movement every second. Though interesting for some use cases, we do not need as much information to produce interesting animations. The processing time required for that frequency of collection outweighs its visual benefits.
There are a couple of different ways we could go about limiting the number of points. We could use a bash script eliminating some of the rows within the file exported csv file. I preferred modifying the dbt model directly by adding a filtering condition on the point number using the modulo operator:
with points as (
st_transform(st_setsrid(st_makepoint(x, y, ele), 4326), 3857) geom,
select filename, num_point, st_x(geom) x, st_y(geom) y, st_z(geom) z, time
where num_point % 20 = 0
After all this divergent experimentation, what principles seem to be relevant to other applications?
For starters, we should work our way from the expected specification needed for the 3D animation tool. Knowing the details of the data needed to create our visualizations is our starting point. Not only can we map out our ETLs based on these requirements, but we can also define the multidimensional granularity of our datasets needed to produce the desired visualization.
There are a bunch of ways to transform data, but few offer as many possibilities as the FOSS4G stack. GDAL/OGR offers the entire range of operations you could desire and more.
By integrating the FOSS4G stack within other data frameworks (PostGIS, dbt, and bash scripting), we can shape our pipelines to our own creative ends.
Creating data pipelines is all fun and games, but it is only as useful as the applications it enables. I hope to share more on creating 3D animation within Blender in the next Not the Shortest Path.