I’ve been using Trainline (formerly Capitaine Train) to book my train tickets across Europe since July 2011. They offer to book train tickets for a handful of countries with no fuss so I’ve used them almost every time I took the train since then. I’m interested by the GDPR law, and it gives us an amazing opportunity to take back control of our personal data from companies thanks to Article 20 – Right to data portability. I asked Trainline for my personal data and decided to analyze my train usage across these years.
Asking for your personal data
You can contact the company by email or through their contact form and send an email which looks like this:
My username on your service is xxxx and I would like to exercise my right to get my personal data thanks to the GDPR law. Please send me this data by email in a machine-readable format. If you’ve got any kind of documentation associated with it, I’m interested in this as well.
If I don’t get a reply in 30 days or if you refuse to comply to my request, I’ll contact my Data Protection Authority.
Have a nice day.
Most services will ask you for a proof of your identity so it’s good to attach an identity document to avoid another exchange. Keep a proof that you sent this email in case you need to reach your Data Protection Authority to enforce your rights.
Playing with my data
The Trainline team got back to me under 48h hours with an automated email with a ZIP archive, which contained a JSON with my searches, credit cards, bookings etc. This is really rare so it must be noted that they did a good job in order to be ready for GDPR. JSON files are naturally verbose but mine had more than 100k lines so I had to create smaller and more usable files before being able to analyze my data.
I was only interested in journeys I’ve indeed traveled (not booked or canceled for example) so my first focus was on this. I wanted to have a simple CSV file, with one line per train journey. In order to achieve this result, I wrote a small Python file which takes the source JSON file and outputs a CSV file containing my traveled train journeys with basic information: departure time, arrival time, departure station, arrival station, carrier, train number, train type, travel class, CO2 emission, departure and arrival countries and the number of times I’ve done this specific train journey in total (or booked by Trainline to be exact). You can find this Python script on my GitHub under the MIT license.
Armed with a CSV containing my completed train journeys, I wanted to get a rough idea of my train usage. My main interest was seeing where and how often I traveled. I chose to use kepler.gl from Uber Engineering because it’s straightforward to use: you go to their website, choose your CSV file with latitude and longitude columns and you’re ready to play. Your data stays on your computer because it’s just a frontend application.
Here are my train journeys split by carrier:
My most common journey: Paris-Rouen. Explained by the fact that I live in Paris but come from Normandy and therefore I go there often.
And I’ve done a timelapse of my train journeys, month by month.
It’s quite uncommon for now to be able to play with your data that you got back from companies, for many reasons:
- people are not interested by their data
- people don’t know about GDPR
- companies have not yet (or don’t plan) to implement an “export your data” function
- it often requires coding skills to be able to play with your data
- your data does not come with documentation
But I’m sure it’ll be better in a few years! In the meantime, I highly encourage you to request your personal data from companies and see what they’ve stored about you!
Note: I shared this story originally on my Twitter. You can follow me for more news like this!