Personally, I’m a fan of trains. They’re a nice, albeit slow, method to get around the country. Canada isn’t the very best candidate for rail transit, provided the rather big area between coasts, however by means of Rail does operate routine train service in their corridor between Windsor as well as Quebec City.
Unfortunately, traveler rail has to produce to industrial rail in Canada which commonly triggers delays. After noticing that some trains have extremely regular delays, it seemed like it would be beneficial to understand the typical performance of each by means of train. by means of does not supply this data publicly.
However, they do supply some data about arrival as well as departure times. Digging into the data offered with any type of browser checking out the by means of Rail site, it was possible to inquiry for past scheduled/actual arrival data. The result is TrainStats.ca, a screen of Via’s in a timely manner performance. join me after the break as I go over exactly how this all works, as well as exactly how to pick a champion when getting your next train ticket.
Getting the Data
Via does supply routine data for the previous, current, as well as next day on their condition page. This would let us develop up a set of trip data, however only someday at a time. Fortunately, we can terminate up Chrome’s inspector as well as discover this get request:
http://reservia.viarail.ca/tsi/GetTrainStatus.aspx?l=en&TsiCCode=VIA&TsiTrainNumber=87&DepartureDate=2015-12-01&ArrivalDate=2015-12-01&TrainInstanceDate=2015-12-01&t=1449033500354
There’s a few juicy parameters here. TsiTrainNumber is undoubtedly the train number we’re looking at. DepartureDate is the date the train left, as well as ArrivalDate is when it arrived. TrainInstanceDate likewise appears to be set to the date the train left. With this in mind, it’s time to jump into Python as well as utilize the amazing requests library to create some requests.
This data consists of bidirectional Unicode text that may be interpreted or compiled in a different way than what appears below. To review, open the data in an editor that exposes hidden Unicode characters.
discover a lot more about bidirectional Unicode characters
show hidden characters
payload = {‘l’: ‘en’,
‘TsiCCode’: ‘VIA’,
‘TsiTrainNumber’: train_number,
‘DepartureDate’: trip_date,
‘ArrivalDate’: trip_date,
‘TrainInstanceDate’: trip_date}
r = requests.get(‘http://reservia.viarail.ca/tsi/GetTrainStatus.aspx',
params=payload)
view raw
request.py
held with by GitHub
This code enables us to fetch data for any type of train number on any type of date. After some testing, we discovered that Via’s data goes back to April 2015, which provides us over 6 months of data. for every trip, we get the arranged as well as actual arrival as well as departure times for each station. keeping that information, we can quickly determine exactly how delayed the trains are.
With the page data fetched as HTML, a script was hacked together using BeautifulSoup to extract all the values. This script then produces objects for the trip data as well as stores them in a PostgreSQL database utilizing SQLAlchemy. This makes it simple as well as effective to gain access to the data later.
The last step was to iterate over all the train numbers as well as days to pull the data. This script just utilizes some nested loops to get hold of the data as well as store it. one more script grabs the previous day’s data as well as stores it in the database. This is set up on a cron job, so the database stays fresh.
Building a (cheap) Website
trainstats.ca website
At this point, we have arrival data on over 12,000 trips. While we can manually run queries as well as compose scripts to produce plots, it’s far a lot more fun to put the data online. That indicates it’s time to develop a website. Making things look great on the Web is not my forte, so [Phil Everson] jumped in to do some web development.
To add a constraint, we wished to make the site as affordable as possible to run. platform as a service offerings like Heroku ran about $20 a month. A online personal Server from DigitalOcean would expense at least $5. The least expensive choice was to make a static site.
A static web page is a trip back to the days of Geocities. You can hold files, however cannot do any type of processing on the server. Fortunately, this worked well for the type of data we were providing. All the aggregated trip data might be exported to JSON files, as well as Javascript on the client side can tons the data as well as screen plots.
The TrainStats site consists of some HTML, CSS, as well as Javascript that runs in your browser, as well as a collection of JSON data with the data. The dataset gets generated dailyby one more cron job, which enables all the processing to occur in one go on a regional computer. then the Amazon web services Command Line Interface is utilized to push the data to S3, where it can be retrieved by users. Since the datasets are small, as well as S3 is cheap, this makes the costs lower than normal hosting.
The Results
This hack was mainly developed for fun, however it has a few fascinating findings. On my normal Ottawa to Toronto route, I’m a lot more likely to select the train that’s in a timely manner 84% of the time, versus the one that only rolls into the station without delay on 28% of trips. Some other travellers may discover the stats beneficial as well. Either way, it was an fascinating exercise in scraping up a dataset as well as supplying a web service on the cheap.
If you’re thinking about the source, it’s all up on Github for the taking. We kindly request that you don’t DDoS by means of Rail with it.