“We accomplished the best Christmas and New Year's check-in wait times of the last 5-6 years without additional investment in software, training or additional resources.”
I was working as an intern analyst at a major airline at the time, and I couldn't believe what my boss was telling me.
That was the moment I decided to pursue a career in data.
What is data (to me)?
I mentioned in the previous post that one of the focus points of data semantics is the meaning of data.
As a data professional, I think it is important to start with the meaning of data for us personally because this has a direct impact on our motivation to learn and grow.
In this post I’m sharing why I fell in love with data in the first place.
The reason I'm doing this is:
To help you understand my background and where I'm coming from for future posts.
I would love for you to share in the comments or in your own posts why/how you got into working with data.
Riding the truck: Individuals and interactions over process and tools
I love airports.
When I was growing up, I loved long layovers between flights whenever we traveled with my family. There were always so many different people, the stores were full of interesting things, and I really enjoyed sleeping on benches (I was clearly much smaller back then).
Naturally, I jumped at the opportunity of an internship working in an airport.
My main tasks were focused on collecting and analyzing data feeding into KPIs of airport operations like check-in wait time or baggage reclaim wait time. Every month I also had to work on optimizing the scheduling for 150+ coworkers at the airport.
Contrary to my direct teammates, I would frequently go out and work from different places in the airport instead of our little office accessible through a hidden elevator.
Seeing the operations in person reminded me that there was meaning behind the numbers I looked at every day.
John Giles, author of The Nimble Elephant explains it like this when commenting on the values of the Agile Manifesto:
Individuals and interactions: A good friend of mine frequently used the phrase that we should “ride the truck”. Instead of sitting in the office and talking to people about their work, he encouraged data architects to get out and see what life was really like for the front-line workers.
I did not know about the Agile Manifesto back then, but I found much value in “riding the truck”.
Great solution, wrong assumptions
I started noticing a pattern in our check-in booths. As the queues started to get longer, we had surprisingly few people in the booths. Then, when the queues became shorter, we had more people than we needed.
This seemed very odd to me.
Luckily, I had access to both the scheduling and check-in wait time data and I decided to compare the two.
Below you can see a simplified version of what I found. Note that this is simulated data that I specifically designed to illustrate this point by eliminating a lot of the real-world variation, applying simplifying assumptions and does not include any actual data.
The first graph shows the average waiting time in the check-in line. Peaks in this graph mean long and painful queues.
The second graph shows the percentage of utilization of available check-in booths. Tall bars indicate the most or all the check-in booths are in use, while low bars mean that only a few booths are active.
Why is this interesting?
You probably picked up on the fact that waiting times increase relatively fast, indicating that the active booths are working at full capacity, and the inactive booths become fully operational when the peak has passed.
According to queue theory, when available resources are working close to or at full capacity, wait times tend to explode due to the inherent variability in service times and arrival rates of customers.
At this stage, even small fluctuations in arrivals or service times (e.g., a slightly longer transaction at a check-in booth) can cause queues to grow very rapidly because there's no spare capacity to absorb these variations.
This is often referred to as the "clogging effect" or "congestion collapse".
A more ideal scenario is what we can see around 13:00, where the number of active booths increase in response to a higher flow in travelers, but the average waiting time keeps going down.
By assigning new resources before a bottleneck is created, it is possible to avoid peaks altogether.
Armed with this information, I prepared a presentation with my boss which led to a few conversations with other coworkers in charge of simulating the queues. The output of these simulations was used as input for the scheduling optimization.
It might be interesting for you to know that the people in charge of these simulations did not work at the airport.
To make it worse, some of the main assumptions for the simulations were the business rules defined by teams working in the HQ, located in another country.
The simulations were good at predicting rising demand. It was an incredibly smart and powerful solution.
But, by failing to connect simulations and business rules to reality, we were systematically underestimating the severity of the queues, allocating resources too late, and keeping them there longer than needed.
My first taste of data-driven decisions
We updated the assumptions of the model and changed the scheduling. Just in time for the busiest time of the year: Christmas and New Year’s.
My boss offered me the full experience of clocking in early with the operational crew during the holiday period, starting shifts at 4:00. I was extremely happy to do this for a couple of weeks :)
The official reason was to make sure that we were collecting the data correctly, but I also helped the crew in whichever way I could. I remember even running around the airport helping a couple with a cancelled flight.
When all of this was over, we sat down to look at the data and were amazed by the results.
Did we eliminate the queues? No, they were still extremely long. This is a tough problem, especially with limited resources.
But updating the assumptions meant that we were able to provision new booths in time to absorb more of the unexpected variations early.
The improved waiting times continued after the holidays as well.
A simple question that I was able to investigate with data led to a good amount of additional value, almost “for free”.
On top of this, data empowered an intern just starting his career like me, to influence operations at an international airport!
I decided that I could help answer interesting questions and solve valuable problems by mastering data. This is the craft I have dedicated myself to.
Why is data important to you?
Now you know why I fell in love with data in the first place. In future posts I can share why I keep going.
Leveraging quality data, I have found that we can shorten the feedback loop, learn faster, discover unknowns, and accelerate the work we are doing.
Now I’m curious about you.
How do you work with data and what does data mean to you?
In other words, why is data important to you?