One of the most hotly debated topics in Formula 1 is over who should receive more of the plaudits for success, the car, or the driver. There are many views that range over the entire spectrum of opinions, from...
One of the most hotly debated topics in Formula 1 is over who should receive more of the plaudits for success, the car, or the driver. There are many views that range over the entire spectrum of opinions, from those who believe that the best car would win even with the worst driver, to those who think that the best driver can single-handedly drag a middling car to greatness.
As a Formula 1 fan, I have always been very intrigued by this question. The goal of my analysis in the paper is to provide a quantitative look at this age-old question and set up a foundation upon which further empirical research into this topic can be conducted. This project is adapted from a project that I worked on for a college Statistics for Data Science class that I had taken.
In this project, I looked at the last 30 years of Formula 1 data, spanning 1994 to 2023. I looked at various statistics that I thought would be insightful, and I eliminated metrics that were redundant and did not offer much avenue for exploration. After I had settled on the metrics that I thought would be most useful for analytical research, I narrowed down the scope of the data that I was going to use. I found data for the winners of each Drivers’ Championship from 1994 to 2023, the runner up in that season, and their best teammate. In situations where the runner up was the teammate of the winner of the championship for that season, I looked at the third-place finisher and their teammate for my analysis. Because of the transient nature of Formula 1 driving opportunities, many drivers lower down the order often get replaced mid-season, and this could have the possibility of skewing the data. Hence, I limited myself to top of the championship finishers, who have more stability in team and driver selection throughout the season.
Preliminary Variable Analysis
Before I went into an in-depth analysis of the data I had, I felt that it was important for me to first look at the individual variables that I was using, and try to identify some trends in the data, so that I could easily make connections between the results that I would find through the course of analysis, and the domain specific knowledge that I already possessed. The variables that I included in my data were:
1. Position: This is the place that the driver secured in the final Formula 1 Drivers’ Championship in the season they were competing in. It follows a ranking scale, with lower numbers being more prestigious, and 1 indicating that the driver won the Drivers’ Championship in that season.
2. Team: This metric was something that I created to make the data easier to analyze and sort through. Given that there were always only two teams per season in my data, I assigned 1 to the team of the winner of that season’s Championship. The runner up’s team would be assigned 2. The second drivers for both teams were assigned either 1 or 2 based on which team they were racing for in that season.
3. Points: This metric is used to measure how many placing finishes a driver has over the course of the season. Points are the metric by which the Drivers’ Championship is analyzed, and so I made it a key part of my analysis. Due to the myriad scoring systems in Formula 1 over the years, I also had to introduce another variable to account for the change in system.
4. Wins: This shows how many wins each driver had in that specific season.
5. Win Percentage: This statistic is the percentage of total wins that driver had in that season. It is calculated by taking the number of races won, dividing by the total races raced that year, multiplied by 100.
6. Poles: This shows how many pole positions each driver had in that specific season. A pole position indicates that a driver will start at the front of the grid for the race.
7. Pole Percentage: This statistic is the percentage of total pole positions that driver had in that season. It is calculated by taking the number of pole positions, dividing by the total races raced that year, multiplied by 100.
8. Point System: This statistic was one that I introduced to explain some of the drastic fluctuations in points scoring. Over the past thirty years of Formula 1, there have been a few minor changes, and one major change that caused a major impact on points scored. This variable was introduced to correct for the sudden change in scores over a certain period.
After finalizing the variables that I was going to use, I plotted some graph of the variables that I was using, to identify macro trends within the data.
The next graph that I looked at was the graph of points totals of only the Championship winners for that season. While most of the trends that were observed in the earlier graph were also true in this graph, there is one very interesting pattern in this graph. Under scoring system 2, which was in effect from 2003 to 2009, the points scored by Championship winners decreased almost every season, which stands in sharp contrast to the rest of the data. Under every other scoring system, the number of points required to win a championship followed a general upward trend. This makes the contrast of the 2003-2009 seasons stand out even more, as there does not seem to be a reason as to why the points totals continuously decreased.
The graph that I looked at next was the graph of pole positions and wins secured by the championship winner season by season. I thought that this would be an interesting area of exploration, as Formula 1 teams sometimes eschew qualifying pace to set up their car best for the race. This graph also showed something very interesting. While for the most part, the pole positions secured and the race wins of the championship winner followed roughly the same path, there were only three seasons in the data where the championship winner had the same number of pole positions and wins in a season.
This graph is similar to the graph of pole positions and wins above, except that it shows the pole position and win data in percentage form. One thing that I noticed from this graph was that there were quite a few seasons where the championship winner had neither 50 percent of total pole positions nor wins. This happened more often in the earlier seasons covered by the data, which is likely because cars were not as reliable as they are in the more recent Formula 1 seasons.