Paris 2024: How to Improve Gymnastics Judging

As the 2024 Olympic Games have now come to a close, we can take some time to reflect on lessons to be learned and how to improve the processes for better gymnastics judging in the future.

Like many Olympic Games, when the spotlight is on the biggest athletic stage in the world, it shines a light on misjudgments and processes that don't function as well as they should. The easy thing is to blame the judges, who have been selected for their expertise and from whom the world expects perfection more than from the gymnasts themselves. But just as the gymnasts are human and bound to make errors, so are the judges who evaluate them. The goal is to minimize those errors, and some systems can be put in place that could have eliminated some of the judging errors that were made at the 2024 Olympic Games.

The ultimate goal of judging is to give every gymnast the most accurate scores possible so that the best performances are awarded the appropriate ranking and medals. Unfortunately, some processes in place inhibit that from happening. Let's look at three specific processes that could have eliminated certain judging errors had they been in place.

Out of Bounds Errors

In the floor exercise final, Sabrina Maneca-Voinea of Romania received a -0.1 penalty for going out of bounds after one of her tumbling passes. At the Olympics and World Championship competitions, the line judges are not seated directly on the competition floor. Instead, they are seated alongside the other judges on the panel and view the line boundaries in real-time on a screen at their station. Depending on camera positions, this can give the judges a better view of all four corners of the floor that they wouldn't necessarily have from their traditional floor-side perspective. Additionally, not having judges seated at the corners of the floor exercise podium also reduces distractions for the gymnasts and provides a "cleaner" look for TV.

The misjudgment likely occurred due to the ambiguity of the out-of-bounds and the fact that the line judge had to render a judgment based on one fixed camera angle in real time. In fact, the line infraction was so unclear that the Romanian delegation did not specifically ask to have the line penalty reviewed, and it was only upon closer inspection, with zoomed-in, frame-by-frame analysis, that it was able to be determined that it was indeed a false out-of-bounds.

So what is the solution?

The solution is simple: to afford the line judges, who are evaluating line violations via video on a screen anyway, the ability to review potential line violations before submitting a penalty. The technology is certainly there. In a typical competition, there are very few line penalties that are borderline and would need to be reviewed. Those errors are in a specific time and place and could be rapidly pinpointed in a video review and accurately determined in seconds. Line judges have time after the routine while the other judges are computing their scores to render such a review, and the end result would be more accurate line penalties.

It is ridiculous that the fans watching a television broadcast can see more detailed replays of what happened than the judges who are supposed to render the judgment. In other sports, officials can review plays and line judgments using videos from multiple angles with the ability to zoom in to determine infractions. It is reasonable to expect gymnastics judges to have the same video review capacities, especially since employing this would not add any time to the overall evaluation process.

Difficulty Judges (D Jury) errors

Evaluating the Difficulty Score (D score) of a gymnastics routine is a complicated process. The judges who make these evaluations at the World and Olympic levels are the highest-rated judges in the world and are truly experts at what they do. There are two D judges on the panel who must agree on how to credit what was done to determine the D score value of the routine. Having two experts working together cuts down on the likelihood of mistakes. However, like line judging, there are still borderline judgments that must be made for skills that have been performed with less-than-perfect execution. In many cases, those faults are penalized with deductions, but there are times when the faults are large enough to warrant downgrading or not recognizing an element.

The difference can be a turn being performed to 269º versus 271º. How much confidence would you have to render such a precise measurement with a single viewing of a dynamic object in motion (i.e., not a still position) without any measurement tools other than your eyeballs? This is what D judges are tasked to do, and they are amazingly good at it. However, there are instances when it would be beneficial to have a second look. Unfortunately, like the line judges, the D judges are not permitted to video review their judgments before submitting their scores.

In NBA basketball, you will sometimes see a referee circle their hand in the air. That is their signal that they would like to have an action reviewed that they didn't get a good view of. There are then officials in a mission control-type center who can instantaneously video review actions on the court. The result is a more accurate evaluation of what occurred.

Why can't D jury gymnastics judges at the most high-stakes gymnastics competitions on the planet be afforded a similar double-check when a borderline decision is to be made? Like the line judges, not every element needs to be reviewed; most elements are quite clear in their presentations, and the expert judges are fully capable of evaluating them correctly in real time. However, there are the occasional borderline decisions that could benefit from a second look and perhaps a freeze-frame view.

I have judged at high-level competitions where D Jury video reviews were possible and permissible before the score was finalized. This technology was not overused, nor did it add any time to the evaluation, because being able to quickly access a frame-by-frame video review eliminated the need for the two D judges to debate about the performance. "Oh, you thought it was 91º short of completion; I thought it was at 89º." A quick freeze frame of the video can quickly end the debate with the correct evaluation.

Without such a definitive video review, the judges are left to their best judgment based on a one-time live-action viewing. In such cases, the moral thing for the judges to do is not give credit for the element. The gymnast's coaches can then submit an inquiry to have the routine reviewed (by video and even AI evaluations in some cases) to have an accurate judgment made. I say the "moral thing to do" because if the judges give credit in a case where it wasn't warranted, no coach is going to submit an inquiry to have something downgraded, and potentially incorrect score will not be inquired. Giving credit in an ambiguous situation opens up the possibility that a gymnast wins a medal based on a judgment error instead of how their routine was actually performed. However, if a gymnast is not awarded credit for a borderline element, it would be wise for the coach to submit an inquiry to have it double-checked to see if credit should be awarded. After all, what do they have to lose, apart from the financial cost of a failed inquiry?

So what is the solution?

Allow D judges an opportunity to quickly video review, or signal for the apparatus supervisor to quickly perform a video review of specific elements before submitting their D score — the same quick double-check afforded to officials in other sports. After all, the ultimate goal is to have every gymnast receive the most accurate score possible for their performance. Why are there norms that unnecessarily hinder the most accurate evaluation possible? The technology is there; it is used to review score inquiries. Why not give some of those technological tools to the judges themselves so that they can render the most accurate score possible from the start?

Just like the Line Judges, it is ludicrous that someone watching a competition on their television or a streaming platform has access to better information and video replays than the judges actually tasked with evaluating the routines!

Apparatus Supervisors

Lastly, let's look at the role of the Apparatus Supervisors.

Apparatus Supervisors oversee the judging panels on each apparatus. There is one Apparatus Supervisor for each gymnastics apparatus. They serve as a triple-check against errors and can block inaccurate scores before they are posted.

The Apparatus Supervisors at the World Championships and Olympic Games are not appointed based on their merit and expertise but are elected members of the Technical Committees (TC) that oversee each gymnastic discipline. The fact that they are elected puts them in a compromised position. As with any elected official, promises are often made, and favors are exchanged in order to acquire votes. This is a potential conflict of interest to oversee the judgment of gymnasts from the same federations from whom the TC members are trying to secure votes to be re-elected. This is not to say that the members of the TCs are corrupt. Indeed, most of them have the best interest of the sport and athletes at heart. They are former athletes, coaches, and judges who want the best for the sport. However, in order to see their vision for the sport come to fruition, they still need to attain the votes required to be elected.

Another thing to consider is that one of the primary tasks of the TCs in the year leading up to the Olympics is to finalize the rules for the next Olympic Cycle that will go into effect the year following the Olympics. Therefore, the Apparatus Supervisors have a double task of evaluating exercises under one set of rules while creating a different set of rules to be used in the future.

This is not to question the gymnastics judging expertise of the members of the TC, but to consider that they are placed in a more cognitively demanding situation (juggling two sets of rules), and they have non-judging related circumstances (i.e., being elected) that can consciously, or unconsciously, influence their judgment.

All of these considerations could lead one to believe that perhaps there are better candidates to be tasked with the roles of Apparatus Supervisors than the elected members of the TCs.

What is the solution?

To have the Apparatus Supervisors be judges who are appointed based on their merit, judging integrity, and proven expertise. Of course, there is always the potential for corrupt or favored appointments, but having the Apparatus Supervisors be separate from the members of the TC, who are evaluating their performance, adds one more layer of oversight to protect against corruption. Currently, the TC members serve the double role of Apparatus Supervisors and are essentially evaluating themselves. Not an ideal situation in any organization.

Corruption is more significant when people are put in compromised situations. Inaccurate evaluations are far more prevalent when making ambiguous decisions. Modifying these situations can significantly improve the circumstances for judges to render the most accurate scores possible. The judges and the apparatus supervisors are all put in compromising positions where the best tools and motivations are not readily available.

All of these proposed solutions are doable. The technology exists to give judges on the competition floor better tools to render accurate judgments, and leadership structures can be modified to reduce conflicts of interest for those in supervisory positions. Each Olympic cycle brings an opportunity to learn, change, and improve the processes for the future. Let's see what changes are made to set up the gymnasts to receive the most accurate scores at the Olympics in 2028.

Photo Credits: Enis Hodzic Lederer