Note: This article was originally published on StickItMedia.com in February 2020.
There is much conversation on the Gymternet about the proposed 10.0 conversion system for Men’s NCAA. Minnesota gave it a test drive last weekend and Penn State just hosted a competition using the 10.0 conversion. As someone who studies judging psychology, I found it interesting to think about some of the biases which have a greater potential to occur when judging under a 10.0 system, some of the unique nuances brought about by the conversion, and what measures would be necessary to counter them.
Of course, judges always need to remain as neutral as possible. However, judges are human, and some cognitive biases are subconscious in nature. A key factor to reducing those biases is to have an awareness that they have the potential to exist. Cognitive biases are most likely to come into play when making borderline decisions under pressure. Some judging decisions are clear, others are more ambiguous, and all of them are under pressure. The more ambiguous the decision, the more likely the decision will be influenced by non-performance factors such as: the reputation of the gymnast, the order in the line-up, the audience cheering, or making a compensation for a previous judging decision. By preparing for those situations, the judge can counter the influence of non-performance factors on his judgments. Similar to how a high-performing athlete prepares for both the physical and mental aspects of a competition, a high-performing judge will not only prepare for the technical aspects of judging but will prepare for the mental aspects as well.
The attitudes surrounding a 10.0 can vary widely. A 10.0 can be something avoided to escape controversy, or an achievement to be liberally bestowed upon those thought of as deserving. Historically, after the first 10.0 was awarded, the threshold was crossed and subsequent 10.0s became more frequent. We saw this phenomenon happen internationally after Nadia Comaneci received the first Olympic 10.0 in 1976. The floodgates were opened and 10.0s were awarded with increased frequency through the 80s. The pendulum then swung the other direction in international judging culture, with Execution scores steadily decreasing. Nowadays it is rare to see international Execution scores in the 9.0s let alone a 10.0. There is no guarantee which direction a 10.0 culture would go in Men’s NCAA with the conversion; whether it will be something encouraged or avoided in the judging culture. On one hand, talk about the conversion shows an intention to push the 10.0 towards something to be awarded with some frequency, similar to how it is regarded in Women’s NCAA. However, the common reference to it as a “perfect 10” brings forth a whole different set of psychological principles, possibly positioning it as a holy grail that is incredibly difficult to attain.
The way the conversion is currently structured, any routines which score a 15.0 or higher will be awarded a 10.0. Routines that are clearly in the upper 15.0 range (or higher), will undoubtedly receive scores that convert to 10.0s; it is the routines that fall right at or around the 15.0 mark, with borderline decisions that need to be made, which will be the most interesting to observe from a psychological standpoint. If the judge is aware that 15.0 is the threshold score for a 10.0, it transforms what would be normal decisions into a threshold for perfection, causing additional mental pressure for routines in that realm. The most difficult situation will be if there is a routine with obvious flaws which has the potential to score a 15.0. If a 15.0 will be presented to the audience as a perfect 10.0, how can a judge allow a routine with an obvious error, to receive a perfect 10.0? To award such a routine a perfect 10.0 would give the appearance that the judge was either incompetent or corrupt. It is easy to imagine how these circumstances could sway a judge towards heavier deductions on a borderline error. A slight tilt of the head and a 45° error can be seen as 46° warranting a much harsher evaluation.
The above bias is not the only one that could plausibly occur. Some other specific cognitive biases have an increased potential to come into play when judging in the 10.0 system (Think of how many of these you can easily observe in Women’s NCAA judging):
Difficulty bias: A difficulty bias is when a more difficult routine is judged more favorably than an easier routine and judges overlook errors due to the increased difficulty of the skills being performed. In recognition of the limits inherent with the 10.0 system, the 1964 MAG Code of Points stated: “Series of value presenting great risks or marked originality will be judged more favorably, in the matter of small faults in execution than those lacking originality, risk and value”. We see this in Women’s NCAA judging when a more difficult routine is more likely to be awarded a 10.0 than one which barely meets the 10.0 Start Value requirements. Since the 10.0 scoring system does not adequately differentiate to reward difficulty, the judges sometimes take it upon themselves to do so. The 10.0 conversion has a built-in difficulty bias.
Reputation bias: A reputation bias exists in the open-ended scoring system too, but it is more pronounced in the 10.0 scoring system because there is a psychological barrier surrounding the 10.0. Again, look at the Women’s NCAA judging. The judges hesitate to award a gymnast (or school) their first 10.0 for a routine, but after a gymnast has received one 10.0 for a routine it becomes psychologically easier for judges to award that gymnast 10.0s for subsequent routines after the barrier has been torn down.
Sequential Order Bias: Sequential (within team) order bias becomes more pronounced under the 10.0 system. Judges’ scores get “boxed in” and reach the upper limit more quickly in the 10.0 system. There is less wiggle room which leads to a greater tendency for judges to score the first routine harsher to leave room for a potentially better routine later. How many times do you see the first Women’s NCAA routine in a lineup get a 10.0? The first routine is sometimes even called the sacrificial lamb because they know it is less likely to get a 10.0 than if it were to appear later in the line-up. Under the 10.0 conversion system, I could imagine the judges might be tempted to “find” an extra deduction that could keep a borderline first routine below the 10.0 conversion threshold.
The Minnesota Flashback Friday meet did not have any gymnasts with scores that were borderline or crossed the 15.0/10.0 threshold, however, Penn State has a gymnast who certainly can. Steven Nedoroscik’s Pommel Horse routine has already scored well into the 15 range several times this season, hitting 15.35, 15.5, and even as high as 15.9. It is a pity that under the 10.0 conversion system there is no differentiation between these routines; they would have all been 10.0s. One fan on Facebook mentioned how exciting it was to watch his score increase throughout the season.
All of that excitement would have been lost in the 10.0 conversion. But more importantly, Nedoroscik can perform a routine with a 6.8 Difficulty. That gives him a 1.8 cushion on his E score to still get a 15.0 final score, which would convert to a 10.0. Hypothetically, Nedoroscik could fall and still get a 10.0! That would be a difficulty bias taken to the utmost extreme! But it wouldn’t be a result of the judges being biased, it would be a bias that would be imposed upon their scores by the conversion itself.
Perhaps the worst hypothetical scenario would be if a routine actually earns a perfect 10.0 Execution score. If the Difficulty score is less than a 5.0, then that amazing, flawless routine would be converted to less than a 10.0. That monumental 10.0 E score would be entirely lost in the conversion - talk about a travesty! Which brings me to the next point.
Any final score, whether it is a FIG score, or a score out of a 10.0 via the proposed conversion system, lacks meaning without the context of a D/E breakdown. The 10.0 conversion system was developed in part, to appease one segment of the fan base who long to give Men’s NCAA gymnasts the opportunity to score a “perfect” 10.0. But the 10.0 conversion scale has nothing to do with a 10.0 equaling perfection. A converted 9.9 tells you no more than a 14.6 does about how close a routine was to being perfect without the context of the D/E scores. In fact, it does the opposite and gives a false impression of what perfection is.
As a judge, I focus on D and E scores, and rarely even look at final scores. It is irrelevant what they add up to when looking at the quality of the routine. D & E tells me all of the information I need. I realize most other people want to know how the final score ultimately compares to the others competing, since that is what determines who wins. But because I strive to remain neutral as a judge, I don’t care who wins. However, I do care about the evaluation of how difficult a routine was and how close it came to perfection. The converted 10.0 score tells me neither of these things.
If there is any change to be made in how scores are presented, I would like to see the D/E breakdown be more prominently displayed alongside the final scores. Most teams have the D/E breakdown on their in-house electronic score displays but fail to provide it in their social media coverage and post-competition write-ups. Often, the only place to locate the D/E breakdown is to hunt down the official meet score sheet PDF, which may, or may not, be
posted on Road to Nationals or the team’s website. I found it frustrating (and others found it confusing) that Minnesota went through the effort to post the 10.0 converted score alongside their final scores on Social Media, (which some fans interpreted as the E score), but they don’t make the effort to post D/E breakdown, which gives more meaningful information. By not readily providing the D/E breakdown, you are not only depriving the knowledgeable followers of the information they want, you are short-changing the casual fans of the information they need to evolve into educated fans and limiting the growth of the sport. Presenting the FIG score as D+E=Final Score is easily understood, accurately depicts how the score is formulated, furthers understanding of the judgment, and is internationally recognized by gymnastics fans world-wide. A convoluted conversion to a 10.0 scale, in which the 10.0 does not represent perfection, places an unnecessary complication for those trying to understand the scoring.
Supporters of the 10.0 conversion system cite the excitement that scores of 10.0 bring to the Women’s NCAA competitions as a rationale for the use of a 10.0 conversion in Men’s NCAA. However, they fail to recognize the judging bias issues which are notoriously present in Women’s NCAA judging that are propagated in part by the existence of the 10.0. Measures such as dividing D & E panel duties, alternating gymnasts competing from different teams, and the use of the open-ended scoring system are all systems that have been employed at FIG competitions to reduce judging bias. The Women’s NCAA does none of these and has a difficult time countering the biased judging such systems could help to reduce. Over the past 3 seasons, the NGJA has made a concerted effort to bring Men’s NCAA judging more in line with FIG standards. It is worthwhile to consider the potential unraveling of these efforts which would need to be countered if a 10.0 conversion were to be instated for Men’s NCAA scoring. The Men’s NCAA community needs to contemplate if they would like to institute a conversion system that could draw Men’s NCAA judging closer to the Women’s NCAA norms. Judges need to be responsible and strive to judge with as little bias as is humanly possible, but it should be realized that poorly thought-out rules and competition structures can be counter-productive to judges attaining that goal.
Don’t get me wrong, using the 10.0 conversion as a gimmick for a themed competition such as Minnesota’s Flashback Friday competition is a great idea. It has the potential to be a fun promotional tool. But to institute it as a new universal scoring scale for the men’s NCAA would bring more complications than the creators seek to solve. I am all for innovative change to increase the excitement, understandability, and entertainment value of Men’s NCAA gymnastics, but there are other competition structures that could be employed to achieve those goals while reducing, rather than increasing, the potential for judging bias.
Kathi-Sue Rupp is a FIG Brevet Judge (MAG). She holds a MS in Sports Psychology with an emphasis on Officiating Psychology.
Comments