Leave it up to an expert on randomness and a statistician turned winemaker to make an astonishing case that wine ratings are inconsistent and unreliable. The debate on ratings value has been legitimately extended by a couple of primary and secondary research studies presented by Robert Hodgson in the Journal of Wine Economics. Hodgson’s work and conversations with a few high profile wine critics and winemakers were nicely organized in a recent Wall Street Journal article entitled “A Hint of Hype, A Taste of Illusion.” And this time, maligning rating reliability is not aimed at any one system or critic, but more fully at the fundamental inconsistency and susceptibility of the human brain when it comes to sensory interpretation.
Hodgson’s research renders degrees of satisfaction and validation that my personal sensibility for ignoring marginal point differences in wine reviews has not left anything on the table. I have attempted to uphold a pattern for discussing wines in this blog and elsewhere that sidesteps numbers and avoids drawing concrete lines in the sand. Without self-contradiction, I also respect the 100 pont scale and other numerically deployed systems that are comfortable approaches for other reviewers because they provide a general indication of how much a reviewer likes or dislikes a wine. I look at a 96 and 92 point rating knowing anyone could appreciate either one over the other.
Hodgson ran a conclusive experiment over four different years with panels of 70 judges from the California State Fair Wine Competition. He served them 100 wines over a few day period employing the same blind tasting rigors they are subject to in the actual competition. But in his study, every wine was presented to each judge three different times from the same bottle to be judged and awarded point scores. The findings are profound, but not surprising:
The judges’ wine ratings typically varied by ±4 points on a standard ratings scale running from 80 to 100. A wine rated 91 on one tasting would often be rated an 87 or 95 on the next. Some of the judges did much worse, and only about one in 10 regularly rated the same wine within a range of ±2 points. …..the judges whose ratings were most consistent in any given year landed in the middle of the pack in other years, suggesting that their consistent performance that year had simply been due to chance.
It is my personal experience that the same bottle of wine will often taste different even within reasonably tight windows of time. Open vs. blind tastings of the wine will provide different results and side by side tasting with other peer group wines can also alter perception. Drinking the same wine in different moods, at different times of the day, with varied aeration periods, in different quantities and environments all change interpretations and conclusions of quality.
Hodgson went further and studied track records for specific wines’ after submitting to judging across several contests. His study showed that a wine’s opportunity for winning a gold medal is statistically equal to random chance:
…..he made a bar graph of the number of wines winning 0, 1, 2, etc. gold medals in those competitions. The graph was nearly identical to the one you’d get if you simply made five flips of a coin weighted to land on heads with a probability of 9%. The distribution of medals, he wrote, “mirrors what might be expected should a gold medal be awarded by chance alone.
I am flabbergasted with the specificity and range of flavors promulgated by the umpteen thousands of reviewers, professional and otherwise, offering their opinions about specific wine characteristics in print, online, and in person. Not meaning to single out this recent example review by a reliably experienced and respected wine writer and founder of Palate Press, David Honig, I bring reference to it only as representation of a widely embraced genre of reviews. While it most surely was Honig’s honest experience I (1) can almost guarantee these will not be my specific flavor perceptions and (2) get the feeling it leans toward “base covering” for the multiple impressions that came and went during the tasting experience:
This comes at you [in]waves of flavor, starting with blackberries, coffee and plums. Fruits sweeten on the mid-palate, adding some blueberry to the blackberry. The espresso changes to unsweetened cocoa. Leather shows up at the end of the mid-palate and lingers with black fruit on the finish
A favored review style presented by one wine writer , The Brooklyn Wine Guy, leans toward broad, sensory reactions combined with a dominant flavor characteristic or two, weaving in the context of his tasting experience to transmit a conclusion that is easier to embrace. Here is one example review:
Levi and I both had this as 1st choice during the tasting. I thought it clearly stood out above the rest – it was completely harmonious, subtly quite intense, and very beautiful. The nose was spicy with pomegranate fruit, very elegant, there was good acidity, and great length – the floral finish really lingered in my nostrils. The funny thing is, everyone agreed that this wine fell off over the course of the evening, and was perhaps overshadowed rather than enhanced by our dinner (biryani-style rice with beef, watermelon radishes, green salad).
Mlodinow references a 1996 study presented in the Journal of Experimental Psychology that predates Hodgson’s work and offers up a current example that suggests ignoring the very specific flavor nuance claims of other enthusiasts and professional critics alike:
….. a 1996 study in the Journal of Experimental Psychology showed that even flavor-trained professionals cannot reliably identify more than three or four components in a mixture, although wine critics regularly report tasting six or more. There are eight in this description, from The Wine News, as quoted on wine.com, of a Silverado Limited Reserve Cabernet Sauvignon 2005 that sells for more than $100 a bottle: “Dusty, chalky scents followed by mint, plum, tobacco and leather. Tasty cherry with smoky oak accents…” Another publication, The Wine Advocate, describes a wine as having “promising aromas of lavender, roasted herbs, blueberries, and black currants.” What is striking about this pair of descriptions is that, although they are very different, they are descriptions of the same Cabernet. One taster lists eight flavors and scents, the other four, and not one of them coincide.
So much of the criticism on this subject to date has been directed at one or another critic’s track record or performance against self-proclaimed superiority. This was in evidence in Tyler Colman’s recent post about the Robert Parker lead 2005 Executive Wine Seminar tasting which Mlodinow also references in his WSJ piece and I wrote about here in a post entitled “A Roadside Bomb.” Even the most heralded wine critic of our time will have point variation and wide swings in perception about wines when subjected to blind tastings of the same wines multiple times.
The Mlodinow piece was refreshing in not picking on one system over another and for not maligning specific critics or the commercial realities inherent in reviewing and selling wine. Instead, it helped me and others feel good about the favored strategy of connecting with educated wine friends, sellers, and critics that align with our personal style preferences to taste what they recommend. So, it’s safe to go back in the water to taste everything you can, to make your own decisions about what you enjoy, while staying clear of the rating/review mousetrap in defense of the lurking influences to drink something you simply don’t like or understand.