Using Data to Your Advantage: Winning at Wordle

Written by Benjamin Simon: 

One of the most popular trends during the first half of 2022 is the emergence of Wordle. Created in 2021 by Josh Wardle, the word-guessing game now has hundreds of thousands, even millions, of daily users. Wordle’s rapid increase in popularity can be attributed to how easy it is to share results with others through text or social media. The ability to share the Wordle experience brings users to the site and incentivizes competition, which in turn motivates people to apply strategy to their game. As social scientists, we can use data to our advantage when playing Wordle to maximize better outcomes. Benjamin guides us on how we can leverage data science concepts to become better players and how to maximize efficiency with every turn.

 

Wordle gives players six guesses to enter the correct word of the day. In each guess, a letter entered in the correct spot will appear green; a letter that is in the word but is placed incorrectly will be yellow; and any letters that are not in the word remain grey. To identify a handful of the most efficient words to try on the first guess, we can use letter frequency. One of the most common strategies is starting with four-vowel words, such as “ADIEU” or “AUDIO.” Strategies differ with some people choosing the first word at random, and still others entering four words that consist of 20 unique letters in the English alphabet.

Wordle’s software contains a list of 2,315 words from which one is chosen at random each day. This means a few things for Wordle players. On one hand, the game currently has an expiration date in the Summer of 2027. On the other hand, there is a fixed set of 11,575 characters from which we can calculate the frequency of all 26 letters in the alphabet. To skew your guess distribution in your favor, the best strategy to use is to rely on the data.

In order, the five most common letters in the word set are “E,” “A,” “R,” “O,” and “T.” There are two five-letter words that contain each of those letters: “ORATE” and “OATER.” Using overall letter frequency, those two words give the best chance of containing the correct characters. There is also a handful of letters to avoid. In descending order, each of these characters is found 40 or fewer times: “Z,” “X,” “Q,” and “J.” Let’s assume that all 27 “J’s” featured in the Wordle list were found in different words. This would mean that less than 1.2% of Wordle words contain the letter “J”. Steer clear of these four low frequency letters!.

Another way that data can inform your guess is through probabilities related to the frequency with which certain letters appear in certain positions in a word. For instance, just seven letters in the alphabet are the first character in over half of the words on the Wordle list. The most common letter is “S,” although it is not one of the five most common characters. With this in mind, you can calculate the frequency with which a letter is present when a certain letter is the beginning of a word. To end a word, “Y” is among the most commonly used letters. It is possible to identify letter frequency when “Y” ends off a word, as well. This same relationship can be made with any letter in any placement in a word, including words with duplicate letters.

So, how will you approach tomorrow’s word? Will you try “ORATE” to eliminate the five most common letters? Another possibility is to use “STORY” keeping in mind the likelihood of words starting with “S” or ending with “Y.” There is also nothing wrong with eliminating four vowels with one entry. Whether playing for fun or competitively, I urge you to give these strategies a try. Ultimately, whoever has the data has the upper hand.