Just exactly how many tries should one expect to solve one Wordle problem?

TL;DR: I actually don't know the exact number but I'm fairly certain that it takes roughly 4.18 tries.

NOTE: I base my modelling on the game itself with no other information - no source code peeking, no asking its devs, no nothing.

If you don't know Wordle is an online game where you guess a five-letter word. The biggest appeal is obviously the fact that it's daily - one problem per 24 hours. One can easily make an "unlimited" edition but I can clearly see how the fun collapses about such a simple game when presented without a restriction so I'd rather save my time for something else.

I myself have been participating the game for days along side with a helper script containing 8000+ words that allows me to search through words easily. One day I looked at the 3/6 score I just got and wondered what is the theoretical average score; but the math is rather tricky here because:

If guessed properly, each guess introduces new information.
The exact probability is tied to the dictionary Wordle is using.

so I decided to Monte Carlo like every good (read: lazy) programmer does.

Finding a good dictionary

The dictionary I used for my helper script is for Scrabble which contains lots of rarely used words, or at least words that definitely isn't in Wordle's dictionary (e.g. Italy; I've tried that one before), so to get an (at least more) accurate result I have to find another one. I was surprised you can actually find one on Project Gutenberg, but it's too old to be of any use, So I used English Lexicon Project to get a (hopefully complete enough) list of words.

# requires python 3.6+
with open('Items.csv', 'r') as f:
    s = f.read()

is_word = lambda x: all('a' <= i <= 'z' for i in x)
# because of the format of the csv file i can do this little shortcut...
word_list = [eval(f'[{i}]') for i in s.strip().split('\n')]
word_list = [i[0] for i in word_list if len(i[0]) == 5 and is_word(i[0])]

This produces a list of 4612 words.

Modelling the guessing process

The guessing process is as follows:

Randomly select a word as the target.
Randomly select another word to check with the word selected in step 1.
Filter the word list according to the result of the check.
Repeat step 1~3 until I have an exact match or I run out of attempts.

Randomly selecting a word

This part is easy:

import math
import random

# `random.random` generates a number in the interval [0, 1)
select_word = lambda ss: ss[math.floor(random.random() * len(ss))]

Check a word against another word & filtering the word list

def check_word(word1, word2, word_list):
    '''
    Returns `True` if `word1` and `word2` is an exact match; otherwise
    returns the filtered `word_list`.
    '''
    if word1 == word2: return True

    for i, ch in enumerate(word1):
        if ch not in word2:
            # the current character does not exist in word2
            word_list = [w for w in word_list if ch not in w]
        elif word1[i] == word2[i]:
            # the current character exists in the same place in word2
            word_list = [w for w in word_list if w[i] == ch]
        else:
            # the current character exists in word2 but it's not in the same place
            word_list = [w for w in word_list if ch in w and w[i] != ch]
    return word_list

Repeat until having exact match

def play(word_list):
    '''
    Simulates a game. Returns the number of attemps.
    '''
    target = select_word(word_list)
    current_attempt = 0
    while True:
        check_res = check_word(select_word(word_list), target, word_list)
        current_attempt += 1
        if check_res is True:
            return current_attempt
        else:
            word_list = check_res
    return False

Note that in Wordle you can at most try 6 times but I didn't put the restriction here; this is for easy calculation in further steps.

The simulation

def simulate(n, word_list):
    '''
    Simulate `n` games with the list of words `word_list`. Returns a dict of
    number of guesses. For example a probable result could be like this:

        >>> simulate(100, ss)
        {3: 23, 4: 32, 5: 28, 6: 12, 7: 4, 2: 1}

    This means in the 100 simulated games, 23 of them ended with 3 guesses, 32
    of them ended with 4 guesses, etc..
    '''
    counter = {}
    for _ in range(n):
        play_res = play(word_list)
        if not counter.get(play_res):
            counter[play_res] = 0
        counter[play_res] += 1
    return counter

def calculate(n, word_list):
    '''
    Calculates the average of # of guesses in `n` games.
    '''
    simulate_result = simulate(n, word_list)
    result = 0
    for i in simulate_result.keys():
        n_of_games = simulate_result[i]
        result += i * (n_of_games / n)
    return result

And finally:

>>> r = [calculate(10000, ss) for _ in range(10)]
>>> r
[4.5733, 4.582299999999999, 4.589299999999998, 4.5735, 4.58, 4.5895, 4.5582, 4.5870999999999995, 4.5811, 4.571999999999999]
>>> sum(r) / len(r)
4.5786299999999995

The result seems to be roughly 4.579 tries.

Using a smaller dictionary

What if Wordle uses a much smaller dictionary, like instead of 4600+ words it's ~2500 words? Would the result be then different?

Good thing the data from English Lexicon Project comes with word frequencies so we can select the most used 2500 words:

word_list = [eval(f'[{i}]') for i in s.strip().split('\n')]
# change starts from this line:
word_list = [i for i in word_list if len(i[0]) == 5 and is_word(i[0])]
word_list = [i[0] for i in sorted(word_list, key=lambda i: i[3])[-2500:]]

>>> r = [calculate(10000, ss) for _ in range(10)]
>>> r
[4.1902, 4.1827000000000005, 4.1943, 4.1728000000000005, 4.197099999999999, 4.1675, 4.197900000000001, 4.1916, 4.1792, 4.1639]
>>> sum(r) / len(r)
4.18372

Now it's roughly 4.18 tries. This fits with our intuition: the lesser the choice the more easier it is to get the right answer.

2022.1.13