Python-Wordlesolver: Solver for Wordle

Wordle Solver

This is simple Wordle solver written in Julia. The code is in the notebook wordle_solver.ipynb, which contains all necessary instructions. For more information about how the solver works, keep reading!

About the strategy

Wordle uses two word lists.

  • solutions.txt is the set of words that might appear as solutions to the puzzle. This list contains 2315 words.
  • nonsolutions.txt is the set of words that can be used as guesses, but will never appear as solutions. Contains 10657 words.

I extracted these word lists directly from the Wordle source code (see this article for details). Every time we guess a word, we get to know whether any of the letters were correct and if they were in the right location in the word, similar to the game Mastermind but with words instead of colors. The information returned narrows down the list of possible solutions. Of course, we want to narrow down the list as much as possible, but the amount of narrowing depends on the information we receive.

Example: Initially, there are 2315 possible solutions. Suppose we try "STUMP" as our first word. Here are some possibilities:

  • "00000": none of the letters were correct. This still gives us useful information, and we can narrow down the list of possible solutions to 730 words.
  • "10000": only the "S" is in the solution, but it is in the wrong spot. This narrows down the list of possible solutions to 87 words.
  • "10201": the "U" is in the correct spot, and the "S" and "P" belong to the solution but are in the wrong spot. This narrows down our solution to only two possible words ("PAUSE" and "PLUSH").
  • "12000": the "T" is in the correct spot, and the "S" belongs to the solution but is in the wrong spot. Although this seems less restrictive than the previous case, there is only a single word that fits this description! ("ETHOS")

So whenever we pick a word, there is a distribution of possible outcomes. Each different outcome (e.g. "10201") has an associated number of possible solutions associated with it. If we get lucky, that number will be small. In the case of "STUMP", the worst possible case is that we strike out and are still left with 730 possible solutions.

Max-size prioritization

One possible strategy is to consider this "worst possible case" and to pick the word for which the worst case has the shortest possible word list. Doing this leads to a first move choice of "ARISE" or "RAISE". With either of these first moves, again the worst possible outcome is that you strike out, except striking out now eliminates as many words as possible (mostly because r,s,a,i,e are very common letters). Starting with "ARISE" or "RAISE", we are guaranteed to reduce the list of possible solutions to no more than 168.

Max-entropy prioritization

Another possible approach is to view the distribution over outcomes as a probability mass function (PMF) and to pick the word that leads to a PMF with maximum entropy. This biases the choice towards distributions that are closer to being uniform. By making the distribution close to uniform, we are ensuring that all possible outcomes are similarly bad (i.e. none of them are too bad). The first move choice with the largest entropy is either "RAISE" or "SOARE", depending on whether you only use words in solutions.txt or allow all possible words, respectively.

How well do these strategies work?

If we prioritize Max-size and use Max-entropy as a tiebreaker, we obtain the following histogram of results, depending on whether only use guesses from solutions.txt or whether we use all admissible words as valid guesses. The code for generating these histograms is in performance.ipynb.

using only solution words as guesses using any guess

As we can see, we always finish in at most 5 moves, no matter what the unknown word is. If instead we prioritize max-entropy, we obtain slightly different results, shown below.

using only solution words as guesses using any guess

When prioritizing entropy, we get better average performance, but the worst-case performance is worse. Specifically, prioritizing entropy leads to a worst-case that may take 6 turns, although this is quite rare. Here is a summary of the four different cases considered above:

Guesses allowed First guess Heuristic used Average Guesses # > 4 guesses
Only from solutions list "RAISE" Max-size 3.551 101
Only from solutions list "RAISE" Max-entropy 3.495 93*
All 5-letter words "RAISE" Max-size 3.521 73
All 5-letter words "SOARE" Max-entropy 3.463 62*

When using the Max-entropy heuristic (*), the worst case takes 6 turns rather than 5.

Adversarial Wordle

Since the strategies above consider all possible valid words, they are just as effective if the hidden word is chosen adversarially. That is, the hidden word can change to escape detection, so long as it remains consistent with past guesses. Such "evil" variants exist, such as Absurdle. If we try the first heuristic from the table above (Max-size using only guesses from the solution list), we can solve Absurdle in 5 moves, as expected:

absurdle solution

How well can we hope to do?

The heuristics presented above are fundamentally greedy; we are only looking one move ahead when making each decision. For example, reducing the candidate word list at every turn does not mean the resulting smaller set of words will be easier to reduce in subsequent turns. I suspect there must exist better strategies that achieve e.g. a smaller expected number of moves, or a smaller probability of using 5 moves.

It would be interesting to see if a more complicated strategy is able to guarantee a solution in 4 turns!

Comments

  • Factor out common functions in notebooks + Perf Improvements + max-splits heuristic
    Factor out common functions in notebooks + Perf Improvements + max-splits heuristic

    Jan 12, 2022

    • We factor out common functions in performance.ipynb and wordle_solver.ipynb into a single utils file. This will make code reviews in the future simpler, since diffs of .jl files display better than those for .ipynb files.
    • Significantly improve the performance of the code in performance.ipynb by using a 2d-array to store the precomputed scores rather than a dictionary, and by making the score a UInt8, which takes up less space.
      • w/o hard mode, computing using all words goes from 900s -> 200s
      • Implementation:
        • get_group_sizes, find_move, trim_pool, apply_strategy and get_num_turns make use of type polymorphism to handle either an Int (corresponding to the index of the word in the list of all words) or a String (the word itself).
        • In addition, to get the score to fit in a UInt8, we treat the scores as ternary (e.g. 22222 in base 3 is 242 in base 10).
    • Add charts for max-splits heuristic + hard mode with all guesses.
      • Max-splits seems to perform consistently best on every metric for hard mode.

    NOTE: You'll see that there's some differences in our histograms. I haven't been able to dig in to exactly what the issues are, but I suspect it has to do with the change to tiebreaks (from lexical order on strings to the position of the words in the guessing list).

    Reply
  • Fix typo: acheive -> achieve
    Fix typo: acheive -> achieve

    Jan 9, 2022

    null

                                                                                                                                                                                                           
    Reply
  • Code to generate plots?
    Code to generate plots?

    Jan 9, 2022

    Would you share the code that you used to generate the plots? I wanted to try out different optimizations that might improve the average guesses required.

    Reply
  • possible bug in measure_word function
    possible bug in measure_word function

    Jan 10, 2022

    @vtjeng I think I might have made an error in measure_word. I don't think it handles all cases of repeated letters correctly. For example, I think that if the correct word is "STAGE" and you guess "SPASM", the code returns "20210". But in the actual Wordle game, it would return "20200", which signals that there is one "S" in the solution. These are rare occurrences since such scenarios don't come up very often... but if we are making histograms over all possible outcomes, I think it will definitely affect the results.

    The best solution here would be to dig into the Wordle source code to figure out exactly how word scoring is handled, but that might take some time.

    Edit: I posted this as a question on Twitter and got this response: https://twitter.com/LaurentLessard/status/1480609143027388420 So it seems that measure_word does indeed need to be updated.

    Reply
  • Add option to prioritize entropy rather than worst-case candidate count.
    Add option to prioritize entropy rather than worst-case candidate count.

    Jan 10, 2022

    We introduce a heuristic where we prioritize maximizing entropy ahead of minimizing the worst-case number of remaining candidates (a.k.a. the 'maximum group size'). Performance comparison

    | Guesses allowed | First guess | Heuristic used | Average Guesses | # > 4 guesses | | ------------------------ | ----------- | ------------------------- | --------------- | ------------- | | Only from solutions list | raise | Prioritize entropy | 3.501 | 89 | | Only from solutions list | raise | Prioritize max group size | 3.550 | 111 | | All 5-letter words | soare | Prioritize entropy | 3.472 | 65* | | All 5-letter words | raise | Prioritize max group size | 3.522 | 73 |

    * One of the words takes 6 guesses. The first guess in this case is "soare". Hard-coding it to "raise" gives slightly worse performance, and we still end up with one word that required 6 guesses.

    Different heuristics can be selected by modifying the heuristic parameter in get_move. We change the default behavior to maximize entropy, since this leads to the best performance.

    Additional Changes

    • Quicker Iteration: Improve performance (e.g. computing cached wordscores takes ~85s -> ~35s; computing the best starting word takes ~20s -> ~2s; overall runtime against largelist ~1300s -> ~900s), via:
      • measure_word: use integer rather than vector of strings that needs to be joined
      • get_groups -> get_group_sizes: we do not use the words in the group, so we can just update the size of each group.
      • find_move: computing the grplist and entrlist (renamed maximum_group_size and entropy) via a map, allowing for preallocation of a typed vector.
    • Easier Monitoring: Progress bars (via @showprogress) to long-lived operations
    • Less Repetition: Wrapped code in functions for:
      • parse_word_list: Parsing word list
      • get_num_turns: Computing the number of required turns
      • plot_num_turns: Plotting number of turns

    Note that the results with "any guess allowed" are slightly better in my implementation, with one more word requiring only 3 guesses and one fewer word requiring 5. I haven't looked into this deeply, but I am guessing it is caused by a subtle change to the tiebreaking rules when solution scores are tied in find_move (we always pick the lexicographically first word in the new implementation, while the previous implementation depennded on the sort-order).

    Reply
  • Is this the correct wordlist?
    Is this the correct wordlist?

    Jan 9, 2022

    The repo uses a "5 letter wordlist" with 2315 words. That seems rather small to me. Using the SOWPODS dictionary (or something very close to it) that I have on my local system, I find:

    % word '.*' | wc -l
    267751
    % word '.....' | wc -l
    12478
    

    I don't know if the Wordle authors have announced what wordlist they use, but mine is apparently about 5x as large as Laurent Lessard's. (word is a small script on my system that basically just wraps egrep ^$1$).

    Exploring further with my larger wordlist:

    % word '.....' | histogram
    6441 e
    6427 s
    5738 a
    4248 o
    4027 r
    3595 i
    3269 l
    3206 t
    2854 n
    2399 u
    2364 d
    1992 y
    1947 c
    1945 p
    1908 m
    1682 h
    1576 g
    1547 b
    1426 k
    1071 f
    1013 w
    661 v
    412 z
    270 j
    268 x
    104 q
    

    This probably suggests AROSE as a good starting point. But it seems probable that also considering digrams among 5-letter words might affect the optimal strategy.

    Reply