
I have analysed commonly used start words for the game Wordle which can be played here. A prime concept in this analysis is that there are a certain number of possible solutions (at the time of writing this is 1,735 and it reduces by 1 each day). Each guess that you make will reduce the number of possible solutions and once they reach 1 you have the answer.
The Criteria
For each chosen start word this analysis looks at the number of Remaining Solutions for each possible colour pattern and records 5 criteria:
- Max – The worst case number of Remaining Solutions.
- Entropy – The higher the number, the higher the probability that the guess will bring you closer to the correct answer than one with a lower value. This isn’t a concept that can be described easily in a few words, but essentially it adds up the probability of getting each colour pattern (eg bbbbb, bbbbg, bbbby,… where b is black, g green and y yellow) in such a way that the flattest distribution gives the highest value. Mathematicians call it the amount of information the guess will provide. Watch this video if you would like to know more about information theory entropy and how it is calculated.
- One – The number of colour patterns that will uniquely identify the solution (Remaining Solutions of 1). ie There will be times when given the chosen guess and the pattern that comes back, there will only be 1 possible solution remaining.
- Two – The number of solution words, out of the 1,735 (at time of writing) available, that will have two or less Remaining Solutions after using the starting word.
- Three – The number of solution words, out of the 1,735 (at time of writing) available, that will lead to three or less Remaining Solutions.
- I have also indicated whether or not the potential best starting word can be the answer. There are 2 reasons that the answer can be No. Firstly, Wordle has a much longer list of words that will be accepted as guesses, but which will not solve the puzzle. Secondly, Wordle does not reuse answers, so once a word has already been a solution it can no longer be a solution.
Results
Word | Max | Entropy | One | Two | Three | Answer? |
---|---|---|---|---|---|---|
about | 284 | 751 | 25 | 51 | 66 | y |
adept | 234 | 739 | 22 | 46 | 61 | y |
adieu | 218 | 600 | 16 | 30 | 45 | n |
aisle | 147 | 929 | 21 | 43 | 79 | y |
arise | 116 | 939 | 23 | 43 | 61 | y |
arose | 143 | 936 | 22 | 44 | 71 | y |
audio | 331 | 694 | 21 | 41 | 59 | y |
clamp | 449 | 803 | 28 | 54 | 72 | y |
clasp | 347 | 911 | 34 | 48 | 81 | y |
close | 190 | 938 | 20 | 60 | 102 | y |
cones | 206 | 919 | 28 | 66 | 84 | n |
depot | 284 | 769 | 24 | 46 | 79 | n |
hates | 212 | 864 | 28 | 46 | 61 | n |
heart | 208 | 1, 094 | 30 | 64 | 103 | y |
lance | 213 | 1, 039 | 31 | 61 | 91 | y |
leant | 164 | 1, 047 | 29 | 53 | 110 | y |
meaty | 252 | 787 | 21 | 53 | 68 | y |
ocean | 182 | 870 | 34 | 56 | 70 | y |
opera | 180 | 816 | 29 | 53 | 62 | y |
ouija | 395 | 448 | 16 | 30 | 42 | n |
parse | 206 | 1,166 | 32 | 76 | 118 | y |
pious | 354 | 685 | 18 | 38 | 68 | n |
plaid | 343 | 874 | 26 | 54 | 90 | y |
311 | 865 | 22 | 52 | 79 | n | |
recap | 243 | 861 | 23 | 53 | 86 | n |
roate | 156 | 936 | 21 | 47 | 65 | n |
salet | 165 | 1,147 | 31 | 69 | 96 | n |
scalp | 347 | 823 | 27 | 47 | 71 | y |
serai | 130 | 847 | 22 | 40 | 63 | n |
slate | 165 | 1,146 | 32 | 66 | 93 | n |
slice | 221 | 998 | 25 | 53 | 83 | y |
soare | 143 | 981 | 21 | 41 | 68 | n |
stale | 165 | 1,094 | 30 | 64 | 106 | n |
stare | 173 | 994 | 18 | 44 | 80 | y |
steal | 166 | 936 | 25 | 45 | 72 | y |
store | 187 | 938 | 22 | 52 | 82 | n |
strap | 271 | 896 | 28 | 58 | 88 | y |
tales | 165 | 917 | 23 | 51 | 63 | n |
trace | 203 | 1,169 | 32 | 76 | 100 | n |
train | 204 | 1,060 | 33 | 55 | 94 | n |
tramp | 330 | 788 | 27 | 47 | 74 | y |
trice | 217 | 1,051 | 29 | 59 | 95 | n |
tried | 243 | 1,044 | 32 | 64 | 79 | n |
Interpretation
Potential Answer
Theoretically, a guess that can’t be an answer could still be a good guess, but this is not supported by the evidence above. Words with “n” in the Answer? column aren’t standing out in this list enough to warrant an unwinnable step. Thus, whilst the unwinnable TRACE has the highest Entropy and Two scores it is only slightly better than the winnable PARSE, so I consider PARSE to be a better option.
Criteria Usefulness
None of these criteria are direct measures of the aim in playing the game. I play to find the word with as few guesses as I can. The best measure would therefore be about the number of guesses needed, which you would want to be less than 7. These criteria give numbers that can be much bigger than 7, meaning that small differences might make no or infrequent differences.
Number of guesses criteria are not provided because such measures require dealing with very large numbers and on using calculation techniques that are not easily emulated with spreadsheets.
The Max (Worst Case) criteria can be useful, however, it is the distribution of Remaining Solution sizes that is the key issue. Two (not real) extremes that illustrate the issue would be a word that provides potential answers in 17 groups of 100 and 1 of 35, versus another that provides 1 of 400 and 1335 groups of 1. The second distribution provides a better than 3 in 4 chance of finding the answer with one more guess, but its Worst Case is 4 times greater.
The Entropy criteria is a mathematically rigorous effort to evaluate which are the best distributions. However, depending on your strategy, it could be rational to put an even greater weight on small Remaining Solutions to improve the chances of quick answers at the cost of occasional slow answers, hence the One, Two and Three criteria.
Here is a table showing the number of words in categories of Remaining Solution sizes for a selection of the words in the table above.
trace | parse | salet | heart | train | raise | irate | arise | meaty | |
---|---|---|---|---|---|---|---|---|---|
Entropy | 1,169 | 1,166 | 1,147 | 1,094 | 1,060 | 997 | 969 | 939 | 787 |
Answer? | n | y | n | y | y | y | y | y | y |
>160 | 203 | 206 | 165 | 208 | 204 | 0 | 0 | 0 | 447 |
81-160 | 278 | 397 | 197 | 452 | 315 | 298 | 341 | 408 | 480 |
41-80 | 288 | 323 | 472 | 282 | 483 | 554 | 404 | 561 | 229 |
21-40 | 408 | 240 | 305 | 263 | 227 | 262 | 504 | 200 | 145 |
11-20 | 221 | 244 | 255 | 197 | 197 | 350 | 203 | 286 | 233 |
5-10 | 201 | 171 | 205 | 202 | 167 | 156 | 190 | 183 | 109 |
3-4 | 60 | 78 | 67 | 67 | 87 | 63 | 39 | 54 | 39 |
0-2 | 76 | 76 | 69 | 64 | 55 | 52 | 54 | 43 | 53 |
TOTAL | 1735 | 1735 | 1735 | 1735 | 1735 | 1735 | 1735 | 1735 | 1735 |
0-10 | 337 | 325 | 341 | 333 | 309 | 271 | 283 | 280 | 201 |
Conclusions
- There doesn’t appear to be a reason to used the allowed words that aren’t answers
- Entropy is the most reliable criteria that is easily calculated
- PARSE appears to be the best choice at the moment
- The vast majority of high Entropy words have only two vowels and they are usually A and E
- Three vowel higher Entropy words usually have A, E and I
- Four vowel words are universally poor choices
- Some 3 vowel words have reasonably high Entropy and significantly lower Max (good) but they also tend to have fewer small Remaining Solutions. The lower Entropy suggests that the low Max doesn’t make up for the poorer distribution.
Exit
Please leave comments if you feel there are ways that this analysis can be improved or corrected. Also, if you have start words that you would like me to analyse, let me know in the comments.