How does ZOS Generate Puzzles?
moultano
Creator of ns_shiva. Join Date: 2002-12-14 Member: 10806Members, NS1 Playtester, Contributor, Constellation, NS2 Playtester, Squad Five Blue, Reinforced - Shadow, WC 2013 - Gold, NS2 Community Developer, Pistachionauts
<div class="IPBDescription">A little bit of geeky interest.</div>I recall reading a while back that a generalized form of Sudoku is an NP-Complete problem. That seems like it would make it difficult to generate puzzles even on the limited size of a standard Sudoku board. How does the generation of a puzzle in ZOS work?
Comments
just my guess though <img src="style_emoticons/<#EMO_DIR#>/wow.gif" style="vertical-align:middle" emoid=":0" border="0" alt="wow.gif" />
Nanites.
<!--QuoteEnd--></div><!--QuoteEEnd-->
*hides*
Generate full board
while "number of desired empty fields" is smaller than x do the following
{
Remove random number
check if this can be solves within x steps of recursion
if it can, then repeat
else reinsert the number
}
print
If you have a code, that can solve solvable sodokus, then you can use it to create puzzles by recursion. The problem with this method is, that it is cpu and time consuming and the generation can fail. So you need quite a bit of code to catch failed generations <img src="style_emoticons/<#EMO_DIR#>/wink-fix.gif" style="vertical-align:middle" emoid=";)" border="0" alt="wink-fix.gif" />
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 - - - - - -
4 5 6 - - - - - -
7 8 9 - - - - - -
<!--c2--></div><!--ec2-->
The mean the top row only has 4 5 6 7 8 9 left for choices (6 factorial). The second row has 1 2 3 7 8 9 left and the third row has 1 2 3 4 5 6 left.
Radomly pick the remaining top row from the remaining 4-9 choices.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 - - - - - -
7 8 9 - - - - - -
<!--c2--></div><!--ec2-->
Now look at the second 3x3 cell.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
9 8 7
- - -
- - -
<!--c2--></div><!--ec2-->
There is a very small subset of possible items. 6 5 4 3 2 and 1 must be used but 6 5 4 cannot be used on row and 7 8 9 cannot be used in the third row. That means that the second row has 3 factorial possibilities left for combinations or 1 2 3 and the third row has the same for 4 5 6. 3 factorial squared?
Example:
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 - - -
7 8 9 4 5 6 - - -
<!--c2--></div><!--ec2-->
No matter the combination, there is no dirrect effect on the remaining 6 empty choices; however, the 3 remaining for the second row have more possibilities: 1-3 and 7-9 for each item. There are only 3 factorial choices remaining for the third row for 1-3. Once you've chosen the bottom row, however, you actually finally reduce the second row back to 3 factorial choices. In the example below, only 7 8 9 are possibilities now.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 - - -
7 8 9 4 5 6 - - -
becomes
1 2 3 | 9 8 7 | 6 5 4
4 5 6 | 1 2 3 | 7 8 9
7 8 9 | 4 5 6 | 1 2 3
<!--c2--></div><!--ec2-->
Now, use <i>almost</i> the same exact process for the columns below the seed cell.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
2 - -
3 - -
5 - -
6 - -
8 - -
9 - -
<!--c2--></div><!--ec2-->
If you wanted to be shifty, you could mirror image the game board and use the same algorithm above and then re-invert it. The only difference between the two is that the first 3 columns would be full when inverted.
Column 2 has 1 3 4 6 7 9 for choices.
Column 3 has 1 2 4 5 7 8 for choices.
The first 3 choices for both columns must contain 1 4 6 7 8 9
The second 3 choices for both columns must contain 1 2 3 4 5
Note that 1,4,7 are shared by both column choices and cannot be used in the same cells.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
2 1 4
3 6 7
5 9 8
6 - -
8 - -
9 - -
<!--c2--></div><!--ec2-->
Note that only 9 and 8 must be in a specific column. 1,4,6,7 can go anywhere.
Now, column 2 has 3 4 7 for choices.
Column 3 has 1 2 5 for choices.
Neither column has a limitation for choice of placement.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
2 1 4 - - - - - -
3 6 7 - - - - - -
5 9 8 - - - - - -
6 3 1
8 4 2
9 7 5
<!--c2--></div><!--ec2-->
Row 3 must contain 3 5 6 7 8 9.
- 9 must not be in the first choice
- 8 or 5 cannot be the second choice
- 7 3 or 6 cannot be the third choice
- 6 or 7 cannot be the fourth choice
- 5 or 8 cannot be the fifth choice
- 9 or 3 cannot be the sixth choice
Row 4 must contain 1 2 4 5 8 9.
Row 5 must contain 1 2 3 4 6 7.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 - - - - - -
5 9 8 - - - - - -
6 3 1
8 4 2
9 7 5
<!--c2--></div><!--ec2-->
Note that the only decision really had to be made above was to use 9 before 7. Every other choice simply picked the first available number not used in the row and column.
Now, can the new "3 6 8" cell be a legal cell? It must contain 1 2 4 5 7 9 and there must be both a row and a column that do not use the remaining numbers.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 2 4 5 - - -
5 9 8 7 1 9 - - -
6 3 1
8 4 2
9 7 5
<!--c2--></div><!--ec2-->
Row 5 must use 1 4 8
Row 6 must use 2 3 6
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 2 9 5 4 1 8
5 9 8 7 1 4 2 3 6
-------------------
6 3 1 - - - - - -
8 4 2
9 7 5
<!--c2--></div><!--ec2-->
Row 7 must contain 2 4 5 7 8 9.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 2 9 5 4 1 8
5 9 8 7 1 4 2 3 6
-------------------
6 3 1 5 4 9 8 7 2
8 4 2
9 7 5
<!--c2--></div><!--ec2-->
Once again, I chose the first available legal number from the set.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 2 9 5 4 1 8
5 9 8 7 1 4 2 3 6
-------------------
6 3 1 5 4 9 8 7 2
8 4 2 - - -
9 7 5 - - -
<!--c2--></div><!--ec2-->
The cell must contain 1 2 3 6 7 8.
Column 3 must contain 6 8.
Column 4 must contain 3 7
Column 5 must contain 1 2.
Now, the choices are simple.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 2 9 5 4 1 8
5 9 8 7 1 4 2 3 6
-------------------
6 3 1 5 4 9 8 7 2
8 4 2 6 7 1 - - -
9 7 5 8 3 2 - - -
<!--c2--></div><!--ec2-->
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 2 9 5 4 1 8
5 9 8 7 1 4 2 3 6
-------------------
6 3 1 5 4 9 8 7 2
8 4 2 6 7 1 - - -
9 7 5 8 3 2 - - -
<!--c2--></div><!--ec2-->
Only 1 3 4 5 6 9 remain for choices.
Column 7 must contain 3 9
Column 8 must contain 4 6
Column 9 must contain 1 5
We've finally unresolved conflict(s).
The 9th row has both 9 and 3 making column 7 impossible to solve.
The 8th row was both 4 and 6 making column 8 impossible to solve.
I'll go ahead and fill in the remaining columns with conflicts and then do some simple flipping between the items to find a legal final solution.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 9 8 7 6 5 4
4 5 6 1 2 3 7 8 9
7 8 9 4 5 6 1 2 3
-------------------
2 1 4 3 6 8 5 9 7
3 6 7 2 9 5 4 1 8
5 9 8 7 1 4 2 3 6
-------------------
6 3 1 5 4 9 8 7 2
8 4 2 6 7 1 3 6 5 (2 6's)
9 7 5 8 3 2 9 4 1 (2 9's)
<!--c2--></div><!--ec2-->
Now, to resolve the final conflicts - the final 2 items in every column can be inverted until the the remaining cell is no longer conflicting.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 | 9 8 7 | 6 5 4
4 5 6 | 1 2 3 | 7 8 9
7 8 9 | 4 5 6 | 1 2 3
-------------------
2 1 4 | 3 6 8 | 5 9 7
3 6 7 | 2 9 5 | 4 1 8
5 9 8 | 7 1 4 | 2 3 6
-------------------
6 3 1 | 5 4 9 | 8 7 2
9 4 2 | 8 7 1 | 3 6 5
8 7 5 | 6 3 2 | 9 4 1
<!--c2--></div><!--ec2-->
I flipped 9 and 8 (end of column 1), 8 and 6 (end of column 4) and it appears all conflicts have been resolved. Both items share the common 8, so a new conflict is not created. There were only about 3 locations in the algorithm where you couldn't blindly place numbers and some conflict resolution had to be performed.
To sum it up:
- Choose a seed
- Generate the first incomplete row
- Finish the two remaining 3x3 cells connected to the row
- Invert (repeating steps 2 and 3)
- Invert again
- Generate the first incomplete row
- Finish the two remaining 3x3 cells connected to the row
- Generate the first incomplete row
- Finish the two remaining 3x3 cells connected to the row - ignore incosistancies for the final cell
- Resolve final remaining inconsistancies by flipping the last 2 column items using the final cell's last 3 possibilites
If I had to guess, this on-the-fly algorithm should be able to handle most conflicts. As you noticed, only generating the final cell produced inconsistancies that could only be resolved by altering the previous 2 cells. The final cell has the most restrictions on it, so I'd expect problems to manifest most in this step.
The final math resolves down to much much less than (6 * 5 * 4 ) * 6! * 8 [691,200] once a seed is chosen to complete all cells. That's (6 choose 3) for each cell's first row (position is important), 6 factorial for the remaining 2 rows of the cell, 8 cells total (9 cells minus 1 seed). None of these calculations take into account the prunning done by excluding numbers when illegal combinations are detected.
To clarify, that's not <i>the</i> way, but <b>a</b> way. I tend to find clever - yet still simple - shorcuts to reach the same goal. You could just as well keep trying exhaustive recursive trial-and-error options. With today's processing power, you can get away with a lot more gross misuses of horsepower.
<!--QuoteEnd--></div><!--QuoteEEnd-->
Looks good for a start, but then once you have a completed board how do you go about removing tiles in such a way that ensures a unique solution? I'm also not entirely convinced that the swapping to avoid conflicts won't go on through many iterations.
<!--quoteo--><div class='quotetop'>QUOTE</div><div class='quotemain'><!--quotec-->i guess the "usual" approach with backtracking and heuristics? ^^<!--QuoteEnd--></div><!--QuoteEEnd-->
This seemed like a likely approach to me, but it still isn't clear to be how you ensure that your final board has a unique solution. I suppose you could just generate a set of candidate boards and then run a solver to find those with multiple solutions, then add tiles until all but one solution are excluded. I don't know how long a sudoku solver would take per board. I'd probably go with a constraint propagation model + backtracking. Then we've still got the problem of making boards of various difficulty, but I'd guess that the number of open tiles is a decent heuristic for that.
Looks good for a start, but then once you have a completed board how do you go about removing tiles in such a way that ensures a unique solution? I'm also not entirely convinced that the swapping to avoid conflicts won't go on through many iterations.<!--QuoteEnd--></div><!--QuoteEEnd-->
There's an unproved theory that you need a minimum of 17 hints to ensure there is only 1 (one) unique solution, but some puzzles do require more. Also, the number or tiles present doesn't indicate the difficulty of solving the problem. Swaping can be more involved (I'll refer to this later as a dependency chain).
The "wise" thing to do to would be to show nodes that have the highest number of dependancies first. This means you need to create another data structure to store the metadata (or the data about other data) and a graph is the obvious choice. Every individual number depends on the row, the column, and the surounding 3x3 cells. You can easily argue that ever item indirectly depends on ever other possible item. You need a good heuristic for choosing after you disperse your first guess hints and your first guess should also use hueristics as well.
If you look at a few puzzles, you'll start to notice some things are evident. I'd be surprised to find a puzzle where every number (1-9) is not shown, the average hints shown per row is not relatively close to 4, the average hints per column is not relatively close to 4, and the average numbers per 3x3 is not relatively close to 4, and the average appearance of each number is not relatively close to 4. For example, if the standard deviation of number per row/column/cell is wide, then that might mean more possibilities must exist for those rows/columns/cells that have few hints. Those cases have the ability to exponentially increasing the total possibilities.
You could conclude that you'll need to distribute your hints well before you even begin testing for uniqueness. There are also some easy situations to identify that will cause multiple solutions. For example, if there exists mirror image pairs X and Y in the same 2 rows, they can be interchanged so one of the numbers must be displayed.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 | 9 8 7 | 6 5 4 <<<
4 5 6 | 1 2 3 | 7 8 9
7 8 9 | 4 5 6 | 1 2 3
-------------------
2 1 4 | 3 6 8 | 5 9 7<<<
3 6 7 | 2 9 5 | 4 1 8
5 9 8 | 7 1 4 | 2 3 6
-------------------
6 3 1 | 5 4 9 | 8 7 2
9 4 2 | 8 7 1 | 3 6 5
8 7 5 | 6 3 2 | 9 4 1
^ ^
^ ^
<!--c2--></div><!--ec2-->
Using the example above, look at the numbers where the "arrows" intersect. If you swap 2x1 and 1x2, the puzzle now has another possible solution. So, one of those 4 numbers <b>must</b> be displayed.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
2 1 3 | 9 8 7 | 6 5 4
^ ^
4 5 6 | 1 2 3 | 7 8 9
7 8 9 | 4 5 6 | 1 2 3
-------------------
1 2 4 | 3 6 8 | 5 9 7
^ ^
3 6 7 | 2 9 5 | 4 1 8
5 9 8 | 7 1 4 | 2 3 6
-------------------
6 3 1 | 5 4 9 | 8 7 2
9 4 2 | 8 7 1 | 3 6 5
8 7 5 | 6 3 2 | 9 4 1
<!--c2--></div><!--ec2-->
In the first building example, remember the last 2 rows and the conflicts found and their solution? (see below)
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 | 9 8 7 | 6 5 4
4 5 6 | 1 2 3 | 7 8 9
7 8 9 | 4 5 6 | 1 2 3
-------------------
2 1 4 | 3 6 8 | 5 9 7
3 6 7 | 2 9 5 | 4 1 8
5 9 8 | 7 1 4 | 2 3 6
-------------------
6 3 1 | 5 4 9 | 8 7 2
9 4 2 | 8 7 1 | 3 6 5
8 7 5 | 6 3 2 | 9 4 1
<!--c2--></div><!--ec2-->
Both 9/8 and 8/6 contain an eight. If swapped, the generate a conflict in the right-most, bottom-most cell. If goes without saying that if you can show one of those pairs and likely lock it into a single solution.
Still, look at the same 2 last rows by columns and look at the dependency chain. To swap 9 and 8, you'd have to swap 8 and 6. So it's given that you'd need to swap 6 and 4 and you'd need to swap 4 and 7, then 7 and 3, then 9 and 3 to have a legal puzzle.
<!--c1--><div class='codetop'>CODE</div><div class='codemain'><!--ec1-->
1 2 3 | 9 8 7 | 6 5 4
4 5 6 | 1 2 3 | 7 8 9
7 8 9 | 4 5 6 | 1 2 3
-------------------
2 1 4 | 3 6 8 | 5 9 7
3 6 7 | 2 9 5 | 4 1 8
5 9 8 | 7 1 4 | 2 3 6
-------------------
6 3 1 | 5 4 9 | 8 7 2
8 7 2 | 6 3 1 | 9 4 5
9 4 5 | 8 7 2 | 3 6 1
<!--c2--></div><!--ec2-->
So, the last 2 numbers of columns 1, 2, 4,5, 7 and 8 are dependant: now reading as 8/9 7/4 6/8 3/7 9/3 and 4/6. So, just to swap the 8's in the last 2 rows, you have to inturn swap 6 columns in total to ensure a legal. That means that 2/5 1/2 and 5/1 are the second dependancy chain. I'd say it goes without saying that you must also show some of these number to ensure uniqueness. I don't know offhand what percentage of the numbers you'd need to show. I'd also say that the 3 column dependancy would be much easier to solve than the 7 column dependancy chain. But a 2 column mirror image pair would be the *easiest* to solve. So, there's a genuine heuristic that can be used for difficulty that will work.
Also, you can't forget about the dependancies between the 7'th and the 8'th rows. All 3 rows (7-8-9) are dependant via the cells and can be swapped without changing the column requirements (just the row requirements are changed). The chains in the above solution are [6/8 8/9 9/1 1/2 2/5 5/6] and [3/7 7/4 4/3] for rows 7 and 8.
Now, you can do the same process between columns instead of rows. The rules are the same for the 3x3 boundaries.
----
The rows and columns must present enough information so that certain blank items will have only 1 possible solution. There's an excellent example here in the first picture: <a href="http://www.sudoku.com/howtosolve.htm" target="_blank">http://www.sudoku.com/howtosolve.htm</a>
Every time you remove an item from the puzzle, you can increase the possibilites in other blank cells. Storing these possible legal numbers per item also gives you a rather good heuristic for puzzle dificulty. The more "only one possible solution" items that exist, the easier the puzzle. The average possibilities per item also gives you a rather good heuristic (keeping standard deviation in mind) for total difficulty. You'll notice that you need to take into account 3 rows that comprise three 3x3 cells (and the same goes for 3 columns).
Conversely, every time you add an item to the puzzle, you decrease the possibilites for every item the the same cell, row, and column. Puzzles that are likely to create more single possibility items as the puzzle is traversed also reduce the overall complexity.
----
Anyways, that's a long way of saying you need to show stuff that reduces the puzzles complexity which also reduces the possibitilities to a state where checking for uniqueness will be the "making sure" step because it heuristicly looks like it will produce a unique solution.