The Hidden Logic of Sudoku
A new conceptual framework
for solving Sudoku puzzles with pure logic
Denis Berthier
Second, revised and
extended edition: November
2007
29.95 euros
(paperback, 416 pages, Royal: 6.14" x 9.21" - 15.6 cm x 23.4 cm)
ISBN : 978-1-84799-214-7
Lulu.com Publishers
Support print-on-demand publishing, buy this book directly from
Lulu:
(First edition published by Lulu.com, May 2007, ISBN :
978-1-84753-472-9)
© Denis
Berthier. All the material in this page and the pages it gives
access to are the property of the author and may not be
re-published or re-posted without his prior written permission.
Document1
The Hidden Logic of Sudoku
provides the first systematic perspective of the logical
foundations and of the symmetries of the popular game. These
are fully exploited to define new kinds of resolution rules,
new graphical representations that may ease the solving
process and a precedence ordering of the rules consistent with
their logical complexity.
In addition to a few elementary rules,
the very classical and basic pattern of xy-chains has been
extended as far as its underlying logic allowed, into a
homogeneous set of chain rules of progressively increasing
complexity. It suffices to solve almost any puzzle without
making guesses, dealing with chains of subsets or with nets or
assuming the uniqueness of a solution.
These rules are illustrated with a
hundred puzzles, together with their full resolution paths.
They have been tested in an Artificial Intelligence (AI)
engine and tens of thousands of puzzles have been processed,
leading to a precise evaluation of the efficiency of each
rule.
This book is intended for Sudoku players
of all levels: they will discover many new facets of the game
and new types of resolution rules - introduced in a
pedagogical way and set in a uniform conceptual framework
based on patterns. It is also intended for teachers or
students of Logic or AI: they will appreciate the strict
logical foundations.
Keywords: Logic Games,
Sudoku, Artificial Intelligence.
What's new in the second edition (pdf)
Contents (pdf)
Introduction (pdf)
Conclusion (pdf)
Backcover (pdf)
Press release (pdf)
Press release (french version)
(pdf)
Extended
Sudoku
Board (pdf)
This is a Sudoku Board specifically designed for the easy
application of the Sudoku resolution method defined in the book.
Using this board, complex hidden patterns will be made obvious. How
it can be built and used is described in full detail in the book.
This board is copyrighted with the only goal that it will not be
patented by anybody and will always remain available to the whole
Sudoku community. Feel free to use it and to tell me about your
experience with it.
Frequently
Asked Questions
errata:
p. 48, section III.1.6.1, a bug in the pdf generator makes the sort
axioms illegible. They should read:
forall n
(n=1n ∨ n=2n ∨ n=3n
∨
n=4 n ∨ n=5n ∨ n=6n ∨ n=7n ∨ n=8n ∨ n=9n)
forall r
(r=1r ∨ r=2r ∨ r=3r ∨ r=4r ∨ r=5r ∨ r=6r ∨ r=7r ∨ r=8r ∨ r=9r)
forall c
(c=1c ∨ c=2c ∨ c=3c ∨ c=4c ∨ c=5c ∨ c=6c ∨ c=7c ∨ c=8c ∨ c=9c)
forall b
(b=1b ∨ b=2b ∨ b=3b ∨ b=4b ∨ b=5b ∨ b=6b ∨ b=7b ∨ b=8b ∨ b=9b)
forall s
(s=1s ∨ s=2s ∨ s=3s ∨ s=4s ∨ s=5s ∨ s=6s ∨ s=7s ∨ s=8s ∨ s=9s)
forall ut
(ut=row ∨ ut=col ∨ ut=blk)
************************************************************************************************
**********************
**********************
**********************
ONLINE
SUPPLEMENTS TO THE
BOOK
**********************
**********************
for
advanced players (not the good place to start with)
**********************
**********************
**********************
************************************************************************************************
WARNINGS:
1) Part of the material presented here is more complex than what you
can find in the book and it is for advanced players. The basics are
in the book. Although some definitions are recalled here, this is
not done systematically. Said otherwise, this is not the good place
to start with.
2) Most of the material here is a compilation of ideas that I have
already published on Web Forums and/or in international conferences.
Some of it already appears in the second edition of the book.
I compiled this material here in order to make it more easily
available than lots of posts disseminated in Web forums, which are
always plagued with tons of unrelated stuff.
Notice however that, although a part of it can be understood
independently by advanced players, this is not designed as a fully
self-contained set of pages but as a set of supplements to the book
(but, with time and several re-writings, this has become more
self-contained).
3) WORK IN PROGRESS. At a few places, the writing may be sketchy,
but all the definitions are precise.
4) Part of the material below is superseded by the contents of my
more recent books: "Constraint
Resolution Theories" and "Pattern-Based
Constraint Satisfaction". But the book itself is not
superseded by these newer ones, because it contains Sudoku specific
aspects that are not recalled in them. These new books are best
understood as a sequel to "The Hidden Logic of Sudoku".
1) GENERAL CONCEPTS AND TECHNIQUES
1.1) The concept of a
resolution rule
discusses the general concept of a resolution rule and its specificity
with respect to a "resolution technique".
The concepts of a resolution theory and a resolution path are also
defined, together with the very important property a resolution
theory can have: the confluence
property.
Resolution rules can be classified into various large families. The
classification introduced in the resolution rules page is not
exhaustive, new families might be discovered. But it is useful to
understand that there are several very different types of rules.
Basic patterns (such as naked or hidden subsets and "fish") have
been known for a long time. But when basic patterns are not enough
to solve a puzzle, chain patterns become the next tool. This is why
much emphasis will be put on chains (although, of course, my
conceptual framework is not limited to chains).
The resolution rules page introduces various important properties a
resolution rule for chains can have, such as reversibility,
non-anticipativeness
(or no look-ahead), composability.
All the chains, whips and braids introduced below are
non-anticipative and composable. In addition, the chains and whips
that do not contain the t- extension (xy, xyz, nrc, nrcz) are
reversible.
The most basic chains are the standard xy-chains, i.e. sequences of
bivalue cells linked in rc-space, and with contents {x1 y1} - {x2
y2} - {x3 y3} - {x4 y4} - ..... - {xn yn}, such that for all n>1,
xn+1=yn and yn=x1. A target is a candidate z equal to both x1 and yn
and located in a cell linked to both the first and the last cells;
the xy-chain rule says that any target can be eliminated.
The (very standard) logic of an xy-chain
(i.e. the once and for all proof of the xy-chain theorem) is as
follows:
If target
then not x1
then y1
then not x2
then y2
......
then
not xn
then
yn
then
not target.
Therefore not target
Once the logic of an xy-chain is understood, then the logic of all
the other types of chains, whips and braids is easily understood,
because all of them are conceptually simple generalisations of
xy-chains.
Chains with the t- extension (xyt, hxyt, xyzt, hxyzt, nrct, nrczt)
can also be called oriented chains, because, contrary to xy, xyz,
nrc or nrcz chains, they cannot be reversed from head to tail.
Here are the most basic definitions relevant to chains:
Definition: two candidates n1r1c1 and n2r2c2 are nrc-linked if they are different
and:
- either n1=n2 and the two rc-cells (r1, c1) and (r2, c2) share a
unit in rc-space,
- or n1 ≠ n2 and the rc-cells (r1, c1) and (r2, c2) are the
same.
Being nrc-linked is the most general, fully super-symmetric support
for the immediate detection of a contradiction between two
nrc-candidates. It is quasi "physical" in the sense that it depends
only on the grid structure.
Definition: a chain is a sequence (i.e. a
strictly ordered set) of candidates, such that the first and the
last candidates in the sequence are different (there is no global
loop) and any two consecutive candidates are nrc-linked.
It can easily be checked that the above definition of an xy-chain
can be re-written in this format.
A chain in this purely descriptive sense thus satisfies the two
basic conditions without which the notion of a chain would loose any
meaning: linearity
and nrc-continuity.
There is also an implicit third general condition that any specific
type of chain should satisfy: consistency between the factual
description and the proof of the eliminations, i.e. one should be
able to prove the associated resolution rule in a way that
consistently follows the linear structure (e.g. without having to
anticipate on later parts of the sequence). All the types of
chains/lassos and whips we shall define will satisfy all these
conditions.
Definition: a target of a chain is a
candidate that does not belong to the chain and that is nrc-linked
to both endpoints of the chain.
Of course, chains of specific types will satisfy additional
conditions allowing to prove associated resolution rules. These
resolution rules will always conclude that any target can de
deleted.
1.2) xyt-, xyz- and xyzt-
chains have been defined in the first edition of
the book. They are also discussed in this
thread of the Sudoku Player's Forum. I haven't yet had time to
write a synthesis of this thread.
1.3) hidden-chains (hxyt,
hxyz, hxyzt) have been defined in the first edition
of the book. They are also discussed in this
thread of the Sudoku Player's Forum. I haven't yet had time to
write a synthesis of this thread.
1.4) nrczt-chains
and lassos have been defined in the second edition of
the book. They are the "3D" counterpart of the xy-, xyt-, xyz- and
xyzt- chains.
1.5) Whips are mainly another
view of chains and lassos, unifying them into a single concept.
1.6) Subsumption
theorems prove that nrczt-whips subsume (are more
general than) many previously known techniques.
1.7) Braids
are an extension of whips and are precisely related to the Trial and
Error procedure via the braids vs
T&E theorem. Naturally defined nrczt-braid resolution
theories have the confluence property.
1.8) x2y2-belts
are my interpretation of Steve's "hidden pairs loops" (introduced to
solve EasterMonster). But this page also contains indications on how
to define formally a non ambiguous resolution rule when we only have
examples of what it should do.
1.9) The
strategic level discusses different ways of defining
resolution strategies based on a set of resolution rules
2) TECHNIQUES FOR EXCEPTIONALLY HARD PUZZLES
Basic nrczt-whips are enough for solving all the puzzles that a
human solver is able to solve - and much more, as shown by my classification results. based
on ~ 10,000,000 random puzzles.
However, as there has been some interest for exceptionally hard
puzzles, I have developed a few additional techniques for dealing
with them.
2.1) On the
notion of a strong T-backdoor and its application to
EasterMonster
introduces the notion of a strong T-backdoor, where T is any
resolution theory, and shows how it can be applied, with the
EasterMonster example.
2.2) The
general zt-ing principle; generalised whips and braids
defines the general zt-ing principle, gives definitions of
zt-whips(FP) and zt-braids(FP) for any family of patterns FP, states
the general zt-braids(FP) vs
T&E(FP) theorem , the confluence property for some
zt-braid(FP) resolution theories and gives classification results
for the hardest known puzzles (gsf's collection) with respect to an
increasing sequence of FP families.
3) RATING RESOLUTION RULES AND PUZZLES; CLASSIFICATION RESULTS
3.1) Rating resolution
rules and puzzles; ordering the rules
introduces several ratings of puzzles, studies their correlation and
justifies the choice of length as a measure of complexity of a chain
rule.
This page is not yet revised and it keeps a strong flavour of its
historical development on the Sudoku Player's Forum.
3.2) Classification
results
gives various classification results, shows that different kinds of
generators of minimal puzzles lead to different statitics (and thus
proves that there is some bias), introduces the notion of a
controlled-bias generator, uses it to produce unbiased statistics.
As a collateral result, gives a close estimate of the real
distribution of minimal puzzles.
3.3) The nrczt-rating of
nrczt-whips as a guide for defining the rating of any chain or
any pattern
How can one rate chains based on subsets (e.g. ALS-chains) with
different lengths and different maximal sizes for the subsets they
use, in a way consistent with the rating of nrczt chains or whips?
Remember first that the complexity of a resolution rule is in its
condition pattern, not in its conclusion.
Remember also that the kind of complexity I am considering is mainly
complexity of resolution rules and that it is extended to complexity
of puzzles in the SER style (= complexity of the most complex rule
necessary to solve it).
The question then is mainly: how can the presence of subsets in a
chain be taken into account in the definition of chain length? For
ALS-chains, e.g., there are two kinds of parameters: chain length
and size of each of the ALSs; how should they be combined?
After all the time and work on the subject of ratings, the answer
now seems very clear to me: as the nrczt-whips can solve almost all
the puzzles (in more than 10 million minimal puzzles randomy
generated by various kinds of generators - bottom-up, top-down,
conrollled-bias - no puzzle was found that couldn't be solved by
nrczt-whips), the NRCZT-rating constitutes an almost universal
rating, to which all the other ratings can be compared
statistically.
It has very good properties, the first three of which are not shared
with the widely used SER:
- pure logic definition,
- full supersymmetry,
- very strong correlation (0.95) with log(number of chains), which
shows that it is statistically a logarithmic measure of complexity,
- strong correlation (0.895) with the well established SER for the
human solvable puzzles, in spite of their a priori very different
definitions,
- almost confluence of the underlying resolution theories
(i.e. almost independence of the implementation); if one wants
strict confluence and indepence, just replace erywwhere in this
section whips by braids and NRCZT by B-NRCZT.
It is therefore justified to take it as a reference for defining
ratings based on different kinds of chains or patterns.
3.3.1) Consider first chains of ALS, AHS and A-Fish.
- define the length of such a chain to be the sum of the sizes of
all its defining subsets, a single being considered as a subset of
length 1 (see remarks below for precisions);
- define the LS-rating of a puzzle as the length of the longest such
chain necessary to solve it.
Remarks:
- the "restricted commons" don't count in the subset sizes, they
more or less play the roles of left-linking candidates,
- for the most classical complementarity reasons: NS(5) = HS(4),
HS(5) = NS(4), NS(6) = HS(3) ... (here, NS= Naked Subset, HS=Hidden
Subset)
- for supersymmetry reasons: for n = 1, 2, 3, 4, NS(n), HS(n) and
SHS(n) (i.e. Fish(n)) are all counted as a NS(n).
This rating is supersymmetric and consistent with the NRCZT-rating:
whenever a chain can be viewed according to the two points of view
(which, according to the general subsumption results, is the case
for almost all the chains of ALS, AHS and A-Fish), its length will
be the same according to the two points of view.
Although there is currently no program computing ratings of
ALS-chains in a way consistent with this definiton, it is almost
certain (due to the subsumption theorems) that this LS-rating is
very strongly correlated with the NRCZT-rating.
3.3.2) Now, consider the more general case of nrczt-whips(Subsets),
as defined in section 2 of the zt-ing
page. In such chains, a naked, hidden and super-hidden (fish)
subset (modulo the target and all the previous right-linking
candidates) can be taken as a right-linking object, in lieu of a
mere candidate.
Define the length of such a whip(Subsets) as the sum of all the
sizes of the right-linking objects (Subsets) it contains (a single
candidate being still considered as a subset of length 1).
Define the NRCZT(Subsets)-rating as the length of the maximal
whip(Subsets) necessary to solve it.
Again, this rating is supersymmetric and consistent: any chain that
can be viewed according to several points of view (whether some
parts of it are considered as subsets modulo the target and the
previous rlc's or as mere nrczt-chains) will have the same ratings
for all these points of view.
We know that, most of the extremely rare puzzles that cannot be
solved with simple whips (fewer than one in 10 millions) can be
solved with zt-whips(Subsets).
It is thus very likely that most of the puzzles that can be solved
by mere nrczt-whips will have the same ratings when they can be
considered according to the two points of view.
The nrczt(Subsets)-rating is thus almost an extension of the
nrczt-rating.
3.3.3) This definition obviously applies to Paul Isaacson's whips
with ALS inserts, which are a secial case of zt-whips(Subsets).
3.3.4) Finally, this definition should apply to Allan Barker's set
cover patterns (networks of 2D-cells, i.e. rc-, rn-, cn- and bn-
cells - "base sets" in set cover terminology, renamed "truths" in
his vocabulary), for which one could take the number of 2D-cells as
the "length" of the pattern, disregarding the number of "cover sets"
(in set cover terminology - renamed "linksets" in Allan's
vocabulary).
This definition of the NRCZT-rating, thus extended to other patterns
in a consistent way, allows comparisons of the complexities of the
solutions obtained with the corresponding patterns.
4) ADDITIONAL EXAMPLES
With a complete listing of the full resolution paths of the 66
hardest puzzles in the famous Magictour/Top1465 collection, we
explore the limits of pure nrczt-whips.
5) Extension of the techniques developed for Sudoku to
the general Constraints Satisfaction Problem (CSP)
Many real world problems, such as resource allocation, temporal
reasoning or scheduling, naturally appear as Constraint Satisfaction
Problems (CSP).
Constraints Satisfaction is a main sub-area of Artificial
Intelligence.
A finite CSP is defined by a finite number of variables with values
in some fixed finite domains and a finite set of constraints (i.e.
of relations they must satisfy); it consists of finding a value for
each of these variables, such that they globally satisfy all the
constraints.
Sudoku is obviously a CSP. The "natural" variables are the Xrc (one
variable for each row r and each column c), with domains the 9
numbers that can a priori occupy these cells; but it is convenient
to add auxiliary variables Xbs, Xrn, Xcn, Xbn to take into account
the natural symmetries and super-symmetries of the problem.
Variants of Sudoku and many other "logic games" are CSPs.
Most of what I've done for Sudoku
can be extended to any finite CSP:
<> the whole general conceptual framework in Multi-Sorted
First Order Logic (MS-FOL);
<> definition of candidates and elaboration of their logical
epistemic status;
<> definition of candidates being linked (nrc-linked in
Sudoku);
<> definitions of a resolution rule, a resolution theory, the
solution of a CSP according to a resolution theory, a resolution
strategy;
<> definition of a chain and a target;
<> definition of a bivalue variable and a bivalue chain (which
corresponds to an xy-chain or to an nrc-chain, depending on which
set of variables we take as primitive);
<> definitions of t-, z- and zt- chains, whips and braids;
<> proof of the zt- chain, whip and braid elimination
theorems;
<> proof of confluence property for naturally defined
zt-braids theories;
<> proof of the T&E vs zt-braids elimination theorem.
Details can be found in two presentations I have done at the
CISSE'08 international conference (http://www.cisse2008online.org):
From Constraints to
Resolution Rules. Part I: Conceptual Framework
From Constraints to Resolution Rules. Part II: Chains,
Braids, Confluence and T&E
<> Moreover, even though the numerical values of the classification results
obtained for Sudoku can obviously not be transferred to any CSP, the
general ideas about top-down, bottom-up and controlled-bias
generators suggest that it may be very difficult to obtain unbiased
statistics for any CSP. See another presentation I have done at the
CISSE'09 international conference (http://www.cisse2009online.org):
Unbiased Statistics of a
CSP - a Controlled-Bias Generator
<> I have now published a whole book about these
generalisations: Constraint Resolution Theories
and a more recent, revised and largely extended version of it: Pattern-Based
Constraint Constraint Satisfaction and Logic Puzzles
in which applications to other logic puzzles (N-Queens, Futoshiki,
Kakuro, Numbrix, Hidato, graph colouring) are also dealt with in
detail.
Home(Denis Berthier)