The Hidden Logic of Sudoku
A new conceptual framework
for solving Sudoku puzzles with pure logic
Denis Berthier
First edition: May 2007
Second edition: November 2007
Online Supplements
The Concept of a
Resolution Rule
© Denis
Berthier. All the material in this page and the pages it gives
access to are the property of the author and may not be
re-published or re-posted without his prior written permission.
1) INTRODUCTION: GENERAL FRAMEWORK
The mathematical backgound is fisrt order logic (FOL). For a logic
game, that should not be too surprising.
As there are other approaches to sudoku solving, there can exist
different frameworks (graph theory and colouring problems, exact
cover sets, ...) but I think FOL is the most natural one, the one
closest to what players are used to do. This was already the
inspiring idea in the first edition of my book and it has never
changed: keeping the formal concepts as close as possible to the
natural ones.
In this framework, there is a very limited number of basic concepts:
cell, row, column, block, number, value, candidate. They should not
sound too unfamiliar.
All the other concepts are defined from the above basic ones:
bivalue, bilocation, naked pair, hidden pair, swordfish, ....,
nrc-link, chain, many specific types of chains, .... This allows to
give very clear and non ambiguous definitions (e.g. a degenerated
quad consisting of 2 pairs is not called a quad).
A major aspect of the framework is that all the concepts used are
purely factual: they describe a situation on a grid in terms of what
can be observed. Patterns are observables, they are not defined in
terms of what wants to do with them (as is the case with views
centred on a supposed "inference level").
Axioms: there are only the four axioms that have always been the
Sudoku axioms everywhere in the world since the beginning of Sudoku
and that remain the constraints you can see in any newspaper or any
puzzles book: only one value in each cell, only one occurrence of
each number in each row, column, block.
These axioms satisfy symmetry and super-symmetry relations that can
be exploited in the definition of the resolution rules.
Symmetry can also be exploited if one chooses to use the extended
Sudoku board I have designed with the four rc, rn, cn and bn spaces.
Adopting rules based on the assumption of uniqueness is an option.
For a player, uniqueness can only be an assumption: he has no
control on it. Either he accepts this oracle or he doens't.
2) REMARKS ON FIRST ORDER LOGIC (FOL)
First order logic (FOL), also called "predicate calculus", is the
background of the framework described here. More precisely, I use a
notational variant of FOL, MS-FOL (Multi-Sorted First Order Logic).
If you don't know anything about it, an elementary reference
is: http://en.wikipedia.org/wiki/First-order_logic
First order logic should not be confused with common sense logic.
One thing you should keep in mind is that if something is not
illogical in the commonsense of this word, it doesn't entail it can
be expressed in FOL.
The reason is that FOL has a strict formalism.
A first order theory is restricted to a well defined topic, which
you may discuss only in a well defined language specifically defined
for and adapted to this topic.
In particular, it doesn't include other mathematical theories, such
as number theory or set theory, and FOL doesn't allow arbitrary
mathematical operations.
As Algebra, Differential Geometry or lots of other topics,
Mathematical Logic is a specialised field of mathematics. Most of
the mathematicians are very specialised in their own field and they
don't know as much about other fields. Moreover, generally, they
don't know much about FOL.
The reason is that they work within the framework of "naive set
theory", where they can build sets of subsets, sets of functions and
so on, they can use quantifiers on functions, subsets, subsets of
subsets and so on, and they don't want to be restricted in these
possibilities.
But FOL doesn't allow such constructions.
FOL appears naturally as the proper formalism for Sudoku because
most players are looking for solutions using only "pure logic". FOL
provides the kind of restrictions able to give a precise meaning to
the idea that a solution has to be obtained by "pure logic".
3) RESOLUTION RULES AND RESOLUTION PATHS
Formal definition: A resolution rule is a formula of
first order logic (MS-FOL - or typed predicate calculus), in
language of Sudoku, that satisfies the two conditions:
- it is a logical consequence of the Sudoku axioms,
- it is written in the"condition => action" form, where "action"
can only be the assertion of value(s) and the negation of
candidate(s) in some cells associated with the pattern (the target
cells).
The condition is also called the pattern
of the rule.
Quantifiers can appear in the rule. Any variable appearing in the
action part must already appear in the condition part.
The language of Sudoku is built on a very limited number of concepts
with immediate intuitive meaning: row, column, block, number.
It has a very restricted number of primary predicates and associated
atomic formulæ:
n1 = n2
r1 = r2
c1 = c2
b1 = b2
value(n, r, c)
candidate(n, r, c)
where n, r, c,… are variables (and = is the usual equality sign).
This is a natural modelling choice, justified by the fact that they
constitute the minimum common to all players.
According to Occam's razor principle, any addition to these basic
predicates should be justified by some necessity.
Auxiliary predicates can of course be defined. Formally, they are
just formulæ with free variables.
For instance, share-a-unit(r1, c1, b1, r2, c2, b2) is defined as: r1
= r2 or c1 = c2 or b1 = b2
Other predicates that can be easily defined are:
- bivalue
- bilocation (or conjugate)
- nrc-linked
- nrc-conjugate,…
If the above definition, the condition part (which is also a FOL
formula in itself) describes a possible situation on a grid, e.g. a
Naked-Triplets, an xy4-chain, an nrczt5-chain, … Patterns describe
purely factual situations, they are the observables of Sudoku.
A resolution rule is proven once and for all, and the way it has
been proven, by a valid logical method, has no impact of any kind on
its validity.
In particular, whether or not reasoning by cases has been used in
the proof is totally irrelevant to its validity.
Being logically valid doesn't prevent a rule from being more or less
easy to apply, but this has nothing to do with the way it has been
proven. The problems
- of proving a rule once and for all (generally these proofs are
very straightforward)
- and of spotting the occurences (instantiations) of its defining
pattern on a real grid
are totally independent.
Definition: A resolution path
is a sequence of applications of resolution rules, starting from a
given puzzle and ending, hopefully, in a solution (or in a sate
where no rule can be applied) - with no additional ad hoc pieces of
reasoning. A resolution path is thus a logical, constructive proof of the solution.
Unfortunately, examples given in forums very rarely satisfy this
definition, although they often could.
This is mainly due to:
- ambiguous examples with incomplete resolution paths and inadapted
notations,
- a generalised confusion between the description of a purely
factual situation (e.g. the existence of a general link in my sense,
i.e. a unit being shared between two cells, or of aconjugacy link)
and the way this can be used in an inference step. In this regard,
the notions of strong and weak links are certainly the most
confusing ones that can have been introduced - leading e.g. to
hyper-realistic debates on whether a strong link is also a weak
link.
Definition: a resolution theory
is a set of resolution rules.
Definition: a puzzle is solvable by
a resolution theory T if there is a resolution path using
only rules in T and leading to a solution.
Remarks.
The above definitions don't have to be understood in an integrist
sense - which doesn't mean they must be relaxed:
1) A resolution rule can be written formally in MS-FOL, but it can
also be written in plain English. In this case, to be sure there is
a real unambiguously defind resolution rule behind it, the only
three things you have to be careful about are:
- respect the "condition => action" form;
- in the condition part, or when you define new auxiliary predicates
from the elementary ones (e.g. a new type of chain), use only
boolean combinations and quantifications; always use only precise
factual conditions (and not undefined concepts such as "weak" or
"strong" link);
- in the action part, put only assertion of values and deletion of
candidates; there can be no "or".
2) Logical formulæ can use auxiliary predicates, which can be
given any convenient intuitive form.
The graphical representations I use for chains ARE logical
formulæ - and they are very short.
As an elementary example, here is the formula for an xy-chain[4]:
xy-chain[4] r1c1{1 2} - r1c4{2 3} - r3c5{3 4} - r3c2{4
1}
3) Basic AICs or NLs (what I call nrc-chains) are resolution rules.
Of course, xy, xyt, xyzt, xhy, hxyt, hxyzt, nrc, nrct, nrcz and
nrczt chains are resolution rules.
4) As soon as you consider that you can shift the rows or columns,
you have variables and you are already doing FOL (and not
propositional calculus). There is no difference between Math and
Logic in the use of variables. A variable in a formula is the same
thing as a variable in a polynomial.
5) There is no reason to broaden the definition of a resolution rule
(e.g. by accepting "or statements" in the action part; that would
radically change the framework).
The problem is not to extend the definition but to find additional,
not necessarily formal, conditions a valid resolution rule should
satisfy in order to be useful in practice. See e.g. the next section
about chains.
6) Due to the natural elementary symmetries of Sudoku, this is mere
common sense, but it may be useful to state it once formally: the
statement of a resolution rule should rely on no particular row,
column, block or number. Practical statement: the formula expressing
a chain rule should only have symbols for variables, not for
constants.
4) A FEW OBJECTIVE
PROPERTIES OF CHAINS
The definitions below will be clearer if you have first a quick look
on chains or whips.
Warning: chains should not be confused with chain rules. A chain
rule is merely valid or not valid, which depends neither on the way
it has been proven nor on the properties of the underlying chain
that will be defined below. There's nevertheless an obvious
(syntactic) property of chain rules that should be respected. Due to
the natural elementary symmetries of Sudoku, this is mere common
sense, but it may be useful to state it once formally: the statement
of a chain rule should rely on no particular row, column, block or
number. Practical statement: the formula expressing a chain rule
should only have symbols for variables, not for constants.
A non valid chain rule is merely useless. But, independently of its
logical validity, a valid chain rule can be more a less useful, more
or less easy to apply, more or less acceptable. As these are purely
subjective criteria, they can only lead to confusion or wars of
words.
Here, I shall be concerned only with purely objective properties of
chains (and not of chain rules) that may be relevant to estimate
their usefulness.
Notice that these objective properties can be the subject of much
debate when it comes to subjectively evaluating their impact on
usefulness or acceptability.
4.1 Reversibility
Definitions:
- given a (2D or 3D) chain C, the reversed chain is the chain
obtained by reversing the order of the candidates; in the process,
left (resp. right) linking candidates become right (resp. left)
linking candidates.
- a chain type is called reversible
if for any chain of this type, the reversed chain is of this type.
Theorem: xy-chains, xyz-chains,
nrc-chains, nrcz-chains are reversible.
Proof: obvious.
4.2 Non-anticipativeness
Definition: a given type of (2D or 3D) chain is called non-anticipative if, when you
build a chain of this type from left to right, all that you have to
check to add the next candidate depends only on the previous
candidates (and not on the potential future ones).
Comments:
- this is a very strong criterion for acceptability, from the points
of view of both human players and programmers;
- non-anticipativeness could also be called no-look-ahead;
- the definition does not imply that adding a candidate will always
allow you to finally get a complete chain, but it guarantees that,
up to the new candidate, the partial chain is of the given type.
Theorem: all the (h)xy(z)(t) and
nrc(z)(t) chains/lassos/whips are non-anticipative.
Proof: obvious. Indeed they were defined so as to be
non-anticipative.
Theorem: any reversible chain is
non-anticipative.
Proof: obvious
4.3 Extendability
Definition: a given type of (2D or 3D) chain is called left-extendable if, when you
already have a partial chain of this type, you can add cells or
candidates or another partial chain of the same type not only to its
right but also to its left (of course, respecting the linking
conditions at the junction).
Theorem: (h)xy(z)(t)-chains and nrc(z)(t)-chains are
left-extendable.
Proof: obvious. Indeed they were defined so as to be
non-anticipative.
Theorem: any reversible chain is
left-extendable.
Proof: obvious
Theorem: any non-anticipative chain
is left-extendable.
Proof: obvious
4.3 Composability
Definition: a given type of (2D or 3D) chain is called composable if, when two partial
chains of this type are given, they can be combined into a single
chain of this type (of course, respecting the linking conditions on
left- and right- linking candidates for chains of this type at the
junction and having the same target in case they are built around a
target).
Theorem: all the (h)xy(z)(t) and nrc(z)(t) chains are composable.
This is a very important property of chains. It can be used to build
long chains from shorter ones.
Its practical impact is mainly for chains with the t-extension: when
additional t-candidates are justified by previous right-linking
candidates of a partial chain, they will still be justified by the
same candidates if another partial chain of the same type is added
to its left. Of course, not all chains with the t-extension can be
obtained by combining shorter chains of the same type, but looking
first for chains with short distance t-interactions may be a
valuable strategy.
5) CONFLUENCE, A MAJOR PROPERTY OF RESOLUTION THEORIES
This section is now in a separate page.
Theorem: Braid resolution
theories have the confluence property. Braids are
defined here and the confluence property
of their resolution theories is proven here.
6) RESOLUTION RULE vs RESOLUTION TECHNIQUE vs REPRESENTATION
TECHNIQUE
Whereas I have given the phrase "resolution
rule" a precise, purely logical definition, it is not yet
the case for the word "technique". First, I think two different
kinds of techniques should be defined.
A "resolution technique" is
a procedure that can help get closer to a solution by eliminating
candidates or asserting values.
Some colouring techniques, depending on what one means by this, can
be resolution techniques in this sense.
A "representation technique"
is a representation that can be the support for several resolution
techniques.
Marking conjugate candidates with upper and lower case letters is a
technique in this sense. It can be used for finding Nice Loops or
AICs.
The Extended Sudoku Board introduced in HLS, with its 2D-spaces
(rc-, rn-, cn- and bn- spaces), is a representation techniques in
this sense.
One important notion is that of a resolution technique being
associated with or being the implementation of a resolution rule.
A given resolution rule can, in practice, be applied in various
ways. Said otherwise, different resolution techniques can sometimes
be attached to a single resolution rule
This distinction is important. Forgetting it may lead to the
confusion between an algorithm and a logical formula - two notions
that do not belong to the same universe of discourse (Computer
Science vs Logic).
A formal definition is the following:
A resolution technique is the implementation of a resolution rule if
for any partially filled grid with candidates, applying the
technique to this grid has the same final effect (values asserted
and candidates deleted) as what can be logically concluded by
applying the resolution rule.
Validity of a resolution rule is based on logic. Validity of a
resolution technique is based either on its conformity with the
underlying resolution rule if there is one OR on the algorithm
describing it.
For a technique that is not based on a resolution rule, it may be
very difficult to guarantee that it doesn't amount to a form of
trial and error.
Most, but not all, of the usual techniques could be re-written as
resolution rules.
7)
RECURSIVE-TRIAL-AND-ERROR-WITH GUESSING (RTEG)
Definition: RTEG is the any structured search procedure (e.g.
depth-first or breadth-first): recursively make ad hoc hypotheses on
values that you do not know to be true at the time you make them,
write all their consequences (candidates eliminated and values
added), with the possibility of retracting each of these hypotheses
and their consequences if they lead to a contradiction; accept the
solution if one is found (this is the "guessing" part of this
procedure).
RTEG is a resolution technique in the above sense.
In fact, RTEG is a family of resolution techniques, depending on how
you write "all the consequences" of an hypothesis, i.e. on which
resolution theory T (i.e. on which family T of resolution rules) you
adopt for this. This is not really important, because such
consequences are only used to prune the tree of possibilities, i.e.
to restrict the search.
There may also be variants of RTEG depending on how many solutions
you want in case the puzzle has several: this will be defined by the
exit conditions of the algorithm.
Being an algorithm, Recursive-Trial-and-Error-with-guessing can
obviously not be a resolution rule: the two pertain to different
domains of speech (algorithmic vs logic). But we have the stronger:
RTEG theorem: RTEG is a resolution technique that
cannot be the implementation of any resolution rule.
Proof: since RTEG can be applied any time a cell has at least two
candidates, if it was expressible as a resolution rule, the
condition of this rule could only be:
"if a cell has at least two candidates".
But it is obvious that, from such a condition, no general conclusion
can be obtained (no value can be asserted and no candidate can be
eliminated).
Here is a stronger version and an alternative proof.
RTEG theorem (stronger version): RTEG is a resolution technique that cannot be the
implementation of any set of resolution rules.
Proof: if a puzzle has more than one solution, RTEG is guaranteed to
find one (or several or all, depending on the exit condition we put
on the RTEG algorithm - again, RTEG is a family of algorithms, and
the theorem applies to any variant).
On the contrary, as a set S of resolution rules can only lead to
conclusions that are logical consequences of the axioms and the
entries (i.e. that are true in all its models - all the solutions),
if a puzzle has several solutions, S cannot find any.
E.g. if there are two solutions such that r1c1 is 1 in the first and
2 in the second, S cannot prove that r1c1=1 (nor that r1c1=2). It
can therefore find none of these solutions.
q.e.d.
Remark: Notice that another
procedure (non-recursive Trial-and-Error with no guessing)
corresponding to the standard notion of T&E is defined here and shown to be representable by
braids.
8) FAMILIES OF RESOLUTION RULES
First, let me state that I am only speaking here of resolution rules
and not of resolution techniques.
Moreover, I am considering only rules that can be considered as
first order, in conformance with my general framework, i.e. whose
logical formulation does not require quantifiers on subsets (such as
the cover set approach).
Resolution rules can be classified into various large families. This
classification is not exhaustive, new families might be discovered.
But it is useful to understand that there are several very different
types.
In my book and in SudoRules, I have mainly been considering only
four large families of resolution rules:
1) the elementary constraints propagation rules or ECP (direct
contradictions along rows, columns and blocks),
2) the subset rules: Naked, Hidden and Super-Hidden (i.e. Fish)
versions of Singles, Pairs, Triplets, Quads,
3) the elementary interaction rules (between rows and blocks and
between columns and blocks): BI,
4) the xy-to-nrczt (xy, hxy, xyt, hxyt, xyz, hxyz, xyzt, hxyzt, nrc,
nrct, nrcz, nrczt) family of chain rules.
I have shown the unity of the subset rules through symmetry and
super-symmetry.
I have also shown why all the rules in the xy-to-nrczt family can be
considered as generalisations of the basic xy-chain rule. This
includes nrc-chains, equivalent to the basic NLs or AICs (basic
meaning "with no ALSs"). Although nrczt-chains subsume the whole
family, the other chains are easier to find and should therefore not
be forgotten. This family is thus organised in a pedagogical
hierarchy.
In my book, I have also mentionned that the subset rules are
extensions of the xy2 to xy4 patterns.
Moreover, most (but not all) cases of the subset patterns are
special cases of nrczt-chains (see the subsumption
page)
This shows that families 2 and 4 are very closely related.
What I want to stress now is that, in spite of being so close,
families 2 and 4 require very different forms of reasoning:
- the standard proofs of the subset rules require considering the
pattern globally; they don't proceed from one cell to the next;
- the proofs of the xy-to-nrczt rules follow the linear chain
structure in such a way that each step doesn't anticipate on the
future elements of the chain; I consider this non-anticipativeness
as a key property of chains; it is what makes them useable in
practice.
As they correspond to different forms of reasoning, the subsets and
the nrczt-chains could be subsumed under a common pattern only if we
accepted chains that anticipe on their future elements (which is an
abomination from the point of view of information theory) OR if we
define a new type of pattern (see the next section).
The rules in the above four families are enough to solve almost any
randomly generated puzzle (indeed, in several tens of thousands of
randomly generated puzzles, I have met none that they could not
solve).
From experiments with hundreds of puzzles taken from various forums,
it appears that they are enough to solve puzzles upto SER = 9.3. In
particular, they were used daily by human players on the French
sudoku-factory.com forum (before it suddenly disappeared) to solve
puzzles at SER 9.0 to 9.3. They can solve any puzzle that any expert
human player can solve and probably even much more.
All this shows that, in spite of the simplicity of the general
framework, it allows to define simple and very powerful resolution
rules.
It is important to recall this, because the next section will
discuss more complex patterns and I want to make it clear that such
patterns, especially braids, are not necessary but for exceptionally
hard puzzles.
9) MORE COMPLEX FAMILIES OF RESOLUTION RULES
Considering the current interest for exceptional puzzles, I have
extended my set of resolution rules in two directions.
1) Firstly, I have generalised my idea of additional z- and t-
candidates and I have introduced a very general principle, zt-ing, that allows to define new
patterns and associated new chain rules from any family FP of basic
patterns: zt-whip(FP) and zt-braid(FP). zt-whipping or zt-braiding
is a general method (more general than the classical almost-ing) for
including in chains/whips/braids any pattern having an associated
resolution rule.
In essence, the generalisation to zt-whip(FP) consists of allowing
the right-linking candidates of a whip to be whole patterns instead
of mere candidates.
If one takes FP=ECP+NS+HS, one gets the ordinary nrczt-whips.
If one takes FP=ECP+NS+HS+BI, one gets the grouped or hinged
zt-whips.
If one takes FP = ECP+NS+HS+BI+SubsetRules, one gets a new family of
chain patterns, whip(ECP+NS+HS+BI+SubsetRules), more general than
nrczt-chains, AICs with ALSs and than their grouped or hinged
counterparts. Indeed, these whips contain any AAAALS, AHHHHS and any
AAAAAA-Fish, with as many A's or H's as one wants.
These patterns are chains, in exactly the same sense as any
nrczt-chain or any AIC. They can be considered as defining a new set
of levels in the hierarchy defined by family 4.
Such generalisations of nrczt-chains have also already been used in
the late sudoku-factory forum - although in these puzzles ordinary
nrczt-whips were enough.
2) Secondly, with the introduction of nrczt-braids, I have added a
fifth family, composed of a (relatively mild) kind of nets, still
with some linear structure. See the page on braids.
3) The above two extensions can be combined, leading to
zt-braids(FP) for any family of patterns with associated resolution
rules.
I have also shown the close relationship between braids and
Trial-and-Error (T&E) - the standard T&E procedure (which is
not itself a resolution rule but only a "resolution technique") with
no guessing and no recursion, not to be confused with the RTE
procedure:
Theorem: for any resolution theory
T based on a family FP of elementary patterns, any elimination
that can be done by T&E(T) can be done by some zt-braid(FP).
See the page on braids.
For players who use T&E for hard puzzles, this should soften
their pain when they read that T&E is the abomination. The only
thing that remains the abomination in this framework is guessing.
All the above patterns still have two very important properties,
helpful for finding them:
- they are non-anticipative, i.e. the validity of a partial
whip/braid depends only on the previous candidates (and it can
therefore be checked on-the-fly),
- they are composable, i.e. partial whips/braids can be linkd
together to make longer ones, with obvious compatibility conditions
at the junction.
Home(The Hidden Logic of Sudoku)
Home(Denis
Berthier)