The Critical Factorization Theorem is a fundamental result in combinatorics on words that relates the presence of local repetitions within a word to the overall periodicity of the word.

The theorem was first proven in 1978 by Césari and Vincent [CV]. For more recent treatments, see [CP] and [HN].

Before stating the theorem, we define some preliminaries.

Definitions

Given a word \(w\), we let \(|w|\) denote the length of the word. If \(i\) is an integer \(1\leq i\leq|w|\), we let \(w[i]\) denote the \(i\)-th letter (i.e. \(i\) is a \(1\)-based index). We say an integer \(d>0\) is a period of \(w\) if \(w[i] = w[i + d]\) for all \(i\) when both sides are defined. Observe that if \(d\) is a period of \(w\), then \(w\) takes the form of multiple copies of a word of length \(d\) concatenated together (possibly with the last copy of the word cut short). Define the global period of \(w\) to be the smallest possible \(d\) that is a period. For example, the word "abababa" of length 7 has global period 2, but 4 and 6 are also periods of the word.

Given a word, we define a point or position \(p\) to be the location between letters that is immediately after the \(p\)-th letter and before the \(p+1\)-th letter. For example, the position \(p=1\) means the location between the first and second letters.

Given a position \(p\) of a word \(w\), we say an integer \(r>0\) is a local period at \(p\) if \(w[i]=w[i+r]\) for all \(i\) with \(p-r+1\leq i \leq p\), and when both are sides defined.

This figure shows examples illustrating the four possible configurations a local period can have, depending on whether the repetition word is truncated at the beginning and/or at the end. In each example, the arrow points to the position \(p\). In the case 1 example (no truncation), the local period at \(p\) is 2 with repetition word "ab." In case 2 (truncation on the left), the local period is 3 with repetition word "abc." In case 3 (truncation on the right), the local period is 4 with repetition word "abcd." Lastly, in case 4 (truncation on the left and right), the local period is 5 with repetition word "abcde."

An equivalent formulation is as follows. An integer \(r>0\) is a local period at \(p\) if there is a word \(u\) of length \(r\) such that \(p\) is both immediately preceded by and immediately followed by \(u\) (but allowing \(u\) to be truncated at the beginning and/or end if the beginning and/or end of \(w\) is reached). In this case, we say \(u\) is a repetition word at \(p\). The figure above illustrates the four possible configurations a local period can have, depending on whether the repetition word is truncated at the beginning and/or at the end. We say \(r\) is the local period at \(p\) if it is the smallest possible \(r\) that is a local period.

It is easy to see that the local period at a point is always less than or equal to the global period. One can also see that the repetition word corresponding to the local period is irreducible in the sense it can't have a prefix that is also a suffix. In other words, the repetition word is a bifix-free word in the local period case. 

Finally, we say that a point \(p\) is a critical point of \(w\) if the local period at \(p\) equals the global period of \(w\). Roughly speaking, a critical point represents a transition to a point in the word of least possible local repetition.

Statement of the Theorem

We can now state the theorem:

Critical Factorization Theorem. Let \(w\) be a word with \(|w|\geq 2\) and global period \(d\). Then there is a critical point \(p\) with \(1\leq p < d\).

One way of looking at the theorem is as follows. Say the global period \(d\) of a word \(w\) is known to be bounded above by some integer \(d^{\prime}\) (i.e. so \(d\leq d^{\prime}\)). For example, \(d^{\prime}\) can be a period that isn't necessarily global, or it can be \(|w|\). Further assume there is an integer \(k\) with \(k<d^{\prime}\) such that the local period of \(w\) at \(p\) is less than or equal to \(k\) for all points \(p<d^{\prime}\). Then the Critical Factorization Theorem says the global period \(d\) satisfies \(d\leq k\). In other words, the theorem lets us "patch together" the local periods to get a global one; we can draw a global conclusion from a local property that holds everywhere.

Example

This figure shows the local period and corresponding repetition word at each position of the word "abaabaa." The first row illustrates how the word has global period 3. The next six rows show the repetition word for the local period at position \(p\), for positions one through six, respectively. They are "ba," "aab," "a," "baa," "aab," and "a." Positions 2, 4, and 5 are the critical points, with the critical point at position 2 the one guaranteed by the Critical Factorization Theorem.

The figure to the right illustrates some of this article's concepts for the word "abaabaa," which is one of the examples in [CP]. The word has global period 3 and critical points at 2, 4, and 5. The critical point at position 2 is the one guaranteed by the Critical Factorization Theorem (since 2 is less than the period 3).

References

  • [CV] Y. Césari and M. Vincent, "Une caractérisation des mots périodiques," Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences, 286 Ser. A, 1978, 1175–1177.
  • [CP] M. Crochemore and D. Perrin, "Two-way string-matching," Journal of the Association for Computing Machinery, Vol. 38, No. 1, July 1991, 651–675.
  • [HN] T. Harju and D. Nowotka, "Density of Critical Factorizations," RAIRO - Theoretical Informatics and Applications, Vol. 36, No. 3, July/Sept 2002, 315-327.