Counting
strings that contain intervals of same letter repetitions
Description of the problem
Today, I will look at a simple combinatorics problem that came up in
my discrete math class.
Assume we have an alphabet of size . How many different strings of length
can I construct in which the
intervals of same letter repetitions have length at most ?
For example if and the strings “ababa” “aabaa” “aabba”
are ok but “aaaab” is not because the upper limit for lengths of
intervals of same letter repetitions was .
A solution
First, I need to determine the shape of the string. For that I need
to count the number of solutions for the equation The number of
solutions of this equation counts the number of strings of the shape
where . If is the number of solutions for
the equation above, then the number of strings of this shape is Then the number of strings I want
is
I will describe the function recursively. We have a simple
recurrence relation of the form with the base cases The recursive equation comes
from the following argument: in the equation if we reduce every
term by one, the upper bound is
also reduced by one and we get Now, we have to control the terms which are equal
to and we go over all
possibilities in which ’s are 0
in the sum.
There is also a functional relation of the form because the set of solutions to the equation
with the constraint is bijective to the set
of solutions of the equation with the constraint since every in the first equation has a unique
conjugate in the second equation. By increasing the elements by 1 we get with
the constraint .
An implementation
I will start by implementing a function for the binomial numbers.
(defun binom (n k)
(cond ((< (- n k) k) (binom n (- n k)))
((= k 0) 1)
((= k 1) n)
(t (reduce '* (loop for i from 0 to (1- k)
collect (/ (- n i)
(- k i)))))))
BINOM
Here is a simple implementation for the number of solutions for the
integer-linear equation I was considering earlier with built-in
memoization:
(let ((lookup (make-hash-table :test 'equal)))
(defun solutions (n m k)
(cond ((or (<= k 0) (<= m 0) (<= n 0) (> n (* m k))) 0)
((= m 1) (if (< k n) 0 1))
((= m n) 1)
((= n (* m k)) 1)
((> n (- (* m (1+ k)) n)) (solutions (- (* m (1+ k)) n) m k))
(t (or (gethash (list n m k) lookup)
(setf (gethash (list n m k) lookup)
(loop for i from 0 to (1- m)
sum (* (binom m i)
(solutions (- n m) (- m i) (1- k))))))))))
SOLUTIONS
And finally the number of words
(defun count-solutions (n k r)
(loop for m from (truncate (/ n k)) to n
sum (* r
(solutions n m k)
(expt (1- r) (1- m)))))
COUNT-SOLUTIONS
The following is a test function.
(defun test-solutions (a b)
(loop for i from 1 to a collect (count-solutions a i b)))
TEST-SOLUTIONS
For the list of bit strings
of length 3 are
101 010
001 110 100 011
000 111
grouped together according to the largest repeating intervals. For
the list is
1010 0101
0010 1101 0100 1011 0011 1100 0110 1001
0001 1110 1000 0111
0000 1111
And finally for
10101 01010
00101 11010 10100 01011 00110 11001 01100 10011 10010 01101 01001 10110 00100 11011
00010 11101 01000 10111 10001 01110 00111 11000 11100 00011
00001 11110 10000 01111
00000 11111
The numbers below confirm these results.
(test-solutions 3 2)
(2 6 8)
(test-solutions 4 2)
(2 10 14 16)
(test-solutions 5 2)
(2 16 26 30 32)
The following is the number of strings of length 50 using only the
letters ‘a’, 'c’, 'g’ and ‘t’ that contain same letter intervals of
length at most 5.
(count-solutions 50 5 4)
1226096552838244254073677630864
This is approximately 96.72 percent of all words of length 50 using
the alphabet “a,c,g,t”,
Older Posts
[2025-02-23] Counting Matroids
[2025-02-12] Sampling from
a Random Variable
[2025-01-29] Markov Numbers
[2024-12-24] Number of isomorphism classes
of simple graph (continued)
[2024-12-22] Counting Isomorphism
Classes of Graphs
[2024-11-25] Connected Components
of Graphs
[2024-11-24] Counting connected
components of a graph
[2024-11-18] Counting Isomorphism
Classes of m-ary Trees
[2024-11-16] Number of
Isomorphism Classes of Ternary Trees
[2024-11-12] Hosoya Index of Balanced
Binary Trees
[2024-11-11] Hosoya Index of a
Graph
[2024-10-29] The Clique Number of a Simple
Graph
[2024-10-28] The Size of
Maximally Independent Subsets in a Graph
[2023-11-03] Graph
Algorithms in JGraphT with Common Lisp
[2023-10-28] An
Implementation of Pandas’ cut
and qcut
in
Lisp
[2023-07-24] A
Collatz-like Conjecture for the Projective Line
[2023-03-06] Twin
Primes, Cousin Primes, Sexy Primes, and Prime Triplets
[2023-03-02] Set
of All Partitions of a Finite Set
[2023-02-14] Non-crossing
Partitions and Dyck Words
[2023-02-13] Non-crossing
Linear Chords
[2023-02-04] Clojure/Python
Interop Examples
[2023-01-14] Graph
Algorithms in Clojure with JGraphT
[2022-03-29] 2D-Random Walk
[2022-03-28] Trade
Deficit vs Exchange Rate Curve
[2022-03-16] Working
with World Bank Data in Python
[2022-03-09] Working
with European Central Bank data in python (revisited)
[2022-01-24] A
Clique Analysis of Quakers in early modern Britain (1500-1700)
[2021-12-05] Boyer–Moore
and Misra-Gries Algorithms in Clojure
[2021-09-12] Tension in Text
Plotted
[2021-09-02] Statistical
Distributions using Apache Commons Math in Clojure
[2021-08-31] Reduce
with Intermediate Results in Common Lisp
[2021-08-21] Multivariate
Regression Implemented in Clojure
[2021-05-29] Using
Neural Networks to Detect Graph Properties
[2021-04-17] Fast
Null-Space Calculation via LU-Decomposition
[2021-02-24] Stoer-Wagner
Algorithm in Clojure
[2021-02-19] Calculating
Vertex Covers in Clojure
[2021-02-18] Listing All
Paths in a Graph
[2021-02-14] Strict
Dyck Words and Fibonacci Numbers
[2021-02-14] Kruskal’s
Algorithm in Common Lisp
[2021-02-13] Kruskal's
Algorithm Implemented in Clojure
[2021-02-10] An
integer dynamical system of integers
[2021-02-08] Binary
Symmetrization
[2021-01-28] Prüfer
Encoding and Decoding of a Tree in Clojure
[2021-01-27] Counting
Cycle-Free Paths in a Graph
[2021-01-27] Counting
Connected Labeled Graphs
[2020-12-18] Counting
Graphs with a Prescribed Degree Sequence
[2020-12-13] Havel–Hakimi
Algorithm in Clojure
[2020-12-12] Havel–Hakimi
Algorithm in Common-Lisp
[2020-10-23] The Quadratic
Casimir Element
[2020-07-04] Collatz Sequence
in Binary
[2020-07-02] A Lazy
Sequence of Primes in Clojure
[2020-06-10] Yet
Another Fizz-Buzz in Common Lisp
[2020-05-12] ECB
Data with Clojure and Vega-Lite
[2020-05-06] Processing
ECB Data with Common Lisp
[2020-04-17] Next
Permutation in the Lexicographical Ordering
[2020-04-13] Turkish
Hyphenation in Common Lisp
[2020-04-01] Using JavaPlex
with Clojure
[2019-11-05] Constricted
Arithmetic Progressions
[2019-11-03] The
Number of Arithmetic Progressions of Integers
[2019-05-06] Bron-Kerbosch
Algorithm in Clojure
[2019-05-01] An
Implementation of Ford-Fulkerson Algorithm in Clojure
[2019-04-22] Document
Summarization via Nonnegative Matrix Factorization
[2019-04-20] Latent
Semantic Analysis in Clojure
[2019-04-13] K-Nearest
Neighbors Algorithm in Clojure
[2019-04-06] K-Means
Implemented in Clojure
[2019-03-19] Prüfer
Encoding/Decoding of a Tree in Common Lisp
[2019-03-05] Gale-Shaply
Algorithm in Common Lisp
[2019-03-02] Calculating
The Correct Rank of a Matrix
[2018-12-04] Feed-forward
and back-propagation in neural networks as left- and right-fold
[2018-10-31] Nonnegative
Matrix Decomposition in Clojure
[2018-10-30] Non-negative
Matrix Decomposition in Scala
[2018-08-30] Working
with European Central Bank Data in Scala
[2018-07-30] Perverse
Sequences
[2018-05-28] Online
Perceptron in Common Lisp
[2018-05-26] Online Perceptron
[2018-05-18] Online Regression
[2018-05-06] Knut’s
Algorithm-S in Common Lisp
[2018-02-28] Irreducible Dyck
Words
[2018-02-19] Optimization
with GNU Scientific Library for Lisp
[2018-02-10] Van Eck’s
Sequence
[2018-02-09] Hiring
networks in mathematics
[2018-02-08] Linus Sequence
[2018-02-05] Egyptian
Fractions
[2018-02-01] Listing all
Young Tableaux
[2018-01-23] Collatz sequence
(yet again)
[2018-01-15] Hofstadter's Q
sequence
[2018-01-09] Farey Sequence
[2018-01-09] Catalan's
Triangle
[2018-01-06] The
Shoelace Formula for the Area of a Polygon
[2017-10-01] Working
with European Central Bank Data in Python
[2017-09-27] Expected
Value of the Diameter of a Tree
[2017-09-26] Using
Quandl with kixi.stats on Clojure
[2017-09-22] Using Quandl
with Common Lisp
[2017-08-05] Solving
Linear Equations in Natural Numbers
[2017-07-31] Transitive
Closure of a Directed Graph or a Relation
[2017-07-20] Steenrod-Milnor
and Tournament Sequences
[2017-07-15] A
lower bound on the radius of a graph
[2017-07-08] All partitions
of an integer
[2017-07-06] Some Hasse
Diagrams
[2017-07-04] Shuffles
[2017-07-03] Kaprekar Sequence
[2017-07-01] Lattice of Dyck
Words
[2017-06-28] The
poset of connected subgraphs of a connected graph
[2017-06-21] Calculating
the Diameter and the Radius of a Graph Using Tropic Linear
Algebra
[2017-06-19] Generating
random regular graphs
[2017-06-14] Estimating
the maximum element of a large list
[2017-06-09] A
Stochastic Gradient Descent Implementation in Clojure
[2017-06-06] A topology
problem
[2017-04-22] Listing duplicate
files
[2017-03-14] My First Idris
Proof
[2016-12-02] Distinguishing
hash functions (part II)
[2016-12-01] Distinguishing
hash functions
[2016-10-20] Hofstadter-Conway
$10,000 sequence
[2016-08-18] A
Solution for Problem 171 of 4Clojure
[2016-08-16] Puzzles and Group
Theory
[2016-08-13] Using Weka within
Lisp
[2016-07-12] Funniest
and Unfunniest Jokes in the Jester Dataset
[2016-07-05] Generating
Uniformly Random Connected Graphs
[2016-06-16] The
Robinson-Schensted Algorithm
[2016-06-01] Conjugate
Partitions
[2016-04-27] Using Word2Vec
from Clojure
[2016-04-24] Using
Word2Vec from Common Lisp
[2016-04-18] A Migration
Analysis
[2016-04-11] Basic
Data Analysis with CL without Frameworks
[2016-03-25] Parallel
map-reduce in Common Lisp
[2016-02-22] Text
Summarization and Topic Analysis
[2016-01-27] Set Covering
Problem
[2016-01-25] Kolmogorov-Smirnov
Test
[2016-01-20] Eigen-values
of the Laplacian and Connected Components of a Graph
[2015-12-12] Dual Graphs
[2015-10-26] Longest
Increasing Subsequence Revisited
[2015-10-16] Document
Summarization via Markov Chains
[2015-10-07] Computational
Literary Analysis
[2015-09-30] Library of
Babel in Common Lisp
[2015-09-28] Merging
Association Lists in Common Lisp
[2015-07-22] Cheapest
Paths via Tropic Matrices
[2015-07-21] Hidden
Markov Models via Tropic Matrices
[2015-07-08] A non-technical
post
[2015-06-28] An
implementation of the Viterbi algorithm in Common Lisp
[2015-05-28] Greatest
Common Divisor of Two Rational Numbers
[2015-05-21] Partitions
of Equal Measure Whatever the Measure May Be
[2015-05-14] Finding Cliques
in a Graph
[2015-05-12] Set Cover Problem
[2015-05-03] Threading
Macros in Common Lisp
[2015-05-03] Happy Numbers
[2015-05-01] Collatz Primes
[2015-04-23] Splitting Streams
[2015-04-06] Hamming
Distance and Double Hashing
[2015-04-05] Hamming
Distance and Hashing Functions
[2015-04-05] Hamming
Derivative of Hashing Functions
[2015-04-02] A Topology
Problem
[2015-03-21] Curve
Fitting is a Gram-Schmidt Reduction
[2015-03-08] Maximum
number of characters using keystrokes A, Ctrl+A, Ctrl+C and
Ctrl+V
[2015-03-06] Eccentricity,
Radius and Diameter in a Graph, Revisited
[2015-03-01] Graphs and
Entropy
[2015-02-22] Math PhD
Hiring Network (Part 3)
[2015-02-19] Math PhD
Hiring Network (Part 2)
[2015-02-18] Math PhD
Hiring Network (Part 1)
[2015-02-17] Faculty
Networks and Inequality in Hiring Practices in Universities
[2015-02-10] Functional
Streams in Lisp Explained
[2015-02-05] Collatz-type
Conjectures (Continued)
[2015-02-04] Collatz-type
Conjectures (Continued)
[2015-01-31] Collatz-type
Conjectures (Continued)
[2015-01-30] Collatz-type
Conjectures
[2015-01-28] Experiments
with Infinite Recursive Sequences (continued)
[2015-01-17] Experiments
with Infinite Recursive Sequences
[2015-01-10] Goldbach Pairs
[2015-01-02] Collatz Lengths
(Continued)
[2015-01-01] Functional
Streams
[2014-12-27] Polarization
in the US Congress
[2014-12-18] Partition a
sequence
[2014-11-28] Uniformly
Random Permutations
[2014-11-22] An
Implementation of Ford-Fulkerson Algorithm in Common Lisp
[2014-11-17] Tropic
Calculation of Cheapest Paths
[2014-11-05] Longest
common subsequence of two sequences
[2014-10-30] Counting
Spanning Trees of a Graph
[2014-10-26] Longest
Increasing Subsequence
[2014-10-24] The
Number of Inversions in a Sequence
[2014-10-22] Hashes and
Entropy
[2014-10-09] Estimating
Cardinality with Constant Memory Complexity
[2014-09-30] Landau's Function
[2014-09-29] A
Problem on Substitution Ciphers and Group Theory
[2014-09-28] A Morse Code
Translator
[2014-09-23] A
Memoization Macro for Common Lisp
[2014-09-21] Reducers are
Monoid Morphisms
[2014-09-18] Number
of isomorphism classes of binary trees
[2014-09-07] CONS is your
friend
[2014-08-22] A Zipf's Law
Simulation
[2014-08-07] Generating
Uniformly Random Trees
[2014-07-09] A Solution
for Project Euler #463
[2014-06-12] Entropy of
truncated MD5 hashing
[2014-06-08] Hexadecimal digits
of π
[2014-02-11] Information
content of n-grams
[2014-02-08] Turkish
Sentiment Analysis Using Thesaurus Distance
[2014-02-01] Sentiment
analysis using word distances
[2014-01-27] Phase
transitions in entropy
[2013-12-13] Optimal length of
n-grams
[2013-12-10] Counting
strings that contain intervals of same letter repetitions
[2013-12-02] Patterns
Separating Large Texts
[2013-11-23] Collatz
Sequences (Continued)
[2013-11-11] Entropy
and approximately one-to-one maps
[2013-10-23] Tree Isomorphism
[2013-10-15] Self Organizing
Maps
[2013-09-15] Euler Project
#401
[2013-09-15] An
additively recursive definition of the Moebius function
[2013-09-11] An
Unsuccessful Attempt for Solving Euler Project #401
[2013-09-04] Uniform
Sampling from Parametrized Submanifolds in Scala
[2013-09-04] Uniform
Sampling from Parametrized Submanifolds
[2013-08-30] Randomly
Generated Points Obeying a Distribution
[2013-08-25] Simulated
Annealing in Lisp
[2013-08-21] Eigenvalues
and Eigenvectors in GSLL
[2013-08-16] Reservoir
Sampling
[2013-08-11] Gibbs
sampling in lisp compared with C
[2013-08-10] Logistic
Regression in lisp
[2013-08-10] Linear
Discriminant Analysis in R
[2013-07-17] A
Gradient Descent Implementation in Lisp
[2013-07-01] k-Nearest
Neighbor Classification Algorithm Implemented in Lisp
[2013-05-19] Newton-Raphson
Method
[2013-05-07] Levenshtein
Distance
[2013-04-15] Cut points in a
graph
[2013-04-01] Experiments
on Collatz Lengths (Continued)
[2013-02-18] The
sound of the torsion parts of homotopy groups of spheres
[2013-02-12] Monadic Units
[2013-02-07] Distribution
of Collatz Lengths (continued)
[2013-02-03] Distribution
of Collatz Lengths
[2013-01-31] Quotients
of polynomial algebras
[2013-01-12] Path ideals
[2013-01-10] McCarthy91
Terminates
[2013-01-09] Finding
all paths in a directed graph
[2013-01-04] A
Simple Monte-Carlo Integration Implementation in Lisp
[2012-12-30] A
simple problem in Kolmogorov-Chaitin complexity
[2012-12-29] From walks to
paths
[2012-12-16] Higher
order functions, functors and monads
[2012-12-13] Eccentricity,
Radius and Diameter in an Undirected Graph
[2012-11-29] Untitled
[2012-11-25] Strictly
Increasing Labels of Directed Graphs
[2012-11-19] Strictly
Increasing Labellings of Directed Graphs
[2012-11-17] Nilpotent
elements in an artinian algebra
[2012-11-04] Local
rings, idempotents and non-invertible elements
[2012-10-18] An
implementation of the fixed-radius near neighbor clustering algorithm in
lisp
[2012-10-15] Reducing directed
graphs
[2012-10-10] An
implementation of the k-means
clustering algorithm in lisp
[2012-10-08] A
comparison of different map functions in lisp
[2012-10-03] Source code
entropy
[2012-09-28] Collisions in
random walks
[2012-09-26] Transitive
closure of a directed graph
[2012-09-26] Solving
linear equations in ℕ
[2012-09-26] Listing
partitions
[2012-09-26] Inverting
formal power series
[2012-09-26] Hasse
subgraph of a directed graph