Puzzle Me This: String Replacement

If you love numbers, logic and puzzles, this one is for you.

Itzik Ben-Gan

April 9, 2015

12 Min Read
Puzzle Me This: String Replacement

My dad, Gabriel Ben-Gan, passed away recently. He loved numbers, logic and puzzles, and used to solve problems in his own unique way. This article is about a puzzle that incorporates the above ingredients. Dad, this one’s for you, and is in your memory.

I’d like to thank David Shipley, who originally sent me this beautiful puzzle. The problem is pretty simple, but the solution requires some creativity.

As part of the solution to the puzzle I use an auxiliary table called dbo.Nums, which contains a sequence of integers starting with 1 in a column called n. You can find this auxiliary table in the sample database TSQLV3. You can download the source code to create this sample database here.

The Puzzle

The puzzle is about coping with typographical errors (typos) when a keyword is entered for search purposes.

You get as inputs three strings:

1. A keyword that the user entered for search (@keyword)

2. A source word (@src)

3. A synonym for the source word (@synonym)

Here’s an example for possible input values:

DECLARE

@keyword AS NVARCHAR(1000) = N'wwAMxxAMyyAMzz',

@src AS NVARCHAR(1000) = N'AM',

@synonym AS NVARCHAR(1000) = N'AN';

Your task is to generate all possible permutations of the input @keyword where each occurrence of @src is either replaced with @synonym or not. For example, for the above inputs the desired output is:

wwAMxxAMyyAMzz

wwANxxAMyyAMzz

wwAMxxANyyAMzz

wwANxxANyyAMzz

wwAMxxAMyyANzz

wwANxxAMyyANzz

wwAMxxANyyANzz

wwANxxANyyANzz

To work…

Step 1: Generate permutations

The solution to this puzzle can be broken into five main steps.

The first step is to return a sequence of integers in the range 0 through num_permutations - 1, where num_permutations represents the number of permutations of @keyword that need to be created. The num_permutations value can be computed as 2^num_ocurrences, where num_occurrences is the number of occurrences of @src in @keyword. With our sample input values, @src is AM and @keyword is wwAMxxAMyyAMzz, so num_ocurrences = 3, and therefore num_permutations = 8.

If you’re wondering, “What’s the logic behind computing the number of permutations?” think of the template for the permutations as having placeholders for all occurrences of @src in @keyword: ww?xx?yy?zz. Each placeholder is like a bit in an integer bitmap. When the bit is not set, you replace the placeholder with @src; when it’s set you replace it with @synonym. With three bits, the possible integers fall into the range 0 through 7 (2^3 - 1). The bitmaps of the integers in this range represent the permutations of the aforementioned template that you need to create.

The following expression computes num_ocurrences:

(LEN(@keyword) - LEN(REPLACE(@keyword, @src, N''))) / LEN(@src)

The following query generates the integers whose bitmaps represent the permutations that you need to create:

SELECT n-1 AS permutation
FROM TSQLV3.dbo.Nums
WHERE n <= POWER(2, (LEN(@keyword) - LEN(REPLACE(@keyword, @src, N''))) / LEN(@src));

Table 1 shows the output of this query with the bitmap representation of the integers.

Table 1: Permutations
permutation binperm
------------- --------
0 000
1 001
2 010
3 011
4 100
5 101
6 110
7 111

 

Step 2: Compute injection positions

Call the result of the first step P. The second step creates a row for every combination of permutation in P and number n in the range 1 through num_ocurrences. Each occurrence of a @src element in @keyword may be preceded by some non-@src element and may be followed by some non-@src element. So, for every n in the range 1 through num_ocurrences we can think of the positions of the @src elements among the @src and non-@src elements as being 2 * n. For example, in our input @keyword we have the following elements:

1 - ww

2 - AM

3 - xx

4 - AM

5 - yy

6 - AM

7 - zz

As you can see, the three occurrences of @src are in element positions 2, 4 and 6. If a non-@src element doesn’t exist before or after a @src element, you can think of it as existing but being an empty string.

As mentioned, the second step of our solution generates a row for every permutation in P and occurrence n of @src in @keyword. You compute the element position of the replacement element (call it pos) as 2 * n. Then depending on the state of the nth bit in permutation, the replacement element is either @src (bit isn’t set) or @synonym (bit is set). The code in Listing 2 implements the second step of the solution.

[Listing 1: Step 2 - Compute injection positions]

WITH P AS

(

SELECT n-1 AS permutation

FROM TSQLV3.dbo.Nums

WHERE n <= POWER(2, (LEN(@keyword) - LEN(REPLACE(@keyword, @src, N''))) / LEN(@src))

)

SELECT *

FROM P

CROSS APPLY

(

SELECT N.n, N.n * 2 AS pos, -- * 2 to account for a preceeding element

CASE permutation & POWER(2, N.n - 1)

WHEN 0 THEN @src

ELSE @synonym

END AS element

FROM TSQLV3.dbo.Nums AS N

WHERE N.n <= (LEN(@keyword) - LEN(REPLACE(@keyword, @src, N''))) / LEN(@src)

) AS A

ORDER BY permutation, pos, element;

The code in Listing 1 defines a CTE called P based on the query that implements the first step in the solution. The outer query against P uses the CROSS APPLY operator to create for every permutation and occurrence of @src in @keyword the replacement elements. The APPLY operator applies a query against Nums (call it N). As mentioned, for occurrence N.n, the target element position (pos) is N.n * 2. To know whether to use @src or @sysnonym as the target element, you need to check whether the nth bit is set. You do so with the following expression:

permutation & POWER(2, N.n - 1)

You then use a CASE expression to return either @src or @synonym based on the result of this expression.

The output of the code in Listing 2 with our sample input values is shown in Table 2 in abbreviate form.

[Table 2: Injection positions]

Table 2: Injection positionspermutation binperm n   pos         element----------- ------- --- ----------- -------0           000     1   2           AM0           000     2   4           AM0           000     3   6           AM1           001     1   2           AN1           001     2   4           AM1           001     3   6           AM2           010     1   2           AM2           010     2   4           AN2           010     3   6           AM

Again, I added a column called binperm that shows the bitmap representation of the permutation value.

Step 3: Split original elements

The third step creates the set of elements that precede or follow the @src elements in @keyword. It also computes corresponding element positions, leaving room for the replacement elements to be injected. To achieve this you use classic string-splitting logic based on an auxiliary table of numbers (you can find coverage of such string splitting solutions here and here). In this case, you split the string stored in @keyword, using @src as the separator. The following code implements the third step in the solution:

SELECT (ROW_NUMBER() OVER(ORDER BY n) - 1) * 2 + 1 AS pos,

SUBSTRING(@keyword, n, CHARINDEX(@src, @keyword + @src, n) - n) AS element

FROM TSQLV3.dbo.Nums
WHERE n <= LEN(@keyword) + 1

AND SUBSTRING(@src + @keyword, n, LEN(@src)) = @src;

This code, applied to our sample input values, generates the output shown in Table 3.

[Table 3: Split string]

Table 3: Split stringpos                  element-------------------- --------1                    ww3                    xx5                    yy7                    zz

Step 4: Unifies original elements and elements to inject

The fourth step simply combines the result of step 2 (replacement elements to inject) with the result of step 3 (original elements). The code in Listing 4 implements this step.

[Listing 2: Step 4 - Combine steps 2 and 3]

WITH P AS -- Permutations

(

SELECT n-1 AS permutation

FROM TSQLV3.dbo.Nums

WHERE n < =

POWER(2, (LEN(@keyword) - LEN(REPLACE(@keyword, @src, N'')))

/ LEN(@src))

),

E AS -- Orignal split elements minus @src occurrences

(

SELECT (ROW_NUMBER() OVER(ORDER BY n) - 1) * 2 + 1 AS pos,

SUBSTRING(@keyword, n,

CHARINDEX(@src, @keyword + @src, n) - n) AS element

FROM TSQLV3.dbo.Nums

WHERE n <= LEN(@keyword) + 1

AND SUBSTRING(@src + @keyword, n, LEN(@src)) = @src

)

SELECT *

FROM P

CROSS APPLY ( -- Elements to inject

SELECT N.n * 2 AS pos,

CASE permutation & POWER(2, N.n - 1)

WHEN 0 THEN @src

ELSE @synonym

END AS element

FROM TSQLV3.dbo.Nums AS N

WHERE N.n < =

(LEN(@keyword)

- LEN(REPLACE(@keyword, @src, N'')))

/ LEN(@src)  

UNION ALL

-- Original elements

SELECT pos, element FROM E) AS A

ORDER BY permutation, pos, element;

Like before, the CTE P represents the bitmaps for the permutations of @keyword that need to be created. The CTE E represents the original split elements (the query implementing step 3). The outer query uses a CROSS APPLY operator that applies logic to generate the elements to inject unified with the original elements from E. The output of the code in Listing 2, applied to our sample input values is shown in Table 4.

[Table 4: Unified original elements and elements to inject]

Table 4: Unified original elements and elements to injectpermutation pos                  element----------- -------------------- -------0           1                    ww0           2                    AM0           3                    xx0           4                    AM0           5                    yy0           6                    AM0           7                    zz1           1                    ww1           2                    AN1           3                    xx1           4                    AM1           5                    yy1           6                    AM1           7                    zz2           1                    ww2           2                    AM2           3                    xx2           4                    AN2           5                    yy2           6                    AM2           7                    zz

Step 5: Concatenate elements

The fifth and final step concatenates the elements of each permutation to one string based on pos ordering. I use the classic FOR XML PATH technique to concatenate the elements. Listing 3 has the complete solution for the puzzle.

[Listing 3: Complete solution]

WITH P AS -- Permutations

(

SELECT n-1 AS permutation

FROM TSQLV3.dbo.Nums

WHERE n <=

POWER(2,

(LEN(@keyword)

- LEN(REPLACE(@keyword, @src, N'')))

/ LEN(@src))

),

E AS -- Orignal split elements minus @src occurrences

(

SELECT (ROW_NUMBER() OVER(ORDER BY n) - 1) * 2 + 1 AS pos,

SUBSTRING(@keyword, n,

CHARINDEX(@src, @keyword + @src, n) - n) AS element

FROM TSQLV3.dbo.Nums

WHERE n <= LEN(@keyword) + 1

AND SUBSTRING(@src + @keyword, n, LEN(@src)) = @src

)

SELECT result FROM P

CROSS APPLY

(

SELECT element AS [text()]

FROM

(

SELECT N.n * 2 AS pos,

CASE permutation & POWER(2, N.n - 1)

WHEN 0 THEN @src

ELSE @synonym

END AS element

FROM TSQLV3.dbo.Nums AS N

WHERE N.n <=

(LEN(@keyword)

- LEN(REPLACE(@keyword, @src, N'')))

/ LEN(@src)

UNION ALL

SELECT pos, element

FROM E

) AS D

ORDER BY pos

FOR XML PATH('')

 

) AS A(result);

 

Table 5 shows the output of the solution for the sample input values.

[Table 5: Desired result]

Table 5: Desired resultwwAMxxAMyyAMzzwwANxxAMyyAMzzwwAMxxANyyAMzzwwANxxANyyAMzzwwAMxxAMyyANzzwwANxxAMyyANzzwwAMxxANyyANzzwwANxxANyyANzz

Conclusion

I find the string replacement puzzle that I presented in this article to be a beautiful puzzle. It requires you to apply your knowledge of some classic T-SQL techniques such as the ones used for string splitting and string concatenation, and also to incorporate some new ideas. It involves numbers, logic, bitwise manipulation and love. I can only hope to face many more puzzles like this in the future.

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like