TSQL Proximity Puzzle
Itzik provides a TSQL challenge involving matching rows based on proximity of values.
June 29, 2010
I got this puzzle from my good friend Herbert Albert. Herbert addressed a more detailed version of the problem with a customer of his; here I’ll present a simplified form of the problem.
Consider the following tables and sample data:
set nocount on;
use tempdb;
if object_id('dbo.t1', 'u') is not null drop table dbo.t1;
if object_id('dbo.t2', 'u') is not null drop table dbo.t2;
go
create table dbo.t1
(
col1 int not null primary key,
col2 varchar(1) not null
);
create table dbo.t2
(
col1 int not null primary key,
col2 varchar(1) not null
);
-- small sample data to check validity of solution
insert into dbo.t1(col1, col2) values
(10, 'a'),(20, 'b'),(30, 'c'),(40, 'd'),(50, 'e');
insert into dbo.t2(col1, col2) values
(9, 'z'),(12, 'y'),(20, 'w'),(35, 'v'),(47, 'u'),(51, 't'),(1759, 's');
go
The challenge is to write a query that matches to each row from t1 the row in t2 with the “closest” value in t2.col1 to the one in t1.col1. In “closest” I mean the smallest absolute difference. In case of ties, simply the smallest t2.col1 value wins. For the given sample data, the desired result is:
t1_col1 t1_col2 t2_col1 t2_col2
----------- ------- ----------- -------
10 a 9 z
20 b 20 w
30 c 35 v
40 d 35 v
50 e 51 t
Use the given sample data and desired result to verify the validity of the solution. But to check the performance of the solutions, I will use the following sample data:
-- Virtual Auxiliary Table of Numbers
IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;
GO
CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE
AS
RETURN
WITH
L0 AS(SELECT 1 AS c UNION ALL SELECT 1),
L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),
L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),
L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),
L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),
L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),
Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)
SELECT TOP (@n) n FROM Nums ORDER BY n;
GO
-- Populate Tables
truncate table dbo.t1;
truncate table dbo.t2;
insert into dbo.t1 with (tablock) (col1, col2)
select
n * 10000 + abs(checksum(newid())) % 10000,
char(ascii('a') + abs(checksum(newid())) % (ascii('z') - ascii('a') + 1))
from dbo.getnums(10000);
insert into dbo.t2 with (tablock) (col1, col2)
select
n * 100 + abs(checksum(newid())) % 100,
char(ascii('a') + abs(checksum(newid())) % (ascii('z') - ascii('a') + 1))
from dbo.getnums(1000000);
I’ll cover the solutions in a post next week.
Good luck!
BG
About the Author
You May Also Like