TSQL Proximity Puzzle

Itzik provides a TSQL challenge involving matching rows based on proximity of values.

ITPro Today

June 29, 2010

8 Min Read
ITPro Today logo in a gray background | ITPro Today

I got this puzzle from my good friend Herbert Albert. Herbert addressed a more detailed version of the problem with a customer of his; here I’ll present a simplified form of the problem.

Consider the following tables and sample data:

set nocount on;

use tempdb;

 

if object_id('dbo.t1', 'u') is not null drop table dbo.t1;

if object_id('dbo.t2', 'u') is not null drop table dbo.t2;

go

 

create table dbo.t1

(

  col1 int        not null primary key,

  col2 varchar(1) not null

);

 

create table dbo.t2

(

  col1 int not null primary key,

  col2 varchar(1) not null

);

 

-- small sample data to check validity of solution

insert into dbo.t1(col1, col2) values

  (10, 'a'),(20, 'b'),(30, 'c'),(40, 'd'),(50, 'e');

insert into dbo.t2(col1, col2) values

  (9, 'z'),(12, 'y'),(20, 'w'),(35, 'v'),(47, 'u'),(51, 't'),(1759, 's'); 

go

 

The challenge is to write a query that matches to each row from t1 the row in t2 with the “closest” value in t2.col1 to the one in t1.col1. In “closest” I mean the smallest absolute difference. In case of ties, simply the smallest t2.col1 value wins. For the given sample data, the desired result is:

t1_col1     t1_col2 t2_col1     t2_col2

----------- ------- ----------- -------

10          a       9           z

20          b       20          w

30          c       35          v

40          d       35          v

50          e       51          t

 

Use the given sample data and desired result to verify the validity of the solution. But to check the performance of the solutions, I will use the following sample data:

-- Virtual Auxiliary Table of Numbers

IF OBJECT_ID('dbo.GetNums') IS NOT NULL DROP FUNCTION dbo.GetNums;

GO

CREATE FUNCTION dbo.GetNums(@n AS BIGINT) RETURNS TABLE

AS

RETURN

  WITH

  L0   AS(SELECT 1 AS c UNION ALL SELECT 1),

  L1   AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B),

  L2   AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B),

  L3   AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B),

  L4   AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B),

  L5   AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B),

  Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS n FROM L5)

  SELECT TOP (@n) n FROM Nums ORDER BY n;

GO

 

-- Populate Tables

truncate table dbo.t1;

truncate table dbo.t2;

 

insert into dbo.t1 with (tablock) (col1, col2)

  select

    n * 10000 + abs(checksum(newid())) % 10000,

    char(ascii('a') + abs(checksum(newid())) % (ascii('z') - ascii('a') + 1))

  from dbo.getnums(10000);

 

insert into dbo.t2 with (tablock) (col1, col2)

  select

    n * 100 + abs(checksum(newid())) % 100,

    char(ascii('a') + abs(checksum(newid())) % (ascii('z') - ascii('a') + 1))

  from dbo.getnums(1000000);

I’ll cover the solutions in a post next week.

Good luck!

BG

 

Sign up for the ITPro Today newsletter
Stay on top of the IT universe with commentary, news analysis, how-to's, and tips delivered to your inbox daily.

You May Also Like