Cross Joins are rarely used, eh?
In the TSQL
MOC for SQL 2000, Microsoft states that “While cross joins are rarely used on a
normalized database, you can use them to generate test data for a database or
lists of all possible combinations for checklists or business templates.”
Although I’d dispute the “rarely”, I’ve never really thought much about
checklists or business templates part until yesterday when I was working on a
report for a client. They wanted to find aggregations based on ranges, but
wanted to display the domain of ranges regardless of whether values existed or
not.
I thought of
a couple of ways to do it, but decided why not try an old trick that I haven’t
used in awhile. The Cross Join portion of the following is an adaptation of a
piece of code that I stole from Itzik
Ben-Gan without permission, and have been using for years, mostly to create
time dimensions for data marts. (Sorry Mr. Ben-Gan, I didn’t dig for the article
in my stack of SQL Mags dating back to 1998, sitting in a closet floor at home,
and was too lazy to look it up on SQLmag.com).
Run the
following script and then review the results. Notice that there are gaps in the
reporting ranges. The code checks for 10 “bands”, yet only four show in the
result set.
SET NOCOUNT
ON
DROP TABLE
#loans
CREATE TABLE
#loans
(LoanId int
identity (1,1) NOT NULL
,CustomerId int
,LoanPrincipal money NOT NULL DEFAULT
0
,LoanDate smalldatetime NOT
NULL DEFAULT
current_timestamp
,LoanOffice char(1) DEFAULT 'D'
,Constraint
pk_loans PRIMARY KEY (LoanId, CustomerId)
)
--***Insert data
INSERT INTO #loans
VALUES (1,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (2,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (4,150,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (4,200,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (2,200,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (4,200,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (3,500,DEFAULT,DEFAULT)
--***Has 10 or more
loans
INSERT INTO #loans
VALUES (10,500,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,400,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,500,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,600,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
INSERT INTO #loans
VALUES (10,100,DEFAULT,DEFAULT)
--***Gives you the correct
count but doesn't fill in the missing --ranges
SELECT "PaydayLoanFrequency"= CASE WHEN CNT =1 THEN '1
times'
WHEN CNT =2 THEN '2 times'
WHEN CNT =3 THEN '3 times'
WHEN CNT =4 THEN '4 times'
WHEN CNT =5 THEN '5 times'
WHEN CNT =6 THEN '6 times'
WHEN CNT =7 THEN '7 times'
WHEN CNT =8 THEN '8 times'
WHEN CNT =9 THEN '9 times'
WHEN CNT >=10
THEN '10 times or
more'
END,
COUNT(ID) AS cnt
FROM (
SELECT COUNT(d.LoanId) as cnt
,CustomerId as
ID
FROM #Loans d with (nolock)
GROUP BY
customerid
) X
GROUP BY
cnt
That’s
result set might be ok sometimes, but not good for the display result that they
were looking for. You could do it in they reporting tool of your choice, but I
chose to do it in TSQL.
--*** A cartesian product is
just the trick
--***
Notice how I use f.id
SELECT F.id
"Payday Loan Frequency"
,ISNULL(z.cnt,0) "Total Number Of Borrowers"
FROM (
SELECT "PaydayLoanFrequency"= CASE
WHEN CNT =1 THEN '1
time'
WHEN CNT =2 THEN '2 times'
WHEN CNT =3 THEN '3 times'
WHEN CNT =4 THEN '4 times'
WHEN CNT =5 THEN '5 times'
WHEN CNT =6 THEN '6 times'
WHEN CNT =7 THEN '7 times'
WHEN CNT =8 THEN '8 times'
WHEN CNT =9 THEN '9 times'
WHEN CNT >=10
THEN '10 times or
more'
END,
COUNT(ID) AS cnt
FROM (
SELECT COUNT(d.LoanId) as
cnt
,CustomerId as
ID
FROM #Loans d with
(nolock)
GROUP BY
customerid
)
X
GROUP BY
cnt
) z
--*** Here is my product
RIGHT OUTER JOIN
(
SELECT CASE
WHEN CAST((a1.id + a0.id) as varchar (4))
+ ' times' = '10 times' then '10 times or
more'
WHEN CAST((a1.id + a0.id) as varchar (4))
+ ' times' = '1 times' then '1 time'
ELSE CAST((a1.id + a0.id) as varchar (4))
+ ' times' end
"id"
FROM
(SELECT 0 id UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7 UNION ALL
SELECT 8 UNION ALL
SELECT 9
)
a0
CROSS JOIN
(SELECT 0 id UNION ALL
SELECT 10 UNION
ALL
SELECT 20 UNION
ALL
SELECT 30 UNION
ALL
SELECT 40 UNION
ALL
SELECT 50
)
a1
) f
on z.paydayloanfrequency = f.id
WHERE CAST(LEFT(id,2) AS INT) BETWEEN 1 and 10
ORDER BY CAST(LEFT(id,2) AS INT)
The
table aliased “f” is the table to obtain my discrete values. The f.ID
attribute, along with the RIGHT OUTER JOIN, guarantees me that I’ll get all of
my values regardless of whether the data exists. From here, I created three
tables in Microsoft Reporting Services, and set them side-by-side, each with
different ranges of values for the report that I was looking for. (RIGHT JOIN can be substituted for LEFT OUTER JOIN, of course).
Lee
-----------------------
Interested in a bridge I have for sale in Arizona???