Dinamically grouping in T-SQL

Extracting, manipulating and presenting data from a SQL database could be a complicated operation, expecially if we talk about business intelligence, reports and graphics for the final users.

If we couldn’t use a BI tool and we need to write extraction queries manually, we could have some problem to write complex queries. Writing a query that dinamically group a set of data based on a user selection is no very easy.

Suppose that the application have an interface with a multivalue field where the user can choose a max of three grouping from a list of ten, after that the filters will be stored in a dedicated table. Then the extraction query will be executed, will read the groupings from the table and will have to apply the grouping dinamically to the dataset.

The columns of the result dataset should are:

  • Grouping 1
  • Grouping 2 (optional)
  • Grouping 3 (optional)
  • Totals

In order to achieve this result I’ll use a SQL Function and table variables.

Function

The function will returns a table variable like this:


CREATE FUNCTION [dbo].[report_summary] ()
RETURNS @Aggregate TABLE (
FirstGrouping NVARCHAR(50) null
,SecondGrouping NVARCHAR(50) null
,ThirdGrouping NVARCHAR(50) null
,Total int
)
AS
BEGIN
.....
END

The first step is retrieving the groupings from the filters table:


DECLARE @Groupings TABLE (
Grouping NVARCHAR(50)
)

INSERT INTO @Groupings
SELECT Grouping, ROW_NUMBER() OVER(ORDER BY DefaultOrder) AS RowNumber from dbo.Groupings
WHERE ExtractionId = @ExtractionId
ORDER BY DefaultOrder

So I’ve declared a table variable filled with a list of groupings retrieved from the Groupings table. The table has an identity column DefaultOrder as well that is useful to assign an order to the groupings and a RowNumber with OVER clause that assign the first, second and third position.

Based on a predefinded list of possible groupings, I set a bunch of variables that will be useful later:

DECLARE @BirthCityGrouping NVARCHAR(30) = 'Birth city';
DECLARE @AgeGrouping NVARCHAR(30) = 'Age';
DECLARE @ZipCodeGrouping NVARCHAR(30) = 'Zip code';
DECLARE @CountryGrouping NVARCHAR(30) = 'Country';
DECLARE @SubscriptionYearGrouping NVARCHAR(30) = 'Subscription year';
.....

DECLARE @IsBirthCityGrouping smallint set @IsBirthCityGrouping = (select count(Grouping) from @Groupings where Grouping = @BirthCityGrouping);
DECLARE @IsAgeGrouping smallint set @IsAgeGrouping = (select count(Grouping) from @Groupings where Grouping = @AgeGrouping);
DECLARE @IsZipCodeGrouping smallint set @IsZipCodeGrouping = (select count(Grouping) from @Groupings where Grouping = @ZipCodeGrouping);
.....
DECLARE @FirstGrouping nvarchar(50) = (select Grouping from @Groupings where RowNumber = 1);
DECLARE @SecondGrouping nvarchar(50) = (select Grouping from @Groupings where RowNumber = 2);
DECLARE @ThirdGrouping nvarchar(50) = (select Grouping from @Groupings where RowNumber = 3);

I can use the first set of variables to build an intermediate table variable with the values to be grouped:


DECLARE @SelectedRecords TABLE (
Id UNIQUEIDENTIFIER
,BirthCity NVARCHAR(50)
,Age smallint
,ZipCode NVARCHAR(5)
,Country NVARCHAR(50)
,SubscriptionYear SMALLINT
)

INSERT INTO @SelectedRecords select Id
,CASE WHEN (@IsBirthCityGrouping > 0) THEN BirthCity ELSE 'ALL' END
,CASE WHEN (@IsAgeGrouping > 0) THEN Age ELSE 'ALL' END
,CASE WHEN (@IsZipCodeGrouping > 0) THEN ZipCode ELSE 'ALL' END
.....
from dbo.MyTable

With the variables declarated above I can check if the grouping have to be done or not; in the first case I select the effective value, otherwise I set a default value ‘ALL’ for all the rows, that will be not considered in the group by clause.

Now I’m ready to populate the Aggregate table:


INSERT INTO @Aggregate
SELECT
(CASE
WHEN @FirstGrouping = @BirthCityGrouping THEN BirthCity
WHEN @FirstGrouping = @AgeGrouping THEN Age
WHEN @FirstGrouping = @ZipCodeGrouping THEN ZipCode
.....
ELSE '' END) as FirstGrouping
,(CASE
WHEN @SecondGrouping = @BirthCityGrouping THEN BirthCity 
WHEN @SecondGrouping = @AgeGrouping THEN Age 
WHEN @SecondGrouping = @ZipCodeGrouping THEN ZipCode .....
ELSE '' END) as SecondGrouping 
,(CASE
WHEN @ThirdGrouping = @BirthCityGrouping THEN BirthCity 
WHEN @ThirdGrouping = @AgeGrouping THEN Age 
WHEN @ThirdGrouping = @ZipCodeGrouping THEN ZipCode .....
ELSE '' END) as ThirdGrouping 
,count(Id)
FROM @SelectedRecords
GROUP BY
BirthCity,Age,ZipCode,Country,SubscriptionYear

In the first three colums I need to determine the groupings and show the correct values and in the last one I calculate the total. The group by clause will group for the enhanced columns and will ignore those that they have the ALL value. In this way I will be able to generate a table variable with three dynamic grouping.

Summary

The problem of dynamically grouping sql records can be solved with table variables and replacing some default values instead of the originals where we do not need to do groupings.

We can manage a default grouping order with an identity column in a table that guarantee the order which the rows are created.

Once the intermediate table variables are ready, we can count the subtotals by leveraging the default values defined, that not influence the count.

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create a website or blog at WordPress.com

Up ↑

%d bloggers like this: