Functions (Analytic)

Introduction#

You use analytic functions to determine values based on groups of values. For example, you can use this type of function to determine running totals, percentages, or the top result within a group.

Syntax#

FIRST_VALUE ( scalar_expression )

OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] ) 2. LAST_VALUE ( scalar_expression ) OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] ) 3. LAG (scalar_expression [,offset] [,default]) OVER ( [ partition_by_clause ] order_by_clause ) 4. LEAD ( scalar_expression [ ,offset ] , [ default ] ) OVER ( [ partition_by_clause ] order_by_clause ) 5. PERCENT_RANK( ) OVER ( [ partition_by_clause ] order_by_clause ) 6. CUME_DIST( ) OVER ( [ partition_by_clause ] order_by_clause ) 7. PERCENTILE_DISC ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ <partition_by_clause> ] ) 8. PERCENTILE_CONT ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ <partition_by_clause> ] )

FIRST_VALUE

You use the FIRST_VALUE function to determine the first value in an ordered result set, which you identify using a scalar expression.

SELECT StateProvinceID, Name, TaxRate,
       FIRST_VALUE(StateProvinceID)
        OVER(ORDER BY TaxRate ASC) AS FirstValue
FROM SalesTaxRate;

In this example, the FIRST_VALUE function is used to return the ID of the state or province with the lowest tax rate. The OVER clause is used to order the tax rates to obtain the lowest rate.

StateProvinceID	Name	TaxRate	FirstValue
74	Utah State Sales Tax	5.00	74
36	Minnesota State Sales Tax	6.75	74
30	Massachusetts State Sales Tax	7.00	74
1	Canadian GST	7.00	74
57	Canadian GST	7.00	74
63	Canadian GST	7.00	74

LAST_VALUE

The LAST_VALUE function provides the last value in an ordered result set, which you specify using a scalar expression.

SELECT TerritoryID, StartDate, BusinessentityID,
       LAST_VALUE(BusinessentityID) 
        OVER(ORDER BY TerritoryID) AS LastValue
FROM SalesTerritoryHistory;

This example uses the LAST_VALUE function to return the last value for each rowset in the ordered values.

TerritoryID	StartDate	BusinessentityID	LastValue
1	2005-07-01 00.00.00.000	280	283
1	2006-11-01 00.00.00.000	284	283
1	2005-07-01 00.00.00.000	283	283
2	2007-01-01 00.00.00.000	277	275
2	2005-07-01 00.00.00.000	275	275
3	2007-01-01 00.00.00.000	275	277

LAG and LEAD

The LAG function provides data on rows before the current row in the same result set. For example, in a SELECT statement, you can compare values in the current row with values in a previous row.

You use a scalar expression to specify the values that should be compared. The offset parameter is the number of rows before the current row that will be used in the comparison. If you don’t specify the number of rows, the default value of one row is used.

The default parameter specifies the value that should be returned when the expression at offset has a NULL value. If you don’t specify a value, a value of NULL is returned.

The LEAD function provides data on rows after the current row in the row set. For example, in a SELECT statement, you can compare values in the current row with values in the following row.

You specify the values that should be compared using a scalar expression. The offset parameter is the number of rows after the current row that will be used in the comparison.

You specify the value that should be returned when the expression at offset has a NULL value using the default parameter. If you don’t specify these parameters, the default of one row is used and a value of NULL is returned.

SELECT BusinessEntityID, SalesYTD,
       LEAD(SalesYTD, 1, 0) OVER(ORDER BY BusinessEntityID) AS "Lead value",
       LAG(SalesYTD, 1, 0) OVER(ORDER BY BusinessEntityID) AS "Lag value"
FROM SalesPerson;

This example uses the LEAD and LAG functions to compare the sales values for each employee to date with those of the employees listed above and below, with records ordered based on the BusinessEntityID column.

BusinessEntityID	SalesYTD	Lead value	Lag value
274	559697.5639	3763178.1787	0.0000
275	3763178.1787	4251368.5497	559697.5639
276	4251368.5497	3189418.3662	3763178.1787
277	3189418.3662	1453719.4653	4251368.5497
278	1453719.4653	2315185.6110	3189418.3662
279	2315185.6110	1352577.1325	1453719.4653

PERCENT_RANK and CUME_DIST

The PERCENT_RANK function calculates the ranking of a row relative to the row set. The percentage is based on the number of rows in the group that have a lower value than the current row.

The first value in the result set always has a percent rank of zero. The value for the highest-ranked – or last – value in the set is always one.

The CUME_DIST function calculates the relative position of a specified value in a group of values, by determining the percentage of values less than or equal to that value. This is called the cumulative distribution.

SELECT BusinessEntityID, JobTitle, SickLeaveHours,
PERCENT_RANK() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours DESC)
       AS "Percent Rank",
CUME_DIST() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours DESC)
       AS "Cumulative Distribution"
FROM Employee;

In this example, you use an ORDER clause to partition – or group – the rows retrieved by the SELECT statement based on employees’ job titles, with the results in each group sorted based on the numbers of sick leave hours that employees have used.

BusinessEntityID	JobTitle	SickLeaveHours	Percent Rank	Cumulative Distribution
267	Application Specialist	57	0	0.25
268	Application Specialist	56	0.333333333333333	0.75
269	Application Specialist	56	0.333333333333333	0.75
272	Application Specialist	55	1	1
262	Assitant to the Cheif Financial Officer	48	0	1
239	Benefits Specialist	45	0	1
252	Buyer	50	0	0.111111111111111
251	Buyer	49	0.125	0.333333333333333
256	Buyer	49	0.125	0.333333333333333
253	Buyer	48	0.375	0.555555555555555
254	Buyer	48	0.375	0.555555555555555

The PERCENT_RANK function ranks the entries within each group. For each entry, it returns the percentage of entries in the same group that have lower values.

The CUME_DIST function is similar, except that it returns the percentage of values less than or equal to the current value.

PERCENTILE_DISC and PERCENTILE_CONT

The PERCENTILE_DISC function lists the value of the first entry where the cumulative distribution is higher than the percentile that you provide using the numeric_literal parameter.

The values are grouped by rowset or partition, as specified by the WITHIN GROUP clause.

The PERCENTILE_CONT function is similar to the PERCENTILE_DISC function, but returns the average of the sum of the first matching entry and the next entry.

SELECT BusinessEntityID, JobTitle, SickLeaveHours,
       CUME_DIST() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours ASC)
       AS "Cumulative Distribution",
       PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY SickLeaveHours)
          OVER(PARTITION BY JobTitle) AS "Percentile Discreet"
FROM Employee;

To find the exact value from the row that matches or exceeds the 0.5 percentile, you pass the percentile as the numeric literal in the PERCENTILE_DISC function. The Percentile Discreet column in a result set lists the value of the row at which the cumulative distribution is higher than the specified percentile.

BusinessEntityID	JobTitle	SickLeaveHours	Cumulative Distribution	Percentile Discreet
272	Application Specialist	55	0.25	56
268	Application Specialist	56	0.75	56
269	Application Specialist	56	0.75	56
267	Application Specialist	57	1	56

To base the calculation on a set of values, you use the PERCENTILE_CONT function. The “Percentile Continuous” column in the results lists the average value of the sum of the result value and the next highest matching value.

SELECT BusinessEntityID, JobTitle, SickLeaveHours,
       CUME_DIST() OVER(PARTITION BY JobTitle ORDER BY SickLeaveHours ASC)
       AS "Cumulative Distribution",
       PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY SickLeaveHours) 
          OVER(PARTITION BY JobTitle) AS "Percentile Discreet",
       PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY SickLeaveHours) 
          OVER(PARTITION BY JobTitle) AS "Percentile Continuous"
FROM Employee;

BusinessEntityID	JobTitle	SickLeaveHours	Cumulative Distribution	Percentile Discreet	Percentile Continuous
272	Application Specialist	55	0.25	56	56
268	Application Specialist	56	0.75	56	56
269	Application Specialist	56	0.75	56	56
267	Application Specialist	57	1	56	56