Window functions
Window function
Background
The window function is a special class of built-in functions. Similar to the aggregation function, it also does calculations on multiple input rows to get a single data value. The difference is that the window function processes the input data within a specific window, rather than using the "group by" method. The data in each window can be sorted and grouped using the over() clause. The window function computes a separate value for each row, rather than computing one value for each group. This flexibility allows users to add additional columns to the select clause and further filter the result set. The window function can only appear in the select list and the outermost position of a clause. It takes effect at the end of the query, that is, after the join
, where
, and group by
operations are performed. The window function is often used to analyze trends, calculate outliers, and perform bucketing analyses on large-scale data.
Usage
Syntax of the window function:
function(args) OVER([partition_by_clause] [order_by_clause] [order_by_clause window_clause])
partition_by_clause ::= PARTITION BY expr [, expr ...]
order_by_clause ::= ORDER BY expr [ASC | DESC] [, expr [ASC | DESC] ...]
Functions
Currently supported functions include:
- MIN(), MAX(), COUNT(), SUM(), AVG()
- FIRST_VALUE(), LAST_VALUE(), LEAD(), LAG()
- ROW_NUMBER(), RANK(), DENSE_RANK()
- CUME_DIST(), PERCENT_RANK(), QUALIFY()
- NTILE()
- VARIANCE(), VAR_SAMP(), STD(), STDDEV_SAMP(), COVAR_SAMP(), COVAR_POP(), CORR()
PARTITION BY clause
The Partition By clause is similar to Group By. It groups the input rows by one or more specified columns. Rows with the same value are grouped together.
ORDER BY clause
The Order By
clause is basically the same as the outer Order By
. It defines the order of the input rows. If Partition By
is specified, Order By
defines the order within each Partition grouping. The only difference is that Order By n
(n is a positive integer) in the OVER
clause is equivalent to no operation, whereas n
in the outer Order By
indicates sorting by the nth column.
Example:
This example shows adding an id column to the select list with values of 1, 2, 3, etc., sorted by the date_and_time
column in the events table.
SELECT row_number() OVER (ORDER BY date_and_time) AS id,
c1, c2, c3, c4
FROM events;
Window clause
The window clause is used to specify a range of rows for operations (the preceding and following rows based on the current row). It supports the following syntaxes: AVG(), COUNT(), FIRST_VALUE(), LAST_VALUE(), and SUM(). For MAX() and MIN(), the window clause can specify the start to UNBOUNDED PRECEDING
.
Syntax:
ROWS BETWEEN [ { m | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | n } FOLLOWING] ]
Example:
Suppose we have the following stock data, the stock symbol is JDR, and the closing price is the daily closing price.
create table stock_ticker (
stock_symbol string,
closing_price decimal(8,2),
closing_date timestamp);
-- ...load some data...
select *
from stock_ticker
order by stock_symbol, closing_date
The raw data was shown as follows:
+--------------+---------------+---------------------+
| stock_symbol | closing_price | closing_date |
+--------------+---------------+---------------------+
| JDR | 12.86 | 2014-10-02 00:00:00 |
| JDR | 12.89 | 2014-10-03 00:00:00 |
| JDR | 12.94 | 2014-10-04 00:00:00 |
| JDR | 12.55 | 2014-10-05 00:00:00 |
| JDR | 14.03 | 2014-10-06 00:00:00 |
| JDR | 14.75 | 2014-10-07 00:00:00 |
| JDR | 13.98 | 2014-10-08 00:00:00 |
+--------------+---------------+---------------------+
This query uses the window function to generate the moving_average column whose value is the 3-day (previous day, current day, and next day) average stock price. The first day does not have the value of its previous day, and the last day does not have the value of the day after, so these two rows only calculate the average value of two days. Here Partition By
does not take effect, because all the data is JDR data. However, if there is other stock information, Partition By
will ensure that the window function is operated within each Partition.
select stock_symbol, closing_date, closing_price,
avg(closing_price)
over (partition by stock_symbol
order by closing_date
rows between 1 preceding and 1 following
) as moving_average
from stock_ticker;