T-SQL 101, Lesson 3
Most database administrators and SQL Server administrators know that they can use T-SQL's SELECT statement to retrieve the data stored in Microsoft SQL Server databases. What database administrators and SQL Server administrators might not realize is that they can also use SELECT statements to aggregate the data they retrieve. T-SQL's COUNT, MIN, MAX, AVG, and SUM functions make that possible.
April 28, 2008
In their simplest form, SELECT statements let you retrieve data stored in databases. However, you can use them for so much more, including aggregating data. Aggregating data simply means bringing data together and summarizing it. Aggregate functions available in T-SQL include COUNT, MIN, MAX, AVG, and SUM. There are other functions, but for now, I’ll show you how to use these five.
The Prerequisites
To help you follow the examples I present, I created a sample Employee table for you to use. I’ll assume you have a database to work with and the permissions needed to create and modify tables in it. To create the sample Employee table, follow these steps:
1. Download the CodeToCreateEmployeeTable.sql and CodeToPopulateEmployeeTable.sql files. Click the"Download the Code" hotlink at the top of the article.
2. Create the Employee table. Open SQL Server 2005’s SQL Server Management Studio (SSMS) or SQL Server 2000’s Query Analyzer and copy the code in CodeToCreateEmployeeTable.sql, which Listing 1 shows, in the query window. In the code at callout A in Listing 1, change MyDB to the name of your database. Execute the code.
3. Populate the Employee table. To do this, run CodeToPopulateEmployeeTable.sql. As Listing 2 shows, this code uses single-record INSERT statements to add the fictitious employee data. (See "T-SQL 101, Lesson 2," April 2008, for information about this type of INSERT statement.)
After you’ve created and populated the Employee table, take a minute to familiarize yourself with the table’s layout.
Counting Records
You can use the COUNT function in a SELECT statement to obtain the number of items in a group. You enclose the item you want to count in parentheses. The item can be just about any expression (i.e., column name, function, constant value, or any combination thereof), but usually the item is a single column in a table. You can specify an asterisk (*) if you want to count all the records in the table. For example, if you want to determine how many records are in the Employee table, you’d run the code
SELECT COUNT(*) AS 'Employees' FROM Employee
In this statement, the AS 'Employees' clause specifies that you want the results displayed under the column name of Employees. (See "T-SQL 101, Lesson 1," March 2008, for information about using the AS clause to display different column names in result sets.) The FROM Employee clause specifies the target table. As Figure 1 shows, the results show that the Employee table has 8 records.
Suppose that you only want to count the number of employees that make under $30,000 a year. You can add a WHERE clause that specifies the value in the Salary column must be less than $30,000:
SELECT COUNT(*) AS 'Impoverished' FROM Employee WHERE Salary < 30000
In this case, the result is 1, which is displayed under the column name of Impoverished.
When you specify *, the COUNT function counts all rows, even if columns within the rows contain NULL values (i.e., entries that have no explicitly assigned value). If you specify a column name, however, only non-NULL values are counted. When you use the DISTINCT keyword, the number of unique non-NULL values is determined. The expression specified must be a column name and not an arithmetic expression. For example, you can use the DISTINCT keyword to determine how many unique last names are in the LastName column in the Employee table:
SELECT COUNT(DISTINCT LastName) AS 'Last Names' FROM Employee
The result is 7 because two of the employees have the same last name (i.e., Smith), as callout A in Listing 2 shows.
Note that at the beginning of this section, I mentioned you can count just about any expression. I said just about because you can’t use expressions of type uniqueidentifier, text, ntext, or image. If you’re unfamiliar with these data types, see the Data Types (Transact-SQL) Web page at msdn2.microsoft.com/ en-us/library/ms187752.aspx.
Determining Minimum Values
When you use the MIN function in a SELECT statement, you can find out the minimum value for a specified column or arithmetic expression, which you enclose in parentheses. For example, to determine the lowest salary in the Employee table, you can use the query
SELECT MIN(Salary) AS 'Minimum Salary' FROM Employee
The result is $23,500. Although you can use the DISTINCT keyword with the MIN function, there’s no point. By the function’s very definition, there can be only one value.
Unlike the COUNT function, the MIN function always ignores NULL values. (Similarly, the MAX, AVG, and SUM functions always ignore NULL values.) Like the COUNT function, the MIN function is quite versatile in that you can include just about any expression to specify the item for which you want to find the minimum value. For example, you can use an expression that contains the DATEDIFF and GETDATE functions to determine the number of months that the most-recent hired employee has been with our fictitious company. First, you use DATEDIFF and GETDATE to calculate the number of months that have elapsed since each employee was hired. Then, you use MIN to determine the lowest number out of all the month values just calculated.
The DATEDIFF function takes three parameters. You use the first parameter to specify the time period being tracked. In this case, you need to specify m for months. You use the second and third parameters to specify the start and end dates, respectively. In this case, the start date is the value in the Employee table’s HireDate column and the end date is the current date, which you obtain with the GETDATE function. So, the query is
SELECT MIN(DateDiff (m, HireDate, GETDATE())) AS 'Number of Months' FROM Employee
If you run this query on, say, May 20, 2008, the result is 26 months.
To learn more about the MIN, DATEDIFF, GETDATE, or any of the other functions discussed here, highlight the function in your query window and press Shift+F1. This will invoke SQL Server Books Online (BOL) context-sensitive help, which will bring you to the appropriate documentation.
Determining Maximum Values
You can use the MAX function in a SELECT statement to obtain the maximum value for a specified column or arithmetic expression. For example, the query
SELECT MAX(Salary) AS 'Maximum Salary' FROM Employee
reveals that the highest salary in the Employee table is $250,000.
Using the DATEDIFF and GETDATE functions, you can determine how many years the most senior employee has been with the fictitious company. The code is similar to that used to obtain how long the most recently hired employee has been with the company, except that MAX rather than MIN is used and the time-period argument is in years (represented by yy) rather than months. So, the query
SELECT MAX(DATEDIFF (yy, HireDate, GETDATE())) AS 'Number of Years' FROM Employee
reveals that the most senior employee has been with the company for 18 years.
Determining Average Values
You can use the AVG function in a SELECT statement to obtain the average value for a specified column or arithmetic expression. (The values you’re averaging must be numeric.) For example, to determine the average number of years employees have been with the company, you’d run the query
SELECT AVG(DATEDIFF (yy, HireDate, GETDATE())) AS 'Average Years of Service' FROM Employee
The result is an average of 5 years. Note that the data type returned by AVG is determined by the type of data being evaluated. In this case, the Salary field is the data type of integer, so the data returned is an integer.
Let’s look at another example for AVG. Suppose you want to find the average salary of all the employees. You’d run the query
SELECT AVG(Salary) AS 'AverageSalary' FROM Employee
The result is $90,562. You might be tempted to say that our fictitious company pays a decent average salary, but looking at the minimum ($23,500) and maximum ($250,000) salaries reveals that there’s quite a gap between the two. In such cases, you can use the MIN and MAX functions in a WHERE clause to filter out the top and bottom salaries:
SELECT AVG(Salary) AS 'AdjustedAverageSalary' FROM Employee WHERE Salary < (SELECT MAX(Salary) FROM Employee) AND Salary > (SELECT MIN(Salary) FROM Employee)
As you can see, the WHERE clause consists of two components. The first component—Salary < (SELECT MAX(Salary) FROM Employee)—uses MAX to find the maximum salary, then uses the < (less than) operator to select the salaries lower than that value. The second component—Salary > (SELECT MIN(Salary) FROM Employee)—uses MIN to find the minimum salary, then uses the > (greater than) operator to select the salaries higher than that value. By joining these two components with the AND operator, the WHERE statement retrieves only those values that are lower than the maximum salary and higher than the minimum salary. The AVG function then uses those values to determine the average salary. In this case, the result is $75,166, which is quite a bit lower than previously reported $90,562. (See Lesson 1 for background information on how to use the AND, <, and > operators in WHERE clauses.)
Determining the Sum of Values
When you use the SUM function in a SELECT statement, you can obtain the total of all the values in the specified column or arithmetic expression. (The values being totaled must be numeric.) For example, the query
SELECT SUM(Salary) AS 'Salary Total' FROM Employee
shows that the sum of all the salaries in the Employee table is $724,500. Like the AVG function, the data type returned by the SUM function is determined by the type of data being evaluated.
With the sum of all the salaries in hand, you might be tempted to divide that total by the number of records returned by COUNT(*) to get an average salary. However, this practice can lead to problems because COUNT(*) and SUM handle NULL values differently. As I mentioned previously, COUNT(*) includes NULL values whereas SUM ignores them. So, for example, if the HR department didn’t get a chance to enter all the salary data for employees, using the SUM and COUNT(*) functions to calculate the average salary would lead to an inaccurate calculation because you’d be totaling only the available salary data and dividing that total by all the employee records. Thus, it’s safer to use the AVG function instead because it ignores NULL values altogether.
Beyond the Aggregate Basics
In this lesson, I showed you examples of how to use the COUNT, MIN, MAX, AVG, and SUM functions in SELECT queries. Although these queries are useful from a teaching perspective, they only scratch the surface when it comes to showing how useful aggregate functions can be when they’re used in more complex queries. In Lesson 4, I’ll show you how to tap into the true power of T-SQL by using aggregate functions with the Group By clause in SELECT statements.
About the Author
You May Also Like