GROUP BY Clause in MySQL
Introduction
Grouping data helps analyse patterns and summaries from large datasets.
The GROUP BY clause in MySQL groups rows with identical values into summary rows.
It is widely used for:
- Aggregate calculations (COUNT, SUM, AVG, MAX, MIN)
- Data analysis and reporting
- Dashboard statistics
What is the GROUP BY Clause
The GROUP BY clause groups rows that have the same values in specified columns into summary rows.
Aggregate functions like COUNT, SUM, and AVG are applied to each group.
Basic Syntax
sql
SELECT column_name, AGGREGATE_FUNCTION(column)
FROM table_name
GROUP BY column_name;
Example
sql
SELECT city, COUNT(*) as student_count
FROM Students
GROUP BY city;
This returns the number of students per city.
GROUP BY with Multiple Columns
Groups by a combination of columns.
sql
SELECT city, department, AVG(marks) as avg_marks
FROM Students
GROUP BY city, department;
Groups by city AND department combination.
Common Aggregate Functions
- COUNT(*): Total rows in group
- SUM(column): Total of numeric column
- AVG(column): Average of numeric column
- MAX(column): Highest value
- MIN(column): Lowest value
Example with Multiple Aggregates
sql
SELECT city,
COUNT(*) as total_students,
AVG(marks) as avg_marks,
MAX(marks) as highest_mark
FROM Students
GROUP BY city;
GROUP BY with WHERE Clause
WHERE filters before grouping.
sql
SELECT city, COUNT(*) as count
FROM Students
WHERE marks > 80
GROUP BY city;
GROUP BY with ORDER BY
Sort grouped results.
sql
SELECT city, AVG(marks) as avg_marks
FROM Students
GROUP BY city
ORDER BY avg_marks DESC;
GROUP BY with JOIN
Group joined table results.
sql
SELECT Courses.course_name, COUNT(Students.id) as enrollment
FROM Students
JOIN Courses ON Students.course_id = Courses.id
GROUP BY Courses.course_name;
GROUP BY with Subqueries
Use grouped results in a subquery.
sql
SELECT * FROM (
SELECT city, AVG(marks) as avg_marks
FROM Students
GROUP BY city
) AS city_stats
WHERE avg_marks > 75;
GROUP BY with ROLLUP
ROLLUP adds subtotals and totals.
sql
SELECT city, department, AVG(marks)
FROM Students
GROUP BY city, department WITH ROLLUP;
Shows city subtotals, department subtotals, and total.
Execution Order with GROUP BY
GROUP BY is applied after:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
LIMIT
Performance Considerations
- Index GROUP BY columns for faster grouping
- Avoid GROUP BY on large text columns
- Use WHERE to reduce rows before grouping
- Temporary tables created for complex GROUP BY
Important Notes
- All non-aggregate columns must appear in GROUP BY
- HAVING filter groups (after GROUP BY)
- ORDER BY can reference aliases
- NULL values form their own group
- MySQL allows SELECT * with GROUP BY (uses arbitrary row)
Example Scenario
Get average marks per city with total students:
sql
SELECT city,
COUNT(*) as total_students,
ROUND(AVG(marks), 2) as avg_marks
FROM Students
GROUP BY city
ORDER BY avg_marks DESC;
Common Mistakes
- Forgetting GROUP BY for aggregate functions
- Using WHERE instead of HAVING for aggregates
- Non-grouped columns in SELECT without aggregate
- GROUP BY before WHERE (wrong execution order)
- Large datasets without proper indexing
Key Points to Remember
- GROUP BY groups identical values into summary rows
- Aggregate functions work on each group
- WHERE filters before grouping, HAVING filters after
- Multiple columns: groups by complete combination
- ROLLUP adds subtotals and grand totals
- Index GROUP BY columns for performance