Union vs Join in MySQL
Introduction
Combining data has two strategies: vertical stacking vs horizontal matching.
UNION stacks result sets vertically, JOIN matches horizontally.
They are widely used for:
- Different data combination needs
- Performance optimization
- Normalized vs denormalized data
What is Union vs Join
| Aspect | UNION | JOIN |
| Direction | Vertical (stack) | Horizontal (match) |
| Input | Multiple SELECTs | Related tables |
| Output | Single column set | Multiple columns |
| Duplicates | Removes (UNION) | Keeps (depends) |
Basic UNION Example
sql
SELECT name FROM Students
UNION
SELECT name FROM Graduates;
Single column, stacked results.
Basic JOIN Example
sql
SELECT s.name, c.course_name
FROM Students s
JOIN Courses c ON s.course_id = c.id;
Multiple columns, matched rows.
UNION Use Case
Combine similar structured data.
sql
SELECT 'Student' as type, name, marks FROM Students WHERE marks > 90
UNION
SELECT 'Graduate' as type, name, score FROM Graduates WHERE score > 90
ORDER BY marks DESC;
JOIN Use Case
Related table data combination.
sql
SELECT s.name, c.course_name, d.department
FROM Students s
JOIN Courses c ON s.course_id = c.id
JOIN Departments d ON c.dept_id = d.id;
Performance Comparison
| Scenario | UNION | JOIN | Winner |
| 1000+1000 rows | Slow (dedupe) | Fast | JOIN |
| Different tables | N/A | Fast | JOIN |
| Same structure | Fast (UNION ALL) | Complex | UNION |
| Normalized data | Complex | Fast | JOIN |
Column Requirements
sql
-- UNION: Same # columns, compatible types
SELECT name FROM Students
UNION
SELECT name FROM Graduates; -- 1 column only
-- JOIN: Related by key, any # columns
SELECT s.name, c.course_name FROM Students s JOIN Courses c;
UNION with ORDER BY
Final result sorting only.
sql
SELECT name FROM Students
UNION
SELECT name FROM Graduates
ORDER BY name; -- Applies to combined result
JOIN with ORDER BY
Sorts matched results.
sql
SELECT s.name, c.course_name
FROM Students s JOIN Courses c ON s.course_id = c.id
ORDER BY s.name, c.course_name;
When to Use UNION
- Same structure, different sources
- Log files from different dates
- Category/type discrimination
sql
SELECT 'Sales' as source, amount FROM JanSales
UNION ALL
SELECT 'Sales' as source, amount FROM FebSales;
When to Use JOIN
- Normalized relational data
- Master-detail relationships
- Foreign key relationships
sql
SELECT o.order_id, c.customer_name
FROM Orders o JOIN Customers c ON o.customer_id = c.id;
Memory Usage
| Operation | Sort Buffer | Index Usage |
| UNION | High | Per SELECT |
| JOIN | Low | Join columns |
| UNION ALL | Low | Per SELECT |
Execution Order Comparison
UNION: SELECT1 → SELECT2 → Combine → ORDER BY → LIMIT
JOIN: FROM → JOIN → ON → WHERE → GROUP BY → ORDER BY → LIMIT
Performance Considerations
- JOIN faster for normalized data
- UNION ALL fastest for stacking
- UNION slow due to deduplication
- Index join columns (JOIN) vs filter columns (UNION)
Important Notes
- UNION requires identical column structure
- JOIN needs relationship (foreign keys)
- UNION ALL avoids duplicate removal overhead
- Different purposes, not interchangeable
- JOIN = horizontal expansion, UNION = vertical stacking
Example Scenario
Sales report from multiple sources vs customer orders:
sql
-- UNION: Stack monthly sales
SELECT 'Jan' month, SUM(amount) FROM JanSales
UNION ALL
SELECT 'Feb' month, SUM(amount) FROM FebSales;
-- JOIN: Customer sales detail
SELECT c.name, SUM(o.amount)
FROM Customers c JOIN Orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;
Common Mistakes
- Using UNION for related table data
- JOIN without relationship (cartesian)
- UNION ALL when deduplication needed
- Performance confusion between methods
- Wrong direction (vertical vs horizontal)
Key Points to Remember
- UNION stacks vertically (same columns)
- JOIN expands horizontally (related tables)
- UNION ALL fastest (no dedupe)
- Index JOIN columns, filter UNION sources
- Different data models, different tools
- Choose based on data relationships