This is a beta course, so its structure, chapters, and examples may continue to change.
External Sorting
Idea
External sorting handles data too large for memory, using external storage (disk). Split the big file into memory-fit chunks, sort each, then multiway-merge.
外部排序演示
块1:
84
块2:
57
块3:
13
块4:
62
当前阶段:分块排序
Common approach
Multiway merge sort
- Split into chunks, load each into memory and sort (e.g., merge sort).
- Merge sorted chunks (“runs”) via multiway merge.
- Use 2-way, 4-way, or multiway merging to reduce passes.
Scenarios
- Large DBs, search logs, big-data analytics
- Any sort where the whole dataset cannot fit in memory
Complexity
- Sorting phase: (per chunk)
- Merge phase: , = merge ways
- Overall:
- Space: dominated by disk I/O and buffer sizes
Exercises
Exercise 1
Describe the basic flow and use cases of external sorting.
参考答案
思路:分块排序 + 多路归并,适合大数据。
步骤:
- 分块排序,生成归并段
- 多路归并成大文件
- 适用于内存放不下的场景
答案:分块+归并,适合大数据排序。
Exercise 2
Common merge ways and their complexity?
参考答案
思路:二路/多路归并与复杂度。
步骤:
- 二路、四路、多路归并
- 复杂度 , 为路数
答案:常用多路归并,复杂度 。