External Sort/Merge Algorithms
This chapter discusses external sort/merge algorithms used for sorting large data files that could not fit into the main memory of the computer. The algorithms presented are: two-way sort/merge, the balanced two-way sort/merge, the balanced k-way sort/merge, and the polyphase sort/merge. The advantages of sorting records include the following: sorting transactions to match the key order of the master file can reduce the time required to locate a matching master record; sorted transactions can improve the maintenance of master files with other types of file organization; and many reports generated from files require that the file be sorted, as it is more presentable and readable when sorted.
A sort/merge algorithm involves two steps:
1. The records in the file to be sorted are divided into several groups, called run, and each run fits into main memory. An internal sort is applied to each run, and the resulting sorted runs are distributed to two external files.
2. One run at a time from each of the external files created in step 1 merge into larger runs of sorted records. The result is stored in a third external file. The data are distributed from the third file back into the first two files, and the merge continues until all records are in one large run.
Suppose we have a file of records with the following keys to be sorted in ascending order:
50 110 95 10 100 36 153 40 120 60 70 130 22 140 80
Let us assume that the size of the run is 3 records/run. By using step 1, we can divide the keys into groups of 3 as follows: (50,110,95), (10,100,36), (153,40,120), (60,70,130), (22,140,80). The number of groups indicates that we have 5 runs. These 5 runs are then loaded into 2 files: File 1 contains run 1(50,110,95), run 3(153,40,120), and run 5(22,140,80); File 2 contains run 2(10,100,36) and run 4(60,70,130). All in sorted order. The distribution of runs in each file is alternated in order to have the same...