Divide and Conquer - Tue, January 26


Given a list of numbers , output them in increasing order. The idea is to split the list into two halves, recursively sort each half, and merge the two sorted halves. This is known as merge sort.

MergeSort(a_1, ..., a_n):
    if n = 1: return a_1
    S_L := MergeSort(a_1, ..., a_{n/2})
    S_R := MergeSort(a_{n/2 + 1}, ..., a_n)
    S = merge(S_L, S_R)
    return S

Here, merge(S_L, S_R) runs in time .

The running time of the algorithm is , which is a balanced case (work at each level of levels is n^d). Applying the Master Theorem gives us .

All the real work in in merging, as nothing happens until the recursion hits the base case. This naturally leads to an iterative algorithm that maintains a queue on lists:

(the bottom line here says "sorted lists of 4 elements")

Can we do better than merge sort? No and yes:

Theorem: Any comparison sorting algorithm requires comparisons to sort lists of elements.

Proof: Fix an algorithm . Without loss of generality, focus on input lists , where the elements are distinct. The computation of on defines a permutation . Here, every permutation is a possible output.

Let denote the set of possible permutations at a given point in 's computation. Before the algorithm starts, . At each comparison:

Since , we know that or . As in, one of them has to be at least half as large as . So, a comparison divides possible permutations by at most .

Hence, the number of comparisons until is at least [ 1 ]. QED.

This theorem implies that merge sort is optimal among comparison sorts.

Median Finding

Given , output such that half of is smaller and half of is larger than .

We can use divide and conquer to solve a harder, more general problem of selection: for an input set of numbers and index , output the th smallest element in .

Select(S, k):
    pick a \in S and compute S_L, S_a, S_R (this step is O(n))
    if k <= |S_L|: return Select(S_L, k)
    if |S_L| < k <= |S_L| + |S_a|: return a
    if |S_L| + |S_a| < k: return Select(S_R, k)

We go from a list of size to a list of size . But how do we select ?

The problem? Picking a good requires... finding the median, which means we can't do it. Let's pick it at random instead!

We will say that is good if it is between the 25th and 75th percentiles of . When is good, our new set shrinks by a constant factor:

There are many good elements -- in fact, the probability that is good is , so it takes ~2 tries to get a good . Expected running time:

On any input list and integer , Select(S, k) (with chosen randomly) returns the correct answer in an number of steps.

[ 1 ] this is a worst-case lower bound (depth of deepest leaf is ), but this can be improved to an average-case lower bound (average depth of leaf is ).