# Divide and Conquer - Tue, January 26

## Sorting

Given a list of numbers , output them in increasing order. The idea is to split the list into two halves, recursively sort each half, and merge the two sorted halves. This is known as merge sort.

MergeSort(a_1, ..., a_n):
if n = 1: return a_1
S_L := MergeSort(a_1, ..., a_{n/2})
S_R := MergeSort(a_{n/2 + 1}, ..., a_n)
S = merge(S_L, S_R)
return S

Here, merge(S_L, S_R) runs in time .

The running time of the algorithm is , which is a balanced case (work at each level of levels is n^d). Applying the Master Theorem gives us .

All the real work in in merging, as nothing happens until the recursion hits the base case. This naturally leads to an iterative algorithm that maintains a queue on lists: (the bottom line here says "sorted lists of 4 elements")

Can we do better than merge sort? No and yes:

• No: merge sort is a comparison sort, as in an algorithm in which the only operation performed on the inputs are comparisons
• Yes: there are sorting algorithms that are not solely based on comparisons
• For example, if the elements are bits long, then:
• radix sort uses bit operations
• merge sort uses bit operations, as a comparison costs bit operations

Theorem: Any comparison sorting algorithm requires comparisons to sort lists of elements.

Proof: Fix an algorithm . Without loss of generality, focus on input lists , where the elements are distinct. The computation of on defines a permutation . Here, every permutation is a possible output.

Let denote the set of possible permutations at a given point in 's computation. Before the algorithm starts, . At each comparison:

• if ,
• if ,

Since , we know that or . As in, one of them has to be at least half as large as . So, a comparison divides possible permutations by at most .

Hence, the number of comparisons until is at least [ 1 ]. QED.

This theorem implies that merge sort is optimal among comparison sorts.

## Median Finding

Given , output such that half of is smaller and half of is larger than .

• We can sort the list and take the middle element, which is
• However, we don't care about the order of elements above and below the median

We can use divide and conquer to solve a harder, more general problem of selection: for an input set of numbers and index , output the th smallest element in .

• Pick and split into:
• : the elements in smaller than
• : the elements in equal to
• : the elements in larger than
• Then reverse in a straightforward way
Select(S, k):
pick a \in S and compute S_L, S_a, S_R (this step is O(n))
if k <= |S_L|: return Select(S_L, k)
if |S_L| < k <= |S_L| + |S_a|: return a
if |S_L| + |S_a| < k: return Select(S_R, k)

We go from a list of size to a list of size . But how do we select ?

• The worst case is if is always the largest or smallest element of the current set:
• The best case is if always splits roughtly in half () :

The problem? Picking a good requires... finding the median, which means we can't do it. Let's pick it at random instead!

We will say that is good if it is between the 25th and 75th percentiles of . When is good, our new set shrinks by a constant factor:

There are many good elements -- in fact, the probability that is good is , so it takes ~2 tries to get a good . Expected running time:

On any input list and integer , Select(S, k) (with chosen randomly) returns the correct answer in an number of steps.

[ 1 ] this is a worst-case lower bound (depth of deepest leaf is ), but this can be improved to an average-case lower bound (average depth of leaf is ).