sort

the bubble sort

def bubble_sort(data):
    n = len(data)
    for idx1 in range(n-1):
        for idx2 in range(idx1+1,n):
            if data[idx1] > data[idx2]:
                data[idx1], data[idx2] = data[idx2], data[idx1]
    return data

data = [2,7,1,6,6,3,0,6,9]
print(bubble_sort(data))

[0, 1, 2, 3, 6, 6, 6, 7, 9]

A bubble sort can be modified to stop early if it finds that the list has become sorted. This means that for lists that require just a few passes, a bubble sort may have an advantage in that it will recognize the sorted list and stop.

def short_bubble(data):
    n = len(data)
    exchange = True
    compare_num = n - 1 # at most n-1 compare times
    while compare_num > 0 and exchange:
        exchange = False
        for idx in range(compare_num):
            if data[idx] > data[idx+1]:
                exchange = True
                data[idx], data[idx+1] = data[idx+1], data[idx]
        compare_num -= 1
    return data

data = [2,7,1,6,6,3,0,6,9]
print(short_bubble(data))

[0, 1, 2, 3, 6, 6, 6, 7, 9]

the selection sort

def select_sort(data):
    n = len(data)
    exchange_times = n-1
    for fill_idx in range(n-1, 0,-1):
        max_idx = 0
        for idx in range(1,fill_idx+1):
            if data[idx] > data[max_idx]:
                max_idx = idx
        data[fill_idx], data[max_idx] = data[max_idx], data[fill_idx]
    return data

data = [2,7,1,6,6,3,0,6,9,-1]
print(select_sort(data))

[-1, 0, 1, 2, 3, 6, 6, 6, 7, 9]

the insertion sort

It always maintains a sorted sublist in the lower positions of the list. Each new item is then “inserted” back into the previous sublist such that the sorted sublist is one item larger.

def insert_sort(data):
    n = len(data)
    for idx1 in range(1,n):
        for idx2 in range(idx1):
            if data[idx1] < data[idx2]:
                data[idx1],data[idx2] = data[idx2],data[idx1]
    return data

def insertionSort(alist):
   for index in range(1,len(alist)):

     currentvalue = alist[index]
     position = index

     while position>0 and alist[position-1]>currentvalue:
         alist[position]=alist[position-1]
         position = position-1

     alist[position]=currentvalue


data = [2,7,1,6,6,3,0,6,9,-1]
alist = [54,26,93,17,77,31,44,55,20]
print(insert_sort(data))  
print(insert_sort(alist))

[-1, 0, 1, 2, 3, 6, 6, 6, 7, 9]
[17, 20, 26, 31, 44, 54, 55, 77, 93]

the shell sort

The shell sort, sometimes called the “diminishing increment sort,” improves on the insertion sort by breaking the original list into a number of smaller sublists, each of which is sorted using an insertion sort. The unique way that these sublists are chosen is the key to the shell sort. Instead of breaking the list into sublists of contiguous items, the shell sort uses an increment i, sometimes called the gap, to create a sublist by choosing all items that are i items apart.

the complexity of shell sort tends to fall somewhere between O(n) and O(n2) , based on the behavior of the element distribution.

def gap_shell_sort(data, start_pos, gap_len):
    n = len(data)
    for idx1 in range(start_pos+gap_len, n, gap_len):
        for idx2 in range(0, idx1+1, gap_len):
            if data[idx2] > data[idx1]:
                data[idx2], data[idx1] = data[idx1], data[idx2]    

def shell_sort(data):
    sublist_len = len(data) // 2
    while sublist_len > 0:
        for idx in range(sublist_len):
            gap_shell_sort(data, idx, sublist_len)
        sublist_len //= 2


alist = [54,26,93,17,77,31,44,55,20]
shell_sort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 93]

the merge sort

the merge sort achives a better time complixity O(nlog(n)) .
however, the space complixity is O(nlog(n)) .
the core of merge sort algorithm is inpired by divid and conqure algorithm.

def merge_sort(data):
    n = len(data)
    if n<2:
        return data
    left = merge_sort(data[:n//2])
    right = merge_sort(data[n//2:])

    idx_left = idx_right = 0
    idx = 0
    while idx_left < len(left) and idx_right < len(right):
        if left[idx_left] < right[idx_right]:
            data[idx] = left[idx_left]
            idx_left += 1
            idx += 1
        else:
            data[idx] = right[idx_right]
            idx_right += 1
            idx += 1

    while idx_left < len(left):
        data[idx] = left[idx_left]
        idx_left += 1
        idx += 1

    while idx_right < len(right):
        data[idx] = right[idx_right]
        idx_right += 1
        idx += 1

    return data

alist = [54,26,93,17,77,31,44,55,20]
merge_sort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 93]

the quick sort

The quick sort uses divide and conquer to gain the same advantages as the merge sort, while not using additional storage. As a trade-off, however, it is possible that the list may not be divided in half. When this happens, we will see that performance is diminished.

A quick sort first selects a value, which is called the pivot value. Although there are many different ways to choose the pivot value, we will simply use the first item in the list. The role of the pivot value is to assist with splitting the list. The actual position where the pivot value belongs in the final sorted list, commonly called the split point, will be used to divide the list for subsequent calls to the quick sort.

To analyze the quickSort function, note that for a list of length n, if the partition always occurs in the middle of the list, there will again be logn divisions. In order to find the split point, each of the n items needs to be checked against the pivot value. The result is O(nlogn) . In addition, there is no need for additional memory as in the merge sort process.

Unfortunately, in the worst case, the split points may not be in the middle and can be very skewed to the left or the right, leaving a very uneven division. In this case, sorting a list of n items divides into sorting a list of 0 items and a list of n−1 items. Then sorting a list of n−1 divides into a list of size 0 and a list of size n−2, and so on. The result is an O(n2) sort with all of the overhead that recursion requires.

We mentioned earlier that there are different ways to choose the pivot value. In particular, we can attempt to alleviate some of the potential for an uneven division by using a technique called median of three. To choose the pivot value, we will consider the first, the middle, and the last element in the list. In our example, those are 54, 77, and 20. Now pick the median value, in our case 54, and use it for the pivot value (of course, that was the pivot value we used originally).

Ealier version of python used quick sort to implement the sort() method of the list structure. In python 3.2, a hybrid algorithm that combines the insertion and merge sort algorithms is used instead.

def quick_sort(data):
    """
    using data[0] as the 
    """
    def qsort(data, start, end):
        if end - start < 1:
            return 
        pivot = data[start]
        left_idx = start 
        right_idx = end
        # partition
        while left_idx < right_idx:
            while data[right_idx] >= pivot and left_idx < right_idx:
                right_idx -= 1
            data[left_idx] = data[right_idx]
            while data[left_idx] <= pivot and left_idx < right_idx:
                left_idx += 1
            data[right_idx] = data[left_idx]
        data[left_idx] = pivot 
        qsort(data, 0, left_idx-1)
        qsort(data, left_idx+1, end)


    qsort(data, 0, len(data)-1)


alist = [54,26,93,17,77,31,44,55,20]
quick_sort(alist)
print(alist)              

[17, 20, 26, 31, 44, 54, 55, 77, 93]



def quickSort(alist):
   quickSortHelper(alist,0,len(alist)-1)

def quickSortHelper(alist,first,last):
   if first<last:

       splitpoint = partition(alist,first,last)

       quickSortHelper(alist,first,splitpoint-1)
       quickSortHelper(alist,splitpoint+1,last)


def partition(alist,first,last):
   pivotvalue = alist[first]

   leftmark = first+1
   rightmark = last

   done = False
   while not done:

       while leftmark <= rightmark and \
               alist[leftmark] <= pivotvalue:
           leftmark = leftmark + 1

       while alist[rightmark] >= pivotvalue and \
               rightmark >= leftmark:
           rightmark = rightmark -1

       if rightmark < leftmark:
           done = True
       else:
           temp = alist[leftmark]
           alist[leftmark] = alist[rightmark]
           alist[rightmark] = temp

   temp = alist[first]
   alist[first] = alist[rightmark]
   alist[rightmark] = temp


   return rightmark

alist = [54,26,93,17,77,31,44,55,20]
quickSort(alist)
print(alist)  

[17, 20, 26, 31, 44, 54, 55, 77, 93]



def qsort(data):
    n = len(data)

    def mid_idx(data, start, end):
        mid = (start + end)//2
        if data[mid]<= max(data[start], data[end]) and data[mid]>= min(data[start], data[end]):
            return mid
        if data[start]<= max(data[mid], data[end]) and data[start]>= min(data[mid], data[end]):
            return start
        if data[end]<= max(data[start], data[mid]) and data[end]>= min(data[start], data[mid]):
            return end

    def qsort_helper(data, start, end):
        if end - start < 1:
            return 
        pivot_idx = mid_idx(data, start, end)
        data[start], data[pivot_idx] = data[pivot_idx], data[start]
        pivot = data[start]
        left = start
        right = end
        # partition
        while left < right:
            while data[right] >= pivot and left < right:
                right -= 1
            data[left] = data[right]
            while data[left] <= pivot and left < right:
                left += 1
            data[right] = data[left]

        data[left] = pivot
        qsort_helper(data, start, left-1)
        qsort_helper(data, left+1, end)

    qsort_helper(data, 0, n-1)

alist = [54,26,93,17,77,31,44,55,20]
qsort(alist)
print(alist)                 

[17, 20, 26, 31, 44, 54, 55, 77, 93]

the Radix sort

the comparison sort algorithms achieve their goal by comparing the individual sort keys to other keys in the list. We have reviewed five sorting algorithms above. The first three ——- bubble, selection, and insertion ——have a worst case time of O(n2) while the merge sort has a worst time of O(nlogn) . The quick sort, the more commonly used algorithm in language libraries, is O(n2)) in the worst case but it has an expected or average time of O(nlogn) .

the natural question is can we do better than O(nlogn) ? For a comparison sort, the answer is no. It can be shown, with the use of a decision tree and examining the permutations of all possible comparisons among the sort keys, that the worst case time for a comparison sort can be no better than O(nlogn) .

this dose not mean, however, that the sorting operation cannot be done faster than O(nlogn) . It simply means that we cannot achieve this with a comparison sort. Actually, there is a distribution sort algorithms that works in linear time. Distribution sort algorithms use techniques other than comparisons among the keys themselves to sort the sequence of keys. while these distribution algorithms are fast, they are not general purpose sorting algorithms. In other word, they cannot be applied to just any sequence of keys. Typically, these algorithms are used when the keys have certain characteristics and for specific types of applications.

radix sort is a fast distribution sorting algorithm that orders keys by examining the individual components of the keys instead of comparing the keys themselves. For example, when sorting integer keys, the individual digits of the keys are compared from least significant to most significant. This is a special purpose sorting algorithm but can be used to sort many types of keys, including positive integers, strings, and floating-point values.

from collections import deque

def radix_sort(data, digit_num):
    slots = [deque() for i in range(10)]
    col_num = 1
    for _ in range(digit_num):
        for ele in data:
            key = (ele // col_num) % 10
            slots[key].append(ele)

        idx = 0
        for que in slots:
            while que:
                data[idx] = que.popleft()
                idx += 1
        col_num *= 10
    return data

if __name__ == "__main__":
    data = [23,4,67,1,4,9,49,21,18]
    print(radix_sort(data, 2))

[笔记]sort