Wyo Data Structures Ch. 1 Notes

Wyo Data Structures - Ch. 1 Notes

Objective #1: Understand and explain the formal definition of big-O notation.

Formal definition of big-O: Suppose there exists some function f(n) for all n > 0 such that the number of operations required by an algorithm for an input of size n is less than or equal to some constant C multiplied by f(n) for all but finitely many n. The algorithm is considered to be an O[f(n)] algorithm relative to the number of operations it requires to execute.
This algorithm is classified as O[f(n)].
There is some point of intersection between the graph of the algorithm and the function f(n) where the function "bounds" the algorithm. That is, the function will lie above the algorithm for any given point n.
If n is really small, it does not really matter what algorithm is used to sort or search since the time difference will be negligible.

Objective #2: Recognize and differentiate the big-O categories

Big-O notation is a method of classifying algorithms with regard to the time and space that they require. The O refers to "on the order of".

Common big-O categories:

O(1)	constant	push & pop stack operations; insert & remove queue operations
O(log n)	logarithmic	binary search; insert & find for binary search tree; insert & remove for a heap
O(n)	linear	traversal of a list or a tree; sequential search; finding max or min element of a list
O(n log n)	"n log n"	quicksort, mergesort
O(n²)	quadratic	selection sort; insertion sort; bubble sort
O(n³)	cubic
O(aⁿ)	exponential	recursive Fibonacci; Towers of Hanoi

Note that linear, quadratic, and cubic algorithms are also called polynomial algorithms.

Since it is true for any a, b > 0 and a, b <> 1:

log₂ n = log₁₀ n / log₁₀ 2

then

log₁₀ n = C log₂ n

where C is a constant. Therefore, O(log₂ n) is really the same as O(log n).
O(1) < O(log n) < O(n) < O(n log n) < O(n²) < O(n³) < O(aⁿ)
Divide and conquer algorithms like the quicksort, mergesort, & the binary search are very efficient since they have logarithmic growth.
If an algorithm is O(n²) then it is also O(n^a) where a > 2 since the na bounds n2.

Objective #3: Analyze the common search and sorting algorithms and determine their big-O categories.

To determine an algorithms big-O category, it is most useful to analyze its loops.
In analyzing the loops of an algorithm, you must come up with some function f(n) where C * f(n) parallels (or "bounds") the actual number of operations within the algorithm as closely as possible. The constant C is known as the constant of proportionality and ultimately makes no difference when comparing two functions with the same order of magnitude. The order of magnitude is the degree of the function (the largest exponent of n) or a logarithmic or exponent term.
In any derived function f(n), one term will be the dominant term. A dominating function then is a function f(n) with the dominant term of the original derived function. For example, if you come up with the function f(n) = n² + n log ₂ n + 5, the dominating function is n2 since n2 is the dominant term in f(n).
Examples: n dominates log n, n log n dominates n, n³ dominates n² and a² dominates n^m
When an outer loop iterates from 1 to n-1 and a nested loop iterates from 0 to i-1 (where i is the loop variable in the outer loop), the inner loop will iterate a total of 1 + 2 + 3 + .... + (n - 1) times. This is equal to n(n - 1)/2 which is the number of terms multiplied by the average of the first and last terms. Since n² ends up being the dominant term in the polynomial, this explains the O(n²) category of several of the algorithms we will be analyzing.

Objective #4: Recognize and understand the common sorting algorithms including the selection sort, insertion sort, bubble sort, & the radix sort.

Bubble Sort characteristics:
- adjacent elements are compared
- many exchanges (i.e. swaps)
- very inefficient when data is really out of order
- Boolean variable allows outer loop to exit early possibly saving time
- the variable k is not used in the inner loop
Selection Sort characteristics
- only 1 exchange (swap) at the end of each pass
- non-adjacent elements are compared
- no Boolean variable is used to allow for an early exit
- the variable minPosition is used as a marker
Insertion Sort characteristics
- Boolean variable allows inner loop to exit early saving a bit of time here and there
- compares non-adjacent elements
- no exchanges (swaps), rather many elements must shift
- a lot of comparisons are made slowing down this algorithm

Objective #5: Recognize and understand the common searching algorithms including the sequential search and the binary search as well as the key-to-address transformation.

Sequential Search
- The sequential search algorithm is relatively easy to code.
- Unfortunately the sequential search algorithm's is O(n) in its worst case since the target key may be found in the "last" position that is probed.
- Notice that the author use's a bool variable to exit from the indeterminant while loop when the target key is found. Also, notice that the author returns a bol value from his sequentialSearch algorithm on p. 58 to indicate whether or not the target key was found or not. The object parameter is passed by reference and assigned the target value if it is found. The author did not choose to return the position within the apvector that the target was found. Also notice that the apvector list was passed by constant reference in order to save memory while preserving the original values of the apvector.
Binary Search
- The binary search algorithm requires the following preconditions:
  - The data must be sorted.
  - The number of elements in the list must be stored in a separate variable.
  - The programmer must be able to randomly access elements in the list by relative position.
- Since the binary search is O(log n) in worst case, sorting the data may be worth the price of above preconditions.
- However, keeping a list in physical sorted order may be costly. If many insertions and deletions need to be made to the list over time and if the elements are large (structs or objects rather than int's or double's) then a binary search may not be worthwhile. If a list of pointers is used to keep the elements in logical (rather than physical) sorted order, extra memory will be required by the additional pointers.
Key-to-Address Transformation
- The key-to-address transformation search technique can only be used in certain situations. However it is O(1) in its worst-case!
- A hash function is used to convert a data element to a position within a one or two-dimensional array.
- Problems occur though if collisions occur wherein two or more elements "hash" to the same position within the array.
- A severe disadvantage to this search technique is that gross amounts of memory may need to be used for the array.