Wyo Data Structures - Ch. 1 Notes
Objective #1: Understand and explain the formal definition of big-O
notation.
- Formal definition of big-O: Suppose there exists some function f(n) for
all n > 0 such that the number of operations required by an algorithm for
an input of size n is less than or equal to some constant C multiplied by
f(n) for all but finitely many n. The algorithm is considered to be an O[f(n)]
algorithm relative to the number of operations it requires to execute.
- This algorithm is classified as O[f(n)].
- There is some point of intersection between the graph of the algorithm and
the function f(n) where the function "bounds" the algorithm. That
is, the function will lie above the algorithm for any given point n.
- If n is really small, it does not really matter what algorithm is used to
sort or search since the time difference will be negligible.
Objective #2: Recognize and differentiate the big-O categories
- Big-O notation is a method of classifying algorithms with regard to the
time and space that they require. The O refers to "on the order of".
- Common big-O categories:
O(1) |
constant |
push & pop stack operations; insert & remove queue
operations |
O(log n) |
logarithmic |
binary search; insert & find for binary search tree;
insert & remove for a heap |
O(n) |
linear |
traversal of a list or a tree; sequential search; finding
max or min element of a list |
O(n log n) |
"n log n" |
quicksort, mergesort |
O(n2) |
quadratic |
selection sort; insertion sort; bubble sort |
O(n3) |
cubic |
|
O(an) |
exponential |
recursive Fibonacci; Towers of Hanoi |
Note that linear, quadratic, and cubic algorithms are also called polynomial
algorithms.
- Since it is true for any a, b > 0 and a, b <> 1:
log2 n = log10 n / log10 2
then
log10 n = C log2 n
where C is a constant. Therefore, O(log2 n) is really the same
as O(log n).
- O(1) < O(log n) < O(n) < O(n log n) < O(n2) <
O(n3) < O(an)
- Divide and conquer algorithms like the quicksort, mergesort, & the binary
search are very efficient since they have logarithmic growth.
- If an algorithm is O(n2) then it is also O(na) where
a > 2 since the na bounds n2.
Objective #3: Analyze the common search and sorting algorithms and
determine their big-O categories.
- To determine an algorithms big-O category, it is most useful to analyze
its loops.
- In analyzing the loops of an algorithm, you must come up with some function
f(n) where C * f(n) parallels (or "bounds") the actual number of
operations within the algorithm as closely as possible. The constant C is
known as the constant of proportionality and ultimately makes no difference
when comparing two functions with the same order of magnitude. The order of
magnitude is the degree of the function (the largest exponent of n) or a logarithmic
or exponent term.
- In any derived function f(n), one term will be the dominant term. A dominating
function then is a function f(n) with the dominant term of the original derived
function. For example, if you come up with the function f(n) = n2
+ n log 2 n + 5, the dominating function is n2 since n2 is the
dominant term in f(n).
Examples: n dominates log n, n log n dominates n, n3 dominates n2 and a2 dominates
nm
- When an outer loop iterates from 1 to n-1 and a nested loop iterates from
0 to i-1 (where i is the loop variable in the outer loop), the inner loop
will iterate a total of 1 + 2 + 3 + .... + (n - 1) times. This is equal to
n(n - 1)/2 which is the number of terms multiplied by the average of the first
and last terms. Since n2 ends up being the dominant term in the
polynomial, this explains the O(n2) category of several of the
algorithms we will be analyzing.
Objective #4: Recognize and understand the common sorting algorithms including
the selection sort, insertion sort, bubble sort, & the radix sort.
- Bubble Sort characteristics:
- adjacent elements are compared
- many exchanges (i.e. swaps)
- very inefficient when data is really out of order
- Boolean variable allows outer loop to exit early possibly saving time
- the variable k is not used in the inner loop
- Selection Sort characteristics
- only 1 exchange (swap) at the end of each pass
- non-adjacent elements are compared
- no Boolean variable is used to allow for an early exit
- the variable minPosition is used as a marker
- Insertion Sort characteristics
- Boolean variable allows inner loop to exit early saving a bit of time
here and there
- compares non-adjacent elements
- no exchanges (swaps), rather many elements must shift
- a lot of comparisons are made slowing down this algorithm
Objective #5: Recognize and understand the common searching algorithms including
the sequential search and the binary search as well as the key-to-address transformation.
- Sequential Search
- The sequential search algorithm is relatively easy to code.
- Unfortunately the sequential search algorithm's is O(n) in its worst
case since the target key may be found in the "last" position
that is probed.
- Notice that the author use's a bool variable to exit from the indeterminant
while loop when the target key is found. Also, notice that the author
returns a bol value from his sequentialSearch algorithm on p. 58 to indicate
whether or not the target key was found or not. The object parameter is
passed by reference and assigned the target value if it is found. The
author did not choose to return the position within the apvector that
the target was found. Also notice that the apvector list was passed by
constant reference in order to save memory while preserving the original
values of the apvector.
- Binary Search
- The binary search algorithm requires the following preconditions:
- The data must be sorted.
- The number of elements in the list must be stored in a separate
variable.
- The programmer must be able to randomly access elements in the list
by relative position.
- Since the binary search is O(log n) in worst case, sorting the data
may be worth the price of above preconditions.
- However, keeping a list in physical sorted order may be costly. If many
insertions and deletions need to be made to the list over time and if
the elements are large (structs or objects rather than int's or double's)
then a binary search may not be worthwhile. If a list of pointers is used
to keep the elements in logical (rather than physical) sorted order, extra
memory will be required by the additional pointers.
- Key-to-Address Transformation
- The key-to-address transformation search technique can only be used
in certain situations. However it is O(1) in its worst-case!
- A hash function is used to convert a data element to a position within
a one or two-dimensional array.
- Problems occur though if collisions occur wherein two or more elements
"hash" to the same position within the array.
- A severe disadvantage to this search technique is that gross amounts
of memory may need to be used for the array.