Introduction

These notes are on the COP4533 Algorithm Abstraction & Design with Professor Ungor at the University of Florida. What I have learned from this course has been compounded into these notes.

Stable Matching Problem

Given a set of preferences among hospitals and med-school students, we have to design a self-reinforcing admissions process.
A matching $M$ is a set of ordered pairs, $h-s$, where $h \in H$ and $s \in S$, such that:
- Each hospital $h \in H$ appears in at most one pair of $M$.
- Each student $s \in S$ appears in at most one pair of $M$.
For a given matching, $M$, it is perfect if $|M| = |H| = |S| = n$
Unstable pair: Given a hospital $h$ and a student $s$, they form an unstable pair if both:
- $h$ prefers $s$ to one of its admitted students.
- $s$ prefers $h$ to their assigned hospital.
Stable assignment: A group of pairs, an assignment, in which there is no unstable pairs.
Stable matching: A perfect matching with no unstable pairs.
Tl;dr given the a list of preferences with $n$ hospitals and $n$ students, return a stable matching.

Input

The input to the stable matching problem is a set of $n$ hospitals, $H$, and a set of $n$ students $S$.
- Each hospital $h \in H$ ranks students.
- Each student $s \in S$ ranks hospitals.

An example:

A set of hospitals, $H$ with $n = 2$ students:
- Ranking is in order of preference, i.e. Chicago prefers Coleton over Daniel, but Tampa prefers Daniel over Coleton

Hospitals	1st	2nd
Chicago	Coleton	Daniel
Tampa	Daniel	Coleton

A set of students, $S$ with $n = 2$ hospitals:
- Ranking, just like before, is in order of preference.

Students	1st	2nd
Coleton	Tampa	Chicago
Daniel	Chicago	Tampa

Perfect Matching

What is a perfect matching for the above table example? It could be:

$M = \{ \text{Chicago} - \text{Coleton},~ \text{Tampa} - \text{Daniel}\}$

The following is also a perfect matching, but an inverse example:

$M = \{ \text{Tampa} - \text{Coleton},~ \text{Chicago} - \text{Daniel}\}$

Unstable Pair

In the above example, both possible matchings are perfect and lack unstable pairs (i.e. both are stable matchings.)
- In the first example, although Coleton preferred Tampa and Daniel preferred Chicago, it is still stable as each $h \in H$ recieved a preferred student.
- In the second example, although Tampa preferred Daniel and Chicago preferred Coleton, it is still stable as each $s \in S$ recieved a preferred hospital.

How about an example where this is not the case?

Hospitals	1st	2nd	3rd
Chicago	Ella	Daniel	Coleton
Tampa	Daniel	Coleton	Ella
Columbus	Daniel	Ella	Coleton

Students	1st	2nd	3rd
Coleton	Tampa	Chicago	Columbus
Daniel	Chicago	Tampa	Columbus
Ella	Chicago	Columbus	Tampa

A possible perfect matching for this would be:

$M = \{ \text{Columbus} - \text{Ella},~\text{Tampa} - \text{Daniel},~\text{Chicago} - \text{Coleton} \}$

A possible unstable pair for $M$ would be $\text{Chicago} - \text{Ella}$, in this case a hospital $h$ (Chicago) prefers matched student $s$ (Ella), and a student $s$ (Ella) prefers matched hospital $h$ (Chicago). This means although the matching is, indeed, perfect, it is not a stable matching.

The key point: An unstable pair $h - s$ could each improve by a joint action, which we want to avoid.

Stable Roomate Problem

Given a different set of problem, do stable matchings always exist? This question is, even with the prior example of stable matching problem, not exactly obvious.

This brings us to the stable roommate problem:

$2n$ people; each person ransk others from $1$ to $2n - 1$.
The goal: assign the roommate pairs such that there are no unstable pairs.

Let’s take a look at an example where a stable matching does not exist:

People	1st	2nd	3rd
Anthony	Bailey	Carl	Darryl
Bailey	Carl	Anthony	Darryl
Carl	Anthony	Bailey	Darryl
Darryl	Anthony	Bailey	Carl

$\text{Anthony}-\text{Bailey},~\text{Carl}-\text{Darryl} \rightarrow \text{Bailey}-\text{Carl}~\text{unstable}$
$\text{Anthony}-\text{Carl},~\text{Bailey}-\text{Darryl} \rightarrow \text{Anthony}-\text{Bailey}~\text{unstable}$
$\text{Anthony}-\text{Darryl},~\text{Bailey}-\text{Carl} \rightarrow \text{Anthony}-\text{Carl}~\text{unstable}$
$\Rightarrow$ No perfect matching is stable.
$\Rightarrow$ Stable matchings need not exist.
Key point: even changing a little about a problem can cause the conditions to make it impossible to guarantee a criteria.

Gale-Shapley deferred acceptance algorithm

This is a method that guarantees to find a stable matching:

initialize $M$ as empty matching;
while some hospital $h$ is unmatched and hasn’t proposed to every student do
$\quad S \leftarrow$ first student on $h$’s list to whom $h$ has not yet proposed;
$\quad$if $s$ is unmatched then
$\qquad$Add $h-s$ to matching $M$;
$\quad$else if $s$ prefers $h$ to current partner $h'$ then
$\qquad$Replace $h'-s$ with $h-s$ in matching $M$;
$\quad$else
$\qquad s$ rejects $h$;
$\quad$end
end
return stable matching $M$;

Effectively, what this does is iterate as long as a hospital $h$ exists such that it is unmatched and hasn’t ran out of students to propose to. It will grab the first student $s$ that hasn’t been proposed to yet by $h$. This means that as this algorithm continues, it will iterate over 1st, 2nd, etc. for each $h \in H$. Next, a few conditions are checked. If $s$ is unmatched, add the match to our matching $M$. If it is matched but $s$ prefers $h$ over its current partner (i.e. the hospital $h'$ that matched with it previously) replace this old pair $h'-s$ with $h-s$ in $M$. Finally, if $s$ doesn’t prefer $h$ over $h'$ (i.e. both $s$ and $h$ prefer each other) then reject and continue. Finally, return the stable matching.

Proof of Correctness: Termination

Some observations, for one the hospitals always propose in decreasing order of preference (i.e. start with most preferred, then go down). Second, once a student is matched, the student will never become unmatched. At worst, (or best I guess), they will just trade up to a more preferred hospital.

Claim: Algorithm terminates after at most $n^2$ iterations of while loop.
Proof:
- There are $n$ students and $n$ hospitals.
- Each time through the while loop, a hospital $h$ will propose to a new student $s$.
- Thus, there are at most n^2 possible proposals, and the loop will terminate. $\blacksquare$

Proof of Correctness: Perfect Matching

Claim: Gale-Shapley outputs a matching $M$.
Proof:
- A hospital will propose only if it is unmatched $\Rightarrow$ matched to $\le 1$ student
- A student keeps only the best hospital $\Rightarrow$ matched to $\le 1$ hospital
Claim: With Gale-Shapley, all hospitals will be matched.
Proof: [by contradiction]
- There are $n$ students and $n$ hospitals.
- Suppose that some hospital $h \in H$ is unmatched once Gale-Shapley terminates.
- Then, subsequently, because $|H| = |S| = n$, some student $s \in S$ is also unmatched.
- This means that $s$ was never proposed to.
- But, $h$ proposes to every student, so $h$ is unmatched. $\Rightarrow\!\Leftarrow$
Claim: With Gale-Shapley, all students will be matched.
Proof: [by counting]
- There are $n$ students and $n$ hospitals.
- By previous claim, all $n$ hospitals will be matched.
- $|H| = |S| = n$
- Thus, all $n$ students get matched as well. $\blacksquare$

Proof of Correctness: Stability

Claim: With Gale-Shapley returned matching $M$, there are no unstable pairs.
Proof: Consider any pair $h-s$ that is not in $M$. Remember, an unstable pair has to be not in $M$ to prove that it is not a stable matching.
- Case 1: $h$ never proposed to $s$:
  - Due to the fact that hospitals propose in a decreasing order of preference, $h$ must prefer its Gale-Shapley partner, $s'$, over $s$.
  - $\Rightarrow$ $h-s$ is not unstable.
- Case 2: $h$ proposed to $s$:
  - $s$ must have rejected $h$, either right away or at a subsequent iteration.
  - Due to the fact students only trade up, $s$ must prefer its Gale-Shapley partner, $h'$, over $h$.
  - $\Rightarrow$ $h-s$ is not unstable.
- With either case, any pair $h-s \notin M$ is not unstable. $\blacksquare$

Hospital Optimality

For a given problem, there may be several stable matchings:

People	1st	2nd	3rd
Anthony	Atlanta	Boston	Chicago
Bailey	Boston	Atlanta	Chicago
Carl	Atlanta	Boston	Chicago

Hospitals	1st	2nd	3rd
Atlanta	Baily	Anthony	Carl
Boston	Anthony	Baily	Carl
Chicago	Anthony	Baily	Carl

Two stable matchings:
- First possible stable matching: $\{ \text{Anthony}-\text{Atlanta},~\text{Bailey}-\text{Boston},~\text{Carl}-\text{Chicago} \}$.
- Second possible stable matching: $\{ \text{Anthony}-\text{Boston},~\text{Bailey}-\text{Atlanta},~\text{Carl}-\text{Chicago} \}$.
Student $s$ is a valid partner for hospital $h$ if there exists any stable matching in which $h$ and $s$ are matched.
- Atlanta and Boston are valid partners for Anthony.
- Atlanta and Boston are valid partners for Bailey.
- Chicago is the only valid partner for Carl.
The best valid partner is the most preferred valid partner, i.e. the best valid partner for Anthony is Atlanta.
Hospital-optimal assignment: Each hospital receives the best valid partner (i.e. best valid student!)
- Some interesting questions: is such an assignment perfect, and is it even stable?
This leads to the claim: All executions of Gale-Shapley yield hospital-optimal assignment.
- Subsequent corollary: Hospital-optimal assignment is a stable matching :)
- Proof: [by contradiction]
  - Suppose a hospital is matched with a student other than its’ best valid partner.
  - Hospitals propose in a decreasing order of preference.
    - $\Rightarrow$ so, some hospital is rejected by a valid partner during the runtime of Gale-Shapley
  - Let $h$ be first such hospital, and let $s$ be the first valid partner that rejects $h$.
  - Let $M$ be a stable matching where $h$ and $s$ are matched.
  - When $s$ rejects $h$ in Gale-Shapley, $s$ forms (or re-affirms) commitment to a hospital, say $h'$.
    - $\Rightarrow$ $s$ prefers $h'$ to $h$. Remember, students trade up.
  - Let $s'$ be partner of $h'$ in $M$.
  - $h'$ had not been rejected by any valid partner (including $s'$) at the point when $h$ is rejected by $s$. This is because this is the first rejection by a valid partner.
  - Thus, $h'$ had not yet proposed to $s'$ when $h'$ proposed to $s$.
    - $\Rightarrow$ $h'$ prefers $s$ to $s'$. This is because hospitals propose, again, in a decreasing order of preference.
  - Thus, $h'-s$ is unstable in $M$, which is a contradiction. $\blacksquare$
- In a bit more understandable phrasing, if a stable matching $M = \{h-s, h'-s'\}$, then for it to be stable, $h'$ and $s$ must not prefer each other. Though, $s$ is the first valid partner that rejects a hospital, $h$, in our example. Since this is the first rejection by a valid partner, $h'$ had not been rejected by any valid partner (including $s'$) whenever $h'$ proposed to $s$. This must mean that $h'$ prefers $s$ to $s'$ and as a result, $h'-s$ form an unstable pair in $M$ and contradicts the fact it a stable matching.

Student Pessimality

Does hospital-optimality come at the expense of the students? Yes!
Student-pessimal assignment: The inverse of Hospital-optimal, this means each student receives its worst valid partner.
Claim: Gale–Shapley finds student-pessimal stable matching $M^*$.
- Proof: [by contradiction]
  - Suppose $h-s$ is matched in $M^*$ but $h$ is not the worst valid partner for $s$.
  - There exists stable matching $M$ in which $s$ is paired with a hospital, say $h'$, whom $s$ prefers less than $h$.
    - $\Rightarrow~s$ prefers $h$ to $h'$.
  - Let $s'$ be the partner of $h$ in $M$.
  - By hospital-optimality, $s$ is the best valid partner for $h$.
    - $\Rightarrow~h$ prefers $s$ to $s'$.
  - Thus, $h-s$ is an unstable pair in $M$, a contradiction. $\blacksquare$
- In a more understandable phrasing, let’s just assume $h-s$ is matched in $M^*$ and that $h$ is not the worst valid partner for $s$. Notice, this does not mean $h$ is the best, just not $s$’ worst. Let’s next assume that there exists a stable matching $M$ such that $M = \{ h'-s,~h-s' \}$ and $s$ prefers $h'$ less than $h$. In this case $s$ must prefer $h$ to $h'$. Note, this means that for it to be unstable, $h$ must also prefer $s$ over $s'$. By previous hospital-optimal, $s$ is the best valid partner of $h$, and that means that does mean that $h$ prefers $s$ to $s'$ and that means $h-s$ is an unstable pair in $M$.

Deceit

Fact 1: No hospital can improve by lying about its preference list.
Fact 2: A devious student can possibly improve by lying.
Fact 3: There isn’t a stable-matching algorithm that exists such that truth-telling works for all agents.

Algorithm Analysis

Computational Tractability

Models of computation

Turing machines

Deterministic Turing machine, it is a simple and idealistic model.
Running time: The number of steps.
Memory: The number of tape cells utilized.
Caveat: There is no random access of memory.

Word RAM.

RAM = Random access memory.
Each memory location and input/output cell stores a $w$-bit integer (we assume $w \ge \text{log}_2~n$.)
Primitive operations: arithmetic/logic operations, read/write memory, array indexing, following a pointer, conditional, branch… constant time C-style operations ($w = 64$).
Running time: The number of primitive operations.
Memory: Number of cells utilized.
Caveat: At times, need more refined model (e.g. multiplying $n$-bit integers)
- We just assume they are all the same in this model, a primitive.
- In Gale-Shapley, we assumed proposals to be a primitive operation.
- Primitive means zooming out, going deeper is zooming in. There are many things happening under the hood, we have to draw the line at some point regarding what is constant and what isn’t constant. See coastline paradox.
- A loop is most definitely not a primitive, for instance.

Brute Force

Brute force: For many nontrivial problems, there is a brute-force search algorithm that solves every possible input via checking every possible solution.
- Typically takes $2^n$ steps (or worse) for inputs of size $n$.
- This is unacceptable in practice.
Example: In the stable matching problem, test all $n!$ perfect matching. This is bad!

Polynomial running time

Desirable scaling property: When the input size doubles, the algorithm should slow down by at most by some multiplicative constant factor C.
Def: An algorithm is poly-time if the following scaling property holds.
- There exists constants $a > 0$ and $b > 0$ such that, for every input of size $n$, the algorithm performs $\le a~n^b$ primitive computational steps. Corresponds to $C = 2^b$ .
  - I.e. for an algorithm with a complexity of $\Theta(n^b)$, $C = 2^b$.
  - For example, if given an algorithm with a time complexity of $\Theta(n^4)$, $C = 2^4 = 16$.
This level of slow down is a price we are willing to pay, what we are not willing to pay is factorial complexity (brute force for instance).
We say that an algorithm is ``efficient’’ if it has a polynomial running time.
Theory: Definition is a relatively insensitive to model of computation.
Practice: It works!
- The poly-time algorithms that people develop have both small constants and small exponents.
- Breaking through the exponential barrier of brute force into polynomial typically exposes crucial structure for the problem.
Expectations: Some poly-time algorithms in the wild have galactic constants and/or huge exponents.
Q. Which would we prefer: $f(n) = 20n^{120}$ or $g(n) = n^{1 + 0.02 \text{ ln } n}$
- We’d prefer the latter as in practice it is faster.
- $f(n) \approx 2.66 \times 10^{37}$.
- $g(n) \approx 3.02$.
- Key point: just because an algorithm is polynomial doesn’t mean it is fast, have to look at it on a case-to-case basis.

Worst-case analysis

Worst case: The running time guarantee for any input size of size $n$.
- Generally captures efficiency in practice.
Exceptions: Some exponential-time algorithms are used widely in practice because the worst-case instances don’t arise often.
- Linear programming: Optimize a function with respect to a bunch of constraints. A lot of problems can be expressed in linear programming.
  - See Cohen, Lee, and Song for what worst case time complexity can look for this.
- Simplex method: The worst case is exponential, but it is so rare it is still used to great effect.

Other types of analyses

Probabilistic: Expected running time of a randomized, non-deterministic algorithm.
- Example: The expected number of compares to quicksort $n$ elements is $2n~\text{ln}~n$.
Amortized: Worst-case running time for any sequence of $n$ operations.
- Example: Starting from an empty stack, any sequence of $n$ push and pop operations takes $O(n)$ primitive computational steps using a resizing array.
Average-case analysis, smoothed analysis, competitive analysis (for online algorithms), etc.

asymptotic order of growth

Big O Notation

Upper bounds: $f(n)$ is $O(g(n))$ if there exist constants $c \gt 0$ and $n_0 \ge 0$ such that $0 \le f(n) \le c \cdot g(n)$ for all $n \ge n_0$.
Ex: $f(n) = 29n^2 + 11n + 5$
- $f(n)$ is $O(n^2)$
- $f(n)$ is neither $O(n)$ nor $O(n~\text{log}~n)$.
Typical usage: Insertion sort makes $O(n^2)$ compares to sort $n$ elements.


Sourced from Programiz

Notice how $f(n)$ is both greater than and lesser than $g(n)$ before $n_0$, but then after $n_0$ i.e. for all $n \ge n_0$, $f(n) \le g(n)$.

Big O Notational Abuses

One-way “equality”: $O(g(n))$ is a set of functions, but we computer scientists often have the nasty habit of writing $f(n) = O(g(n))$ instead of $f(n) \in O(g(n))$
- Ex. Consider $g_1(n) = 7n^3$ and $g_2(n) = 2n^2$.
  - We have $g_1(n) = O(n^3)$ and $g_2(n) = O(n^3)$.
  - But do not consider $g_1(n) = g_2(n)$, this is just not true.
Domain and codomain. $f$ and $g$ are real-valued functions.
- The domain is typically the natural numbers: $\mathbf{N} \rightarrow \mathbf{R}$.
- Sometimes we extend to the reals: $\mathbf{R}_{\ge 0} \rightarrow \mathbf{R}$.
Remember: It is ok to abuse notation, not ok to misuse it (i.e. draw conclusions through misused notations.)

Big O notation: Properties

The “cieling”, if you will, of an algorithm.
Reflexivity: $f$ is $O(f)$.
Constants: If $f$ is $O(g_1)$ and $f_2$ is $O(g_2)$, then $f_1 f_2$ is $O(g_1 g_2)$.
- Proof.
  - $\exists~c_1 \gt 0$ and $n_1 \gt 0$ such that $0 \lt f_1(n) \le c_1 \cdot g_1(n)$ for all $n \gt n_1$.
  - $\exists~c_2\gt 0$ and $n_2 \gt 0$ such that $0 \lt f_2(n) \le c_2 \cdot g_2(n)$ for all $n \gt n_2$.
  - Then, $0 \lt f_1(n) \cdot f_2(n) \lt c_1 \cdot c_2 \cdot g_1(n) \cdot g_2(n)$ for all $n \gt \text{ max }(g_1, g_2)$. $\blacksquare$.
Sums: If $f_1$ is $O(g_1)$ and $f_2$ is $O(g_2)$, then $f_1 + f_2$ is $O(\text{ max }(g_1, g_2))$.
Transitivity: If $f$ is $O(g)$ and $g$ is $O(h)$, then $f$ is $O(h$).
Example: $f(n) = 2n^3 + 10000n^2 + 80n + 4721$ is $O(n^3)$.

Big Omega Notation

The “floor”, if you will, of an algorithm.
Lower bounds: $f(n)$ is $\Omega (g(n))$ if there exist constants $c \gt 0$ and $n_0 \ge 0$ such that $f(n) \ge c \cdot g(n)\ge 0$ for all $n \ge n_0$.
$f(n) = 26n^2 + 4n + 100$.
- $f(n)$ is both $\Omega(n^2)$ and $\Omega(n)$.
- $f(n)$ is not $\Omega(n^3)$.
Typical usage: Any compare-based sorting algorithm requires $\Omega(n~\text{log}~n)$ compares in the worst case.
$f(n)$ is $\Omega(g(n))$ iff $g(n)$ is $O(f(n))$. This is the primary relationship between Big O and Big Omega notation.


Sourced from Programiz

Notice how $f(n)$ is both greater than and lesser than $g(n)$ before $n_0$, but then after $n_0$ i.e. for all $n \ge n_0$, $f(n) \ge g(n)$.

Big Theta Notation

The “room”, if you will, of an algorithm.
Tight bounds: $f(n)$ is $\Theta(g(n))$ if there exist constants $c_1 \gt 0,~c_2 \gt 0$, and $n_0 \ge 0$ such that $0 \le c_1 \cdot g(n) \le f(n) \le c_2 \cdot g(n)$ for all $n \ge n_0$.
$f(n) = 16x^{2} + 17x + 1$.
- $f(n)$ is $\Theta(n^2)$, choose $c_1 = 16$, $c_2 = 24$, $n_0 = 1$.
- $f(n)$ is neither $\Theta(n)$ nor $\Theta(n^3)$..
Typical usage: Mergesort takes $\Theta(n~\text{log}~n)$ compares to sort $n$ elements.


Sourced from Programiz

Graphs

Basic Definitions and Applications

Undirected Graphs

Notation: $G = (V, E)$ where:

$V =$ nodes (also called vertices).
$E =$ edges (also called arcs) between pairs of nodes.
Captures “pairwise” relationship between objects.
For reference, the graph size parameters used later are: $n = |V|$, $m = |E|$, i.e. $n$ is number of nodes, $m$ is number of edges.

Graph Representation

Adjacency Matrix

An $n$-by-$n$ matrix such that $A_{u,~v} = 1$ if $(u,~v)$ is an edge.

Two representations of each edge (i.e. $A_{u,~v}$ and $A_{v,~u}$).
Space complexity is $\Theta(n^2)$.
Checking if $(u,~v)$ is a valid edge takes $\Theta(1)$ time.
Identifying all edges takes $\Theta(n^2)$ time.

Adjacency Lists

Array of lists that are indexed via nodes.

Two representations of each edge.
Space is $\Theta(m + n)$.
Checking if $(u, v)$ is an edge takes $O(degree(u))$ time, where degree is the number of neighbors of $u$.
Identifying all edges takes $\Theta(m + n)$ time.

Paths and Connectivity

Definition: A path for an undirected graph $G = (V,~E)$ is a sequence of nodes $v_0,~v_1,~...,~v_k$ such that each consecutive pair, $v_{i -1},~v_i$ is joined by a distinct edge $E \in G$.
Definition: A path is simple if all nodes are distinct, i.e. no duplicate nodes in the sequence.
Definition: An undirected graph is regarded as connected if for every pair of nodes $u$ and $v$ there exists a path between $u$ and $v$. Notice I said a path not an edge.

Cycles

Definition: A cycle is a path ($v_1,~v_2,~...,~v_k$) in which $v_1 = v_k$ and $k \ge 2$. This would break down if $k$ equals one.
Definition: A cycle is regarded as simple if all nodes are distinct, i.e. no duplicate nodes in the sequence with the exception of $v_1$ and $v_k$.

Trees

Definition: An undirected graph is a tree if it is connected and doesn’t contain a cycle.
Theorem: Let $G$ be an undirected graph of $n$ nodes. If two of the following are true, the third is implied:
- $G$ is connected.
- $G$ doesn’t contain a cycle.
- $G$ has $n-1$ edges, i.e. $m = n - 1$.

Introduction#

Stable Matching Problem#

Input#

Perfect Matching#

Unstable Pair#

Stable Roomate Problem#

Gale-Shapley deferred acceptance algorithm#

Proof of Correctness: Termination#

Proof of Correctness: Perfect Matching#

Proof of Correctness: Stability#

Hospital Optimality#

Student Pessimality#

Deceit#

Algorithm Analysis#

Computational Tractability#

Models of computation#

Turing machines#

Word RAM.#

Brute Force#

Polynomial running time#

Worst-case analysis#

Other types of analyses#

asymptotic order of growth#

Big O Notation#

Big O Notational Abuses#

Big O notation: Properties#

Big Omega Notation#

Big Theta Notation#

Graphs#

Basic Definitions and Applications#

Undirected Graphs#

Graph Representation#

Adjacency Matrix#

Adjacency Lists#

Paths and Connectivity#

Cycles#

Trees#