## Hardness of approximating independent set

I will start my blog by illustrating that NP-completeness has more implications than just exact hardness of computational problems. Through clever “hardness amplification” techniques, one can extend assumptions about the difficulty of exactly solving NP-hard problems like 3-satisfiability (3SAT) to difficulty of coming up with approximate solutions in polynomial time.

By approximate solutions, I mean algorithms that are guaranteed to output an answer that is within some guaranteed factor of the best possible answer. Note that this is a worst-case result, in the sense that the approximation algorithms that we are interested in always run quickly no matter the input (polynomial time in the size of the input) and output an answer that is always within a given factor of the best possible answer (typically a constant factor, like always at least 1/2 of the best possible answer).

In this post, I will look at one particular computational problem called the independent set problem. In this problem, we are given a graph ${G}$ with vertex set ${V}$ and edge set ${E}$ (denoted ${G = (V, E)}$). An independent set of ${G}$ is a subset of ${V}$ such that no two vertices are adjacent in ${G}$. Our goal is to compute an independent set of ${G}$ with maximum possible size.

This is one of Karp’s original NP-complete problems. Instead of trying to solve this problem exactly, we will reason about whether constant factor approximation algorithms exist, i.e. ones that always output an independent set that is ${c}$ times the largest independent set for some fixed constant ${c < 1}$. In this post, I will prove the following result:

Theorem 1 For any constant ${\epsilon > 0}$, assuming that ${P\ne NP}$, there is no worst-case polynomial time algorithm in ${|V| + |E|}$ for distinguishing between the following two sets of graphs for some constant ${c_{\epsilon}}$:

1. Graphs ${G = (V, E)}$ with an independent set of size at least ${c_{\epsilon} |V|}$.

2. Graphs ${G = (V, E)}$ for which all independent sets have at most ${c_{\epsilon}\epsilon|V|}$ vertices.

Note that not all graphs are in the union of these sets. Given a graph that is in one of these two classes, it is NP-hard to decide whether it is in the first class or the second.

This immediately implies that the independent set problem has no constant factor approximation algorithm! If we did have an algorithm that gave us an independent set with size at least ${\rho}$ times the maximum independent set in polynomial time for some constant ${\rho > 0}$, picking any ${\epsilon < \rho}$ would allow us to distinguish between the two sets given in the theorem.

In order to prove this theorem, we will need a theorem about the difficulty of producing “good” solutions to 3SAT that satisfy a large number of clauses. Instead of thinking about a 3SAT formula as a single boolean formula, we can think of it as a set of constraints (corresponding to 3 variable ORed together) that must be simultaneously satisfied. Instead of thinking about satisfying all of the constraints, we can try satisfying as many as we can. More formally,

Definition 2 A 3SAT instance ${\phi}$ can be viewed as a set of functions ${\{\phi_j\}_{j = 1}^m}$, ${\phi_j: \{0, 1\}^n\rightarrow \{0, 1\}}$ that depend on at most three variables from the set ${\{x_i\}_{i = 1}^n}$. Let ${val(\phi)}$ be the maximum fraction of constraints satisfied by any assignment, i.e.

$\displaystyle val(\phi) := \frac{\max_{y\in \{0, 1\}^n} |\{j: 1\le j\le m, \phi_j(y) = 1\}|}{m}$

The optimization problem of computing ${val(\phi)}$ for any 3SAT formula ${\phi}$ is hard to approximate, which follows from the following result:

Theorem 3 There is a constant ${\rho\in (0, 1)}$ such that it is NP-hard to distinguish between the following two sets of 3SAT instances:

1. Formulas ${\phi}$ that are satisfiable (${val(\phi) = 1}$).

2. Formulas ${\phi}$ such that ${val(\phi) \le \rho}$.

I will only outline the proof here, as it is rather lengthy. Since ${3SAT}$ is NP-complete, it follows that it is NP-hard to distinguish between the following two sets of instances:

1. Formulas ${\phi}$ that are satisfiable (${val(\phi) = 1}$).

2. Formulas ${\phi}$ that are not satisfiable, i.e. ${val(\phi)\le 1 - \frac{1}{m}}$ (at least one constraint is not satisfied), where ${m}$ is the number of constraints in ${\phi}$.

From every instance of 3SAT, I claim that there is some constant ${\epsilon_0 > 0}$ for which there exists a linear time computable function ${f}$ on instances of 3SAT that outputs another instance of 3SAT with the following properties:

1. ${\phi}$ is satisfiable implies that ${f(\phi)}$ is satisfiable.

2. ${val(\phi) = 1 - \epsilon}$ for ${\epsilon \in (0, \epsilon_0)}$ implies that ${val(f(\phi)) \le 1 - 2\epsilon}$.

3. ${f(\phi)}$ has at most ${Cm}$ constraints, where ${C}$ is a constant.

This procedure suffices, as applying this procedure as many times as possible (until we are guaranteed that either ${val(\phi) = 1}$ or ${val(\phi)\le 1 - \epsilon_0}$) gives a formula with at most ${C^{O(\log m)}m = poly(m)}$ constraints in the final term, since we will only have to run this procedure ${O(\log m)}$ times to get the desired guarantee on ${val}$.

This procedure is implemented using expander graphs, which intuitively are graphs with high connectivity but a low number of edges (at most a linear function on the number of vertices). First, we convert a given 3SAT instance into an expander graph, slightly weakening the upper bound on ${val(\phi)}$. Then, we use random walks to constuct a 2SAT instances over a large alphabet with an improved upper bound on ${val(\phi)}$. Finally, we turn this 2SAT instance with a large alphabet into a 3SAT instance over a binary alphabet with a slight weakening of the upper bound on ${val(\phi)}$. Composing these steps gives a modest improvement on the upper bound for ${val(\phi)}$ when ${\phi}$ is unsatisfiable, which is the desired result.

Now, I will use this theorem to prove that the maximum independent set problem has no polynomial time approximation algorithm that gets within a factor of ${\rho}$, for the same ${\rho}$ in the previous theorem:

Theorem 4 For the constant ${\rho}$ given in the previous theorem, it is NP-hard to distinguish between the following sets of graphs:

1. Graphs ${G = (V, E)}$ such that there is an independent set with size at least ${\frac{|V|}{7}}$.

2. Graphs ${G = (V, E)}$ such that all independent sets are of size at most ${\frac{\rho |V|}{7}}$.

Proof: It suffices to use the standard reduction from 3SAT to Independent Set. More precisely, for every constraint ${\phi_j}$ in a 3SAT instance ${\phi}$ that depends on exactly three variables, create a block of 7 vertices in a new graph ${G}$ that correspond to satisfying assignments of ${\phi_j}$. For every two vertices in ${G}$, connect them by an edge if and only if they correspond to inconsistent assignments.

First, suppose that a formula ${\phi}$ is satisfiable (in the first set for the previous theorem). This corresponds to consistent assignments for each of the ${m}$ constraints, which gives an independent set of size ${m}$ in the graph obtained through the reduction (which has ${7m}$ vertices). Now, suppose that there is an independent set ${I}$ in the graph ${G = (V, E)}$ resulting from ${\phi}$ with at least ${\frac{\rho |V|}{7}}$ vertices. Note that the seven vertices corresponding to each constraint correspond to pairwise inconsistent assignments, meaning that they form a clique (set of vertices that are pairwise connected by edges). Therefore, ${I}$ has at most one vertex corresponding to an assignment of any particular constraint, so ${I}$ corresponds to a consistent assignment of at least ${\frac{\rho |V|}{7} = m\rho}$ constraints. Therefore, this reduction suffices.

Finally, we need to amplify this result in order to show that the maximum independent set problem cannot be approximated within any constant factor. We will do this by producing a new graph for which every vertex corresponds to a set of ${k}$ vertices from the original graph, where ${k}$ is some nonzero constant. We will see that an independent sets in this graph correspond to independet sets in the old graph, but with increased size. More precisely, we will show the following:

Theorem 5 For any constant positive integer ${k}$, there is a constant ${c_k\in (0, 1)}$ such that it is NP-hard to distinguish the following sets of graphs:

1. Graphs ${G = (V, E)}$ with an independent set of size at least ${c_k |V|}$.

2. Graphs ${G = (V, E)}$ for which all independent sets have at most ${c_k\rho^k |V|}$ vertices.

Proof: For a graph ${G}$, construct a graph ${G_k = (V_k, E_k)}$, with ${V_k = \binom{V}{k}}$ (the set of all size ${k}$ subsets of ${V}$) and with an edge ${\{S_1, S_2\}\in E_k}$ for ${S_1, S_2\in V_k}$ if and only if ${S_1\cup S_2}$ is not an independent set in ${G}$. For every independent set ${I}$ in ${G}$, ${\binom{I}{k}}$ is an independent set in ${G_k}$. Furthermore, if there is an independent set of size ${\binom{r}{k} + 1}$ in ${G_k}$ for some integer ${r}$, there are at least ${r + 1}$ elements in the union of the sets corresponding to these vertices, so ${G}$ has an independent set of size at least ${r + 1}$. For sufficiently large ${n}$, ${\binom{n}{k}}$ is asymptotic to ${\frac{n^k}{k!}}$. Therefore, applying this reduction to both sets of graphs in Theorem 4 for any constant ${k}$ gives the desired NP-hardness of approximation result.

This completes the proof that Maximum Independent Set cannot be approximated within any constant factor. In future blog posts, I will demonstrate more powerful versions of the “hardness amplification” techniques used here to show more hardness of approximation results.

Reference: Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach, chapter 11.