I will start my blog by illustrating that NP-completeness has more implications than just exact hardness of computational problems. Through clever “hardness amplification” techniques, one can extend assumptions about the difficulty of exactly solving NP-hard problems like 3-satisfiability (3SAT) to difficulty of coming up with *approximate* solutions in polynomial time.

By approximate solutions, I mean algorithms that are guaranteed to output an answer that is within some guaranteed factor of the best possible answer. Note that this is a worst-case result, in the sense that the approximation algorithms that we are interested in always run quickly no matter the input (polynomial time in the size of the input) and output an answer that is always within a given factor of the best possible answer (typically a constant factor, like always at least 1/2 of the best possible answer).

In this post, I will look at one particular computational problem called the independent set problem. In this problem, we are given a graph with vertex set and edge set (denoted ). An *independent set* of is a subset of such that no two vertices are adjacent in . Our goal is to compute an independent set of with maximum possible size.

This is one of Karp’s original NP-complete problems. Instead of trying to solve this problem exactly, we will reason about whether constant factor approximation algorithms exist, i.e. ones that always output an independent set that is times the largest independent set for some fixed constant . In this post, I will prove the following result:

Theorem 1For any constant , assuming that , there is no worst-case polynomial time algorithm in for distinguishing between the following two sets of graphs for some constant :

1. Graphs with an independent set of size at least .

2. Graphs for which all independent sets have at most vertices.

Note that not all graphs are in the union of these sets. Given a graph that is in one of these two classes, it is NP-hard to decide whether it is in the first class or the second.

This immediately implies that the independent set problem has *no constant factor approximation algorithm*! If we did have an algorithm that gave us an independent set with size at least times the maximum independent set in polynomial time for some constant , picking any would allow us to distinguish between the two sets given in the theorem.

In order to prove this theorem, we will need a theorem about the difficulty of producing “good” solutions to 3SAT that satisfy a large number of clauses. Instead of thinking about a 3SAT formula as a single boolean formula, we can think of it as a set of constraints (corresponding to 3 variable ORed together) that must be simultaneously satisfied. Instead of thinking about satisfying all of the constraints, we can try satisfying as many as we can. More formally,

Definition 2A 3SAT instance can be viewed as a set of functions , that depend on at most three variables from the set . Let be the maximum fraction of constraints satisfied by any assignment, i.e.

The optimization problem of computing for any 3SAT formula is hard to approximate, which follows from the following result:

Theorem 3There is a constant such that it is NP-hard to distinguish between the following two sets of 3SAT instances:

1. Formulas that are satisfiable ().

2. Formulas such that .

I will only outline the proof here, as it is rather lengthy. Since is NP-complete, it follows that it is NP-hard to distinguish between the following two sets of instances:

1. Formulas that are satisfiable ().

2. Formulas that are not satisfiable, i.e. (at least one constraint is not satisfied), where is the number of constraints in .

From every instance of 3SAT, I claim that there is some constant for which there exists a linear time computable function on instances of 3SAT that outputs another instance of 3SAT with the following properties:

1. is satisfiable implies that is satisfiable.

2. for implies that .

3. has at most constraints, where is a constant.

This procedure suffices, as applying this procedure as many times as possible (until we are guaranteed that either or ) gives a formula with at most constraints in the final term, since we will only have to run this procedure times to get the desired guarantee on .

This procedure is implemented using expander graphs, which intuitively are graphs with high connectivity but a low number of edges (at most a linear function on the number of vertices). First, we convert a given 3SAT instance into an expander graph, slightly weakening the upper bound on . Then, we use random walks to constuct a 2SAT instances over a large alphabet with an improved upper bound on . Finally, we turn this 2SAT instance with a large alphabet into a 3SAT instance over a binary alphabet with a slight weakening of the upper bound on . Composing these steps gives a modest improvement on the upper bound for when is unsatisfiable, which is the desired result.

Now, I will use this theorem to prove that the maximum independent set problem has no polynomial time approximation algorithm that gets within a factor of , for the same in the previous theorem:

Theorem 4For the constant given in the previous theorem, it is NP-hard to distinguish between the following sets of graphs:

1. Graphs such that there is an independent set with size at least .

2. Graphs such that all independent sets are of size at most .

**Proof:** It suffices to use the standard reduction from 3SAT to Independent Set. More precisely, for every constraint in a 3SAT instance that depends on exactly three variables, create a block of 7 vertices in a new graph that correspond to satisfying assignments of . For every two vertices in , connect them by an edge if and only if they correspond to inconsistent assignments.

First, suppose that a formula is satisfiable (in the first set for the previous theorem). This corresponds to consistent assignments for each of the constraints, which gives an independent set of size in the graph obtained through the reduction (which has vertices). Now, suppose that there is an independent set in the graph resulting from with at least vertices. Note that the seven vertices corresponding to each constraint correspond to pairwise inconsistent assignments, meaning that they form a clique (set of vertices that are pairwise connected by edges). Therefore, has at most one vertex corresponding to an assignment of any particular constraint, so corresponds to a consistent assignment of at least constraints. Therefore, this reduction suffices.

Finally, we need to amplify this result in order to show that the maximum independent set problem cannot be approximated within *any* constant factor. We will do this by producing a new graph for which every vertex corresponds to a set of vertices from the original graph, where is some nonzero constant. We will see that an independent sets in this graph correspond to independet sets in the old graph, but with increased size. More precisely, we will show the following:

Theorem 5For any constant positive integer , there is a constant such that it is NP-hard to distinguish the following sets of graphs:

1. Graphs with an independent set of size at least .

2. Graphs for which all independent sets have at most vertices.

**Proof:** For a graph , construct a graph , with (the set of all size subsets of ) and with an edge for if and only if is not an independent set in . For every independent set in , is an independent set in . Furthermore, if there is an independent set of size in for some integer , there are at least elements in the union of the sets corresponding to these vertices, so has an independent set of size at least . For sufficiently large , is asymptotic to . Therefore, applying this reduction to both sets of graphs in Theorem 4 for any constant gives the desired NP-hardness of approximation result.

This completes the proof that Maximum Independent Set cannot be approximated within any constant factor. In future blog posts, I will demonstrate more powerful versions of the “hardness amplification” techniques used here to show more hardness of approximation results.

**Reference:** Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach, chapter 11.