I will start my blog by illustrating that NP-completeness has more implications than just exact hardness of computational problems. Through clever “hardness amplification” techniques, one can extend assumptions about the difficulty of exactly solving NP-hard problems like 3-satisfiability (3SAT) to difficulty of coming up with approximate solutions in polynomial time.
By approximate solutions, I mean algorithms that are guaranteed to output an answer that is within some guaranteed factor of the best possible answer. Note that this is a worst-case result, in the sense that the approximation algorithms that we are interested in always run quickly no matter the input (polynomial time in the size of the input) and output an answer that is always within a given factor of the best possible answer (typically a constant factor, like always at least 1/2 of the best possible answer).
In this post, I will look at one particular computational problem called the independent set problem. In this problem, we are given a graph with vertex set
and edge set
(denoted
). An independent set of
is a subset of
such that no two vertices are adjacent in
. Our goal is to compute an independent set of
with maximum possible size.
This is one of Karp’s original NP-complete problems. Instead of trying to solve this problem exactly, we will reason about whether constant factor approximation algorithms exist, i.e. ones that always output an independent set that is times the largest independent set for some fixed constant
. In this post, I will prove the following result:
Theorem 1 For any constant
, assuming that
, there is no worst-case polynomial time algorithm in
for distinguishing between the following two sets of graphs for some constant
:
1. Graphs
with an independent set of size at least
.
2. Graphs
for which all independent sets have at most
vertices.
Note that not all graphs are in the union of these sets. Given a graph that is in one of these two classes, it is NP-hard to decide whether it is in the first class or the second.
This immediately implies that the independent set problem has no constant factor approximation algorithm! If we did have an algorithm that gave us an independent set with size at least times the maximum independent set in polynomial time for some constant
, picking any
would allow us to distinguish between the two sets given in the theorem.
In order to prove this theorem, we will need a theorem about the difficulty of producing “good” solutions to 3SAT that satisfy a large number of clauses. Instead of thinking about a 3SAT formula as a single boolean formula, we can think of it as a set of constraints (corresponding to 3 variable ORed together) that must be simultaneously satisfied. Instead of thinking about satisfying all of the constraints, we can try satisfying as many as we can. More formally,
Definition 2 A 3SAT instance
can be viewed as a set of functions
,
that depend on at most three variables from the set
. Let
be the maximum fraction of constraints satisfied by any assignment, i.e.
The optimization problem of computing for any 3SAT formula
is hard to approximate, which follows from the following result:
Theorem 3 There is a constant
such that it is NP-hard to distinguish between the following two sets of 3SAT instances:
1. Formulas
that are satisfiable (
).
2. Formulas
such that
.
I will only outline the proof here, as it is rather lengthy. Since is NP-complete, it follows that it is NP-hard to distinguish between the following two sets of instances:
1. Formulas that are satisfiable (
).
2. Formulas that are not satisfiable, i.e.
(at least one constraint is not satisfied), where
is the number of constraints in
.
From every instance of 3SAT, I claim that there is some constant for which there exists a linear time computable function
on instances of 3SAT that outputs another instance of 3SAT with the following properties:
1. is satisfiable implies that
is satisfiable.
2. for
implies that
.
3. has at most
constraints, where
is a constant.
This procedure suffices, as applying this procedure as many times as possible (until we are guaranteed that either or
) gives a formula with at most
constraints in the final term, since we will only have to run this procedure
times to get the desired guarantee on
.
This procedure is implemented using expander graphs, which intuitively are graphs with high connectivity but a low number of edges (at most a linear function on the number of vertices). First, we convert a given 3SAT instance into an expander graph, slightly weakening the upper bound on . Then, we use random walks to constuct a 2SAT instances over a large alphabet with an improved upper bound on
. Finally, we turn this 2SAT instance with a large alphabet into a 3SAT instance over a binary alphabet with a slight weakening of the upper bound on
. Composing these steps gives a modest improvement on the upper bound for
when
is unsatisfiable, which is the desired result.
Now, I will use this theorem to prove that the maximum independent set problem has no polynomial time approximation algorithm that gets within a factor of , for the same
in the previous theorem:
Theorem 4 For the constant
given in the previous theorem, it is NP-hard to distinguish between the following sets of graphs:
1. Graphs
such that there is an independent set with size at least
.
2. Graphs
such that all independent sets are of size at most
.
Proof: It suffices to use the standard reduction from 3SAT to Independent Set. More precisely, for every constraint in a 3SAT instance
that depends on exactly three variables, create a block of 7 vertices in a new graph
that correspond to satisfying assignments of
. For every two vertices in
, connect them by an edge if and only if they correspond to inconsistent assignments.
First, suppose that a formula is satisfiable (in the first set for the previous theorem). This corresponds to consistent assignments for each of the
constraints, which gives an independent set of size
in the graph obtained through the reduction (which has
vertices). Now, suppose that there is an independent set
in the graph
resulting from
with at least
vertices. Note that the seven vertices corresponding to each constraint correspond to pairwise inconsistent assignments, meaning that they form a clique (set of vertices that are pairwise connected by edges). Therefore,
has at most one vertex corresponding to an assignment of any particular constraint, so
corresponds to a consistent assignment of at least
constraints. Therefore, this reduction suffices.
Finally, we need to amplify this result in order to show that the maximum independent set problem cannot be approximated within any constant factor. We will do this by producing a new graph for which every vertex corresponds to a set of vertices from the original graph, where
is some nonzero constant. We will see that an independent sets in this graph correspond to independet sets in the old graph, but with increased size. More precisely, we will show the following:
Theorem 5 For any constant positive integer
, there is a constant
such that it is NP-hard to distinguish the following sets of graphs:
1. Graphs
with an independent set of size at least
.
2. Graphs
for which all independent sets have at most
vertices.
Proof: For a graph , construct a graph
, with
(the set of all size
subsets of
) and with an edge
for
if and only if
is not an independent set in
. For every independent set
in
,
is an independent set in
. Furthermore, if there is an independent set of size
in
for some integer
, there are at least
elements in the union of the sets corresponding to these vertices, so
has an independent set of size at least
. For sufficiently large
,
is asymptotic to
. Therefore, applying this reduction to both sets of graphs in Theorem 4 for any constant
gives the desired NP-hardness of approximation result.
This completes the proof that Maximum Independent Set cannot be approximated within any constant factor. In future blog posts, I will demonstrate more powerful versions of the “hardness amplification” techniques used here to show more hardness of approximation results.
Reference: Sanjeev Arora and Boaz Barak. Computational Complexity: A Modern Approach, chapter 11.