What is NP?

26 May 2017

What is P?

This seems like a silly question. This question seems easily answered by the surprisingly comprehensible explanation on P’s Wikipedia page:

It contains all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computation time

OK, so it’s the set of all decision problems (yes-or-no questions) that are \(O(n^k)\) for some integer \(k\). For example, sorting, which is \(\Theta(n \log n)\) is P, since it’s also \(O(n^2)\).

But let’s go for an alternate definition. Let’s take a C-like language and get rid of a bunch of features: subroutine calls, while loops, goto, for loops with anything other than i < n in the second field (where n is the length of the input), and modifying the for loop variable within the loop. Effectively, this language’s only loping construct is for(int varname = 0; varname < n; varname++) {/* Code that never modifies varname */}. Let’s call this language C-FOR.

Clearly, every program written in this language terminates (there’s no way to construct an infinite loop). Also, we know that every program in this language runs in polynomial time since we can only nest loops a finite amount of times, and that maximum nesting depth determines how long the program takes to execute. Finally, we know that every program in P can be written in this language as we can keep track of the state of the program (including variables and the current line being executed), and just run it for \(p(n) \in O(n^k)\) steps using \(k\) nested loops. Thus, the set of algorithms writable in C-FOR is precisely the set of polynomial-time algorithms.

Sometimes, writing a program in C-FOR will inevitably result in hideous programs that end up using the loops just to run some unrelated calculation for a certain amount of time. However, most commonly-written programs in P are naturally written in nested-for-loop form. For example, here’s a verifier for 3SAT written in C-FOR:

/* 3sat-verify.h */
#include <stdbool.h>
typedef unsigned variable_t;
typedef struct { variable_t name; bool not_negated; } literal_t;
typedef struct { literal_t first, second, third; } clause_t;
typedef struct { unsigned num_vars, num_clauses; clause_t clauses[]; } instance_3sat_t;
bool verify_3SAT(instance_3sat_t *inst, bool *soln);
/* 3sat-verify.c */
#include "3sat-verify.h"

bool verify_3SAT(instance_3sat_t *inst, bool *soln) {
    for (int i = 0; i < inst->num_clauses; i++) {
        clause_t c = inst->clauses[i];
        bool clause_satisfiable = c.first.not_negated == soln[c.first.name];
        clause_satisfiable |= c.second.not_negated == soln[c.second.name];
        clause_satisfiable |= c.third.not_negated == soln[c.third.name];
        if (!clause_satisfiable) {
            return false;
        }
    }
    return true;
}

What is NP?

I never really liked the name NP. It stands for “nondeterministic-polynomial time” which sounds like the set of problems that run in polynomial time on a machine that can sometimes act randomly. In fact, this is a completely different complexity class (named BPP). NP is defined as the set of all decision problems with some fixed verifier in P and a proof for every “yes” answer that the verifier can check. For example, 3SAT is in NP because we can verify a result in polynomial time (see above).

This is a fairly intuitive definition, but what does it have to do with the “nondeterministic” part of “nondeterministic-polynomial”?

Well, a nondeterministic Turing machine is defined as one that can guess correctly at every step. Equivalently, it can try multiple things in parallel at every step and discard incorrect branches. We can add this capability to our language to create the language C-FOR-FORK. We add two primitives:

bool try_both(void);, which returns both true and false in separate threads.
bool wait_all(bool ret), which does a large OR over all possible values for ret collected in various processes.

As it turns out, you can actually implement these commands in C, as such:

#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>
#include "try-both.h"

bool is_parent = true;

bool try_both(void) {
    bool result = !!fork();
    is_parent &= result;
    return result;
}

bool wait_all(bool ret) {
    int subprocess_return_code;
    while(wait(&subprocess_return_code) != -1) {
        ret |= subprocess_return_code;
    }
    if (!is_parent) {
        exit(ret);
    }
    return ret;
}

In any case, we can now write any NP problem in C-FOR-FORK. As a proof, we present a solution to the NP-Complete problem, 3SAT:

#include <stdlib.h>
#include <stdbool.h>
#include "3sat-verify.h"
#include "try-both.h"

bool solve_3SAT(instance_3sat_t *inst) {
    bool *soln = malloc(inst->num_vars * sizeof(bool));
    for (int i = 0; i < inst->num_vars; i++) {
        soln[i] = try_both();
    }
    // Technically, we aren't allowed to call the verify_3SAT subroutine,
    //      but I did that rather than copy-pasting the code here for
    //      brevity. If this bothers you, you should be able to confirm
    //      that copy-pasting the code here doesn't change anything
    bool works = verify_3SAT(inst, soln);
    return wait_all(works);
}

This pattern can be used for any NP problem (or you could just reduce that one to 3SAT and use the above implementation).

Thus, we have it: NP is the set of all problems solvable in a language with only bounded iteration, but also the ability to fork arbitrarily and join at the end.

If you want to look at the code examples and see a 3SAT solver in action, one is implemented here. (Don’t run it on too large an instance, it might crash your computer.)