An early look at the Dart programming language

To content | To menu | To search

Strongly Connected Components and the 2-SAT Problem in Dart

We develop an algorithm to compute the strongly connected components of a directed graph on large graph sizes using the iterative version of our Kosaraju class. We then make use of this algorithm to solve, in linear time, a special case of the boolean satisfiability problem, known as the 2-SAT problem. Computational times of this algorithm are reported for directed graphs in excess of 5 million edges and 2-SAT data structures with up to a 1 million clauses and an equal number of variables.

Introduction


In our previous article, we added a class to our graphlab library which implements an iterative DFS approach to the well known Kosaraju algorithm. In this article, we make use of this class to, first, develop an algorithm to compute the strongly connected components of a directed graph [1], and second, to make use of this algorithm to solve, in linear time, a special case of the boolean satisfiability problem, known as the 2-SAT problem [2] [3]. We will also report the computational times of this algorithm for directed graphs which contain in excess of 5 million edges, and in the case of the 2-SAT problem, data structures which contain up to 1 million clauses with an equal number of variables.

Computing Strongly Connected Components with Dart


We will assume that our edge list has been provided as a 2D array - a List of Lists. The exact format of the input data will tend to be application specific and, although there are more efficient structures for 2D arrays of objects in Dart [4], this article is intended as an example of a very general case. Note that depending on the application, the size of the input graph, and the amount of heap space available, we found that it may be necessary to clear the input edge list once the graph has been mapped to the directed graph data structure to free up memory. Once the components are computed, we then determine the size of each connected group and sort them in descending order. Finally, our top level function scc() returns a Future object of type SccResults:

part of graphlab;

Future<SccResults> scc(List<List> edgeList) {
  // Check if the edge list data is valid.
  if (edgeList == null || edgeList.isEmpty) {
    throw new ArgumentError("Edge list data is not valid.");
  }
  // Variables.
  HashMap components;
  List sccValues = [];
  List sumComponents = [];
  var offset = 0;

  // Create two new directed graphs.
  DirectedGraph graph = new DirectedGraph();
  DirectedGraph grev = new DirectedGraph();

  /// Populate the directed graph with the vertices from the edge list.
  for (List list in edgeList) {
    for (var element in list) {
      graph.addNode(element);
    }
  }
  /// Populate the directed graph with the directed edges.
  for (List list in edgeList) {
    graph.addEdge(list[0], list[1]);
  }
  /// Populate the reversed graph with the vertices from the edge list.
  for (List list in edgeList) {
    for (var element in list) {
      grev.addNode(element);
    }
  }
  /// Populate the reversed graph with the directed edges in reverse order.
  for (List list in edgeList) {
    grev.addEdge(list[1], list[0]);
  }
  // If heap space becomes an issue, clear the edge list.
  if (edgeList.length > 500000) {
    edgeList.clear();
  }

  /// Compute the strongly connected components of the
  /// directed graph using Kosaraju's alogorithm.
  components = new _Kosaraju().computeSCC(graph, grev: grev);

  /// Compute the size of each scc group and store it in a sorted list.
  sccValues = components.values.toList()
      ..sort();
  var N = sccValues.length;
  for (var i = 0; i < N - 1; i++) {
    if (sccValues[i] != sccValues[i + 1]) {
      sumComponents.add(i + 1 - offset);
      offset = i + 1;
    }
  }
  sumComponents..add(N - offset)..sort((a, b) => b.compareTo(a));

  /// Return the SccResults object.
  return new Future(() =>
      new SccResults(sumComponents, N, components, graph));
}

where SccResults() is a simple data structure of the form:

class SccResults extends GraphLabResults{
  final sccNodes;
  final graph;

  SccResults(data, value, this.sccNodes, this.graph) : super(data, value);
}

At this point, our simple graphlab library looks like the following:

library graphlab;

import 'dart:collection';
import 'dart:async';

part 'src/algorithms/directed_graph.dart';
part 'src/algorithms/kosaraju.dart';
part 'src/algorithms/scc.dart';

part 'src/utilities/graphlab_results.dart';

part 'src/exceptions/graphlab_exception.dart';
part 'src/exceptions/no_such_element_exception.dart';

Let's try a simple test of our algorithm on the directed graph that we had defined in our earlier article on collections in Dart, with the exception that we will read the graph from an external file using a Stream [5]:

import 'package:graphlab/graphlab.dart';
import 'dart:async';
import 'dart:io';
import 'dart:collection';
import 'dart:convert';

void main() {
  String filename = 'example/data/small_scc.txt';
  List<List<int>> sccfile = [];
  Stream stream = new File(filename).openRead();
  stream
      .transform(UTF8.decoder)
      .transform(new LineSplitter())
      .listen((String line) {
        var stringBuffer = line.split(" ");
        var intBuffer = [];
        stringBuffer.forEach((element) {
          if(!element.isEmpty) {
            intBuffer.add(int.parse(element.trim()));
          }
        });
        sccfile.add(intBuffer);
      },
      onDone: () {
        print('There are ${sccfile.length} edges in this graph.');
        scc(sccfile).then((sccResults) {
          print('This graph contains ${sccResults.value} nodes.');
          print('The size of each strongly connected component is ${sccResults.data}.');
          print('The map of nodes with the scc the node belongs in: \n'
                  '${sccResults.sccNodes}');
          HashMap sets = new HashMap();
          for (var value in sccResults.sccNodes.values) {
            if (!sets.containsKey(value)) {
              sets[value] = new HashSet();
              for (var key in sccResults.sccNodes.keys) {
                if (sccResults.sccNodes[key] == value) {
                  sets[value].add(key);
                }
              }
            }
          }
          print('The map of components with a list of the nodes belonging to it: \n'
              '$sets');
        });
      },
      onError: (e) {
        print('There was an error: $e');
      });
}

Prints the following:

There are 29 edges in this graph.
The graph contains 15 nodes.
The size of each strongly connected component is [6, 5, 3, 1].
The map of nodes with the scc the node belongs in:
{1: 2, 2: 2, 3: 2, 4: 3, 5: 1, 6: 1, 7: 1, 8: 1, 9: 1, 10: 1, 11: 0, 12: 0, 13: 0, 14: 0, 15: 0}
The map of components with a list of the nodes belonging to it:
{0: {11, 12, 13, 14, 15}, 1: {8, 9, 10, 5, 6, 7}, 2: {1, 2, 3}, 3: {4}}

The graph would look like the following divided into its strongly connected components.directed_graph_sccs.png

But how efficient is this algorithm? Let's test it on a graph which contains in excess of 5 million edges. The file size of large_scc.txt is 69.2MB. Once again, we read the file in with a Stream, which when completed, passes the 2D array to our scc() function:

import 'package:graphlab/graphlab.dart';
import 'dart:async';
import 'dart:io';
import 'dart:convert';

void main() {
  String filename = 'data/large_scc.txt';
  List<List<int>> sccfile = [];
  Stream stream = new File(filename).openRead();
  stream
      .transform(UTF8.decoder)
      .transform(new LineSplitter())
      .listen((String line) {
        var stringBuffer = line.split(" ");
        var intBuffer = [];
        stringBuffer.forEach((element) {
          if(!element.isEmpty) {
            intBuffer.add(int.parse(element.trim()));
          }
        });
        sccfile.add(intBuffer);
      },
      onDone: () {
        print('There are ${sccfile.length} edges in this graph.');
        scc(sccfile).then((sccResults) {
          print('This graph contains ${sccResults.value} nodes');
          print('The total number of sccs is ${sccResults.data.length}');
        });
      },
      onError: (e) {
        print('There was an error: $e');
      });
}

The file contains 5,105,043 edges with 875,714 nodes. The scc() function computes a total of 371,762 strongly connected components. If we break down the performance of the algorithm running on several different Windows platforms, we can see that reading in the file and building the graph data structure consumes a significant portion of the computation time:

SCC Computation Time (seconds)
Platform Read File Construct Graph First DFS Second DFS Calculate Data Total
Core 2 Duo, 2.53GHz, 32bit, 4GB, Vista 38.2 31.3 7.8 1.5 0.46 78.8
i7-720 Quad, 1.6GHz, 64bit, 4GB, W7 36.5 31.9 7.4 1.3 0.51 77.6
i5-650 Dual, 3.2GHz, 64bit, 8GB, W7 29.2 27.6 6.3 1.3 0.41 64.8

From this data, it certainly seems reasonable that we can expect to achieve acceptable levels of performance for solving general purpose problems with large data sets with Dart. So let's see how well we can do using our algorithm to solve a special case of the boolean satisfiablilty problem.

Computing Solutions to the 2-SAT Problem


The 2-SAT problem is a special case of the boolean satisfiablility problem [6] and, unlike the general case, which is NP-complete, the 2-SAT problem is polynomial time solvable. In fact, computing the solution to the 2-SAT problem in linear time is possible using a variation on our scc() algorithm [7]. For example, if we consider our input edge list as a sequence of clauses of the form (A or B), we can create our DirectedGraph data structure as a collection of implications of the form (~A implies B) and (~B implies A). If, after completing the computation of the strongly connected components on the implication graph, a variable and its negation reside in the same scc, the formula is declared unsatisfiable.

part of graphlab;

Future<bool> twosat(List<List> satlist) {
  // Check if the input data is valid.
  if (satlist == null || satlist.isEmpty) {
    throw new ArgumentError("Input data is not valid.");
  }
  // Variables.
  DirectedGraph clauses = new DirectedGraph();
  HashSet variables = new HashSet();

  /// Add each variable and its negation as a node
  /// to the directed graph clauses.
  for (List list in satlist) {
    for (var element in list) {
      clauses.addNode(element);
      clauses.addNode(-element);
      variables.add(element);
    }
  }
  /// Map each clause (A or B) to (-A ~> B) and (-B ~> A).
  for (List list in satlist) {
    clauses.addEdge(-list[0], list[1]);
    clauses.addEdge(-list[1], list[0]);
  }
  // If running up against max heap size, clear the input data
  // once graph and clauses have been created.
  if (satlist.length > 500000) {
    satlist.clear();
  }

  /// Compute the strongly connected components of the modified clauses in
  /// the directed graph clauses using Kosaraju's alogorithm.
  HashMap scc = new _Kosaraju().computeSCC(clauses, negIndex:true);

  /// Search through the strongly connected components.  If an element
  /// and its negation reside in the same strongly connected component,
  /// the clause is unsatisfiable.
  for (var element in variables) {
    if (scc[element] == (scc[-element])) {
      // Clauses are unsatisfiable.
      return new Future(() => false);
    }
  }
  /// Otherwise there is a variable that will satisfy all clauses.
  // Clauses are satisfiable.
  return new Future(() => true);
}

Let's update our graphlab library to include the file twosat.dart which contains top level function twosat():

library graphlab;

import 'dart:collection';
import 'dart:async';

part 'src/algorithms/directed_graph.dart';
part 'src/algorithms/kosaraju.dart';
part 'src/algorithms/scc.dart';
part 'src/algorithms/twosat.dart';

part 'src/utilities/graphlab_results.dart';

part 'src/exceptions/graphlab_exception.dart';
part 'src/exceptions/no_such_element_exception.dart';

Before we take a look at the satisfiability of some large data files, let's begin with a few simple examples:

import 'package:graphlab/graphlab.dart';
import 'dart:async';

void main() {
  // Unsatisfiable.
  List<List> satfile1 = [[1, 1],
                         [-1, 2],
                         [-1, 3],
                         [-2, -3],
                         [4, 5]];

  // Satisfiable.
  List<List> satfile2 = [[1, 2],
                         [-1, 3],
                         [3, 4],
                         [-2, -4]];

  //  Unsatisfiable.
  List<List> satfile3 = [[-1, -4],
                         [-2, -7],
                         [2, -6],
                         [2, 7],
                         [-6, 7],
                         [1, -5],
                         [1, 7],
                         [-5, 7],
                         [-1, -7],
                         [-3, 6],
                         [3, -4],
                         [3, -6],
                         [-4, -6],
                         [2, 5],
                         [-2, 3],
                         [-3, -5]];

  List graphs = [satfile1, satfile2, satfile3];

  for (var graph in graphs) {
    twosat(graph).then((isSat) {
      if (isSat) {
        print('The graph\n$graph\nis satisfiable.\n');
      } else {
        print('The graph\n$graph\nis not satisfiable.\n');
      }
    });
  }

  // prints:
  //The graph
  //[[1, 1], [-1, 2], [-1, 3], [-2, -3], [4, 5]]
  //is not satisfiable.

  //The graph
  //[[1, 2], [-1, 3], [3, 4], [-2, -4]]
  //is satisfiable.

  //The graph
  //[[-1, -4], [-2, -7], [2, -6], [2, 7], [-6, 7], [1, -5], [1, 7], [-5, 7], 
  // [-1, -7], [-3, 6], [3, -4], [3, -6], [-4, -6], [2, 5], [-2, 3], [-3, -5]]
  //is not satisfiable.
}

But how is the performance for large data sets? We again establish a Stream to read in data from an external file:

import 'package:graphlab/graphlab.dart';
import 'dart:async';
import 'dart:io';
import 'dart:convert';

void main() {
  String filename = 'example/data/2sat1.txt';
  List<List<int>> satfile = [];
  Stream stream = new File(filename).openRead();
  stream
      .transform(UTF8.decoder)
      .transform(new LineSplitter())
      .listen((String line) {
        List<String> stringBuffer = line.split(" ");
        List<int> intBuffer = [];
        stringBuffer.forEach((element) {
          if(!element.isEmpty) intBuffer.add(int.parse(element.trim()));
        });
        satfile.add(intBuffer);
      },
      onDone: () {
        print('There are ${satfile.length} clauses in this graph.');
        twosat(satfile).then((isSat) {
          if (isSat) {
            print('The graph is satisfiable.');
          } else {
            print('The graph is not satisfiable.');
          }
        },
        onError: (e) {
          print('There was an error computing the 2-SAT: $e');
        });
      },
      onError: (e) {
        print('There was an error opening the file: $e');
      });
}

A summary of the performance for a variety of large data sets is in the table below:

2-SAT Computation Time (seconds)
Number of Clauses/Variables Read File Construct Graph First DFS Second DFS Calculate Data Total
100000 0.77 0.38 0.26 0.13 0.02 2.1
200000 1.4 0.94 0.31 0.52 0.001 4.6
400000 3.1 2.8 0.59 1.2 0.06 11.62
600000 4.5 5.9 1.6 1.7 0.09 22.2
800000 5.9 6.3 2.1 2.8 0.02 25.4
1000000 7.5 8.9 1.6 2.8 0.02 29.9

A 10x increase in the size of the input translates into approximately a 10x increase in the total computation time, suggesting that we are achieving an O(n) run time for solving the 2-SAT problem with this algorithm, as we anticipated.

The code for the graphlab library, including the 2-sat solver and the strongly connected components algorithm, can be found on Github [8].

Conclusion


Using a combination of Futures, Streams and collections in Dart, we implemented efficient algorithms to compute the strongly connected components of a directed graph for large graph sizes. We then applied this algorithm to solving, in linear time, the special case of the boolean satisfiability problem known as 2-SAT, also for a variety of large input data sets.


Works Cited

[1] Wikipedia: Strongly Connected Components
[2] Wikipedia: The 2-SAT Problem and Strongly Connected Components
[3] "A linear-time algorithm for testing the truth of certain quantified boolean formulas", Aspvall et all (1979), Information Processing Letters 8: pgs 121–123 (pdf)
[4] Kevin Moore's BOT - The Dart Bag-of-Tricks
[5] Getting Your Feet Wet with Streams - Chris Buckett
[6] Wikipedia: Boolean satisfiability problem
[7] Keith Schwarz (Stanford U) - TwoSat.java
[8] The graphlab library on Github