An Algorithmic Way to Generate Simplexes for Topological Data Analysis

In this article, we present a new algorithm for creating simplicial Vietoris-Rips complexes that are easily parallelizable using computation models like MapReduce and Apache Spark. The algorithm does not involve any computation in homology spaces.

Keywords: data analysis, topology, simplicial complex

The article of Silva and Grist [2] showed that the algebraic topology was a very practical tool. In this paper, its authors analyze the coverage of an area with a network of sensors. This network is interpreted as a simplicial complex. Its homology type is then computed and exploited. If this complex is homotopy equivalent to a point, the coverage is complete. Therefore, the authors translated a technical problem into a mathematical problem.

In the article [3] Grist searches for topology structures in big data sets. Again, this results in building simplicial complexes and analyzing them.

The above-mentioned works inspire us to ask the question of how to build simplicial complexes efficiently and how to analyze them. In this article, w propose a novel algorithm to build the simplicial complex. This algorithm has the unique property that it can be implemented within massive parallel computational models like MapReduce and Apache Spark.

Hereafter we assume that the data set given is a finite set of points P := {x1, . . . , xn} ⊂ R^d. This data can come up as a result of some physical experiments, psychological tests, etc., and can generally be high-dimensional. We have no insight into how the data actually look and how it is embedded, but it would be enough to know their “shape” to say something about how it is arranged in the original space.