
Registration


Opening of the GraphHPC2016
V. Voevodin, Doctor of Sciences, Corresponding member of the Russian Academy of Sciences, MSU RCC
A. Simonov, Ph. D., JSC NICEVT


A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, S. Muthukrishnan, Facebook Inc., USA
Hide
Analyzing large graphs provides valuable insights for social networking and web companies in content ranking and recommendations. While numerous graph processing systems have been developed and evaluated on available benchmark graphs of up to 6.6B edges, they often face significant difficulties in scaling to much larger graphs. Industry graphs can be two orders of magnitude larger  hundreds of billions or up to one trillion edges. In addition to scalability challenges, real world applications often require much more complex graph processing workflows than previously evaluated. In this paper, we describe the usability, performance, and scalability improvements we made to Apache Giraph, an opensource graph processing system, in order to use it on Facebookscale graphs of up to one trillion edges. We also describe several key extensions to the original Pregel model that make it possible to develop a broader range of production graph applications and workflows as well as improve code reuse. Finally, we report on realworld operations as well as performance characteristics of several largescale production applications.


Minimum Spanning Tree Problem Implementation on Supercomputers with Angara Interconnect
A. Semenov, A. Mazeev, DISLab/JSC NICEVT


Optimization on Graphs: Clustering and Reconciliation
V. Lyubetsky, K. Gorbunov, A. Seliverstov, Institute for Information Transmission Problems, Russian Academy of Sciences
Hide
The problems studied in our laboratory will be described. These include multipartite graph clustering (each cluster presents the maximum number of parts and the minimum number of nodes from each part), reconciliation of a large number of trees into a single tree, edit distance between CCgraphs (this distance is widely used for sequences but was recently extended to trees), and optimal extension along a tree of CCgraphs given in the tree leaves. The CCgraph is defined as an oriented graph with the total degree of each node not exceeding 2; investigation of such graphs is of great practical interest. We study these problems mathematically and implemented the results on a supercomputer (available at http://lab6.iitp.ru/). Some results rely on nontrivial implementations of integer linear programming. It is worth to note significance and nontrivial problem of computerbased generation of initial graphs in these problems, which would be impossible to decide without parallel computing.


Coffee Break


V. Weber, A. Pozdneev, T. Laino, IBM
Hide
Many graph analysis methods are expressed via sparse matrixmatrix multiplication (SMMM). For example, the betweenness centrality, strong connected components, and minimum paths problems are solved through a series of sparse matrixmatrix multiplications. The adaptation of a particular SMMM algorithm to a specific application domain can be achieved by choosing an appropriate underlying semiring. In this talk we present a novel highly scalable SMMM algorithm based on a midpoint approach. We apply the algorithm over the real field semiring to the construction of the density matrix in electronic structure theory. The computational benefits of the algorithm come from the matrix distribution over processes and the
communication pattern of the algorithm that follow the interaction structure of the atomic system. We use IBM Blue Gene/Q supercomputer to demonstrate the weak scalability of our SMMM algorithm using up to 59,319 MPI processes and strong scalability using up to 185,193 MPI processes. The talk focuses on the detail
s of the algorithm published in http://dx.doi.org/10.1021/acs.jctc.5b00382.


A. Kolganov, MSU
Hide
Breadthfirst search (BFS) is a core primitive for graph traversal and a basis for many higherlevel graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and datadependent. In the report will be discussed the implementation of the breadthfirst search algorithm (the main test of Graph500 List) for processing large graphs on a single node with multicore processor or graphics card (GPU). Also in the report will be discussed optimization techniques of the implementation for shared memory and some graph`s transformations for accelerating GPU memory access.


A. Osipov, A. Daryin, TPlatforms
Hide
This report is a continuation of last year's work on the implementation of distributed BreadthFirst Search (BFS) developed by TPlatforms. This year we added multithreading and modified backstepping algorithm. Performance evaluation of this implementation on the “AngaraK1” cluster with Angara highspeed interconnect is presented.


M. Chernoskutov, Institute of Mathematics and Mechanics, Ural Department of Russian Academy of Sciences
Hide
This report describes methods which allows to accelerate execution of some parallel graph algorithms on cluster computing systems. Methods, designed to distribute workloads amongst computing nodes will be presented. Also, there will be presented methods to reduce amount of data transfers between computational processes.


Lunch Break


Extension of Charm++ for Finegrained Parallel Applications
A. Frolov, DISLab/JSC NICEVT


A. Klimov, Institute of design problems in microelectronics of Russian academy of sciences
Hide
The difficulties of parallel programming are complicated by existence of a variety of platforms and paradigms, each of which requires writing a program almost from scratch. There is a need for a highlevel language in which one could express only once a mathematical structure of an algorithm and then derive an acceptable parallel code for the desired platform. A draft of such a language called PAMELA (PArallel MEtaLAnguage) is proposed. Its semantics is based on the dataflow paradigm, in which computations are activated on availability of input data. Along with an ordinary textual form, this language has a natural graphical form. Its opportunities are demonstrated by example of a parallel version of the Louvain method of detecting communities in graphs. The graphical form gives visibility and convenience not only for program writing and reading, but also for highlevel debugging, analyzing properties, discussion of possible solutions, identifying performance problems. The clarity and compactness of the language definition opens possibilities of equivalent transformations aimed at the optimization and efficient translation
into different target platforms.


A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, S. Muthukrishnan, Facebook Inc., USA
Hide
Synthetic graph generators facilitate research in graph algorithms and processing systems by providing access to data, for instance, graphs resembling social networks, while circumventing privacy and security concerns. Nevertheless, their practical value lies in their ability to capture important metrics of real graphs, such as degree distribution and clustering properties. Graph generators must also be able to produce such graphs at the scale of realworld industry graphs, that is, hundreds of billions or trillions of edges.
In this work, we propose Darwini, a graph generator that captures a number of core characteristics of real graphs. Importantly, given a source graph, it can reproduce the degree distribution and, unlike existing approaches, the local clustering coefficient and jointdegree distributions. Furthermore, Darwini maintains metrics such node PageRank, eigenvalues and the Kcore decomposition of a source graph. Comparing Darwini with stateoftheart generative models, we show that it can reproduce these characteristics more accurately. Finally, we provide an open source implementation of our approach on the vertexcentric Apache Giraph model that allows us to create synthetic graphs with one trillion edges.


GraphHPC2016 contest closing


GraphHPC2016 closing
