CAVE: concurrency-aware graph processing on SSDs

Papon, Tarikul Islam; Chen, Taishan; Athanassoulis, Manos

dc.contributor.author	Papon, Tarikul Islam	en_US
dc.contributor.author	Chen, Taishan	en_US
dc.contributor.author	Athanassoulis, Manos	en_US
dc.date.accessioned	2024-10-24T15:17:43Z
dc.date.available	2024-10-24T15:17:43Z
dc.date.issued	2024-05-30
dc.identifier.citation	Tarikul Islam Papon, Taishan Chen, Shuo Zhang, and Manos Athanassoulis. 2024. CAVE: Concurrency-Aware Graph Processing on SSDs. Proc. ACM Manag. Data 2, 3, Article 125 (June 2024), 26 pages. https://doi.org/10.1145/3654928
dc.identifier.uri	https://hdl.handle.net/2144/49415
dc.description.abstract	Large-scale graph analytics has become increasingly common in areas like social networks, physical sciences, transportation networks, and recommendation systems. Since many such practical graphs do not fit in main memory, graph analytics performance depends on efficiently utilizing underlying storage devices. These out-of-core graph processing systems employ sharding and sub-graph partitioning to optimize for storage while relying on efficient sequential access of traditional hard disks. However, today's storage is increasingly based on solid-state drives (SSDs) that exhibit high internal parallelism and efficient random accesses. Yet, state-of-the-art graph processing systems do not explicitly exploit those properties, resulting in subpar performance. In this paper, we develop CAVE, the first graph processing engine that optimally exploits underlying SSD-based storage by harnessing the available storage device parallelism via carefully selecting which I/Os to graph data can be issued concurrently. Thus, CAVE traverses multiple paths and processes multiple nodes and edges concurrently, achieving parallelization at a granular level. We identify two key ways to parallelize graph traversal algorithms based on the graph structure and algorithm: intra-subgraph and inter-subgraph parallelization. The first identifies subgraphs that contain vertices that can be accessed in parallel, while the latter identifies subgraphs that can be processed in their entirety in parallel. To showcase the benefit of our approach, we build within CAVE parallelized versions of five popular graph algorithms (Breadth-First Search, Depth-First Search, Weakly Connected Components, PageRank, Random Walk) that exploit the full bandwidth of the underlying device. CAVE uses a blocked file format based on adjacency lists and employs a concurrent cache pool that is essential to the parallelization of graph algorithms. By experimenting with different types of graphs on three SSD devices, we demonstrate that CAVE utilizes the available parallelism, and scales to diverse real-world graph datasets. CAVE achieves up to one order of magnitude speedup compared to the popular out-of-core systems Mosaic and GridGraph, and up to three orders of magnitude speedup in runtime compared to GraphChi.	en_US
dc.language.iso	en_US
dc.publisher	ACM	en_US
dc.relation.ispartofseries	Proceedings of the ACM on Management of Data;
dc.rights	© 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. This article has been published under a Read & Publish Transformative Open Access (OA) Agreement with ACM.	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Hardware	en_US
dc.subject	Information Systems	en_US
dc.subject	Communication Hardware, Interfaces and Storage	en_US
dc.subject	External Storage	en_US
dc.subject	Data Management Systems	en_US
dc.subject	Database Design and Models	en_US
dc.subject	Graph-based Database Models	en_US
dc.subject	Information Storage Systems	en_US
dc.title	CAVE: concurrency-aware graph processing on SSDs	en_US
dc.type	Article	en_US
dc.identifier.doi	10.1145/3654928

This item appears in the following Collection(s)

BU Open Access Articles [6429]

Show simple item record

© 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. This article has been published under a Read & Publish Transformative Open Access (OA) Agreement with ACM.

Except where otherwise noted, this item's license is described as © 2024 Copyright held by the owner/author(s). This work is licensed under a Creative Commons Attribution International 4.0 License. This article has been published under a Read & Publish Transformative Open Access (OA) Agreement with ACM.