Files

Overview

This project demonstrates how to use GraphBolt to train and evaluate a GraphSAGE model for node classification task on large graphs, where node features are on-disk and fetched using DiskBasedFeature. GraphBolt utilizes various in-house implemented caching policy algorithms such as SIEVE, S3-FIFO, LRU and CLOCK to cache frequently required features and io_uring to fetch cache-missed features from disk. The SIEVE algorithm is the default option.

Node classification task

This example demonstrates how to run node classification task with GraphBolt.DiskBasedFeature. All results are collected on an AWS EC2 g5.8xlarge instance with 128GB RAM, 32 cores, an 24GB A10G GPU and a instance storage of 250K IOPS.

Run on ogbn-papers100M dataset

Dataset Graph Size Feature Size Feature Dim
ogbn-papers100M 13 GB 53 GB 128

Results with various caching policies

This part trains a three-layer GraphSAGE model for 3 epochs on ogbn-papers100M dataset with 10GB CPU cache, using neighbor sampling.

Run default SIEVE policy

Instruction:

python node_classification.py --gpu-cache-size-in-gigabytes=0 --cpu-cache-size-in-gigabytes=10 --dataset=ogbn-papers100M --epochs=3

Result:

Training: 1178it [03:00,  6.53it/s, num_nodes=671260, gpu_cache_miss=1, cpu_cache_miss=0.0578]                                             
Evaluating: 123it [00:16,  7.47it/s, num_nodes=624816, gpu_cache_miss=1, cpu_cache_miss=0.0569]
Epoch 00, Loss: 1.4173, Approx. Train: 0.5787, Approx. Val: 0.6353, Time: 180.33928060531616s                                              
Training: 1178it [01:39, 11.79it/s, num_nodes=648380, gpu_cache_miss=1, cpu_cache_miss=0.0451]                                             
Evaluating: 123it [00:15,  7.90it/s, num_nodes=625373, gpu_cache_miss=1, cpu_cache_miss=0.0451]
Epoch 01, Loss: 1.1446, Approx. Train: 0.6386, Approx. Val: 0.6382, Time: 99.92613315582275s                                               
Training: 1178it [01:36, 12.15it/s, num_nodes=674194, gpu_cache_miss=1, cpu_cache_miss=0.0408]                                             
Evaluating: 123it [00:15,  8.08it/s, num_nodes=628233, gpu_cache_miss=1, cpu_cache_miss=0.0409]
Epoch 02, Loss: 1.0975, Approx. Train: 0.6507, Approx. Val: 0.6535, Time: 96.95083212852478s

Performance Comparison on four caching polices

Below results demonstrate the epoch time with four different caching policies.

Policy Epoch 1 (s) Epoch 2 (s) Epoch 3 (s)
SIEVE 180.339 99.926 96.951
S3-FiFO 181.438 110.054 108.310
LRU 194.583 138.352 138.369
CLOCK 188.915 129.372 129.388

Results with Layer-Neighbor Sampling

This part trains a three-layer GraphSAGE model for 3 epochs on ogbn-papers100M dataset with 10GB CPU cache, using Layer-Neighbor Sampling and default SIEVE policy.

Run default --batch-dependency=1

Instruction:

python node_classification.py --gpu-cache-size-in-gigabytes=0 --cpu-cache-size-in-gigabytes=10 --dataset=ogbn-papers100M --sample-mode=sample_layer_neighbor --batch-dependency=1 --epochs=3

Result:

Training: 1178it [02:51,  6.88it/s, num_nodes=463495, gpu_cache_miss=1, cpu_cache_miss=0.0774]                                             
Evaluating: 123it [00:15,  7.94it/s, num_nodes=465592, gpu_cache_miss=1, cpu_cache_miss=0.0762]
Epoch 00, Loss: 1.4173, Approx. Train: 0.5774, Approx. Val: 0.6300, Time: 171.11454963684082s                                              
Training: 1178it [01:34, 12.43it/s, num_nodes=474446, gpu_cache_miss=1, cpu_cache_miss=0.0604]                                             
Evaluating: 123it [00:14,  8.45it/s, num_nodes=462042, gpu_cache_miss=1, cpu_cache_miss=0.0603]
Epoch 01, Loss: 1.1463, Approx. Train: 0.6384, Approx. Val: 0.6395, Time: 94.7821741104126s                                                
Training: 1178it [01:31, 12.82it/s, num_nodes=479331, gpu_cache_miss=1, cpu_cache_miss=0.0545]                                             
Evaluating: 123it [00:14,  8.67it/s, num_nodes=463628, gpu_cache_miss=1, cpu_cache_miss=0.0546]
Epoch 02, Loss: 1.1000, Approx. Train: 0.6501, Approx. Val: 0.6516, Time: 91.8746063709259s

Performance Comparison on different --batch-dependency

batch-dependency Epoch 1 (s) Epoch 2 (s) Epoch 3 (s)
1 171.114 94.782 91.875
64 144.241 78.749 75.270
4096 92.494 56.111 57.647

Effect of --layer-dependency

Below results demonstrate the effect of enabling --layer-dependency on epoch time when setting --batch-dependency=1.

layer-dependency Epoch 1 (s) Epoch 2 (s) Epoch 3 (s)
False 171.114 94.782 91.875
True 159.625 86.209 83.171

Compared to In-mem Performance

This part trains a three-layer GraphSAGE model for 3 epochs on ogbn-papers100M dataset with 20GB CPU cache and 5GB GPU cache, using neighbor sampling. We compare it to the in-mem performance with 5GB GPU cache. Following result demonstrates that with sufficient cache memory, the performance of DiskBasedFeature is not bottlenecked by the cache itself and comparable with in-memory feature stores. Note that the first epoch of training initiates the cache, thus taking longer time.

Instruction:

python node_classification.py --gpu-cache-size-in-gigabytes=5 --cpu-cache-size-in-gigabytes=20 --dataset=ogbn-papers100M --epochs=3

Result:

Feature Store Epoch 1 (s) Epoch 2 (s) Epoch 3 (s)
DiskBasedFeature 143.761 32.018 31.889
In-memory 28.861 28.330 28.305