[Test] Basic regression test setup. (#2415)

* add machine name * update scripts * update script * test commit * change run.sh * model acc bench for gcn and sage * get basic pipeline setup for local benchmarking * try to bridge pytest with asv * fix deps * move asv to other folders * move dir * update script * new setup * delete useless file * delete outputs * remove dependency on pytest * update script * test commit * stuck by torch version in dgl-ci-gpu * update readme * update asv conf * missing files * remove the old regression folder * api bench * add batch api bench Co-authored-by: Jinjing Zhou <VoVAllen@users.noreply.github.com>
2026-06-03 19:34:33 +08:00 · 2020-12-15 14:35:15 +08:00
parent 8ff4798075
commit 6634b984f4
32 changed files with 806 additions and 356 deletions
--- a/benchmarks/.gitignore
+++ b/benchmarks/.gitignore
@@ -0,0 +1,2 @@
+html
+results
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -0,0 +1,117 @@
+DGL Benchmarks
+====
+
+Benchmarking DGL with Airspeed Velocity.
+
+Usage
+---
+
+Before beginning, ensure that airspeed velocity is installed:
+
+```bash
+pip install asv
+```
+
+To run all benchmarks locally, build the project first and then run:
+
+```bash
+asv run -n -e --python=same --verbose
+```
+
+Note that local run will not produce any benchmark results on disk.
+To change the device for benchmarking, set the `DGL_BENCH_DEVICE` environment variable.
+Any valid PyTorch device strings are allowed.
+
+```bash
+export DGL_BENCH_DEVICE=cuda:0
+```
+
+DGL runs all benchmarks automatically in docker container. To run all benchmarks in docker,
+use the `publish.sh` script. It accepts two arguments, a name specifying the identity of
+the test machine and a device name.
+
+```bash
+bash publish.sh dev-machine cuda:0
+```
+
+The script will output two folders `results` and `html`. The `html` folder contains the
+generated static web pages. View it by:
+
+```bash
+asv preview
+```
+
+
+Adding a new benchmark suite
+---
+
+The benchmark folder is organized as follows:
+
+```
+|-- benchmarks/
+  |-- model_acc/           # benchmarks for model accuracy
+    |-- bench_gcn.py
+    |-- bench_gat.py
+    |-- bench_sage.py
+    ...
+  |-- model_speed/         # benchmarks for model training speed
+    |-- bench_gat.py
+    |-- bench_sage.py
+    ...
+  ...                      # other types of benchmarks
+|-- html/                  # generated html files
+|-- results/               # generated result files
+|-- asv.conf.json          # asv config file
+|-- build_dgl_asv.sh       # script for building dgl in asv
+|-- install_dgl_asv.sh     # script for installing dgl in asv
+|-- publish.sh             # script for running benchmarks in docker
+|-- README.md              # this readme
+|-- run.sh                 # script for calling asv in docker
+|-- ...                    # other aux files
+```
+
+To add a new benchmark, pick a suitable benchmark type and create a python script under
+it. We prefer to have the prefix `bench_` in the name. Here is a toy example:
+
+```python
+# bench_range.py
+
+import time
+from .. import utils
+
+@utils.benchmark('time')
+@utils.parametrize('l', [10, 100, 1000])
+@utils.parametrize('u', [10, 100, 1000])
+def track_time(l, u):
+    t0 = time.time()
+    for i in range(l, u):
+        pass
+    return time.time() - t0
+```
+
+* The main entry point of each benchmark script is a `track_*` function. The function
+  can have arbitrary arguments and must return the benchmark result.
+* There are two useful decorators: `utils.benchmark` and `utils.parametrize`.
+* `utils.benchmark` indicates the type of this benchmark. Currently supported types are:
+  `'time'` and `'acc'`. The decorator will perform some necessary setup and finalize
+  steps such as fixing the random seed for the `'acc'` type.
+* `utils.parametrize` specifies the parameters to test.
+  Multiple parametrize decorators mean benchmarking the combination.
+* Check out `model_acc/bench_gcn.py` and `model_speed/bench_sage.py`.
+* ASV's [official guide on writing benchmarks](https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
+  is also very helpful.
+
+
+Tips
+----
+* Feed flags `-e --verbose` to `asv run` to print out stderr and more information. Use `--bench` flag
+  to run specific benchmarks.
+* When running benchmarks locally (e.g., with `--python=same`), ASV will not write results to disk
+  so `asv publish` will not generate plots.
+* When running benchmarks in docker, ASV will pull the codes from remote and build them in conda
+  environment. The repository to pull is determined by `origin`, so it works with forked repository.
+  The branches are configured in `asv.conf.json`. If you wish to test the impact of your local source
+  code changes on performance in docker, remember to before running `publish.sh`:
+    - Commit your local changes and push it to remote `origin`.
+    - Add the corresponding branch to `asv.conf.json`.
+* Try make your benchmarks compatible with all the versions being tested.
--- a/benchmarks/asv.conf.json
+++ b/benchmarks/asv.conf.json
@@ -5,10 +5,10 @@
    // The name of the project being benchmarked
    "project": "dgl",
    // The project's homepage
-    "project_url": "https://github.com/dmlc/dgl",
+    "project_url": "https://www.dgl.ai",
    // The URL or local path of the source code repository for the
    // project being benchmarked
-    "repo": ".",
+    "repo": "..",
    // The Python project's subdirectory in your repo.  If missing or
    // the empty string, the project is assumed to be located at the root
    // of the repository.
@@ -16,34 +16,29 @@
    // Customizable commands for building, installing, and
    // uninstalling the project. See asv.conf.json documentation.
    //
-    "install_command": [
-        "/bin/bash {build_dir}/tests/regression/install_dgl_asv.sh"
-    ],
    "build_command": [
-        "/bin/bash {build_dir}/tests/regression/build_dgl_asv.sh"
+        "/bin/bash {conf_dir}/build_dgl_asv.sh"
+    ],
+    "install_command": [
+        "/bin/bash {conf_dir}/install_dgl_asv.sh"
    ],
    "uninstall_command": [
-        "return-code=any python -mpip uninstall -y dgl"
+        "return-code=any python -m pip uninstall -y dgl"
    ],
-    // "build_command": [
-    //     "python setup.py build",
-    //     "PIP_NO_BUILD_ISOLATION=false python -mpip wheel --no-deps --no-index -w {build_cache_dir} {build_dir}"
-    // ],
    // List of branches to benchmark. If not provided, defaults to "master"
    // (for git) or "default" (for mercurial).
-    "branches": ["master"], // for git
-    // "branches": ["default"],    // for mercurial
+    "branches": ["master", "0.5.0", "0.5.2", "0.5.3", "0.4.3.post2"], // for git
    // The DVCS being used.  If not set, it will be automatically
    // determined from "repo" by looking at the protocol in the URL
    // (if remote), or by looking for special directories, such as
    // ".git" (if local).
-    // "dvcs": "git",
+    "dvcs": "git",
    // The tool to use to create environments.  May be "conda",
    // "virtualenv" or other value depending on the plugins in use.
    // If missing or the empty string, the tool will be automatically
    // determined by looking for tools on the PATH environment
    // variable.
-    // "environment_type": "conda",
+    "environment_type": "conda",
    // timeout in seconds for installing any dependencies in environment
    // defaults to 10 min
    "install_timeout": 600,
@@ -104,16 +99,16 @@
    // ],
    // The directory (relative to the current directory) that benchmarks are
    // stored in.  If not provided, defaults to "benchmarks"
-    "benchmark_dir": "tests/regression",
+    // "benchmark_dir": "benchmarks",
    // The directory (relative to the current directory) to cache the Python
    // environments in.  If not provided, defaults to "env"
-    "env_dir": ".asv/env",
+    "env_dir": "env",
    // The directory (relative to the current directory) that raw benchmark
    // results are stored in.  If not provided, defaults to "results".
-    "results_dir": "asv/results",
+    "results_dir": "results",
    // The directory (relative to the current directory) that the html tree
    // should be written to.  If not provided, defaults to "html".
-    "html_dir": "asv/html",
+    "html_dir": "html",
    // The number of characters to retain in the commit hashes.
    // "hash_length": 8,
    // `asv` will cache results of the recent builds in each
--- a/benchmarks/benchmarks/init.py
+++ b/benchmarks/benchmarks/init.py
--- a/benchmarks/benchmarks/api/init.py
+++ b/benchmarks/benchmarks/api/init.py
--- a/benchmarks/benchmarks/api/bench_batch.py
+++ b/benchmarks/benchmarks/api/bench_batch.py
@@ -0,0 +1,29 @@
+import time
+import dgl
+import torch
+
+from .. import utils
+
+@utils.benchmark('time')
+@utils.parametrize('batch_size', [4, 32, 256])
+def track_time(batch_size):
+    device = utils.get_bench_device()
+
+    # prepare graph
+    graphs = []
+    for i in range(batch_size):
+        u = torch.randint(20, (40,))
+        v = torch.randint(20, (40,))
+        graphs.append(dgl.graph((u, v)).to(device))
+
+    # dry run
+    for i in range(10):
+        g = dgl.batch(graphs)
+
+    # timing
+    t0 = time.time()
+    for i in range(100):
+        g = dgl.batch(graphs)
+    t1 = time.time()
+
+    return (t1 - t0) / 100
--- a/benchmarks/benchmarks/model_acc/init.py
+++ b/benchmarks/benchmarks/model_acc/init.py
--- a/benchmarks/benchmarks/model_acc/bench_gat.py
+++ b/benchmarks/benchmarks/model_acc/bench_gat.py
@@ -0,0 +1,98 @@
+import dgl
+from dgl.nn.pytorch import GATConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .. import utils
+
+class GAT(nn.Module):
+    def __init__(self,
+                 num_layers,
+                 in_dim,
+                 num_hidden,
+                 num_classes,
+                 heads,
+                 activation,
+                 feat_drop,
+                 attn_drop,
+                 negative_slope,
+                 residual):
+        super(GAT, self).__init__()
+        self.num_layers = num_layers
+        self.gat_layers = nn.ModuleList()
+        self.activation = activation
+        # input projection (no residual)
+        self.gat_layers.append(GATConv(
+            in_dim, num_hidden, heads[0],
+            feat_drop, attn_drop, negative_slope, False, self.activation))
+        # hidden layers
+        for l in range(1, num_layers):
+            # due to multi-head, the in_dim = num_hidden * num_heads
+            self.gat_layers.append(GATConv(
+                num_hidden * heads[l-1], num_hidden, heads[l],
+                feat_drop, attn_drop, negative_slope, residual, self.activation))
+        # output projection
+        self.gat_layers.append(GATConv(
+            num_hidden * heads[-2], num_classes, heads[-1],
+            feat_drop, attn_drop, negative_slope, residual, None))
+
+    def forward(self, g, inputs):
+        h = inputs
+        for l in range(self.num_layers):
+            h = self.gat_layers[l](g, h).flatten(1)
+        # output projection
+        logits = self.gat_layers[-1](g, h).mean(1)
+        return logits
+
+def evaluate(model, g, features, labels, mask):
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, features)
+        logits = logits[mask]
+        labels = labels[mask]
+        _, indices = torch.max(logits, dim=1)
+        correct = torch.sum(indices == labels)
+        return correct.item() * 1.0 / len(labels) * 100
+
+@utils.benchmark('acc')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_acc(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+
+    g = data[0].to(device)
+
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+
+    # create model
+    model = GAT(1, in_feats, 8, n_classes, [8, 1], F.elu,
+                0.6, 0.6, 0.2, False)
+    loss_fcn = torch.nn.CrossEntropyLoss()
+
+    model = model.to(device)
+    model.train()
+
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    for epoch in range(200):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+    acc = evaluate(model, g, features, labels, test_mask)
+    return acc
--- a/benchmarks/benchmarks/model_acc/bench_gcn.py
+++ b/benchmarks/benchmarks/model_acc/bench_gcn.py
@@ -0,0 +1,91 @@
+import dgl
+from dgl.nn.pytorch import GraphConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .. import utils
+
+class GCN(nn.Module):
+    def __init__(self,
+                 in_feats,
+                 n_hidden,
+                 n_classes,
+                 n_layers,
+                 activation,
+                 dropout):
+        super(GCN, self).__init__()
+        self.layers = nn.ModuleList()
+        # input layer
+        self.layers.append(GraphConv(in_feats, n_hidden, activation=activation))
+        # hidden layers
+        for i in range(n_layers - 1):
+            self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation))
+        # output layer
+        self.layers.append(GraphConv(n_hidden, n_classes))
+        self.dropout = nn.Dropout(p=dropout)
+
+    def forward(self, g, features):
+        h = features
+        for i, layer in enumerate(self.layers):
+            if i != 0:
+                h = self.dropout(h)
+            h = layer(g, h)
+        return h
+
+def evaluate(model, g, features, labels, mask):
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, features)
+        logits = logits[mask]
+        labels = labels[mask]
+        _, indices = torch.max(logits, dim=1)
+        correct = torch.sum(indices == labels)
+        return correct.item() * 1.0 / len(labels) * 100
+
+@utils.benchmark('acc')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_acc(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+
+    g = data[0].to(device).int()
+
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+
+    # normalization
+    degs = g.in_degrees().float()
+    norm = torch.pow(degs, -0.5)
+    norm[torch.isinf(norm)] = 0
+    g.ndata['norm'] = norm.unsqueeze(1)
+
+    # create GCN model
+    model = GCN(in_feats, 16, n_classes, 1, F.relu, 0.5)
+    loss_fcn = torch.nn.CrossEntropyLoss()
+
+    model = model.to(device)
+    model.train()
+
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    for epoch in range(200):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+    acc = evaluate(model, g, features, labels, test_mask)
+    return acc
--- a/benchmarks/benchmarks/model_acc/bench_sage.py
+++ b/benchmarks/benchmarks/model_acc/bench_sage.py
@@ -0,0 +1,89 @@
+import dgl
+from dgl.nn.pytorch import SAGEConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .. import utils
+
+class GraphSAGE(nn.Module):
+    def __init__(self,
+                 in_feats,
+                 n_hidden,
+                 n_classes,
+                 n_layers,
+                 activation,
+                 dropout,
+                 aggregator_type):
+        super(GraphSAGE, self).__init__()
+        self.layers = nn.ModuleList()
+        self.dropout = nn.Dropout(dropout)
+        self.activation = activation
+
+        # input layer
+        self.layers.append(SAGEConv(in_feats, n_hidden, aggregator_type))
+        # hidden layers
+        for i in range(n_layers - 1):
+            self.layers.append(SAGEConv(n_hidden, n_hidden, aggregator_type))
+        # output layer
+        self.layers.append(SAGEConv(n_hidden, n_classes, aggregator_type)) # activation None
+
+    def forward(self, graph, inputs):
+        h = self.dropout(inputs)
+        for l, layer in enumerate(self.layers):
+            h = layer(graph, h)
+            if l != len(self.layers) - 1:
+                h = self.activation(h)
+                h = self.dropout(h)
+        return h
+
+def evaluate(model, g, features, labels, mask):
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, features)
+        logits = logits[mask]
+        labels = labels[mask]
+        _, indices = torch.max(logits, dim=1)
+        correct = torch.sum(indices == labels)
+        return correct.item() * 1.0 / len(labels) * 100
+
+@utils.benchmark('acc')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_acc(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+
+    g = data[0].to(device)
+
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+
+    # create model
+    model = GraphSAGE(in_feats, 16, n_classes, 1, F.relu, 0.5, 'gcn')
+    loss_fcn = torch.nn.CrossEntropyLoss()
+
+    model = model.to(device)
+    model.train()
+
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+    for epoch in range(200):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+    acc = evaluate(model, g, features, labels, test_mask)
+    return acc
--- a/benchmarks/benchmarks/model_speed/init.py
+++ b/benchmarks/benchmarks/model_speed/init.py
--- a/benchmarks/benchmarks/model_speed/bench_gat.py
+++ b/benchmarks/benchmarks/model_speed/bench_gat.py
@@ -0,0 +1,101 @@
+import time
+import dgl
+from dgl.nn.pytorch import GATConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .. import utils
+
+class GAT(nn.Module):
+    def __init__(self,
+                 num_layers,
+                 in_dim,
+                 num_hidden,
+                 num_classes,
+                 heads,
+                 activation,
+                 feat_drop,
+                 attn_drop,
+                 negative_slope,
+                 residual):
+        super(GAT, self).__init__()
+        self.num_layers = num_layers
+        self.gat_layers = nn.ModuleList()
+        self.activation = activation
+        # input projection (no residual)
+        self.gat_layers.append(GATConv(
+            in_dim, num_hidden, heads[0],
+            feat_drop, attn_drop, negative_slope, False, self.activation))
+        # hidden layers
+        for l in range(1, num_layers):
+            # due to multi-head, the in_dim = num_hidden * num_heads
+            self.gat_layers.append(GATConv(
+                num_hidden * heads[l-1], num_hidden, heads[l],
+                feat_drop, attn_drop, negative_slope, residual, self.activation))
+        # output projection
+        self.gat_layers.append(GATConv(
+            num_hidden * heads[-2], num_classes, heads[-1],
+            feat_drop, attn_drop, negative_slope, residual, None))
+
+    def forward(self, g, inputs):
+        h = inputs
+        for l in range(self.num_layers):
+            h = self.gat_layers[l](g, h).flatten(1)
+        # output projection
+        logits = self.gat_layers[-1](g, h).mean(1)
+        return logits
+
+@utils.benchmark('time')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_time(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+    num_epochs = 200
+
+    g = data[0].to(device)
+
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+
+    # create model
+    model = GAT(1, in_feats, 8, n_classes, [8, 1], F.elu,
+                0.6, 0.6, 0.2, False)
+    loss_fcn = torch.nn.CrossEntropyLoss()
+
+    model = model.to(device)
+    model.train()
+
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+
+    # dry run
+    for epoch in range(10):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+    # timing
+    t0 = time.time()
+    for epoch in range(num_epochs):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    t1 = time.time()
+
+    return t1 - t0
--- a/benchmarks/benchmarks/model_speed/bench_sage.py
+++ b/benchmarks/benchmarks/model_speed/bench_sage.py
@@ -0,0 +1,92 @@
+import time
+import dgl
+from dgl.nn.pytorch import SAGEConv
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+
+from .. import utils
+
+class GraphSAGE(nn.Module):
+    def __init__(self,
+                 in_feats,
+                 n_hidden,
+                 n_classes,
+                 n_layers,
+                 activation,
+                 dropout,
+                 aggregator_type):
+        super(GraphSAGE, self).__init__()
+        self.layers = nn.ModuleList()
+        self.dropout = nn.Dropout(dropout)
+        self.activation = activation
+
+        # input layer
+        self.layers.append(SAGEConv(in_feats, n_hidden, aggregator_type))
+        # hidden layers
+        for i in range(n_layers - 1):
+            self.layers.append(SAGEConv(n_hidden, n_hidden, aggregator_type))
+        # output layer
+        self.layers.append(SAGEConv(n_hidden, n_classes, aggregator_type)) # activation None
+
+    def forward(self, graph, inputs):
+        h = self.dropout(inputs)
+        for l, layer in enumerate(self.layers):
+            h = layer(graph, h)
+            if l != len(self.layers) - 1:
+                h = self.activation(h)
+                h = self.dropout(h)
+        return h
+
+@utils.benchmark('time')
+@utils.parametrize('data', ['cora', 'pubmed'])
+def track_time(data):
+    data = utils.process_data(data)
+    device = utils.get_bench_device()
+    num_epochs = 200
+
+    g = data[0].to(device)
+
+    features = g.ndata['feat']
+    labels = g.ndata['label']
+    train_mask = g.ndata['train_mask']
+    val_mask = g.ndata['val_mask']
+    test_mask = g.ndata['test_mask']
+
+    in_feats = features.shape[1]
+    n_classes = data.num_labels
+
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+
+    # create model
+    model = GraphSAGE(in_feats, 16, n_classes, 1, F.relu, 0.5, 'gcn')
+    loss_fcn = torch.nn.CrossEntropyLoss()
+
+    model = model.to(device)
+    model.train()
+
+    # optimizer
+    optimizer = torch.optim.Adam(model.parameters(),
+                                 lr=1e-2,
+                                 weight_decay=5e-4)
+
+    # dry run
+    for i in range(10):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+    # timing
+    t0 = time.time()
+    for epoch in range(num_epochs):
+        logits = model(g, features)
+        loss = loss_fcn(logits[train_mask], labels[train_mask])
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+    t1 = time.time()
+
+    return t1 - t0
--- a/benchmarks/benchmarks/utils.py
+++ b/benchmarks/benchmarks/utils.py
@@ -0,0 +1,88 @@
+import os
+import shutil, zipfile
+import requests
+import numpy as np
+import pandas
+import dgl
+import torch
+
+def _download(url, path, filename):
+    fn = os.path.join(path, filename)
+    if os.path.exists(fn):
+        return
+
+    os.makedirs(path, exist_ok=True)
+    f_remote = requests.get(url, stream=True)
+    sz = f_remote.headers.get('content-length')
+    assert f_remote.status_code == 200, 'fail to open {}'.format(url)
+    with open(fn, 'wb') as writer:
+        for chunk in f_remote.iter_content(chunk_size=1024*1024):
+            writer.write(chunk)
+    print('Download finished.')
+
+def get_livejournal():
+    _download('https://snap.stanford.edu/data/soc-LiveJournal1.txt.gz',
+              '/tmp', 'soc-LiveJournal1.txt.gz')
+    df = pandas.read_csv('/tmp/soc-LiveJournal1.txt.gz', sep='\t', skiprows=4, header=None,
+                         names=['src', 'dst'], compression='gzip')
+    src = np.array(df['src'])
+    dst = np.array(df['dst'])
+    print('construct the graph')
+    return dgl.DGLGraph((src, dst), readonly=True)
+
+def get_graph(name):
+    if name == 'livejournal':
+        return get_livejournal()
+    else:
+        print(name + " doesn't exist")
+        return None
+
+def process_data(name):
+    if name == 'cora':
+        return dgl.data.CoraGraphDataset()
+    elif name == 'pubmed':
+        return dgl.data.PubmedGraphDataset()
+    else:
+        raise ValueError('Invalid dataset name:', name)
+
+def get_bench_device():
+    return os.environ.get('DGL_BENCH_DEVICE', 'cpu')
+
+def setup_track_time(*args, **kwargs):
+    # fix random seed
+    np.random.seed(42)
+    torch.random.manual_seed(42)
+
+def setup_track_acc(*args, **kwargs):
+    # fix random seed
+    np.random.seed(42)
+    torch.random.manual_seed(42)
+
+TRACK_UNITS = {
+    'time' : 's',
+    'acc' : '%',
+}
+
+TRACK_SETUP = {
+    'time' : setup_track_time,
+    'acc' : setup_track_acc,
+}
+
+def parametrize(param_name, params):
+    def _wrapper(func):
+        if getattr(func, 'params', None) is None:
+            func.params = []
+        func.params.append(params)
+        if getattr(func, 'param_names', None) is None:
+            func.param_names = []
+        func.param_names.append(param_name)
+        return func
+    return _wrapper
+
+def benchmark(track_type):
+    assert track_type in ['time', 'acc']
+    def _wrapper(func):
+        func.unit = TRACK_UNITS[track_type]
+        func.setup = TRACK_SETUP[track_type]
+        return func
+    return _wrapper
--- a/benchmarks/build_dgl_asv.sh
+++ b/benchmarks/build_dgl_asv.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+set -e
+
+. /opt/conda/etc/profile.d/conda.sh
+
+# build
+CMAKE_VARS="-DUSE_CUDA=ON"
+mkdir -p build
+pushd build
+cmake $CMAKE_VARS ..
+make -j
+popd
--- a/benchmarks/install_dgl_asv.sh
+++ b/benchmarks/install_dgl_asv.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+
+set -e
+
+. /opt/conda/etc/profile.d/conda.sh
+
+pip install -r /asv/torch_gpu_pip.txt
+pip install pandas
+
+# install
+pushd python
+rm -rf build *.egg-info dist
+pip uninstall -y dgl
+python3 setup.py install
+popd
--- a/benchmarks/publish.sh
+++ b/benchmarks/publish.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+
+if [ $# -eq 2 ]; then
+    MACHINE=$1
+    DEVICE=$2
+else
+    echo "publish.sh <machine_name> <device>"
+    exit 1
+fi
+
+WS_ROOT=/asv/dgl
+
+docker run --name dgl-reg                   \
+           --rm --runtime=nvidia            \
+           --hostname=$MACHINE -dit dgllib/dgl-ci-gpu:conda /bin/bash
+docker exec dgl-reg mkdir -p $WS_ROOT
+docker cp ../.git dgl-reg:$WS_ROOT
+docker cp . dgl-reg:$WS_ROOT/benchmarks/
+docker cp torch_gpu_pip.txt dgl-reg:/asv
+docker exec dgl-reg bash $WS_ROOT/benchmarks/run.sh $DEVICE
+docker cp dgl-reg:$WS_ROOT/benchmarks/results .
+docker cp dgl-reg:$WS_ROOT/benchmarks/html .
+docker stop dgl-reg
--- a/benchmarks/run.sh
+++ b/benchmarks/run.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+
+set -e
+
+DEVICE=$1
+ROOT=/asv/dgl
+
+. /opt/conda/etc/profile.d/conda.sh
+
+conda activate base
+pip install --upgrade pip
+pip install asv
+pip uninstall -y dgl
+
+export DGL_BENCH_DEVICE=$DEVICE
+pushd $ROOT/benchmarks
+cat asv.conf.json
+asv machine --yes
+asv run
+asv publish
+popd
--- a/benchmarks/torch_gpu_pip.txt
+++ b/benchmarks/torch_gpu_pip.txt
@@ -0,0 +1,13 @@
+--find-links https://download.pytorch.org/whl/torch_stable.html
+torch==1.5.1+cu101
+torchvision==0.6.1+cu101
+pytest
+nose
+numpy
+cython
+scipy
+networkx
+matplotlib
+nltk
+requests[security]
+tqdm  
--- a/tests/regression/README.md
+++ b/tests/regression/README.md
@@ -1,27 +0,0 @@
-How to add test to regression
-=================================
-
-Official link to [asv](https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
-
-
-## Add test
-
-DGL reuses the ci docker image for the regression test. There are four conda envs, base, mxnet-ci, pytorch-ci, and tensorflow-ci.
-
-The basic use is execute a script, and get the needed results out of the printed results.
-
- Create a new file in the tests/regression/
- Follow the example `bench_gcn.py` or the [official instruction](https://asv.readthedocs.io/en/stable/writing_benchmarks.html)
-  - function name starts with `track` will be used to generate the stats, by the return value
-  - setup function would be execute every time before running track function
-  - Can use params to pass parameter into `setup` and `track_` functions
-
-## Run locally
-
-The default regression branch in asv is `master`. If you need to run on other branch on your fork, please change the `branches` value in the `asv.conf.json` at the root of your repo.
-
-```bash
-bash ./publish.sh <repo> <branch>
-```
-
-The running result will be at `./asv_data/`. You can use `python -m http.server` inside the `html` folder to start a server to see the result
--- a/tests/regression/init.py
+++ b/tests/regression/init.py
@@ -1 +0,0 @@
-
--- a/tests/regression/asv_data/README.md
+++ b/tests/regression/asv_data/README.md
@@ -1 +0,0 @@
-Empty folder for asv data place holder
--- a/tests/regression/bench_gcn.py
+++ b/tests/regression/bench_gcn.py
@@ -1,61 +0,0 @@
-# Write the benchmarking functions here.
-# See "Writing benchmarks" in the asv docs for more information.
-
-import subprocess
-import os
-from pathlib import Path
-import numpy as np
-import tempfile
-
-base_path = Path("~/regression/dgl/")
-
-
-class GCNBenchmark:
-
-    params = [['pytorch'], ['cora', 'pubmed'], ['0', '-1']]
-    param_names = ['backend', 'dataset', 'gpu_id']
-    timeout = 120
-
-    def __init__(self):
-        self.std_log = {}
-
-    def setup(self, backend, dataset, gpu_id):
-        key_name = "{}_{}_{}".format(backend, dataset, gpu_id)
-        if key_name in self.std_log:
-            return
-        gcn_path = base_path / "examples/{}/gcn/train.py".format(backend)
-        bashCommand = "/opt/conda/envs/{}-ci/bin/python {} --dataset {} --gpu {} --n-epochs 50".format(
-            backend, gcn_path.expanduser(), dataset, gpu_id)
-        process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE,env=dict(os.environ, DGLBACKEND=backend))
-        output, error = process.communicate()
-        print(str(error))
-        self.std_log[key_name] = str(output)
-
-
-    def track_gcn_time(self, backend, dataset, gpu_id):
-        key_name = "{}_{}_{}".format(backend, dataset, gpu_id)
-        lines = self.std_log[key_name].split("\\n")
-
-        time_list = []
-        for line in lines:
-            # print(line)
-            if 'Time' in line:
-                time_str = line.strip().split('|')[1]
-                time = float(time_str.split()[-1])
-                time_list.append(time)
-        return np.array(time_list)[-10:].mean()
-
-    def track_gcn_accuracy(self, backend, dataset, gpu_id):
-        key_name = "{}_{}_{}".format(backend, dataset, gpu_id)
-        lines = self.std_log[key_name].split("\\n")
-
-        test_acc = -1
-        for line in lines:
-            if 'Test accuracy' in line:
-                test_acc = float(line.split()[-1][:-1])
-                print(test_acc)
-        return test_acc
-
-
-GCNBenchmark.track_gcn_time.unit = 's'
-GCNBenchmark.track_gcn_accuracy.unit = '%'
--- a/tests/regression/bench_partition.py
+++ b/tests/regression/bench_partition.py
@@ -1,49 +0,0 @@
-# Write the benchmarking functions here.
-# See "Writing benchmarks" in the asv docs for more information.
-
-import subprocess
-import os
-from pathlib import Path
-import numpy as np
-import tempfile
-
-base_path = Path("~/regression/dgl/")
-
-class PartitionBenchmark:
-
-    params = [['pytorch'], ['livejournal']]
-    param_names = ['backend', 'dataset']
-    timeout = 600
-
-    def __init__(self):
-        self.std_log = {}
-
-    def setup(self, backend, dataset):
-        key_name = "{}_{}".format(backend, dataset)
-        if key_name in self.std_log:
-            return
-        bench_path = base_path / "tests/regression/benchmarks/partition.py"
-        bashCommand = "/opt/conda/envs/{}-ci/bin/python {} --dataset {}".format(
-            backend, bench_path.expanduser(), dataset)
-        process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE,env=dict(os.environ, DGLBACKEND=backend))
-        output, error = process.communicate()
-        print(str(error))
-        self.std_log[key_name] = str(output)
-
-
-    def track_partition_time(self, backend, dataset):
-        key_name = "{}_{}".format(backend, dataset)
-        lines = self.std_log[key_name].split("\\n")
-
-        time_list = []
-        for line in lines:
-            # print(line)
-            if 'Time:' in line:
-                time_str = line.strip().split(' ')[1]
-                time = float(time_str)
-                time_list.append(time)
-        return np.array(time_list).mean()
-
-
-PartitionBenchmark.track_partition_time.unit = 's'
-
--- a/tests/regression/bench_sage.py
+++ b/tests/regression/bench_sage.py
@@ -1,57 +0,0 @@
-# Write the benchmarking functions here.
-# See "Writing benchmarks" in the asv docs for more information.
-
-import subprocess
-import os
-from pathlib import Path
-import numpy as np
-import tempfile
-
-base_path = Path("~/regression/dgl/")
-
-
-class SAGEBenchmark:
-
-    params = [['pytorch'], ['0']]
-    param_names = ['backend', 'gpu']
-    timeout = 1800
-
-    def __init__(self):
-        self.std_log = {}
-
-    def setup(self, backend, gpu):
-        key_name = "{}_{}".format(backend, gpu)
-        if key_name in self.std_log:
-            return
-        run_path = base_path / "examples/{}/graphsage/train_sampling.py".format(backend)
-        bashCommand = "/opt/conda/envs/{}-ci/bin/python {} --num-workers=2 --num-epochs=16 --gpu={}".format(
-            backend, run_path.expanduser(), gpu)
-        process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE,env=dict(os.environ, DGLBACKEND=backend))
-        output, error = process.communicate()
-        print(str(error))
-        self.std_log[key_name] = str(output)
-
-
-    def track_sage_time(self, backend, gpu):
-        key_name = key_name = "{}_{}".format(backend, gpu)
-        lines = self.std_log[key_name].split("\\n")
-        time_list = []
-        for line in lines:
-            if line.startswith('Epoch Time'):
-                time_str = line.strip()[15:]
-                time_list.append(float(time_str))
-        return np.array(time_list).mean()
-
-    def track_sage_accuracy(self, backend, gpu):
-        key_name = key_name = "{}_{}".format(backend, gpu)
-        lines = self.std_log[key_name].split("\\n")
-        test_acc = 0.
-        for line in lines:
-            if line.startswith('Eval Acc'):
-                acc_str = line.strip()[9:]
-                test_acc = float(acc_str)
-        return test_acc * 100
-
-
-SAGEBenchmark.track_sage_time.unit = 's'
-SAGEBenchmark.track_sage_accuracy.unit = '%'
--- a/tests/regression/benchmarks/partition.py
+++ b/tests/regression/benchmarks/partition.py
@@ -1,17 +0,0 @@
-import dgl
-from dgl import distributed as dgl_distributed
-import argparse, time
-from utils import get_graph
-
-parser = argparse.ArgumentParser(description='partition')
-parser.add_argument("--dataset", type=str, default='livejournal',
-                    help="specify the graph for partitioning")
-parser.add_argument("--num_parts", type=int, default=16,
-                    help="the number of partitions")
-args = parser.parse_args()
-
-g = get_graph(args.dataset)
-print('{}: |V|={}, |E|={}'.format(args.dataset, g.number_of_nodes(), g.number_of_edges()))
-start = time.time()
-dgl_distributed.partition_graph(g, args.dataset, args.num_parts, '/tmp', num_hops=1, part_method="metis")
-print('Time: {} seconds'.format(time.time() - start))
--- a/tests/regression/benchmarks/utils.py
+++ b/tests/regression/benchmarks/utils.py
@@ -1,37 +0,0 @@
-import os
-import shutil, zipfile
-import requests
-import numpy as np
-import pandas
-import dgl
-
-def _download(url, path, filename):
-    fn = os.path.join(path, filename)
-    if os.path.exists(fn):
-        return
-
-    os.makedirs(path, exist_ok=True)
-    f_remote = requests.get(url, stream=True)
-    sz = f_remote.headers.get('content-length')
-    assert f_remote.status_code == 200, 'fail to open {}'.format(url)
-    with open(fn, 'wb') as writer:
-        for chunk in f_remote.iter_content(chunk_size=1024*1024):
-            writer.write(chunk)
-    print('Download finished.')
-
-def get_livejournal():
-    _download('https://snap.stanford.edu/data/soc-LiveJournal1.txt.gz',
-              '/tmp', 'soc-LiveJournal1.txt.gz')
-    df = pandas.read_csv('/tmp/soc-LiveJournal1.txt.gz', sep='\t', skiprows=4, header=None,
-                         names=['src', 'dst'], compression='gzip')
-    src = np.array(df['src'])
-    dst = np.array(df['dst'])
-    print('construct the graph')
-    return dgl.DGLGraph((src, dst), readonly=True)
-
-def get_graph(name):
-    if name == 'livejournal':
-        return get_livejournal()
-    else:
-        print(name + " doesn't exist")
-        return None
--- a/tests/regression/build_dgl_asv.sh
+++ b/tests/regression/build_dgl_asv.sh
@@ -1,10 +0,0 @@
-mkdir build
-
-CMAKE_VARS="-DUSE_CUDA=ON"
-
-rm -rf _download
-
-pushd build
-cmake $CMAKE_VARS ..
-make -j4
-popd
--- a/tests/regression/install_dgl_asv.sh
+++ b/tests/regression/install_dgl_asv.sh
@@ -1,22 +0,0 @@
-#!/bin/bash
-
-set -e
-
-python -m pip install numpy
-
-. /opt/conda/etc/profile.d/conda.sh
-
-pushd python
-for backend in pytorch mxnet tensorflow
-do 
-conda activate "${backend}-ci"
-rm -rf build *.egg-info dist
-pip uninstall -y dgl
-# test install
-python3 setup.py install
-# test inplace build (for cython)
-python3 setup.py build_ext --inplace
-python3 -m pip install -r /root/requirement.txt
-done
-popd
-conda deactivate
--- a/tests/regression/publish.sh
+++ b/tests/regression/publish.sh
@@ -1,21 +0,0 @@
-#!/bin/bash
-
-set -x
-
-if [ $# -ne 2 ]; then
-    REPO=dmlc
-    BRANCH=master
-else
-    REPO=$1
-    BRANCH=$2
-fi
-
-docker run --name dgl-reg --rm --runtime=nvidia --hostname=reg-machine -dit dgllib/dgl-ci-gpu:conda /bin/bash
-docker cp ./asv_data dgl-reg:/root/asv_data/
-docker cp ./run.sh dgl-reg:/root/run.sh
-docker cp ./requirement.txt dgl-reg:/root/requirement.txt
-docker exec dgl-reg bash /root/run.sh $REPO $BRANCH
-docker cp dgl-reg:/root/regression/dgl/asv/. ./asv_data/
-docker stop dgl-reg
-
-
--- a/tests/regression/requirement.txt
+++ b/tests/regression/requirement.txt
@@ -1 +0,0 @@
-pandas
--- a/tests/regression/run.sh
+++ b/tests/regression/run.sh
@@ -1,33 +0,0 @@
-#!/bin/bash
-set -e
-
-if [ $# -ne 2 ]; then
-    echo "run.sh <repo> <branch>"
-    exit 1
-fi
-
-REPO=$1
-BRANCH=$2
-
-. /opt/conda/etc/profile.d/conda.sh
-
-cd ~
-mkdir regression
-cd regression
-# git config core.filemode false
-git clone --recursive https://github.com/$REPO/dgl.git 
-cd dgl
-git checkout $BRANCH
-mkdir asv
-cp -r ~/asv_data/* asv/
-
-conda activate base
-pip install --upgrade pip
-pip install asv numpy
-
-export DGL_LIBRARY_PATH="~/dgl/build"
-
-conda activate base
-asv machine --yes
-asv run
-asv publish