[DGL-Go] Change name to dglgo (#3778)

* add * remove * fix * rework the readme and some changes * add png * update png * add recipe get Co-authored-by: Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by: Quan (Andy) Gan <coin2028@hotmail.com>
2026-06-03 19:34:33 +08:00 · 2022-02-28 10:40:01 +08:00
parent d41d07d0f6
commit 266b21e535
58 changed files with 657 additions and 330 deletions
--- a/dglgo/README.md
+++ b/dglgo/README.md
@@ -0,0 +1,397 @@
+# DGL-Go
+
+
+DGL-Go is a command line tool for users to get started with training, using and
+studying Graph Neural Networks (GNNs). Data scientists can quickly apply GNNs
+to their problems, whereas researchers will find it useful to customize their
+experiments.
+
+
+## Installation and get started
+
+DGL-Go requires DGL v0.8+ so please make sure DGL is updated properly.
+Install DGL-Go by `pip install dglgo` and type `dgl` in your console:
+```
+Usage: dgl [OPTIONS] COMMAND [ARGS]...
+
+Options:
+  --help  Show this message and exit.
+
+Commands:
+  configure  Generate a configuration file
+  export     Export a runnable python script
+  recipe     Get example recipes
+  train      Launch training
+```
+
+![img](./dglgo.png)
+
+Using DGL-Go is as easy as three steps:
+
+1. Use `dgl configure` to pick the task, dataset and model of your interests. It generates
+   a configuration file for later use. You could also use `dgl recipe get` to retrieve
+   a configuration file we provided.
+1. Use `dgl train` to launch training according to the configuration and see the results.
+1. Use `dgl export` to generate a *self-contained, reproducible* Python script for advanced
+   customization, or try the model on custom data stored in CSV format.
+
+Next, we will walk through all these steps one-by-one.
+
+## Training GraphSAGE for node classification on Cora
+
+Let's use one of the most classical setups -- training a GraphSAGE model for node
+classification on the Cora citation graph dataset as an
+example.
+
+### Step one: `dgl configure`
+
+First step, use `dgl configure` to generate a YAML configuration file.
+
+```
+dgl configure nodepred --data cora --model sage --cfg cora_sage.yaml
+```
+
+Note that `nodepred` is the name of DGL-Go *pipeline*. For now, you can think of
+pipeline as training task: `nodepred` is for node prediction task; other
+options include `linkpred` for link prediction task, etc. The command will
+generate a configurate file `cora_sage.yaml` which includes:
+
+* Options for the selected dataset (i.e., `cora` here).
+* Model hyperparameters (e.g., number of layers, hidden size, etc.).
+* Training hyperparameters (e.g., learning rate, loss function, etc.).
+
+Different choices of task, model and datasets may give very different options,
+so DGL-Go also adds a comment for what each option does in the file.
+At this point you can also change options to explore optimization potentials. 
+
+Below shows the configuration file generated by the command above.
+
+```yaml
+version: 0.0.1
+pipeline_name: nodepred
+device: cpu
+data:
+  name: cora
+  split_ratio:                # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
+model:
+  name: sage
+  embed_size: -1              # The dimension of created embedding table. -1 means using original node embedding
+  hidden_size: 16             # Hidden size.
+  num_layers: 1               # Number of hidden layers.
+  activation: relu            # Activation function name under torch.nn.functional
+  dropout: 0.5                # Dropout rate.
+  aggregator_type: gcn        # Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
+general_pipeline:
+  early_stop:
+    patience: 20              # Steps before early stop
+    checkpoint_path: checkpoint.pth # Early stop checkpoint model file path
+  num_epochs: 200             # Number of training epochs
+  eval_period: 5              # Interval epochs between evaluations
+  optimizer:
+    name: Adam
+    lr: 0.01
+    weight_decay: 0.0005
+  loss: CrossEntropyLoss
+  save_path: model.pth        # Path to save the model
+  num_runs: 1                 # Number of experiments to run
+```
+
+Apart from `dgl configure`, you could also get one of DGL-Go's built-in configuration files
+(called *recipe*) using `dgl recipe`. There are two sub-commands:
+
+```
+dgl recipe list
+```
+
+will list the available recipes:
+
+```
+➜ dgl recipe list     
+===============================================================================
+| Filename                       |  Pipeline           | Dataset              |
+===============================================================================
+| linkpred_citation2_sage.yaml   |  linkpred           | ogbl-citation2       |
+| linkpred_collab_sage.yaml      |  linkpred           | ogbl-collab          |
+| nodepred_citeseer_sage.yaml    |  nodepred           | citeseer             |
+| nodepred_citeseer_gcn.yaml     |  nodepred           | citeseer             |
+| nodepred-ns_arxiv_gcn.yaml     |  nodepred-ns        | ogbn-arxiv           |
+| nodepred_cora_gat.yaml         |  nodepred           | cora                 |
+| nodepred_pubmed_sage.yaml      |  nodepred           | pubmed               |
+| linkpred_cora_sage.yaml        |  linkpred           | cora                 |
+| nodepred_pubmed_gcn.yaml       |  nodepred           | pubmed               |
+| nodepred_pubmed_gat.yaml       |  nodepred           | pubmed               |
+| nodepred_cora_gcn.yaml         |  nodepred           | cora                 |
+| nodepred_cora_sage.yaml        |  nodepred           | cora                 |
+| nodepred_citeseer_gat.yaml     |  nodepred           | citeseer             |
+| nodepred-ns_product_sage.yaml  |  nodepred-ns        | ogbn-products        |
+===============================================================================
+```
+
+Then use
+
+```
+dgl recipe get nodepred_cora_sage.yaml
+```
+
+to copy the YAML configuration file to your local folder.
+
+### Step 2: `dgl train`
+
+Simply run `dgl train --cfg cora_sage.yaml` will start the training process.
+```log
+...
+Epoch 00190 | Loss 1.5225 | TrainAcc 0.9500 | ValAcc 0.6840
+Epoch 00191 | Loss 1.5416 | TrainAcc 0.9357 | ValAcc 0.6840
+Epoch 00192 | Loss 1.5391 | TrainAcc 0.9357 | ValAcc 0.6840
+Epoch 00193 | Loss 1.5257 | TrainAcc 0.9643 | ValAcc 0.6840
+Epoch 00194 | Loss 1.5196 | TrainAcc 0.9286 | ValAcc 0.6840
+EarlyStopping counter: 12 out of 20
+Epoch 00195 | Loss 1.4862 | TrainAcc 0.9643 | ValAcc 0.6760
+Epoch 00196 | Loss 1.5142 | TrainAcc 0.9714 | ValAcc 0.6760
+Epoch 00197 | Loss 1.5145 | TrainAcc 0.9714 | ValAcc 0.6760
+Epoch 00198 | Loss 1.5174 | TrainAcc 0.9571 | ValAcc 0.6760
+Epoch 00199 | Loss 1.5235 | TrainAcc 0.9714 | ValAcc 0.6760
+Test Accuracy 0.7740
+Accuracy across 1 runs: 0.774 ± 0.0
+```
+
+That's all! Basically you only need two commands to train a graph neural network.
+
+### Step 3: `dgl export` for more advanced customization
+
+That's not everything yet. You may want to open the hood and and invoke deeper
+customization. DGL-Go can export a **self-contained, reproducible** Python
+script for you to do anything you like. 
+
+Try `dgl export --cfg cora_sage.yaml --output script.py`,
+and you'll get the script used to train the model. Here's the code snippet:
+
+```python
+...
+
+class GraphSAGE(nn.Module):
+    def __init__(self,
+                 data_info: dict,
+                 embed_size: int = -1,
+                 hidden_size: int = 16,
+                 num_layers: int = 1,
+                 activation: str = "relu",
+                 dropout: float = 0.5,
+                 aggregator_type: str = "gcn"):
+        """GraphSAGE model
+
+        Parameters
+        ----------
+        data_info : dict
+            The information about the input dataset.
+        embed_size : int
+            The dimension of created embedding table. -1 means using original node embedding
+        hidden_size : int
+            Hidden size.
+        num_layers : int
+            Number of hidden layers.
+        dropout : float
+            Dropout rate.
+        activation : str
+            Activation function name under torch.nn.functional
+        aggregator_type : str
+            Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
+        """
+        super(GraphSAGE, self).__init__()
+        self.data_info = data_info
+        self.embed_size = embed_size
+        if embed_size > 0:
+            self.embed = nn.Embedding(data_info["num_nodes"], embed_size)
+            in_size = embed_size
+        else:
+            in_size = data_info["in_size"]
+        self.layers = nn.ModuleList()
+        self.dropout = nn.Dropout(dropout)
+        self.activation = getattr(nn.functional, activation)
+
+        for i in range(num_layers):
+            in_hidden = hidden_size if i > 0 else in_size
+            out_hidden = hidden_size if i < num_layers - 1 else data_info["out_size"]
+            self.layers.append(dgl.nn.SAGEConv( in_hidden, out_hidden, aggregator_type))
+
+    def forward(self, graph, node_feat, edge_feat=None):
+        if self.embed_size > 0:
+            dgl_warning(
+                "The embedding for node feature is used, and input node_feat is ignored, due to the provided embed_size.",
+                norepeat=True)
+            h = self.embed.weight
+        else:
+            h = node_feat
+        h = self.dropout(h)
+        for l, layer in enumerate(self.layers):
+            h = layer(graph, h, edge_feat)
+            if l != len(self.layers) - 1:
+                h = self.activation(h)
+                h = self.dropout(h)
+        return h
+
+...
+
+def train(cfg, pipeline_cfg, device, data, model, optimizer, loss_fcn):
+    g = data[0]  # Only train on the first graph
+    g = dgl.remove_self_loop(g)
+    g = dgl.add_self_loop(g)
+    g = g.to(device)
+
+    node_feat = g.ndata.get('feat', None)
+    edge_feat = g.edata.get('feat', None)
+    label = g.ndata['label']
+    train_mask, val_mask, test_mask = g.ndata['train_mask'].bool(
+    ), g.ndata['val_mask'].bool(), g.ndata['test_mask'].bool()
+
+    stopper = EarlyStopping(**pipeline_cfg['early_stop'])
+
+    val_acc = 0.
+    for epoch in range(pipeline_cfg['num_epochs']):
+        model.train()
+        logits = model(g, node_feat, edge_feat)
+        loss = loss_fcn(logits[train_mask], label[train_mask])
+
+        optimizer.zero_grad()
+        loss.backward()
+        optimizer.step()
+
+        train_acc = accuracy(logits[train_mask], label[train_mask])
+        if epoch != 0 and epoch % pipeline_cfg['eval_period'] == 0:
+            val_acc = accuracy(logits[val_mask], label[val_mask])
+
+            if stopper.step(val_acc, model):
+                break
+
+        print("Epoch {:05d} | Loss {:.4f} | TrainAcc {:.4f} | ValAcc {:.4f}".
+              format(epoch, loss.item(), train_acc, val_acc))
+
+    stopper.load_checkpoint(model)
+
+    model.eval()
+    with torch.no_grad():
+        logits = model(g, node_feat, edge_feat)
+        test_acc = accuracy(logits[test_mask], label[test_mask])
+    return test_acc
+
+
+def main():
+    cfg = {
+        'version': '0.0.1',
+        'device': 'cuda:0',
+        'model': {
+            'embed_size': -1,
+            'hidden_size': 16,
+            'num_layers': 2,
+            'activation': 'relu',
+            'dropout': 0.5,
+            'aggregator_type': 'gcn'},
+        'general_pipeline': {
+            'early_stop': {
+                'patience': 100,
+                'checkpoint_path': 'checkpoint.pth'},
+            'num_epochs': 200,
+            'eval_period': 5,
+            'optimizer': {
+                'lr': 0.01,
+                'weight_decay': 0.0005},
+            'loss': 'CrossEntropyLoss',
+            'save_path': 'model.pth',
+            'num_runs': 10}}
+    device = cfg['device']
+    pipeline_cfg = cfg['general_pipeline']
+    # load data
+    data = AsNodePredDataset(CoraGraphDataset())
+    # create model
+    model_cfg = cfg["model"]
+    cfg["model"]["data_info"] = {
+        "in_size": model_cfg['embed_size'] if model_cfg['embed_size'] > 0 else data[0].ndata['feat'].shape[1],
+        "out_size": data.num_classes,
+        "num_nodes": data[0].num_nodes()
+    }
+    model = GraphSAGE(**cfg["model"])
+    model = model.to(device)
+    loss = torch.nn.CrossEntropyLoss()
+    optimizer = torch.optim.Adam(
+        model.parameters(),
+        **pipeline_cfg["optimizer"])
+    # train
+    test_acc = train(cfg, pipeline_cfg, device, data, model, optimizer, loss)
+    torch.save(model, pipeline_cfg["save_path"])
+    return test_acc
+
+...
+```
+
+You can see that everything is collected into one Python script which includes the
+entire `GraphSAGE` model definition, data processing and training loop. Simply running
+`python script.py` will give you the *exact same* result as you've seen by `dgl train`.
+At this point, you can change any part as you wish such as plugging your own GNN module,
+changing the loss function and so on.
+
+## Use DGL-Go on your own dataset
+
+DGL-Go supports training a model on custom dataset by DGL's `CSVDataset`.
+
+### Step 1: Prepare your CSV and metadata file.
+
+Follow the tutorial at [Loading data from CSV
+files](https://docs.dgl.ai/en/latest/guide/data-loadcsv.html#guide-data-pipeline-loadcsv`)
+to prepare your dataset. Generally, the dataset folder should include:
+* At least one CSV file for node data.
+* At least one CSV file for edge data.
+* A metadata file called `meta.yaml`.
+
+### Step 2: `dgl configure` with `--data csv` option
+Run
+
+```
+dgl configure nodepred --data csv --model sage --cfg csv_sage.yaml
+```
+
+to generate the configuration file. You will see that the file includes a section like
+the followings:
+
+```yaml
+...
+data:
+  name: csv
+  split_ratio:                # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
+  data_path: ./               # metadata.yaml, nodes.csv, edges.csv should in this folder
+...
+```
+
+Fill in the `data_path` option with the path to your dataset folder.
+
+If your dataset does not have any native split for training, validation and test sets,
+you can set the split ratio in the `split_ratio` option, which will
+generate a random split for you.
+
+### Step 3: `train` the model / `export` the script
+Then you can do the same as the tutorial above, either train the model by
+`dgl train --cfg csv_sage.yaml` or use `dgl export --cfg csv_sage.yaml
+--output script.py` to get the training script.
+
+## FAQ
+
+**Q: What are the available options for each command?**
+A: You can use `--help` for all commands. For example, use `dgl --help` for general
+help message; use `dgl configure --help` for the configuration options; use
+`dgl configure nodepred --help` for the configuration options of node prediction pipeline.
+
+**Q: What exactly is nodepred/linkpred? How many are they?**
+A: They are called DGl-Go pipelines. A pipeline represents the training methodology for
+a certain task. Therefore, its naming convention is *<task_name>[-<method_name>]*. For example,
+`nodepred` trains the selected GNN model for node classification using full-graph training method;
+while `nodepred-ns` trains the model for node classifiation but using neighbor sampling.
+The first release included three training pipelines (`nodepred`, `nodepred-ns` and `linkpred`)
+but you can expect more will be coming in the future. Use `dgl configure --help` to see
+all the available pipelines.
+
+**Q: How to add my model to the official model recipe zoo?**
+A: Currently not supported. We will enable this feature soon. Please stay tuned!
+
+**Q: After training a model on some dataset, how can I apply it to another one?**
+A: The `save_path` option in the generated configuration file allows you to specify where
+to save the model after training. You can then modify the script generated by `dgl export`
+to load the the model checkpoint and evaluate it on another dataset.
--- a/dglgo/dglgo.png
+++ b/dglgo/dglgo.png
--- a/enter/dglenter/init.py
+++ b/enter/dglenter/init.py
--- a/enter/dglenter/cli/init.py
+++ b/enter/dglenter/cli/init.py
--- a/dglgo/dglgo/cli/cli.py
+++ b/dglgo/dglgo/cli/cli.py
@@ -0,0 +1,20 @@
+import typer
+from ..pipeline import *
+from ..model import *
+from .config_cli import config_app
+from .train_cli import train
+from .export_cli import export
+from .recipe_cli import recipe_app
+
+no_args_is_help = False
+app = typer.Typer(no_args_is_help=True, add_completion=False)
+app.add_typer(config_app, name="configure", no_args_is_help=no_args_is_help)
+app.add_typer(recipe_app, name="recipe", no_args_is_help=True)
+app.command(help="Launch training", no_args_is_help=no_args_is_help)(train)
+app.command(help="Export a runnable python script", no_args_is_help=no_args_is_help)(export)
+
+def main():
+    app()
+
+if __name__ == "__main__":
+    app()
--- a/enter/dglenter/cli/config_cli.py
+++ b/enter/dglenter/cli/config_cli.py
@@ -6,9 +6,9 @@ import typing
 import yaml
 from pathlib import Path

-config_app = typer.Typer(help="Generate the config files")
+config_app = typer.Typer(help="Generate a configuration file")
 for key, pipeline in PipelineFactory.registry.items():
    config_app.command(key, help=pipeline.get_description())(pipeline.get_cfg_func())

 if __name__ == "__main__":
-    config_app()
+    config_app()
--- a/enter/dglenter/cli/export_cli.py
+++ b/enter/dglenter/cli/export_cli.py
@@ -10,8 +10,8 @@ import isort
 import autopep8

 def export(
-    cfg: str = typer.Option("cfg.yml", help="config yaml file name"),
-    output: str = typer.Option("output.py", help="output python file name")
+    cfg: str = typer.Option("cfg.yaml", help="config yaml file name"),
+    output: str = typer.Option("script.py", help="output python file name")
 ):
    user_cfg = yaml.safe_load(Path(cfg).open("r"))
    pipeline_name = user_cfg["pipeline_name"]
--- a/dglgo/dglgo/cli/recipe_cli.py
+++ b/dglgo/dglgo/cli/recipe_cli.py
@@ -0,0 +1,54 @@
+from pathlib import Path
+from typing import Optional
+import typer
+import os
+import shutil
+import yaml
+
+def list_recipes():
+    file_current_dir = Path(__file__).resolve().parent
+    recipe_dir = file_current_dir.parent.parent / "recipes"
+    file_list = list(recipe_dir.glob("*.yaml"))
+    header = "| {:<30} |  {:<18} | {:<20} |".format("Filename", "Pipeline", "Dataset")
+    typer.echo("="*len(header))
+    typer.echo(header)
+    typer.echo("="*len(header))
+    for file in file_list:
+        cfg = yaml.safe_load(Path(file).open("r"))
+        typer.echo("| {:<30} |  {:<18} | {:<20} |".format(file.name, cfg["pipeline_name"], cfg["data"]["name"]))
+    typer.echo("="*len(header))
+
+def copy_recipes(dir: str = typer.Option("dglgo_example_recipes", help="directory name for recipes")):
+    file_current_dir = Path(__file__).resolve().parent
+    recipe_dir = file_current_dir.parent.parent / "recipes"
+    current_dir = Path(os.getcwd())
+    new_dir = current_dir / dir
+    new_dir.mkdir(parents=True, exist_ok=True)
+    for file in recipe_dir.glob("*.yaml"):
+        shutil.copy(file, new_dir)
+    print("Example recipes are copied to {}".format(new_dir.absolute()))
+
+def get_recipe(recipe_name: Optional[str] = typer.Argument(None, help="The recipe filename to get, e.q. nodepred_citeseer_gcn.yaml")):
+    if recipe_name is None:
+        typer.echo("Usage: dgl recipe get [RECIPE_NAME] \n")
+        typer.echo(" Copy the recipe to current directory \n")
+        typer.echo(" Arguments:")
+        typer.echo("  [RECIPE_NAME]  The recipe filename to get, e.q. nodepred_citeseer_gcn.yaml\n")
+        typer.echo("Here are all avaliable recipe filename")
+        list_recipes()
+    else:
+        file_current_dir = Path(__file__).resolve().parent
+        recipe_dir = file_current_dir.parent.parent / "recipes"
+        current_dir = Path(os.getcwd())
+        recipe_path = recipe_dir / recipe_name
+        shutil.copy(recipe_path, current_dir)
+        print("Recipe {} is copied to {}".format(recipe_path.absolute(), current_dir.absolute()))
+
+
+recipe_app = typer.Typer(help="Get example recipes")
+recipe_app.command(name="list", help="List all available example recipes")(list_recipes)
+recipe_app.command(name="copy", help="Copy all available example recipes to current directory")(copy_recipes)
+recipe_app.command(name="get", help="Copy the recipe to current directory")(get_recipe)
+
+if __name__ == "__main__":
+    recipe_app()
--- a/enter/dglenter/cli/train_cli.py
+++ b/enter/dglenter/cli/train_cli.py
@@ -5,12 +5,11 @@ from enum import Enum
 import typing
 import yaml
 from pathlib import Path
-
 import isort
 import autopep8

 def train(
-    cfg: str = typer.Option("cfg.yml", help="config yaml file name"),
+    cfg: str = typer.Option("cfg.yaml", help="config yaml file name"),
 ):
    user_cfg = yaml.safe_load(Path(cfg).open("r"))
    pipeline_name = user_cfg["pipeline_name"]
@@ -18,8 +17,8 @@ def train(

    f_code = autopep8.fix_code(output_file_content, options={'aggressive': 1})
    f_code = isort.code(f_code)
-    exec(f_code,  {'__name__': '__main__'})
-
+    code = compile(f_code, 'dglgo_tmp.py', 'exec')
+    exec(code,  {'__name__': '__main__'})

 if __name__ == "__main__":
    train_app = typer.Typer()
--- a/enter/dglenter/model/init.py
+++ b/enter/dglenter/model/init.py
--- a/enter/dglenter/model/edge_encoder/init.py
+++ b/enter/dglenter/model/edge_encoder/init.py
--- a/enter/dglenter/model/edge_encoder/bilinear.py
+++ b/enter/dglenter/model/edge_encoder/bilinear.py
--- a/enter/dglenter/model/edge_encoder/dot.py
+++ b/enter/dglenter/model/edge_encoder/dot.py
--- a/enter/dglenter/model/edge_encoder/ele.py
+++ b/enter/dglenter/model/edge_encoder/ele.py
--- a/enter/dglenter/model/node_encoder/init.py
+++ b/enter/dglenter/model/node_encoder/init.py
--- a/enter/dglenter/model/node_encoder/gat.py
+++ b/enter/dglenter/model/node_encoder/gat.py
--- a/enter/dglenter/model/node_encoder/gcn.py
+++ b/enter/dglenter/model/node_encoder/gcn.py
@@ -49,7 +49,7 @@ class GCN(nn.Module):
            in_hidden = hidden_size if i > 0 else in_size
            out_hidden = hidden_size if i < num_layers - 1 else data_info["out_size"]

-            self.layers.append(dgl.nn.GraphConv(in_hidden, out_hidden, norm=norm))
+            self.layers.append(dgl.nn.GraphConv(in_hidden, out_hidden, norm=norm, allow_zero_in_degree=True))

        self.dropout = nn.Dropout(p=dropout)
        self.act = getattr(torch, activation)
--- a/enter/dglenter/model/node_encoder/gin.py
+++ b/enter/dglenter/model/node_encoder/gin.py
@@ -12,6 +12,8 @@ class GIN(nn.Module):
                 aggregator_type='sum'):
        """Graph Isomophism Networks

+        Edge feature is ignored in this model.
+
        Parameters
        ----------
        data_info : dict
--- a/enter/dglenter/model/node_encoder/sage.py
+++ b/enter/dglenter/model/node_encoder/sage.py
@@ -55,7 +55,7 @@ class GraphSAGE(nn.Module):
            h = node_feat
        h = self.dropout(h)
        for l, layer in enumerate(self.layers):
-            h = layer(graph, h)
+            h = layer(graph, h, edge_feat)
            if l != len(self.layers) - 1:
                h = self.activation(h)
                h = self.dropout(h)
@@ -64,7 +64,7 @@ class GraphSAGE(nn.Module):
    def forward_block(self, blocks, node_feat, edge_feat = None):
        h = node_feat
        for l, (layer, block) in enumerate(zip(self.layers, blocks)):
-            h = layer(block, h)
+            h = layer(block, h, edge_feat)
            if l != len(self.layers) - 1:
                h = self.activation(h)
                h = self.dropout(h)
--- a/enter/dglenter/model/node_encoder/sgc.py
+++ b/enter/dglenter/model/node_encoder/sgc.py
@@ -14,6 +14,8 @@ class SGC(nn.Module):
                 bias=True, k=2):
        """ Simplifying Graph Convolutional Networks

+        Edge feature is ignored in this model.
+
        Parameters
        ----------
        data_info : dict
--- a/enter/dglenter/pipeline/init.py
+++ b/enter/dglenter/pipeline/init.py
--- a/enter/dglenter/pipeline/linkpred/init.py
+++ b/enter/dglenter/pipeline/linkpred/init.py
--- a/enter/dglenter/pipeline/linkpred/gen.py
+++ b/enter/dglenter/pipeline/linkpred/gen.py
@@ -20,6 +20,7 @@ class LinkpredPipelineCfg(BaseModel):
    eval_period: int = 5
    optimizer: dict = {"name": "Adam", "lr": 0.005}
    loss: str = "BCELoss"
+    save_path: str = "model.pth"
    num_runs: int = 1


@@ -29,6 +30,7 @@ pipeline_comments = {
    "train_batch_size": "Edge batch size when training",
    "num_epochs": "Number of training epochs",
    "eval_period": "Interval epochs between evaluations",
+    "save_path": "Path to save the model",
    "num_runs": "Number of experiments to run",
 }

@@ -67,20 +69,18 @@ class LinkpredPipeline(PipelineBase):
        def config(
            data: DataFactory.filter("linkpred").get_dataset_enum() = typer.Option(..., help="input data name"),
            cfg: str = typer.Option(
-                "cfg.yml", help="output configuration path"),
+                "cfg.yaml", help="output configuration path"),
            node_model: NodeModelFactory.get_model_enum() = typer.Option(...,
                                                                         help="Model name"),
            edge_model: EdgeModelFactory.get_model_enum() = typer.Option(...,
                                                                         help="Model name"),
            neg_sampler: NegativeSamplerFactory.get_model_enum() = typer.Option(
-                "uniform", help="Negative sampler name"),
-            device: DeviceEnum = typer.Option(
-                "cpu", help="Device, cpu or cuda"),
+                "persource", help="Negative sampler name"),
        ):
            self.__class__.setup_user_cfg_cls()
            generated_cfg = {
                "pipeline_name": "linkpred",
-                "device": device.value,
+                "device": "cpu",
                "data": {"name": data.name},
                "neg_sampler": {"name": neg_sampler.value},
                "node_model": {"name": node_model.value},
@@ -89,6 +89,7 @@ class LinkpredPipeline(PipelineBase):
            output_cfg = self.user_cfg_cls(**generated_cfg).dict()
            output_cfg = deep_convert_dict(output_cfg)
            comment_dict = {
+                "device": "Torch device name, e.q. cpu or cuda or cuda:0",
                "general_pipeline": pipeline_comments,
                "node_model": NodeModelFactory.get_constructor_doc_dict(node_model.value),
                "edge_model": EdgeModelFactory.get_constructor_doc_dict(edge_model.value),
@@ -99,6 +100,9 @@ class LinkpredPipeline(PipelineBase):
                },
            }
            comment_dict = merge_comment(output_cfg, comment_dict)
+
+            if cfg is None:
+                cfg = "_".join(["linkpred", data.value, node_model.value, edge_model.value]) + ".yaml"
            yaml = ruamel.yaml.YAML()
            yaml.dump(comment_dict, Path(cfg).open("w"))
            print("Configuration file is generated at {}".format(Path(cfg).absolute()))
--- a/enter/dglenter/pipeline/linkpred/linkpred.jinja-py
+++ b/enter/dglenter/pipeline/linkpred/linkpred.jinja-py
@@ -112,6 +112,7 @@ def main():
    loss = torch.nn.{{ loss }}()
    optimizer = torch.optim.Adam(params, **pipeline_cfg["optimizer"])
    test_hits = train(cfg, pipeline_cfg, device, dataset, model, optimizer, loss)
+    torch.save(model, pipeline_cfg["save_path"])
    return test_hits

 if __name__ == '__main__':
--- a/enter/dglenter/pipeline/nodepred/init.py
+++ b/enter/dglenter/pipeline/nodepred/init.py
--- a/enter/dglenter/pipeline/nodepred/gen.py
+++ b/enter/dglenter/pipeline/nodepred/gen.py
@@ -18,6 +18,7 @@ pipeline_comments = {
        "patience": "Steps before early stop",
        "checkpoint_path": "Early stop checkpoint model file path"
    },
+    "save_path": "Path to save the model",
    "num_runs": "Number of experiments to run",
 }

@@ -27,6 +28,7 @@ class NodepredPipelineCfg(BaseModel):
    eval_period: int = 5
    optimizer: dict = {"name": "Adam", "lr": 0.01, "weight_decay": 5e-4}
    loss: str = "CrossEntropyLoss"
+    save_path: str = "model.pth"
    num_runs: int = 1

@PipelineFactory.register("nodepred")
@@ -54,15 +56,14 @@ class NodepredPipeline(PipelineBase):
    def get_cfg_func(self):
        def config(
            data: DataFactory.filter("nodepred").get_dataset_enum() = typer.Option(..., help="input data name"),
-            cfg: str = typer.Option(
-                "cfg.yml", help="output configuration path"),
+            cfg: Optional[str] = typer.Option(
+                None, help="output configuration path"),
            model: NodeModelFactory.get_model_enum() = typer.Option(..., help="Model name"),
-            device: DeviceEnum = typer.Option("cpu", help="Device, cpu or cuda"),
        ):  
            self.__class__.setup_user_cfg_cls()
            generated_cfg = {
                "pipeline_name": self.pipeline_name,
-                "device": device,
+                "device": "cpu",
                "data": {"name": data.name},
                "model": {"name": model.value},
                "general_pipeline": {}
@@ -70,6 +71,7 @@ class NodepredPipeline(PipelineBase):
            output_cfg = self.user_cfg_cls(**generated_cfg).dict()
            output_cfg = deep_convert_dict(output_cfg)
            comment_dict = {
+                "device": "Torch device name, e.q. cpu or cuda or cuda:0",
                "data": {
                    "split_ratio": 'Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset'
                },
@@ -79,6 +81,8 @@ class NodepredPipeline(PipelineBase):
            comment_dict = merge_comment(output_cfg, comment_dict)

            yaml = ruamel.yaml.YAML()
+            if cfg is None:
+                cfg = "_".join(["nodepred", data.value, model.value]) + ".yaml"
            yaml.dump(comment_dict, Path(cfg).open("w"))
            print("Configuration file is generated at {}".format(Path(cfg).absolute()))

@@ -88,7 +92,7 @@ class NodepredPipeline(PipelineBase):
    def gen_script(cls, user_cfg_dict):
        # Check validation
        cls.setup_user_cfg_cls()
-        user_cfg = cls.user_cfg_cls(**user_cfg_dict)
+        user_cfg = cls.user_cfg_cls(**user_cfg_dict)        
        file_current_dir = Path(__file__).resolve().parent
        with open(file_current_dir / "nodepred.jinja-py", "r") as f:
            template = Template(f.read())
@@ -102,6 +106,8 @@ class NodepredPipeline(PipelineBase):
        render_cfg.update(DataFactory.get_generated_code_dict(user_cfg_dict["data"]["name"], '**cfg["data"]'))

        generated_user_cfg = copy.deepcopy(user_cfg_dict)
+        if "split_ratio" in generated_user_cfg["data"]:
+            generated_user_cfg["data"].pop("split_ratio")
        if len(generated_user_cfg["data"]) == 1:
            generated_user_cfg.pop("data")
        else:
@@ -116,9 +122,6 @@ class NodepredPipeline(PipelineBase):

        if user_cfg_dict["data"].get("split_ratio", None) is not None:
            render_cfg["data_initialize_code"] = "{}, split_ratio={}".format(render_cfg["data_initialize_code"], user_cfg_dict["data"]["split_ratio"])
-        if "split_ratio" in generated_user_cfg["data"]:
-            generated_user_cfg["data"].pop("split_ratio")
-
        render_cfg["user_cfg_str"] = f"cfg = {str(generated_user_cfg)}"
        render_cfg["user_cfg"] = user_cfg_dict
        return template.render(**render_cfg)
--- a/enter/dglenter/pipeline/nodepred/nodepred.jinja-py
+++ b/enter/dglenter/pipeline/nodepred/nodepred.jinja-py
@@ -112,6 +112,7 @@ def main():
    optimizer = torch.optim.{{ user_cfg.general_pipeline.optimizer.name }}(model.parameters(), **pipeline_cfg["optimizer"])
    # train
    test_acc = train(cfg, pipeline_cfg, device, data, model, optimizer, loss)
+    torch.save(model, pipeline_cfg["save_path"])
    return test_acc

 if __name__ == '__main__':
--- a/enter/dglenter/pipeline/nodepred_sample/init.py
+++ b/enter/dglenter/pipeline/nodepred_sample/init.py
--- a/enter/dglenter/pipeline/nodepred_sample/gen.py
+++ b/enter/dglenter/pipeline/nodepred_sample/gen.py
@@ -36,6 +36,14 @@ pipeline_comments = {
        "patience": "Steps before early stop",
        "checkpoint_path": "Early stop checkpoint model file path"
    },
+    "sampler": {
+        "fan_out": "List of neighbors to sample per edge type for each GNN layer, with the i-th element being the fanout for the i-th GNN layer. Length should be the same as num_layers in model setting",
+        "batch_size": "Batch size of seed nodes in training stage",
+        "num_workers": "Number of workers to accelerate the graph data processing step",
+        "eval_batch_size": "Batch size of seed nodes in training stage in evaluation stage",
+        "eval_num_workers": "Number of workers to accelerate the graph data processing step in evaluation stage"
+    },
+    "save_path": "Path to save the model",
    "num_runs": "Number of experiments to run",
 }

@@ -47,6 +55,7 @@ class NodepredNSPipelineCfg(BaseModel):
    optimizer: dict = {"name": "Adam", "lr": 0.005, "weight_decay": 0.0}
    loss: str = "CrossEntropyLoss"
    num_runs: int = 1
+    save_path: str = "model.pth"

@PipelineFactory.register("nodepred-ns")
 class NodepredNsPipeline(PipelineBase):
@@ -60,7 +69,7 @@ class NodepredNsPipeline(PipelineBase):
        class NodePredUserConfig(UserConfig):
            eval_device: DeviceEnum = Field("cpu")
            data: DataFactory.filter("nodepred-ns").get_pydantic_config() = Field(..., discriminator="name")
-            model : NodeModelFactory.get_pydantic_model_config() = Field(..., discriminator="name")   
+            model : NodeModelFactory.filter(lambda cls: hasattr(cls, "forward_block")).get_pydantic_model_config() = Field(..., discriminator="name")   
            general_pipeline: NodepredNSPipelineCfg

        cls.user_cfg_cls = NodePredUserConfig
@@ -72,16 +81,14 @@ class NodepredNsPipeline(PipelineBase):
    def get_cfg_func(self):
        def config(
            data: DataFactory.filter("nodepred-ns").get_dataset_enum() = typer.Option(..., help="input data name"),
-            cfg: str = typer.Option(
-                "cfg.yml", help="output configuration path"),
-            model: NodeModelFactory.get_model_enum() = typer.Option(..., help="Model name"),
-            device: DeviceEnum = typer.Option(
-                "cpu", help="Device, cpu or cuda"),
+            cfg: Optional[str] = typer.Option(
+                None, help="output configuration path"),
+            model: NodeModelFactory.filter(lambda cls: hasattr(cls, "forward_block")).get_model_enum() = typer.Option(..., help="Model name"),
        ):
            self.__class__.setup_user_cfg_cls()
-            generated_cfg = {
+            generated_cfg = {                
                "pipeline_name": "nodepred-ns",
-                "device": device,
+                "device": "cpu",
                "data": {"name": data.name},
                "model": {"name": model.value},
                "general_pipeline": {"sampler":{"name": "neighbor"}}
@@ -89,14 +96,21 @@ class NodepredNsPipeline(PipelineBase):
            output_cfg = self.user_cfg_cls(**generated_cfg).dict()
            output_cfg = deep_convert_dict(output_cfg)
            comment_dict = {
+                "device": "Torch device name, e.q. cpu or cuda or cuda:0",
                "data": {
                    "split_ratio": 'Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset'
                },
                "general_pipeline": pipeline_comments,
-                "model": NodeModelFactory.get_constructor_doc_dict(model.value)
+                "model": NodeModelFactory.get_constructor_doc_dict(model.value),
            }
            comment_dict = merge_comment(output_cfg, comment_dict)

+            # truncate length fan_out to be the same as num_layers in model
+            if "num_layers" in comment_dict["model"]:
+                comment_dict['general_pipeline']["sampler"]["fan_out"] = [5,10,15,15,15][:int(comment_dict['model']["num_layers"])]
+
+            if cfg is None:
+                cfg = "_".join(["nodepred-ns", data.value, model.value]) + ".yaml"
            yaml = ruamel.yaml.YAML()
            yaml.dump(comment_dict, Path(cfg).open("w"))
            print("Configuration file is generated at {}".format(
@@ -112,6 +126,10 @@ class NodepredNsPipeline(PipelineBase):
            template = Template(f.read())
        pipeline_cfg = NodepredNSPipelineCfg(
            **user_cfg_dict["general_pipeline"])
+        
+        if "num_layers" in user_cfg_dict["model"]:
+            assert user_cfg_dict["model"]["num_layers"] == len(user_cfg_dict["general_pipeline"]["sampler"]["fan_out"]), \
+                "The num_layers in model config should be the same as the length of fan_out in sampler. For example, if num_layers is 1, the fan_out cannot be [5, 10]"          

        render_cfg = copy.deepcopy(user_cfg_dict)
        model_code = NodeModelFactory.get_source_code(
@@ -123,6 +141,8 @@ class NodepredNsPipeline(PipelineBase):
            user_cfg_dict["data"]["name"], '**cfg["data"]'))
        generated_user_cfg = copy.deepcopy(user_cfg_dict)

+        if "split_ratio" in generated_user_cfg["data"]:
+            generated_user_cfg["data"].pop("split_ratio")
        if len(generated_user_cfg["data"]) == 1:
            generated_user_cfg.pop("data")
        else:
@@ -135,8 +155,6 @@ class NodepredNsPipeline(PipelineBase):

        if user_cfg_dict["data"].get("split_ratio", None) is not None:
            render_cfg["data_initialize_code"] = "{}, split_ratio={}".format(render_cfg["data_initialize_code"], user_cfg_dict["data"]["split_ratio"])
-        if "split_ratio" in generated_user_cfg["data"]:
-            generated_user_cfg["data"].pop("split_ratio")

        render_cfg["user_cfg_str"] = f"cfg = {str(generated_user_cfg)}"
        render_cfg["user_cfg"] = user_cfg_dict
@@ -145,4 +163,4 @@ class NodepredNsPipeline(PipelineBase):

    @staticmethod
    def get_description() -> str:
-        return "Node classification sampling pipeline"
+        return "Node classification neighbor sampling pipeline"
--- a/enter/dglenter/pipeline/nodepred_sample/nodepred-ns.jinja-py
+++ b/enter/dglenter/pipeline/nodepred_sample/nodepred-ns.jinja-py
@@ -157,8 +157,8 @@ def main():
    model = model.to(device)
    loss = torch.nn.{{ user_cfg.general_pipeline.loss }}()
    optimizer = torch.optim.{{ user_cfg.general_pipeline.optimizer.name }}(model.parameters(), **pipeline_cfg["optimizer"])
-    # train
    test_acc = train(cfg, pipeline_cfg, device, data, model, optimizer, loss)
+    torch.save(model, pipeline_cfg["save_path"])
    return test_acc

 if __name__ == '__main__':
--- a/enter/dglenter/utils/init.py
+++ b/enter/dglenter/utils/init.py
--- a/enter/dglenter/utils/base_model.py
+++ b/enter/dglenter/utils/base_model.py
--- a/enter/dglenter/utils/early_stop.py
+++ b/enter/dglenter/utils/early_stop.py
--- a/enter/dglenter/utils/enter_config.py
+++ b/enter/dglenter/utils/enter_config.py
--- a/enter/dglenter/utils/factory.py
+++ b/enter/dglenter/utils/factory.py
@@ -334,6 +334,14 @@ class ModelFactory:
            type_annotation_dict[k] = param.annotation
        return type_annotation_dict

+    def filter(self, filter_func):
+        new_fac = ModelFactory()
+        for name in self.registry:
+            if filter_func(self.registry[name]):
+                new_fac.registry[name] = self.registry[name]
+                new_fac.code_registry[name] = self.code_registry[name]
+        return new_fac
+

 class SamplerFactory:
    """ The factory class for creating executors"""
@@ -411,7 +419,7 @@ class SamplerFactory:


 NegativeSamplerFactory = SamplerFactory()
-NegativeSamplerFactory.register("uniform")(GlobalUniform)
+NegativeSamplerFactory.register("global")(GlobalUniform)
 NegativeSamplerFactory.register("persource")(PerSourceUniform)

 NodeModelFactory = ModelFactory()
--- a/enter/dglenter/utils/optimizer_config.py
+++ b/enter/dglenter/utils/optimizer_config.py
--- a/enter/dglenter/utils/yaml_dump.py
+++ b/enter/dglenter/utils/yaml_dump.py
--- a/dglgo/recipes/init.py
+++ b/dglgo/recipes/init.py
--- a/dglgo/recipes/linkpred_citation2_sage.yaml
+++ b/dglgo/recipes/linkpred_citation2_sage.yaml
@@ -31,4 +31,5 @@ general_pipeline:
    name: Adam
    lr: 0.005
  loss: BCELoss
+  save_path: "model.pth"
  num_runs: 1                 # Number of experiments to run
--- a/dglgo/recipes/linkpred_collab_sage.yaml
+++ b/dglgo/recipes/linkpred_collab_sage.yaml
@@ -31,4 +31,5 @@ general_pipeline:
    name: Adam
    lr: 0.005
  loss: BCELoss
+  save_path: "model.pth"
  num_runs: 1                 # Number of experiments to run
--- a/dglgo/recipes/linkpred_cora_sage.yaml
+++ b/dglgo/recipes/linkpred_cora_sage.yaml
@@ -31,4 +31,5 @@ general_pipeline:
    name: Adam
    lr: 0.005
  loss: BCELoss
+  save_path: "model.pth"
  num_runs: 1                 # Number of experiments to run
--- a/dglgo/recipes/nodepred-ns_arxiv_gcn.yaml
+++ b/dglgo/recipes/nodepred-ns_arxiv_gcn.yaml
@@ -31,4 +31,5 @@ general_pipeline:
    lr: 0.005
    weight_decay: 0.0
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 5
--- a/dglgo/recipes/nodepred-ns_product_sage.yaml
+++ b/dglgo/recipes/nodepred-ns_product_sage.yaml
@@ -35,4 +35,5 @@ general_pipeline:
    lr: 0.005
    weight_decay: 0.0
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 5                 # Number of experiments to run
--- a/dglgo/recipes/nodepred_citeseer_gat.yaml
+++ b/dglgo/recipes/nodepred_citeseer_gat.yaml
@@ -28,4 +28,5 @@ general_pipeline:
    lr: 0.005
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10               # Number of experiments to run
--- a/dglgo/recipes/nodepred_citeseer_gcn.yaml
+++ b/dglgo/recipes/nodepred_citeseer_gcn.yaml
@@ -24,4 +24,5 @@ general_pipeline:
    lr: 0.01
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/recipes/nodepred_citeseer_sage.yaml
+++ b/dglgo/recipes/nodepred_citeseer_sage.yaml
@@ -23,4 +23,5 @@ general_pipeline:
    lr: 0.01
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/recipes/nodepred_cora_gat.yaml
+++ b/dglgo/recipes/nodepred_cora_gat.yaml
@@ -28,4 +28,5 @@ general_pipeline:
    lr: 0.005
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/recipes/nodepred_cora_gcn.yaml
+++ b/dglgo/recipes/nodepred_cora_gcn.yaml
@@ -24,4 +24,5 @@ general_pipeline:
    lr: 0.01
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/recipes/nodepred_cora_sage.yaml
+++ b/dglgo/recipes/nodepred_cora_sage.yaml
@@ -23,4 +23,5 @@ general_pipeline:
    lr: 0.01
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/recipes/nodepred_pubmed_gat.yaml
+++ b/dglgo/recipes/nodepred_pubmed_gat.yaml
@@ -28,4 +28,5 @@ general_pipeline:
    lr: 0.005
    weight_decay: 0.001
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/recipes/nodepred_pubmed_gcn.yaml
+++ b/dglgo/recipes/nodepred_pubmed_gcn.yaml
@@ -24,4 +24,5 @@ general_pipeline:
    lr: 0.01
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/recipes/nodepred_pubmed_sage.yaml
+++ b/dglgo/recipes/nodepred_pubmed_sage.yaml
@@ -23,4 +23,5 @@ general_pipeline:
    lr: 0.01
    weight_decay: 0.0005
  loss: CrossEntropyLoss
+  save_path: "model.pth"
  num_runs: 10                # Number of experiments to run
--- a/dglgo/setup.py
+++ b/dglgo/setup.py
@@ -3,7 +3,7 @@
 from setuptools import find_packages
 from distutils.core import setup

-setup(name='dglenter',
+setup(name='dglgo',
      version='0.0.1',
      description='DGL',
      author='DGL Team',
@@ -15,12 +15,15 @@ setup(name='dglenter',
          'autopep8>=1.6.0',
          'numpydoc>=1.1.0',
          "pydantic>=1.9.0",
-          "ruamel.yaml>=0.17.20"
+          "ruamel.yaml>=0.17.20",
+          "PyYAML>=5.1"
      ],
-    license='APACHE',
+      package_data={"": ["./*"]},
+      include_package_data=True,
+      license='APACHE',
      entry_points={
          'console_scripts': [
-              "dgl-enter = dglenter.cli.cli:main"
+              "dgl = dglgo.cli.cli:main"
          ]
      },
      url='https://github.com/dmlc/dgl',
--- a/dglgo/tests/cfg.yml
+++ b/dglgo/tests/cfg.yml
@@ -0,0 +1,26 @@
+version: 0.0.1
+pipeline_name: nodepred
+device: cpu
+data:
+  name: cora
+  split_ratio:                # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
+model:
+  name: sage
+  embed_size: -1              # The dimension of created embedding table. -1 means using original node embedding
+  hidden_size: 16             # Hidden size.
+  num_layers: 1               # Number of hidden layers.
+  activation: relu            # Activation function name under torch.nn.functional
+  dropout: 0.5                # Dropout rate.
+  aggregator_type: gcn        # Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
+general_pipeline:
+  early_stop:
+    patience: 20              # Steps before early stop
+    checkpoint_path: checkpoint.pth # Early stop checkpoint model file path
+  num_epochs: 200             # Number of training epochs
+  eval_period: 5              # Interval epochs between evaluations
+  optimizer:
+    name: Adam
+    lr: 0.01
+    weight_decay: 0.0005
+  loss: CrossEntropyLoss
+  num_runs: 1                 # Number of experiments to run
--- a/dglgo/tests/run_test.sh
+++ b/dglgo/tests/run_test.sh
@@ -0,0 +1 @@
+python -m pytest --pdb -vv --capture=tee-sys test_pipeline.py::test_recipe
--- a/dglgo/tests/test_pipeline.py
+++ b/dglgo/tests/test_pipeline.py
@@ -0,0 +1,62 @@
+import subprocess
+from typing import NamedTuple
+import pytest
+from pathlib import Path
+# class DatasetSpec:
+
+dataset_spec = {
+    "cora": {"timeout": 30}
+}
+
+
+
+class ExperimentSpec(NamedTuple):
+    pipeline: str
+    dataset: str
+    model: str
+    timeout: int
+    extra_cfg: dict = {}
+
+exps = [ExperimentSpec(pipeline="nodepred", dataset="cora", model="sage", timeout=0.5)]
+
+@pytest.mark.parametrize("spec", exps)
+def test_train(spec):
+    cfg_path = "/tmp/test.yaml"
+    run = subprocess.run(["dgl", "config", spec.pipeline, "--data", spec.dataset, "--model", spec.model, "--cfg", cfg_path], timeout=spec.timeout, capture_output=True)
+    assert run.stderr is None or len(run.stderr) == 0, "Found error message: {}".format(run.stderr)
+    output = run.stdout.decode("utf-8")
+    print(output)
+
+    run = subprocess.run(["dgl", "train", "--cfg", cfg_path], timeout=spec.timeout, capture_output=True)
+    assert run.stderr is None or len(run.stderr) == 0, "Found error message: {}".format(run.stderr)
+    output = run.stdout.decode("utf-8")
+    print(output)
+
+TEST_RECIPE_FOLDER = "my_recipes"
+
+@pytest.fixture
+def setup_recipe_folder():
+    run = subprocess.run(["dgl", "recipe", "copy", "--dir", TEST_RECIPE_FOLDER], timeout=15, capture_output=True)
+
+@pytest.mark.parametrize("file", [str(f) for f in Path(TEST_RECIPE_FOLDER).glob("*.yaml")])
+def test_recipe(file, setup_recipe_folder):
+    print("DGL enter train {}".format(file))
+    try:    
+        run = subprocess.run(["dgl", "train", "--cfg", file], timeout=5, capture_output=True)
+        sh_stdout, sh_stderr = run.stdout, run.stderr
+    except subprocess.TimeoutExpired as e:
+        sh_stdout = e.stdout
+        sh_stderr = e.stderr
+    if sh_stderr is not None and len(sh_stderr) != 0:
+        error_str = sh_stderr.decode("utf-8")
+        lines = error_str.split("\n")
+        for line in lines:
+            line = line.strip()
+            if line.startswith("WARNING") or line.startswith("Aborted") or line.startswith("0%"):
+                continue
+            else:
+                assert len(line) == 0, error_str
+    print("{} stdout: {}".format(file, sh_stdout))
+    print("{} stderr: {}".format(file, sh_stderr))
+
+# test_recipe( , None)
--- a/enter/README.md
+++ b/enter/README.md
@@ -1,270 +0,0 @@
-# DGL-Enter
-
-(What is DGL-Enter? Why design this? What is it for?)
-
-DGL-Enter is a commanline tool for user to quickly bootstrap models with multiple datasets. And provide full capability for user to customize the pipeline into their own takks.
-
-## Installation guide
-You can install DGL-enter easily by `pip install dglenter`. Then you should be able to use DGL-Enter in you commandline tool by type in `dgl-enter`
-```
-Usage: dgl-enter [OPTIONS] COMMAND [ARGS]...
-
-Options:
-  --help  Show this message and exit.
-
-Commands:
-  config  Generate the config files
-  export  Export the python file from config
-  train   Train the model
-```
-
-
-## Train GraphSAGE on Cora from scratch
-Here we'll use one of the most classic model GraphSAGE and Cora citation graph dataset as an example, to show how easy to train a model with DGL-Enter.
-### Step 1: Use `dgl-enter config` to generate a yaml configuration file
-Run `dgl-enter config nodepred --data cora --model sage --cfg cora_sage.yml`. Then you'll get a configuration file `cora_sage.yml` includes all the configuration to be tuned, with the comments
-
-Optionally, You can change the config as you want to acheive a better performance. Below is a modified sample based on the template generated by the command above.
-The early stop part is removed for simplicity
-
-```yaml
-version: 0.0.1
-pipeline_name: nodepred
-device: cpu
-data:
-  name: cora
-  split_ratio:                # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
-model:
-  name: sage
-  embed_size: -1              # The dimension of created embedding table. -1 means using original node embedding
-  hidden_size: 16             # Hidden size.
-  num_layers: 1               # Number of hidden layers.
-  activation: relu            # Activation function name under torch.nn.functional
-  dropout: 0.5                # Dropout rate.
-  aggregator_type: gcn        # Aggregator type to use (``mean``, ``gcn``, ``pool``, ``lstm``).
-general_pipeline:
-  num_epochs: 200             # Number of training epochs
-  eval_period: 5              # Interval epochs between evaluations
-  optimizer:
-    name: Adam
-    lr: 0.01
-    weight_decay: 0.0005
-  loss: CrossEntropyLoss
-  num_runs: 1                 # Number of experiments to run
-
-```
-
-### Step 2: Use `dgl-enter train` to initiate the training process.   
-
-Simply run `dgl-enter train --cfg cora_sage.yml` will start the training process
-```log
-...
-Epoch 00190 | Loss 1.5225 | TrainAcc 0.9500 | ValAcc 0.6840
-Epoch 00191 | Loss 1.5416 | TrainAcc 0.9357 | ValAcc 0.6840
-Epoch 00192 | Loss 1.5391 | TrainAcc 0.9357 | ValAcc 0.6840
-Epoch 00193 | Loss 1.5257 | TrainAcc 0.9643 | ValAcc 0.6840
-Epoch 00194 | Loss 1.5196 | TrainAcc 0.9286 | ValAcc 0.6840
-EarlyStopping counter: 12 out of 20
-Epoch 00195 | Loss 1.4862 | TrainAcc 0.9643 | ValAcc 0.6760
-Epoch 00196 | Loss 1.5142 | TrainAcc 0.9714 | ValAcc 0.6760
-Epoch 00197 | Loss 1.5145 | TrainAcc 0.9714 | ValAcc 0.6760
-Epoch 00198 | Loss 1.5174 | TrainAcc 0.9571 | ValAcc 0.6760
-Epoch 00199 | Loss 1.5235 | TrainAcc 0.9714 | ValAcc 0.6760
-Test Accuracy 0.7740
-Accuracy across 1 runs: 0.774 ± 0.0
-```
-
-That's all! Basically you only need two line of command to train a graph neural network.
-## Debug your model and advanced customization
-
-That's not everything yet. We belive you may want to change more than the configuration files, to change the training pipeline, calculate new metrics, or look into the code for details.
-DGL-Enter can export a self-contained, runnable python script for you to do anything you like. 
-
-Try `dgl-enter export --cfg cora_sage.yml --output script.py`, and you'll get the script used to train the model, like a magic!
-
-Below 
-```python
-...
-
-def train(cfg, pipeline_cfg, device, data, model, optimizer, loss_fcn):
-    g = data[0]  # Only train on the first graph
-    g = dgl.remove_self_loop(g)
-    g = dgl.add_self_loop(g)
-    g = g.to(device)
-
-    node_feat = g.ndata.get('feat', None)
-    edge_feat = g.edata.get('feat', None)
-    label = g.ndata['label']
-    train_mask, val_mask, test_mask = g.ndata['train_mask'].bool(
-    ), g.ndata['val_mask'].bool(), g.ndata['test_mask'].bool()
-
-    val_acc = 0.
-    for epoch in range(pipeline_cfg['num_epochs']):
-        model.train()
-        logits = model(g, node_feat, edge_feat)
-        loss = loss_fcn(logits[train_mask], label[train_mask])
-
-        optimizer.zero_grad()
-        loss.backward()
-        optimizer.step()
-
-        train_acc = accuracy(logits[train_mask], label[train_mask])
-        if epoch != 0 and epoch % pipeline_cfg['eval_period'] == 0:
-            val_acc = accuracy(logits[val_mask], label[val_mask])
-
-        print("Epoch {:05d} | Loss {:.4f} | TrainAcc {:.4f} | ValAcc {:.4f}".
-              format(epoch, loss.item(), train_acc, val_acc))
-
-    model.eval()
-    with torch.no_grad():
-        logits = model(g, node_feat, edge_feat)
-        test_acc = accuracy(logits[test_mask], label[test_mask])
-    return test_acc
-
-
-def main():
-    cfg = {
-        'version': '0.0.1',
-        'device': 'cpu',
-        'data': {
-            'split_ratio': None},
-        'model': {
-            'embed_size': -1,
-            'hidden_size': 16,
-            'num_layers': 1,
-            'activation': 'relu',
-            'dropout': 0.5,
-            'aggregator_type': 'gcn'},
-        'general_pipeline': {
-            'num_epochs': 200,
-            'eval_period': 5,
-            'optimizer': {
-                'lr': 0.01,
-                'weight_decay': 0.0005},
-            'loss': 'CrossEntropyLoss',
-            'num_runs': 1}}
-    device = cfg['device']
-    pipeline_cfg = cfg['general_pipeline']
-    # load data
-    data = AsNodePredDataset(CoraGraphDataset())
-    # create model
-    model_cfg = cfg["model"]
-    cfg["model"]["data_info"] = {
-        "in_size": model_cfg['embed_size'] if model_cfg['embed_size'] > 0 else data[0].ndata['feat'].shape[1],
-        "out_size": data.num_classes,
-        "num_nodes": data[0].num_nodes()
-    }
-    model = GraphSAGE(**cfg["model"])
-    model = model.to(device)
-    loss = torch.nn.CrossEntropyLoss()
-    optimizer = torch.optim.Adam(
-        model.parameters(),
-        **pipeline_cfg["optimizer"])
-    # train
-    test_acc = train(cfg, pipeline_cfg, device, data, model, optimizer, loss)
-    return test_acc
-
-...
-
-```
-
-## Recipes
-
-We've prepared a set of finetuned config under `enter/recipes`, that you can try easily to get a reproducable result.
-
-For example, using GCN with pubmet dataset, you can use `enter/recipes/nodepred_pubmed_gcn.yml`. 
-
-To try it, type in `dgl-enter train --cfg recipes/nodepred_pubmed_gcn.yml` to train the model, or `dgl-enter export --cfg recipes/nodepred_pubmed_gcn.yml` to get the full training script.
-
-## Use DGL-Enter on your own dataset
-You can modify the generated script in anyway you want. However, we also provided an end2end way to use your own dataset, by using our `CSVDataset`. 
-
-Step 1: Prepare your csv and metadata file.
-
-Following the tutorial at [Loading data from CSV files](https://docs.dgl.ai/en/latest/guide/data-loadcsv.html#guide-data-pipeline-loadcsv`), Prepare your own CSV dataset includes three files minimally, node data csv, edge data csv and the meta data file (meta.yml).
-
-```yml
-dataset_name: my_csv_dataset
-edge_data:
- file_name: edges.csv
-node_data:
- file_name: nodes.csv
-```
-
-Step 2: Choose to csv dataset in the `dgl-enter config` stage
-Try `dgl-enter config nodepred --data csv --model sage --cfg csv_sage.yml`, to use SAGE model for your dataset. You'll see the data part is now the configuration related to CSV dataset. `data_path` is used to specify the data folder, and `./` means the current folder. 
-
-If your dataset doesn't have the builtin split on the nodes for train/val/test, you need to manually set the split ratio in the config yml file, DGL will random generate the split for you.
-
-```yml
-data:
-  name: csv
-  split_ratio:                # Ratio to generate split masks, for example set to [0.8, 0.1, 0.1] for 80% train/10% val/10% test. Leave blank to use builtin split in original dataset
-  data_path: ./               # metadata.yaml, nodes.csv, edges.csv should in this folder
-```
-
-
-Step 3: `train` the model/`export` the script
-Then you can do the same as the tutorial above, either train the model by `dgl-eneter train --cfg csv_sage.yaml` or use `dgl-enter export --cfg csv_sage.yml --output my_dataset.py` to get the training script.
-
-## API Referencce
-
-DGL enter is a new tool for user to bootstrap datasets and common models.
-
-The entry point of enter is `dgl-enter`, and it has three subcommand `config`, `train` and `export`.
-
-### Config
-The config stage is to generate a configuration file on the specific pipeline.
-
-`dgl-enter` currently provides 3 pipelines:
- nodepred (Node prediction tasks, suitable for small dataset to prototype)
- nodepred-ns (Node prediction tasks with sampling method, suitable for medium and large dataset)
- linkpred (Link prediction tasks, to predict whether edge exists among node pairs based on node features)
-
-You can get the full list by `dgl-enter config --help`
-```
-Usage: dgl-enter config [OPTIONS] COMMAND [ARGS]...
-
-  Generate the config files
-
-Options:
-  --help  Show this message and exit.
-
-Commands:
-  linkpred     Link prediction pipeline
-  nodepred     Node classification pipeline
-  nodepred-ns  Node classification sampling pipeline
-```
-
-For each pipeline it will have diffirent options to specified. For example, for node prediction pipeline, you can do `dgl-enter config nodepred --help`, you'll get:
-```
-Usage: dgl-enter config nodepred [OPTIONS]
-
-  Node classification pipeline
-
-Options:
-  --data [cora|citeseer|ogbl-collab|csv|reddit|co-buy-computer]
-                                  input data name  [required]
-  --cfg TEXT                      output configuration path  [default:
-                                  cfg.yml]
-  --model [gcn|gat|sage|sgc|gin]  Model name  [required]
-  --device [cpu|cuda]             Device, cpu or cuda  [default: cpu]
-  --help                          Show this message and exit.
-```
-
-You can always get the detailed help information by adding `--help` to the command line
-
-### Train
-You can train a model on the dataset based on the configuration file generated by `dgl-enter config`, by `dgl-enter train`.
-```
-Usage: dgl-enter train [OPTIONS]
-
-  Train the model
-
-Options:
-  --cfg TEXT  yaml file name  [default: cfg.yml]
-  --help      Show this message and exit.
-```
-
-### Export
-Get the self-contained, runnable python script derived from the configuration file by `dgl-enter export`.
--- a/enter/dglenter/cli/cli.py
+++ b/enter/dglenter/cli/cli.py
@@ -1,18 +0,0 @@
-import typer
-from ..pipeline import *
-from ..model import *
-from .config_cli import config_app
-from .train_cli import train
-from .export_cli import export
-
-no_args_is_help = False
-app = typer.Typer(no_args_is_help=no_args_is_help, add_completion=False)
-app.add_typer(config_app, name="config", no_args_is_help=no_args_is_help)
-app.command(help="Train the model", no_args_is_help=no_args_is_help)(train)
-app.command(help="Export the python file from config", no_args_is_help=no_args_is_help)(export)
-
-def main():
-    app()
-
-if __name__ == "__main__":
-    app()
				`@@ -0,0 +1 @@`
				`python -m pytest --pdb -vv --capture=tee-sys test_pipeline.py::test_recipe`