[DOCS] Add training on CPU sections to docs (#3398)

2026-06-04 19:44:23 +08:00 · 2021-10-14 08:53:16 +02:00
parent 188630698e
commit a47ab71d44
4 changed files with 58 additions and 2 deletions
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -200,12 +200,14 @@ examples_dirs = ['../../tutorials/blitz',
                 '../../tutorials/large',
                 '../../tutorials/dist',
                 '../../tutorials/models',
-                 '../../tutorials/multi']  # path to find sources
+                 '../../tutorials/multi',
+                 '../../tutorials/cpu']  # path to find sources
 gallery_dirs = ['tutorials/blitz/',
                'tutorials/large/',
                'tutorials/dist/',
                'tutorials/models/',
-                'tutorials/multi/']  # path to generate docs
+                'tutorials/multi/',
+                'tutorials/cpu']  # path to generate docs
 reference_url = {
    'dgl' : None,
    'numpy': 'http://docs.scipy.org/doc/numpy/',
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -25,6 +25,7 @@ Welcome to Deep Graph Library Tutorials and Documentation
   guide/index
   guide_cn/index
   tutorials/large/index
+   tutorials/cpu/index
   tutorials/multi/index
   tutorials/dist/index
   tutorials/models/index
--- a/tutorials/cpu/README.txt
+++ b/tutorials/cpu/README.txt
@@ -0,0 +1,2 @@
+Training on CPUs
+=========================
--- a/tutorials/cpu/cpu_best_practises.py
+++ b/tutorials/cpu/cpu_best_practises.py
@@ -0,0 +1,51 @@
+"""
+CPU Best Pratices
+=====================================================
+
+This chapter focus on providing best practises for environment setup
+to get the best performance during training and inference on the CPU.
+
+Intel
+`````````````````````````````
+
+Hyper-treading
+---------------------------
+
+For specific workloads as GNN’s domain, suggested default setting for having best performance
+is to turn off hyperthreading.
+Turning off the hyper threading feature can be done at BIOS [#f1]_ or operating system level [#f2]_ [#f3]_ .
+
+
+OpenMP settings
+---------------------------
+
+During training on CPU, the training and dataloading part need to be maintained simultaneously.
+Best performance of parallelization in OpenMP
+can be achieved by setting up the optimal number of working threads and dataloading workers.
+
+**GNU OpenMP**
+    Default BKM for setting the number of OMP threads with Pytorch backend:
+
+    ``OMP_NUM_THREADS`` = number of physical cores – ``num_workers``
+
+    Number of physical cores can be checked by using ``lscpu`` ("Core(s) per socket")
+    or ``nproc`` command in Linux command line.
+    Below simple bash script example for setting the OMP threads and ``pytorch`` backend dataloader workers:
+
+    .. code:: bash
+
+        cores=`nproc`
+        num_workers=4
+        export OMP_NUM_THREADS=$(($cores-$num_workers))
+        python script.py --gpu -1 --num_workers=$num_workers
+
+    Depending on the dataset, model and CPU optimal number of dataloader workers and OpemMP threads may vary
+    but close to the general default advise presented above [#f4]_ .
+
+.. rubric:: Footnotes
+
+.. [#f1] https://www.intel.com/content/www/us/en/support/articles/000007645/boards-and-kits/desktop-boards.html
+.. [#f2] https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/
+.. [#f3] https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-ec2-windows-instances/
+.. [#f4] https://software.intel.com/content/www/us/en/develop/articles/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html
+"""