mirror of
https://github.com/dmlc/dgl.git
synced 2026-06-04 19:44:23 +08:00
[DOCS] Add training on CPU sections to docs (#3398)
This commit is contained in:
@@ -200,12 +200,14 @@ examples_dirs = ['../../tutorials/blitz',
|
||||
'../../tutorials/large',
|
||||
'../../tutorials/dist',
|
||||
'../../tutorials/models',
|
||||
'../../tutorials/multi'] # path to find sources
|
||||
'../../tutorials/multi',
|
||||
'../../tutorials/cpu'] # path to find sources
|
||||
gallery_dirs = ['tutorials/blitz/',
|
||||
'tutorials/large/',
|
||||
'tutorials/dist/',
|
||||
'tutorials/models/',
|
||||
'tutorials/multi/'] # path to generate docs
|
||||
'tutorials/multi/',
|
||||
'tutorials/cpu'] # path to generate docs
|
||||
reference_url = {
|
||||
'dgl' : None,
|
||||
'numpy': 'http://docs.scipy.org/doc/numpy/',
|
||||
|
||||
@@ -25,6 +25,7 @@ Welcome to Deep Graph Library Tutorials and Documentation
|
||||
guide/index
|
||||
guide_cn/index
|
||||
tutorials/large/index
|
||||
tutorials/cpu/index
|
||||
tutorials/multi/index
|
||||
tutorials/dist/index
|
||||
tutorials/models/index
|
||||
|
||||
2
tutorials/cpu/README.txt
Normal file
2
tutorials/cpu/README.txt
Normal file
@@ -0,0 +1,2 @@
|
||||
Training on CPUs
|
||||
=========================
|
||||
51
tutorials/cpu/cpu_best_practises.py
Normal file
51
tutorials/cpu/cpu_best_practises.py
Normal file
@@ -0,0 +1,51 @@
|
||||
"""
|
||||
CPU Best Pratices
|
||||
=====================================================
|
||||
|
||||
This chapter focus on providing best practises for environment setup
|
||||
to get the best performance during training and inference on the CPU.
|
||||
|
||||
Intel
|
||||
`````````````````````````````
|
||||
|
||||
Hyper-treading
|
||||
---------------------------
|
||||
|
||||
For specific workloads as GNN’s domain, suggested default setting for having best performance
|
||||
is to turn off hyperthreading.
|
||||
Turning off the hyper threading feature can be done at BIOS [#f1]_ or operating system level [#f2]_ [#f3]_ .
|
||||
|
||||
|
||||
OpenMP settings
|
||||
---------------------------
|
||||
|
||||
During training on CPU, the training and dataloading part need to be maintained simultaneously.
|
||||
Best performance of parallelization in OpenMP
|
||||
can be achieved by setting up the optimal number of working threads and dataloading workers.
|
||||
|
||||
**GNU OpenMP**
|
||||
Default BKM for setting the number of OMP threads with Pytorch backend:
|
||||
|
||||
``OMP_NUM_THREADS`` = number of physical cores – ``num_workers``
|
||||
|
||||
Number of physical cores can be checked by using ``lscpu`` ("Core(s) per socket")
|
||||
or ``nproc`` command in Linux command line.
|
||||
Below simple bash script example for setting the OMP threads and ``pytorch`` backend dataloader workers:
|
||||
|
||||
.. code:: bash
|
||||
|
||||
cores=`nproc`
|
||||
num_workers=4
|
||||
export OMP_NUM_THREADS=$(($cores-$num_workers))
|
||||
python script.py --gpu -1 --num_workers=$num_workers
|
||||
|
||||
Depending on the dataset, model and CPU optimal number of dataloader workers and OpemMP threads may vary
|
||||
but close to the general default advise presented above [#f4]_ .
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [#f1] https://www.intel.com/content/www/us/en/support/articles/000007645/boards-and-kits/desktop-boards.html
|
||||
.. [#f2] https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-linux/
|
||||
.. [#f3] https://aws.amazon.com/blogs/compute/disabling-intel-hyper-threading-technology-on-amazon-ec2-windows-instances/
|
||||
.. [#f4] https://software.intel.com/content/www/us/en/develop/articles/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html
|
||||
"""
|
||||
Reference in New Issue
Block a user