Best practices to benchmark deep models on CPU (and not GPU) in PyTorch?

Become part of the top 3% of the developers by applying to Toptal https://topt.al/25cXVn

--

Music by Eric Matyas
https://www.soundimage.org
Track title: Puzzle Game 2

--

Chapters
00:00 Question
03:12 Accepted answer (Score 5)
07:18 Thank you

--

Full question
https://stackoverflow.com/questions/6188...

Question links:
[link]: https://discuss.pytorch.org/t/execution-...
[documentation]: https://pytorch.org/docs/1.3.1/torch.htm...

Accepted answer links:
[numbers which underflow]: https://en.wikipedia.org/wiki/Denormal_n...
[PyTorch docs]: https://pytorch.org/docs/master/generate...
[this document]: https://pytorch.org/docs/stable/notes/cp...
[threading]: https://docs.python.org/3/library/thread...
[here]: https://pytorch.org/docs/stable/notes/cp...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#python #pytorch

#avk47

ACCEPTED ANSWER

Score 5

Should we use time.time()?

Yes, it's fine for CPU

Should we use volatile?

As you said it's deprecated. Since 0.4.0 torch.Tensor was merged with torch.Variable (it's deprecated as well) and torch.no_grad context manager should be used.

Should the page cache be cleared?

I don't think so unless you know it's a problem

Should I remove nn.Sequential() and directly put in forward part

No, torch.nn.Sequential should have no or negligible performance burden on your model. It's forward is only:

def forward(self, input):
    for module in self:
        input = module(input)
    return input

If you are running into performance issues with these small numbers, you might try to use torch.set_flush_denormal(True) to disable denormal floating point numbers on the CPU.

Flushing denormal numbers (numbers which underflow) means replacing them strictly by 0.0 which might help with your performance if you have a lot of really small numbers. Example given by PyTorch docs:

>>> torch.set_flush_denormal(True)
True
>>> torch.tensor([1e-323], dtype=torch.float64)
tensor([ 0.], dtype=torch.float64)
>>> torch.set_flush_denormal(False)
True
>>> torch.tensor([1e-323], dtype=torch.float64)
tensor(9.88131e-324 *
       [ 1.0000], dtype=torch.float64)

Should torch.set_num_threads(int) be used? If yes can a demo code be provided?

According to this document it might help if you don't allocate too many threads (probably at most as many as cores in your CPU so you might try 8).

So this piece at the beginning of your code might help:

torch.set_num_threads(8)

You may want to check numbers out and see whether and how much each value helps.

What does These context managers are thread local, so they won’t work if you send work to another thread using the :module:threading module, etc. mean as given in the documentation.

If you use module like torch.multiprocessing and run torch.multiprocessing.spawn (or a-like) and one of your processes won't get into the context manager block the gradient won't be turned off (in case of torch.no_grad). Also if you use Python's threading only the threads where the block was run into will have gradients turned off (or on, it depends).

This code will make it clear for you:

import threading

import torch


def myfunc(i, tensor):
    if i % 2 == 0:
        with torch.no_grad():
            z = tensor * 2
    else:
        z = tensor * 2
    print(i, z.requires_grad)


if __name__ == "__main__":
    tensor = torch.randn(5, requires_grad=True)
    with torch.no_grad():
        for i in range(10):
            t = threading.Thread(target=myfunc, args=(i, tensor))
            t.start()

Which outputs (order may vary):

0 False
1 True
2 False
3 True
4 False
6 False
5 True
7 True
8 False
9 True

Also notice that torch.no_grad() in __main__ has no effect on spawned threads (neither would torch.enable_grad).

Please list any more issues for calculating execution time in CPU.

Converting to torchscript (see here) might help, building PyTorch from source targeted at your architecture and it's capabilities and tons of other things, this question is too wide.