הטכניון - מכון טכנולוגי לישראל Technion - Israel Institute of Technology Технион - израильский технологический институт ألتخنيون - معهد تكنولوجي لإسرائيل

02360370 - Parallel And Distributed Programming

חורף 2019-2020Winter 2019-2020Зима 2019-2020شتاء 2019-2020

שאלות ותשובות - HW1 Frequently Asked Questions - HW1 Вопросы и Ответы - HW1 أسئلة وأجوبة - HW1

		.. (לתיקייה המכילה)

When copying memory to the gpu i got error message CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES

This is happening due to the way the gpu code is compiled.
Every block has a limited amount of registers (hardware property).
When the code of a thread is compiled to be using a lot of registers, when trying to launch the code using too many threads (the requirement in this hw is 1024, which is the maximal amount), then there are not enough registers for all of the threads.

Solution is either write a code that will be compiled to less registers (shorter and simpler), or to limit the amount of registers per thread : @cuda.jit(max_registers=xxx)

Sometimes my code succeed to run, and sometimes it fails
There is more than one type of gpu in the server, a code that uses a lot of register per thread might work on the stronger gpu and not on the weaker ones.

I got error that looks like this: SLURM_NNODES environment variable conflicts with allocated node count (2 != 1).

There are two ways to run your code on the server:
First is to ask for resources to work with, and then execute you code using those resources - srun -c<xxx> --gres=gpu:<xxx> --pty bash
Second option is to insert your code to a job queue managed by the server - srun -K -c<xxx> --gres=gpu:<xxx> --pty python3 <main-file>.py

This error happens when you mix the two options: require for resources using the first option, and then trying to enter your code to the job queue.

שאלות ותשובות - HW1 Frequently Asked Questions - HW1 Вопросы и Ответы - HW1 أسئلة وأجوبة - HW1

When copying memory to the gpu i got error message CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES

Sometimes my code succeed to run, and sometimes it fails

I got error that looks like this: SLURM_NNODES environment variable conflicts with allocated node count (2 != 1).