Nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

FraaaMazzz1 · January 21, 2022, 4:21pm

When working on getting a 2D classification I am getting the following error:

License is valid.



Launching job on lane default target cryoem-Pro-WS-C621-64L-SAGE-Series ...

Running job on master node hostname cryoem-Pro-WS-C621-64L-SAGE-Series

[CPU: 80.4 MB]   Project P1 Job J10 Started

[CPU: 80.4 MB]   Master running v3.3.1+220118, worker running v3.3.1+220118

[CPU: 80.6 MB]   Working in directory: /home/cryo-em/Desktop/software/scipion_projects/ApoferTestFra-cryo-em/P1/J10

[CPU: 80.6 MB]   Running on lane default

[CPU: 80.6 MB]   Resources allocated: 

[CPU: 80.6 MB]     Worker:  cryoem-Pro-WS-C621-64L-SAGE-Series

[CPU: 80.6 MB]     CPU   :  [0, 1]

[CPU: 80.6 MB]     GPU   :  [0]

[CPU: 80.6 MB]     RAM   :  [0, 1, 2]

[CPU: 80.6 MB]     SSD   :  False

[CPU: 80.6 MB]   --------------------------------------------------------------

[CPU: 80.6 MB]   Importing job module for job type class_2D...

[CPU: 234.2 MB]  Job ready to run

[CPU: 234.2 MB]  ***************************************************************

[CPU: 238.6 MB]  Using random seed of 1940899841

[CPU: 238.6 MB]  Loading a ParticleStack with 3513 items...

[CPU: 239.0 MB]    Done.

[CPU: 239.0 MB]  Windowing particles

[CPU: 239.0 MB]    Done.

[CPU: 239.0 MB]  Using 50 classes.

[CPU: 240.1 MB]  Computing 2D class averages: 

[CPU: 240.1 MB]    Volume Size: 64 (voxel size 2.32A)

[CPU: 240.1 MB]    Zeropadded Volume Size: 128

[CPU: 240.1 MB]    Data Size: 120 (pixel size 1.24A)

[CPU: 240.1 MB]    Using Reconstruction Resolution: 6.00A (24.0 radius)

[CPU: 240.1 MB]    Using Alignment Resolution: 6.00A (24.0 radius)

[CPU: 240.1 MB]    Windowing only corners of 2D classes at each iteration.

[CPU: 240.1 MB]  Using random seed for initialization of 307481108

[CPU: 249.9 MB]    Done in 0.475s.

[CPU: 286.1 MB]  Start of Iteration 0

[CPU: 286.1 MB]  

[CPU: 352.9 MB]  Traceback (most recent call last):
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/tools.py", line 429, in context_dependent_memoize
    return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7efd6da64450>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/class2D/run.py", line 323, in cryosparc_compute.jobs.class2D.run.run_class_2D
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 964, in cryosparc_compute.engine.engine.process
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 974, in cryosparc_compute.engine.engine.process
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 156, in cryosparc_compute.engine.cuda_core.allocate_gpu
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 549, in fill
    func = elementwise.get_fill_kernel(self.dtype)
  File "<decorator-gen-13>", line 2, in get_fill_kernel
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/tools.py", line 433, in context_dependent_memoize
    result = func(*args)
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 498, in get_fill_kernel
    "fill")
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 163, in get_elwise_kernel
    arguments, operation, name, keep, options, **kwargs)
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 149, in get_elwise_kernel_and_types
    keep, options, **kwargs)
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 76, in get_elwise_module
    options=options, keep=keep, no_extern_c=True)
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 291, in __init__
    arch, code, cache_dir, include_dirs)
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 254, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir, target)
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 78, in compile_plain
    checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
  File "/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 55, in preprocess_source
    cmdline, stderr=stderr)
pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmpcc12fscs.cu failed
[command: nvcc --preprocess -arch sm_86 -I/home/cryo-em/Desktop/software/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/cuda /tmp/tmpcc12fscs.cu --compiler-options -P]
[stderr:
b"nvcc fatal   : Value 'sm_86' is not defined for option 'gpu-architecture'\n"]

I have read of similar problems (here andhere) but cannot understand how to solve it in this case.
It should be noted that I have already tried launching cryosparcm patch but did not solve the problem

Possible useful information could be the following:

±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.86 Driver Version: 470.86 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … Off | 00000000:17:00.0 Off | N/A |
| 30% 33C P8 20W / 350W | 10MiB / 24268MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … Off | 00000000:65:00.0 On | N/A |
| 30% 41C P8 39W / 350W | 364MiB / 24245MiB | 8% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2645 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3158 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2645 G /usr/lib/xorg/Xorg 53MiB |
| 1 N/A N/A 3158 G /usr/lib/xorg/Xorg 175MiB |
| 1 N/A N/A 3287 G /usr/bin/gnome-shell 51MiB |
| 1 N/A N/A 6338 G …mviewer/tv_bin/TeamViewer 15MiB |
| 1 N/A N/A 604397 G …AAAAAAAAA= --shared-files 49MiB |
±----------------------------------------------------------------------------+

Hope someone can help me!

wtempel · January 21, 2022, 6:10pm

@FraaaMazzz1 On the cryoSPARC worker, which version of CUDA has been configured, and what’s the output of uname -a?

FraaaMazzz1 · January 21, 2022, 8:48pm

When I installed cryoSparc this is the path I gave for CUDA:

--cudapath /usr/lib/cuda\

But I’m not sure how to check which version of CUDA cryoSPARC worker has been configured with. Could you tell me how to check it?

Secondly:

uname -a

prints the following:

Linux cryoem-Pro-WS-C621-64L-SAGE-Series 5.13.0-27-generic #29~20.04.1-Ubuntu SMP Fri Jan 14 00:32:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Mo_O · January 23, 2022, 9:28am

Hi FraaaMazzz1,

i think your worker node does not use the correct cuda path.

Update your cuda path on the worker node via
./bin/cryosparcw newcuda
In your case i think ./bin/cryosparcw newcuda /usr/local/cuda-11.4/

Check with the command “nvcc --version” if you are using the right cuda version or if it is in your $PATH

You can also check the libs with this command
ldconfig -p | grep cuda

If nothing is linked to the right version of cuda-11.4 version add it to your $LD_LIBRARY_PATH

Best,

Mo

FraaaMazzz1 · January 23, 2022, 10:22am

Hi Mo_O,
Thank you for your help!! But I think my computer is confusing me a bit…

nvcc --version prints:
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243
That means that I have CUDA 10.1 which might be the reason why cryosaprc is failing.
If I try to update Nvidia-toolkit (with sudo apt install nvidia-cuda-toolkit) it tells me I am the newest version.

In my /usr/local I have cuda and it does not specify the version. I tried updating my CUDA environment as you suggested anyway (./bin/cryosparcw newcuda /usr/local/cuda/) but it did not solve the problem.

If it can help, this is my $PATH:

bash: /home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/Desktop/software/cryosparc_master/bin:/home/cryo-em/anaconda3/condabin:/usr/lib/cuda/bin:/home/cryo-em/.local/bin:/usr/lib/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin: No such file or directory

this is my $LD_LIBRARY_PATH:

bash: /usr/lib/cuda/lib64:/usr/lib/cuda/lib64:: No such file or directory

and this these are my libs:

libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66
libicudata.so.66 (ELF) => /lib/i386-linux-gnu/libicudata.so.66
libcudart.so.10.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudart.so.10.1
libcudart.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudart.so
libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
libcuda.so.1 (libc6) => /lib/i386-linux-gnu/libcuda.so.1
libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so
libcuda.so (libc6) => /lib/i386-linux-gnu/libcuda.so

Mo_O · January 23, 2022, 10:46am

you can install the cuda 11.4 by downloading it to the worker node.
wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run

Install it
sudo sh cuda_11.4.3_470.82.01_linux.run

Then i think you should see the folder /usr/local/cuda-11.4
Now update cryosparcw to the new path with
./bin/cryosparcw newcuda /usr/local/cuda-11.4/

Put /usr/local/cuda-11.4 into your $PATH
Put /usr/local/cuda-11.4/lib64 into your $LD_LIBRARY_PATH

Hope it helps.

FraaaMazzz1 · January 24, 2022, 5:39pm

Running sudo sh cuda_11.4.3_470.82.01_linux.run gave me a working message: “Existing package manager installation of the driver found. It is strongly recommended that you remove this before continuing”. If I select continue, it fails with this error message: Installation failed. See log at /var/log/cuda-installer.log for details.

The log fail is the following:

GNU nano 4.8 /var/log/cuda-installer.log
[INFO]: Driver installation detected by command: apt list --installed | grep -e nvidia-driver-[0-9][0-9][0-9] -e nvidia-[0-9][0->
[INFO]: Cleaning up window
[INFO]: Complete
[INFO]: Checking compiler version…
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 8.4.0 (Ubuntu 8.4.0-3ubuntu2)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 470.82.01
[INFO]: Executing NVIDIA-Linux-x86_64-470.82.01.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version->
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 470.82.01 failed, quitting

So I tried installing cuda via the commands indicated here. This allowed me to have the folder /usr/local/cuda-11.6. So I updated my $PATH and $LD_LIBRARY_PATH

Now cryosparc seems to be failing with a different error message:

AssertionError: {‘code’: 500, ‘data’: None, ‘message’: “OtherError: cryoem-pro-ws-c621-64l-sage-series:39001: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 61ea9dce1b17e1ecffacdbc7, topology_type: Single, servers: [<ServerDescription (‘cryoem-pro-ws-c621-64l-sage-series’, 39001) server_type: Unknown, rtt: None, error=AutoReconnect(‘cryoem-pro-ws-c621-64l-sage-series:39001: [Errno 111] Connection refused’)>]>”, ‘name’: ‘OtherError’}

wtempel · January 24, 2022, 7:55pm

You may sidestep system-wide reconfigurations and conflicts by installing the CUDA toolkit as a non-root user in a custom location of your choice, as long as the cryosparc system user can access that location.
An example non-root installation is described in another forum.
You should adjust parameters as needed and, as @Mo_O mentioned, run cryosparcw newcuda upon successful toolkit installation, specifying the relevant path to the toolkit.
You may want to try CUDA version 11.2 for a balance of version 11 features and cryoSPARC compatibility.

FraaaMazzz1 · January 25, 2022, 12:54pm

I think I solved the problem.
I removed cuda 10.1 (following the indication reported here) and rebooted my computer. I think I’m still working with CUDA 11.6 (could not get it to point on 11.2). But I don’t get any errors (for now), so it seems to work!
Thank you!!!