Compiler version unsupported on worker

AlekS · October 26, 2023, 12:21pm

Hi, we have a master/worker workstation and we connected a new worker with a better GPU, they have a shared filesystem. It’s v4.3.0. The Flex3D jobs run well on this worker, but a NUR job failed with the following:
Traceback (most recent call last):
File “/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py”, line 2118, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 1028, in cryosparc_compute.engine.engine.process.work
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 99, in cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu
File “cryosparc_master/cryosparc_compute/engine/cuda_kernels.py”, line 1803, in cryosparc_compute.engine.cuda_kernels.prepare_real
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 425, in cryosparc_compute.engine.cuda_core.context_dependent_memoize.wrapper
File “cryosparc_master/cryosparc_compute/engine/cuda_kernels.py”, line 1707, in cryosparc_compute.engine.cuda_kernels.get_util_kernels
File “/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 290, in init
cubin = compile(source, nvcc, options, keep, no_extern_c,
File “/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File “/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 135, in compile_plain
raise CompileError(“nvcc compilation of %s failed” % cu_file_path,
pycuda.driver.CompileError: nvcc compilation of /tmp/tmpscq3vhrt/kernel.cu failed
[command: nvcc --cubin -arch sm_86 -I/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/cuda kernel.cu]
[stderr:
In file included from /home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/bin/…/include/cuda_runtime.h:83,
from :
/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/bin/…/include/crt/host_config.h:132:2: error: #error – unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag ‘-allow-unsupported-compiler’ can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
132 | #error – unsupported GNU version! gcc versions later than 11 are not supported! The nvcc flag ‘-allow-unsupported-compiler’ can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.
| ^~~~~
]

Do I need to export a specific environment variable or make a symlink somewhere?

wtempel · October 26, 2023, 4:25pm

Please can you post

the output (on master) of the command
```
cryosparcm cli "get_scheduler_targets()"
```
[edited: command correction]

the outputs of these commands on the new worker

/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/bin/cryosparcw call which nvcc
/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/bin/cryosparcw call nvcc --version
which c++
readlink -e $(which c++)
uname -a

the OS version on the new worker

AlekS · October 26, 2023, 7:14pm

Hello, the first command didn’t work:

-bash: syntax error near unexpected token `('

/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/bin/nvcc

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

/bin/c++

/usr/bin/x86_64-linux-gnu-g++-12

Linux marcelan 6.1.58 #1 SMP Tue Oct 17 15:49:06 EDT 2023 x86_64 GNU/Linux

This is debien 12.2.

wtempel · October 26, 2023, 7:40pm

Apologies. I forgot to include quotes in this command.

I will take a note of that.

I will check with our team for recommendations. It seems that a g++-11 package is available for your Debian version, but I am not sure how relevant this may be for making your OS compatible with CryoSPARC without disrupting other uses for this computer.

AlekS · October 30, 2023, 7:39pm

This is the output of the first command (with quotes):

[{‘cache_path’: ‘/localssd/asverzh’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 8513650688, ‘name’: ‘NVIDIA GeForce GTX 1070’}, {‘id’: 1, ‘mem’: 8514043904, ‘name’: ‘NVIDIA GeForce GTX 1070’}], ‘hostname’: ‘mazuelo.bcm.umontreal.ca’, ‘lane’: ‘default’, ‘name’: ‘mazuelo.bcm.umontreal.ca’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7]}, ‘ssh_str’: ‘asverzh@mazuelo.bcm.umontreal.ca’, ‘title’: ‘Worker node mazuelo.bcm.umontreal.ca’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/bin/cryosparcw’}, {‘cache_path’: ‘/localssd/{USER}’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: , ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘slurmcluster’, ‘lane’: ‘slurmcluster’, ‘name’: ‘slurmcluster’, ‘qdel_cmd_tpl’: ‘scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: ‘sinfo’, ‘qstat_cmd_tpl’: ‘squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘sbatch {{ script_path_abs }}’, ‘script_tpl’: ‘#!/bin/bash\n\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --partition=root\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --mem=0\n#SBATCH --ntasks={{ num_cpu }}\n\navailable_devs=“”\nfor devidx in (seq 0 3)\ndo\n if [[ -z (nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then\n if [[ -z “$available_devs” ]] ; then\n available_devs=$devidx\n else\n available_devs=$available_devs,$devidx\n fi\n fi\ndone\nexport CUDA_VISIBLE_DEVICES=$available_devs\n\nsrun {{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘slurmcluster’, ‘tpl_vars’: [‘job_uid’, ‘project_uid’, ‘run_cmd’, ‘cluster_job_id’, ‘command’, ‘job_log_path_abs’, ‘num_cpu’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/bin/cryosparcw’}, {‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 12638093312, ‘name’: ‘NVIDIA GeForce RTX 3060’}], ‘hostname’: ‘marcelan.bcm.umontreal.ca’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘marcelan.bcm.umontreal.ca’, ‘resource_fixed’: {‘SSD’: False}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23], ‘GPU’: [0], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, ‘ssh_str’: ‘asverzh@marcelan.bcm.umontreal.ca’, ‘title’: ‘Worker node marcelan.bcm.umontreal.ca’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/jmplab/asverzh/Software/cryoSPARC/cryosparc2_worker/bin/cryosparcw’}]

wtempel · November 8, 2023, 5:58pm

@AlekS You may want to try upgrading CryoSPARC to v4.4, which should no longer depend on your system-installed compiler.

AlekS · November 8, 2023, 9:40pm

Yes, after the update NUR is working, thanks!