I’m once again trying to update our cluster to version 4 from stable 3.4. The master process is fine, but installing the worker process results in error below.
I’d prefer to complete the upgrade as we’re eager to use new features in V4. Can you offer any ideas or solution to the error?
-------------------------------
------------------------------------------------------------------------
Preparing to install all pip packages...
------------------------------------------------------------------------
DEPRECATION: --no-binary currently disables reading from the cache of locally built wheels. In the future --no-binary will not influence the wheel cache. pip 23.1 will enforce this behaviour change. A possible replacement is to use the --no-cache-dir option. You can use the flag --use-feature=no-binary-enable-wheel-cache to test the upcoming behaviour. Discussion can be found at https://github.com/pypa/pip/issues/11453
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1.tar.gz
Preparing metadata (setup.py) ... done
Installing collected packages: pycuda
DEPRECATION: pycuda is being installed using the legacy 'setup.py install' method, because the '--no-binary' option was enabled for it and this currently disables local wheel building for projects that don't have a 'pyproject.toml' file. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/11451
Running setup.py install for pycuda ... error
error: subprocess-exited-with-error
× Running setup.py install for pycuda did not run successfully.
│ exit code: 1
╰─> [6718 lines of output]
*************************************************************
*** I have detected that you have not run configure.py.
*************************************************************
*** Additionally, no global config files were found.
*** I will go ahead with the default configuration.
*** In all likelihood, this will not work out.
***
*** See README_SETUP.txt for more information.
***
*** If the build does fail, just re-run configure.py with the
*** correct arguments, and then retry. Good luck!
*************************************************************
*** HIT Ctrl-C NOW IF THIS IS NOT WHAT YOU WANT
*************************************************************
Continuing in 10 seconds...
**** then hundreds of lines of various messages ending with:
gcc -pthread -B /opt/cryoem/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/compiler_compat -Wno-unused-result -Wsign-compare -fwrapv -Wall -O3 -DNDEBUG -fPIC -DBOOST_ALL_NO_LIB=1 -DBOOST_THREAD_BUILD_DLL=1 -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_PYTHON_SOURCE=1 -Dboost=pycudaboost -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PACKAGE=pycuda -DPYGPU_PYCUDA=1 -DHAVE_CURAND=1 -Isrc/cpp -Ibpl-subset/bpl_subset -I/cm/shared/apps/cuda11.8/toolkit/11.8.0/include -I/opt/cryoem/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/include -I/opt/cryoem/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.8 -c src/cpp/cuda.cpp -o build/temp.linux-x86_64-cpython-38/src/cpp/cuda.o
In file included from src/cpp/cuda.cpp:4:
src/cpp/cuda.hpp:23:10: fatal error: cudaProfiler.h: No such file or directory
#include <cudaProfiler.h>
^~~~~~~~~~~~~~~~
compilation terminated.
error: command '/cm/local/apps/gcc/8.2.0/bin/gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> pycuda
note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.
check_install_deps.sh: 66: ERROR: installing python failed.
ls -l /cm/shared/apps/cuda11.8/toolkit/11.8.0/
ls -l /cm/shared/apps/cuda11.8/toolkit/11.8.0/include/
Somewhat related: we recently became aware of performance issues on CryoSPARC instances configured with CUDA-11.8 that may be avoided by configuring CUDA-11.7 instead.
and possibly other files that are present in an installation (without root privileges) from a runfile that I looked at.
If your installation is, directly or indirectly, package manager-based, could there be a “profiler” package missing (similar to this discussion)
We are working on a fix for the 11.8 issue. Because we have not determined a release date, I recommend using an earlier toolkit version.
If you meet the requirements shown here, you could try /cryosparcw install-3dflex
which currently would (among other items) download and install the toolkit’s v11.7.
Hmm, indeed they are missing. I will correct this problem, but I’m also going to roll back to Cuda 11.2 based on your other advice. It is installed already, and has both cuda_profiler_api.h and cudaProfiler.h present as indicated.
This appears to have built the worker process correctly. I will ask users to test and report back.