I see. mokca is on the right path.
after running:
cryosparcw install-3dflex
Now it seems to be working.
However, there is a very long gcc error that happens, perhaps this should be investigated further.
I see. mokca is on the right path.
after running:
cryosparcw install-3dflex
Now it seems to be working.
However, there is a very long gcc error that happens, perhaps this should be investigated further.
Hey everyone,
Thanks for reporting.
If you’d like to run 3DFlex jobs, you will need to install the dependencies required via the install-3dflex
command as mentioned here:
There seems to be an issue with the installation on some systems, we’re working on an update to fix this.
Thanks @stephan …
We were able to install and start a run, but eventually get this error which seems related to GPU memory:
cryosparc_compute.jobs.flex_refine.flexmod.TetraSVFunction.forward torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.15 GiB (GPU 0; 10.76 GiB total capacity; 7.15 GiB already allocated; 940.94 MiB free; 9.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Is there a place we should change max_split_size_mb ?
Also, will you post back here when you have a fix for the 3D flex dips?
Many thanks!
Is it expected that the worker environment is using the system gcc/g++ instead of the conda version?
I am running into /usr/bin/gcc and cuda/nvcc from conda potential conflicts on ubuntu 20.04. I have remove the possible conflicting packages from ubuntu (apt remove nvidia-cuda-toolkit
…) but ./bin/cryosparcw install-3dflex
keeps failing.
Then trying to revert with:
cryoem@myrdal:~/cryosparc2/cryosparc_worker$ ./bin/cryosparcw forcedeps
yields:
...
gcc -pthread -B /home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/compiler_compat -Wl,--sysroot=/ -Wsign-compare -fwrapv -Wall -O3 -DNDEBUG -fPIC -DBOOST_ALL_NO_LIB=1 -DBOOST_THREAD_BUILD_DLL=1 -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_PYTHON_SOURCE=1 -Dboost=pycudaboost -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PACKAGE=pycuda -DPYGPU_PYCUDA=1 -DHAVE_CURAND=1 -Isrc/cpp -Ibpl-subset/bpl_subset -I/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/numpy/core/include -I/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m -c src/cpp/cuda.cpp -o build/temp.linux-x86_64-3.7/src/cpp/cuda.o
In file included from src/cpp/cuda.cpp:4:
src/cpp/cuda.hpp:14:10: fatal error: cuda.h: No such file or directory
14 | #include <cuda.h>
| ^~~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-3334l4mw/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-3334l4mw/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-8lgu7bn9/install-record.txt --single-version-externally-managed --compile --install-headers /home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m/pycuda Check the logs for full command output.
check_install_deps.sh: 59: ERROR: installing python failed.
I had to re-add nvidia-cuda-toolkit
and the system provided cuda10.1
maybe relates to this thread 3DFlex Dependencies; Building pycuda - #10 by qitsweauca
tru@myrdal:~$ dpkg -l nvidia-cuda-toolkit
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================-============-============-=================================
ii nvidia-cuda-toolkit 10.1.243-3 amd64 NVIDIA CUDA development toolkit
tru@myrdal:~$ dpkg -l gcc g++
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============-================-============-=================================
ii g++ 4:9.3.0-1ubuntu2 amd64 GNU C++ compiler
ii gcc 4:9.3.0-1ubuntu2 amd64 GNU C compiler
now I can revert to previous setup:
cryoem@myrdal:~/cryosparc2/cryosparc_worker$ ./bin/cryosparcw forcedeps
Checking dependencies...
Forcing dependencies to be reinstalled...
------------------------------------------------------------------------
Installing anaconda python...
------------------------------------------------------------------------
PREFIX=/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda
Unpacking payload ...
...
Extracting all conda packages...
------------------------------------------------------------------------
...................................................................................................................................................................................
------------------------------------------------------------------------
Done.
conda packages installation successful.
------------------------------------------------------------------------
Preparing to install all pip packages...
------------------------------------------------------------------------
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1.tar.gz
Preparing metadata (setup.py) ... done
Skipping wheel build for pycuda, due to binaries being disabled for it.
Installing collected packages: pycuda
Running setup.py install for pycuda ... done
Successfully installed pycuda-2020.1
------------------------------------------------------------------------
Done.
pip packages installation successful.
------------------------------------------------------------------------
Main dependency installation completed. Continuing...
------------------------------------------------------------------------
Completed.
Currently checking hash for ctffind
Forcing reinstall for dependency ctffind...
------------------------------------------------------------------------
ctffind 4.1.10 installation successful.
------------------------------------------------------------------------
Completed.
Currently checking hash for cudnn
Forcing reinstall for dependency cudnn...
------------------------------------------------------------------------
cudnn 8.1.0.77 for CUDA 11 installation successful.
------------------------------------------------------------------------
Completed.
Currently checking hash for gctf
Forcing reinstall for dependency gctf...
------------------------------------------------------------------------
Gctf v1.06 installation successful.
------------------------------------------------------------------------
Completed.
Completed dependency check.
Generating '/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/libtiff/tiff_h_4_4_0.py' from '/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/../include/tiff.h'
reverted system yields non function cryosparc with errors such as:
I use the fixed provided at 3DFlex Dependencies; Building pycuda - #15 by scaiola
replacing line 457 of cryosparc_worker/bin/cryosparcw
conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia
by:
conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia/label/cuda-11.7.0
and running cryosparc_worker/bin/cryosparcw install-3dflex
after, seems to have fixed everything and activate the 3dflex functionnality
This fix appears to work for us as well on a CentOS system.
Reference:
I did have to first revert the system:
./bin/cryosparcw forcedeps
Then edit cryosparcw and change from:
conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia
To:
conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia/label/cuda-11.7.0
Then finally run the 3dflex installer:
./bin/cryosparcw install-3dflex
No more lengthy errors, and users report that jobs appear ok so far.
Hi
No idea if this is related to the previous fix, here is an error reported by our users when using topaz:
UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors.
This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor.
You may want to copy the array to protect its data or make it writeable before converting it to a tensor.
This type of warning will be suppressed for the rest of this program.
(Triggered internally at /opt/conda/conda-bld/pytorch_1607370156314/work/torch/csrc/utils/tensor_numpy.cpp:141.)
and another one during extraction:
Thanks for the update!
Hey everyone,
I see that v4.1.1 addressed this, but I have run into this same ‘no torch’ issue in CryoSPARC v4.1.2 on Centos 7 when starting a 3DFlex Train job (Data Prep and Mesh Prep ran fine).
" ModuleNotFoundError: No module named ‘torch’ " is the precise wording in the log.
Before I attempt any of the above solutions, is there one that is currently recommended for v4.1.2?
Thanks!
Please try, without editing cryosparcw
,
/path/to/cryosparc_worker/bin/cryosparcw install-3dflex 2>&1 | tee install_3dflex.log
Does this work?
The command runs with the follow output:
Preparing transaction: …working… done
Verifying transaction: …working… done
Executing transaction: …working… done==> WARNING: A newer version of conda exists. <==
current version: 4.12.0
latest version: 23.1.0Please update conda by running
$ conda update -n base -c defaults conda
Found existing installation: pycuda 2020.1
Uninstalling pycuda-2020.1:
Successfully uninstalled pycuda-2020.1
Collecting torch
Downloading torch-1.13.1-cp38-cp38-manylinux1_x86_64.whl (887.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 887.4/887.4 MB 3.3 MB/s eta 0:00:00
Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.
Running 3DFlex Train fails upon initializing torch
This is using a Quadro P6000 with CUDA Version 10.1.243 (not sure what “found version 10020” maps to from the above error). Seems like the workstation needs an updated CUDA driver to properly install PyTorch?
EDIT: @wtempel seems like all jobs fail now on this machine, not just 3DFlex Train. See below for a Non-Uniform Refinement, and similar error seen for Ab Initio. Suggestion?
I had also run the command on a second workstation set up as a worker for the above workstation. The command ran with the following output:
Preparing transaction: …working… done
Verifying transaction: …working… done
Executing transaction: …working… done==> WARNING: A newer version of conda exists. <==
current version: 4.12.0
latest version: 23.1.0Please update conda by running
$ conda update -n base -c defaults conda
Found existing installation: pycuda 2020.1
Uninstalling pycuda-2020.1:
Successfully uninstalled pycuda-2020.1
Collecting torch
Downloading torch-1.13.1-cp38-cp38-manylinux1_x86_64.whl (887.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 887.4/887.4 MB 4.6 MB/s eta 0:00:00
Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.
I am able to run 3D Flex Train and other jobs without crashing (at least, thus far). The worker is Centos 7, GeForce RTX 2080 Ti with CUDA Version 10.2.89.
Hello, @wtempel. What commands are run during “install-3dflex” ? Currently I am getting Conda HTTP error due to the company firewall settings. Usually, when trying to create Conda environments, I need to add “–insecure” to bypass this issue. Is it possible to do so in this case?
@andreym you should able to do this by adding pypi.org as a trusted host in pip and disabling SSL verification in conda. Should be something like this:
cryosparcw call pip config set global.trusted-host pypi.org
SSL_NO_VERIFY=1 cryosparcw install-3dflex
Let me know how that goes
@nfrasser Thank you for the suggestion. I have tried it, unfortunately I am still running into the error:
" Installing 3D Flex Refine dependencies…
Collecting package metadata (current_repodata.json): failed
CondaHTTPError: HTTP 000 CONNECTION FAILED for url https://conda.anaconda.org/nvidia/label/cuda-11.8.0/linux-64/current_repodata.json
Elapsed: -
An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
It is not your proxy, as https://conda.anaconda.org/nvidia/label/cuda-11.8.0/linux-64/current_repodata.json lands at " The page you are looking for does not exist."
@andreym could you post the full output of the following command?
SSL_NO_VERIFY=1 cryosparcw call conda install -y cuda-nvcc=11.8 cuda-toolkit=11.8 -c nvidia/label/cuda-11.8.0 -v