No module named 'torch' after 4.1 upgrade

yodamoppet · December 13, 2022, 2:47pm

Greetings.

We upgraded a workstation to 4.1 and we are receiving the following error:

Traceback (most recent call last): File "cryosparc_master/cryosparc_compute/run.py", line 80, in cryosparc_compute.run.main File "/data/opt/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 443, in get_run_function runmod = importlib.import_module(".."+modname, __name__) File "/data/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1006, in _gcd_import File "<frozen importlib._bootstrap>", line 983, in _find_and_load File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 677, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 1050, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod ModuleNotFoundError: No module named 'torch'

Troubleshooting info:

Single Workstation
Current cryoSPARC version: v4.1.0

uname -a && free -g
Linux sitak.structbio.pitt.edu 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Tue Sep 13 18:09:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
              total        used        free      shared  buff/cache   available
Mem:             93           2           0           0          90          90
Swap:            31           0          31

/opt/local/cuda-11.3
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
(11, 3, 0)
Linux sitak.structbio.pitt.edu 4.18.0-372.26.1.el8_6.x86_64 #1 SMP Tue Sep 13 18:09:48 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
              total        used        free      shared  buff/cache   available
Mem:             93           2           0           0          90          90
Swap:            31           0          31
Tue Dec 13 09:46:16 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 30%   29C    P8    14W / 250W |     15MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 30%   31C    P8    19W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:41:00.0 Off |                  N/A |
| 30%   30C    P8     1W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:42:00.0 Off |                  N/A |
| 30%   41C    P8    20W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2360      G   /usr/libexec/Xorg                   9MiB |
|    0   N/A  N/A      2928      G   /usr/bin/gnome-shell                4MiB |
+-----------------------------------------------------------------------------+

mokca · December 13, 2022, 3:16pm

We have the same problem. Our machine is Ubuntu 22.04 with CUDA 11.4, upgraded from the most recent cryoSPARC version 3 straight to 4.1.

Edit: The instance test for Worker GPUs fails for both Tensorflow and Torch.

The Tensorflow error is

Testing Tensorflow...
[CPU: 525.0 MB]
    Tensorflow found 0 GPUs.

[CPU: 525.0 MB]
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 93, in cryosparc_compute.run.main
  File "/local_slow/cryosparc/cryosparc_worker/cryosparc_compute/jobs/instance_testing/run.py", line 174, in run_gpu_job
    assert devs == check_gpus, f"Tensorflow detected {devs} of {check_gpus} GPUs."
AssertionError: Tensorflow detected 0 of 4 GPUs.

The Torch error is:

Testing PyTorch...

[CPU: 270.9 MB]
Unable to import PyTorch. Run `cryosparcw install-3dflex` to install PyTorch.

[CPU: 270.9 MB]
Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 93, in cryosparc_compute.run.main
  File "/local_slow/cryosparc/cryosparc_worker/cryosparc_compute/jobs/instance_testing/run.py", line 183, in run_gpu_job
    import torch
ModuleNotFoundError: No module named 'torch'

The original error in 3D Flex Refine was:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 80, in cryosparc_compute.run.main
  File "/local_slow/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 443, in get_run_function
    runmod = importlib.import_module(".."+modname, __name__)
  File "/local_slow/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1050, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
ModuleNotFoundError: No module named 'torch'

uname -a && free -g
Linux pegasus 5.4.0-126-generic #142~18.04.1-Ubuntu SMP Thu Sep 1 16:25:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
              total        used        free      shared  buff/cache   available
Mem:            267         146          91           0          29         118
Swap:             1           0           1

cryosparc@pegasus:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0

cryosparc@pegasus:~$ nvidia-smi
Tue Dec 13 15:15:52 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:3B:00.0 Off |                  N/A |
|  0%   30C    P8    18W / 350W |      5MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:5E:00.0 Off |                  N/A |
|  0%   29C    P8    17W / 350W |      5MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:86:00.0 Off |                  N/A |
|  0%   29C    P8    17W / 350W |      5MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:AF:00.0 Off |                  N/A |
|  0%   30C    P8    23W / 350W |     23MiB / 24267MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4742      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      4742      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      4742      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      4742      G   /usr/lib/xorg/Xorg                  9MiB |
|    3   N/A  N/A      4972      G   /usr/bin/gnome-shell               12MiB |
+-----------------------------------------------------------------------------+

mjones1993 · December 13, 2022, 4:12pm

Same here. Torch error during 3D-flex training after 4.1 upgrade:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 80, in cryosparc_compute.run.main
  File "/lmb/home/mjones/soft/171122/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 443, in get_run_function
    runmod = importlib.import_module(".."+modname, __name__)
  File "/lmb/home/mjones/soft/171122/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1050, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
ModuleNotFoundError: No module named 'torch'

mokca · December 13, 2022, 4:25pm

By the way, if anyone is tempted to follow the recommendation in the error message from the instance GPU worker test

Run 'cryosparcw install-3dflex' to install PyTorch.`

I tried this and it produced the longest gcc error I’ve ever seen while trying to rebuild the pycuda wheel in the cryoSPARC Anaconda env.

The 3DFlex training seems to work so far though…

EDIT: There’s actually a help page about this: Installing 3D Flex dependencies.

One consideration is that you need CUDA 11.8.

ANOTHER EDIT: That page says you don’t need CUDA 11.8, the install-3dflex will install it itself. There’s another post in the Install, Configure, and Update forum about the error message you get with install-3dflex.

yodamoppet · December 13, 2022, 5:16pm

I see. mokca is on the right path.

after running:

cryosparcw install-3dflex

Now it seems to be working.

However, there is a very long gcc error that happens, perhaps this should be investigated further.

stephan · December 13, 2022, 6:59pm

Hey everyone,

Thanks for reporting.

If you’d like to run 3DFlex jobs, you will need to install the dependencies required via the install-3dflex command as mentioned here:

There seems to be an issue with the installation on some systems, we’re working on an update to fix this.

yodamoppet · December 13, 2022, 7:16pm

Thanks @stephan …

We were able to install and start a run, but eventually get this error which seems related to GPU memory:

cryosparc_compute.jobs.flex_refine.flexmod.TetraSVFunction.forward torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.15 GiB (GPU 0; 10.76 GiB total capacity; 7.15 GiB already allocated; 940.94 MiB free; 9.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there a place we should change max_split_size_mb ?

Also, will you post back here when you have a fix for the 3D flex dips?

Many thanks!

Tru · December 14, 2022, 5:22pm

Is it expected that the worker environment is using the system gcc/g++ instead of the conda version?

I am running into /usr/bin/gcc and cuda/nvcc from conda potential conflicts on ubuntu 20.04. I have remove the possible conflicting packages from ubuntu (apt remove nvidia-cuda-toolkit…) but ./bin/cryosparcw install-3dflex keeps failing.

Then trying to revert with:
cryoem@myrdal:~/cryosparc2/cryosparc_worker$ ./bin/cryosparcw forcedeps
yields:

...
    gcc -pthread -B /home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/compiler_compat -Wl,--sysroot=/ -Wsign-compare -fwrapv -Wall -O3 -DNDEBUG -fPIC -DBOOST_ALL_NO_LIB=1 -DBOOST_THREAD_BUILD_DLL=1 -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_PYTHON_SOURCE=1 -Dboost=pycudaboost -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PACKAGE=pycuda -DPYGPU_PYCUDA=1 -DHAVE_CURAND=1 -Isrc/cpp -Ibpl-subset/bpl_subset -I/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/numpy/core/include -I/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m -c src/cpp/cuda.cpp -o build/temp.linux-x86_64-3.7/src/cpp/cuda.o
    In file included from src/cpp/cuda.cpp:4:
    src/cpp/cuda.hpp:14:10: fatal error: cuda.h: No such file or directory
       14 | #include <cuda.h>
          |          ^~~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/python3.7 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-3334l4mw/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-3334l4mw/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-8lgu7bn9/install-record.txt --single-version-externally-managed --compile --install-headers /home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m/pycuda Check the logs for full command output.
check_install_deps.sh: 59: ERROR: installing python failed.

I had to re-add nvidia-cuda-toolkit and the system provided cuda10.1

maybe relates to this thread 3DFlex Dependencies; Building pycuda - #10 by qitsweauca

Tru · December 14, 2022, 5:26pm

tru@myrdal:~$ dpkg -l nvidia-cuda-toolkit
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                Version      Architecture Description
+++-===================-============-============-=================================
ii  nvidia-cuda-toolkit 10.1.243-3   amd64        NVIDIA CUDA development toolkit
tru@myrdal:~$ dpkg -l gcc g++
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version          Architecture Description
+++-==============-================-============-=================================
ii  g++            4:9.3.0-1ubuntu2 amd64        GNU C++ compiler
ii  gcc            4:9.3.0-1ubuntu2 amd64        GNU C compiler

now I can revert to previous setup:

cryoem@myrdal:~/cryosparc2/cryosparc_worker$ ./bin/cryosparcw forcedeps
Checking dependencies...
Forcing dependencies to be reinstalled...
  ------------------------------------------------------------------------
  Installing anaconda python...
  ------------------------------------------------------------------------
PREFIX=/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda
Unpacking payload ...
...
  Extracting all conda packages...
  ------------------------------------------------------------------------
...................................................................................................................................................................................
  ------------------------------------------------------------------------
    Done.
    conda packages installation successful.  
  ------------------------------------------------------------------------
  Preparing to install all pip packages...   
  ------------------------------------------------------------------------
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1.tar.gz
  Preparing metadata (setup.py) ... done
Skipping wheel build for pycuda, due to binaries being disabled for it.
Installing collected packages: pycuda
    Running setup.py install for pycuda ... done
Successfully installed pycuda-2020.1
  ------------------------------------------------------------------------
    Done.
    pip packages installation successful.
  ------------------------------------------------------------------------
  Main dependency installation completed. Continuing...
  ------------------------------------------------------------------------
Completed.
Currently checking hash for ctffind
Forcing reinstall for dependency ctffind...  
  ------------------------------------------------------------------------
  ctffind 4.1.10 installation successful.
  ------------------------------------------------------------------------
Completed.
Currently checking hash for cudnn
Forcing reinstall for dependency cudnn...
  ------------------------------------------------------------------------
  cudnn 8.1.0.77 for CUDA 11 installation successful.
  ------------------------------------------------------------------------
Completed.
Currently checking hash for gctf
Forcing reinstall for dependency gctf...
  ------------------------------------------------------------------------
  Gctf v1.06 installation successful.
  ------------------------------------------------------------------------
Completed.
Completed dependency check.
Generating '/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/libtiff/tiff_h_4_4_0.py' from '/home/cryoem/cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/../include/tiff.h'

Tru · December 16, 2022, 11:09am

reverted system yields non function cryosparc with errors such as:

I use the fixed provided at 3DFlex Dependencies; Building pycuda - #15 by scaiola

replacing line 457 of cryosparc_worker/bin/cryosparcw

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia

by:

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia/label/cuda-11.7.0

and running cryosparc_worker/bin/cryosparcw install-3dflex after, seems to have fixed everything and activate the 3dflex functionnality

yodamoppet · December 16, 2022, 2:39pm

This fix appears to work for us as well on a CentOS system.

Reference:

I did have to first revert the system:

./bin/cryosparcw forcedeps

Then edit cryosparcw and change from:

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia

To:

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia/label/cuda-11.7.0

Then finally run the 3dflex installer:

./bin/cryosparcw install-3dflex

No more lengthy errors, and users report that jobs appear ok so far.

Tru · December 19, 2022, 12:39pm

Hi

No idea if this is related to the previous fix, here is an error reported by our users when using topaz:

UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors.
This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor.
You may want to copy the array to protect its data or make it writeable before converting it to a tensor.
This type of warning will be suppressed for the rest of this program.
(Triggered internally at   /opt/conda/conda-bld/pytorch_1607370156314/work/torch/csrc/utils/tensor_numpy.cpp:141.)

and another one during extraction:

wtempel · December 20, 2022, 4:48pm

We have just released CryoSPARC v4.1.1 to address this issue.

yodamoppet · December 21, 2022, 3:05pm

Thanks for the update!

sergei.pourmal · February 14, 2023, 5:27am

Hey everyone,

I see that v4.1.1 addressed this, but I have run into this same ‘no torch’ issue in CryoSPARC v4.1.2 on Centos 7 when starting a 3DFlex Train job (Data Prep and Mesh Prep ran fine).

" ModuleNotFoundError: No module named ‘torch’ " is the precise wording in the log.

Before I attempt any of the above solutions, is there one that is currently recommended for v4.1.2?

Thanks!

wtempel · February 14, 2023, 2:13pm

Please try, without editing cryosparcw,

/path/to/cryosparc_worker/bin/cryosparcw install-3dflex 2>&1 | tee install_3dflex.log

Does this work?

sergei.pourmal · February 14, 2023, 8:54pm

The command runs with the follow output:

Preparing transaction: …working… done
Verifying transaction: …working… done
Executing transaction: …working… done

==> WARNING: A newer version of conda exists. <==
current version: 4.12.0
latest version: 23.1.0

Please update conda by running
$ conda update -n base -c defaults conda
Found existing installation: pycuda 2020.1
Uninstalling pycuda-2020.1:
Successfully uninstalled pycuda-2020.1
Collecting torch
Downloading torch-1.13.1-cp38-cp38-manylinux1_x86_64.whl (887.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 887.4/887.4 MB 3.3 MB/s eta 0:00:00
Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

Running 3DFlex Train fails upon initializing torch

This is using a Quadro P6000 with CUDA Version 10.1.243 (not sure what “found version 10020” maps to from the above error). Seems like the workstation needs an updated CUDA driver to properly install PyTorch?

EDIT: @wtempel seems like all jobs fail now on this machine, not just 3DFlex Train. See below for a Non-Uniform Refinement, and similar error seen for Ab Initio. Suggestion?

sergei.pourmal · February 14, 2023, 9:55pm

I had also run the command on a second workstation set up as a worker for the above workstation. The command ran with the following output:

Preparing transaction: …working… done
Verifying transaction: …working… done
Executing transaction: …working… done

==> WARNING: A newer version of conda exists. <==
current version: 4.12.0
latest version: 23.1.0

Please update conda by running
$ conda update -n base -c defaults conda
Found existing installation: pycuda 2020.1
Uninstalling pycuda-2020.1:
Successfully uninstalled pycuda-2020.1
Collecting torch
Downloading torch-1.13.1-cp38-cp38-manylinux1_x86_64.whl (887.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 887.4/887.4 MB 4.6 MB/s eta 0:00:00
Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

I am able to run 3D Flex Train and other jobs without crashing (at least, thus far). The worker is Centos 7, GeForce RTX 2080 Ti with CUDA Version 10.2.89.

wtempel · February 14, 2023, 10:38pm

Please ensure your nvidia driver version is at least v 460 (see guide)

andreym · June 27, 2023, 4:14pm

Hello, @wtempel. What commands are run during “install-3dflex” ? Currently I am getting Conda HTTP error due to the company firewall settings. Usually, when trying to create Conda environments, I need to add “–insecure” to bypass this issue. Is it possible to do so in this case?