Problems installing Flex Dependencies

I’m having trouble installing the Flex Dependencies and haven’t found a solution in other posts which works for me. I’m on RHEL7.9, cuda is 11.7.1, and the Nvidia driver version is 515.65.01. When I run the install, everything appears to work until the very end, when it the says it can’t detect any gpus:

Collecting torch
  Using cached torch-1.13.1-cp38-cp38-manylinux1_x86_64.whl (887.4 MB)
Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

Nvidia-smi reports 4 gpus as usual, so the problem must be with PyTorch:

Fri Feb 24 17:58:19 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:05:00.0 Off |                  N/A |
| 26%   42C    P8     7W / 180W |    152MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

The confirmation tests also fail.

Any suggestions what’s wrong here and how to fix it?

Thanks,
-jh-

These 4 gpus are 1080’s with only 8 GB of ram. Could it be that these are simply not powerful enough to run Flex? I see the minimum listed requirement is now 11 GB of vram with V100’s or 2080Ti’s recommended.

Thanks,
-jh-

Regarding GPU tasks with lower VRAM requirements:

This message can be ignored if no other errors occurred during installation. Have you tested any GPU jobs?

Thanks, but the installation is definitely not okay. The validation tests fail:

> cryosparcm test workers P1 --test gpu --test-pytorch
Using project P1
Specifying gpu test
Enabling PyTorch test
Running worker tests...
Traceback (most recent call last):
  File "/home/cryosparc_user/V3.X/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/cryosparc_user/V3.X/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_command/instance_tests/worker_test.py", line 309, in <module>
    execute_tests(args.project, test_type, args.targets, log_level, args.test_tensorflow, args.test_pytorch)
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_command/instance_tests/worker_test.py", line 176, in execute_tests
    workspace_uid = get_testing_workspace(project_uid, cli)
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_command/instance_tests/worker_test.py", line 60, in get_testing_workspace
    workspace_uid = cli.create_empty_workspace(
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_tools/cryosparc/command.py", line 112, in func
    assert "error" not in res, f'Error for "{key}" with params {params}:\n' + format_server_error(res["error"])
AssertionError: Error for "create_empty_workspace" with params {'project_uid': 'P1', 'created_by_user_id': 'instance_tester', 'title': 'Instance Testing on 2023-03-02 00:50:35.098922'}:
ServerError: validation error: lock file for P1 not found at /home/cryosparc_user/P1/cs.lock
Traceback (most recent call last):
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_command/commandcommon.py", line 200, in wrapper
    res = func(*args, **kwargs)
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_command/command_core/__init__.py", line 4432, in create_empty_workspace
    assert check_project_exists(project_uid), f"Project {project_uid} does not exist."
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_command/commandcommon.py", line 191, in wrapper
    return func(*args, **kwargs)
  File "/home/cryosparc_user/V3.X/cryosparc_master/cryosparc_command/commandcommon.py", line 251, in wrapper
    assert os.path.isfile(
AssertionError: validation error: lock file for P1 not found at /home/cryosparc_user/P1/cs.lock

as does a simple 2D classification which ran fine previously:

[CPU:   2.04 GB  Avail: 121.09 GB]
Traceback (most recent call last):
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/tools.py", line 429, in context_dependent_memoize
    return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7ff8882f14a0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/class2D/run.py", line 336, in cryosparc_compute.jobs.class2D.run.run_class_2D
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 964, in cryosparc_compute.engine.engine.process
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 974, in cryosparc_compute.engine.engine.process
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 156, in cryosparc_compute.engine.cuda_core.allocate_gpu
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/gpuarray.py", line 549, in fill
    func = elementwise.get_fill_kernel(self.dtype)
  File "<decorator-gen-13>", line 2, in get_fill_kernel
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/tools.py", line 433, in context_dependent_memoize
    result = func(*args)
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py", line 493, in get_fill_kernel
    return get_elwise_kernel(
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py", line 162, in get_elwise_kernel
    mod, func, arguments = get_elwise_kernel_and_types(
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py", line 148, in get_elwise_kernel_and_types
    mod = module_builder(arguments, operation, name,
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py", line 45, in get_elwise_module
    return SourceModule("""
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 290, in __init__
    cubin = compile(source, nvcc, options, keep, no_extern_c,
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 254, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir, target)
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 78, in compile_plain
    checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
  File "/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 54, in preprocess_source
    raise CompileError("nvcc preprocessing of %s failed" % source_path,
pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmp0zt44t4o.cu failed
[command: nvcc --preprocess -arch sm_61 -I/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/cuda /tmp/tmp0zt44t4o.cu --compiler-options -P]
[stderr:
b'In file included from <command-line>:0:0:\n/usr/include/stdc-predef.h:40:1: fatal error: cuda_runtime.h: No such file or directory\n #endif\n ^\ncompilation terminated.\n']

Removing the Flex dependencies and reinstalling 4.1.2 worker cures the problem and we can again access the gpus.

Thanks,
-jh-

For now I assume that

is unrelated to the pycuda/3dflex dependencies issue. For the project lock file issue, please review existing forum discussions and/or open a new topic as needed.
Regarding 3dflex dependencies installation, please can you post the output of these commands (executed as cryosparc_user):

env | grep PATH
which nvcc
/home/cryosparc_user/V3.X/cryosparc_worker/bin/cryosparcw call which nvcc

Thanks.

You’re absolutely right about the P1 lock issue. I figured that one out last night. I’ve posted the outputs your requested below. A few additional comments may be relevant… 1) We also have a separate, global conda install which was used, among other things, to add Topaz 0.25. 2) I’ve updated to 4.2, but that made no difference. 3) Per suggestions in other posts, I’ve tried both installing the Flex dependencies with no cuda / nvcc on the PATH or LD_LIBRARY_PATH or in the environment per your installation instructions, or with our global cuda environment visible but with it set to 11.7.1 to match what Flex expects. 4) Currently, the global cuda is visible and the outputs you requested reflect that.

Thanks!

env | grep PATH
MANPATH=/usr/local/Particle/man:/usr/local/IMOD/man:/usr/local/Particle/man:/usr/local/IMOD/man:/usr/share/lmod/lmod/share/man:/usr/local/Particle/man:/usr/local/Particle/man:/usr/local/IMOD/man:/usr/local/share/man:/usr/share/man/overrides:/usr/share/man
__LMOD_REF_COUNT_MODULEPATH=/etc/modulefiles:1;/usr/share/modulefiles:1;/usr/local/EM_modulefiles:1;/usr/share/modulefiles/Linux:1;/usr/share/modulefiles/Core:1;/usr/share/lmod/lmod/modulefiles/Core:1
MODULEPATH_ROOT=/usr/share/modulefiles
CDPATH=.:~
LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:/software/lib::
PATH=/home/cryosparc_user/V3.X/cryosparc_master/bin:/opt/conda/envs/topaz/bin:/usr/local/cuda-11.7/bin:/software/bin:/etc/profile.d/bin:/home/cryosparc_user/perl5/bin:/usr/local/Particle/bin:/usr/lib64/openmpi/bin:/home/heumannj/bin:/sbin:/opt/conda/condabin:/usr/lib64/qt-3.3/bin:/home/heumannj/perl5/bin:/usr/local/IMOD/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/IMOD/pythonLink:/opt/conda/bin
MODULEPATH=/etc/modulefiles:/usr/share/modulefiles:/usr/local/EM_modulefiles:/usr/share/modulefiles/Linux:/usr/share/modulefiles/Core:/usr/share/lmod/lmod/modulefiles/Core
__LMOD_REF_COUNT_PATH=/opt/conda/envs/topaz/bin:1;/usr/local/cuda-11.7/bin:1;/software/bin:3;/etc/profile.d/bin:3;/home/cryosparc_user/perl5/bin:1;/usr/local/Particle/bin:3;/usr/lib64/openmpi/bin:3;/home/heumannj/bin:2;/sbin:2;/opt/conda/condabin:1;/usr/lib64/qt-3.3/bin:1;/home/heumannj/perl5/bin:1;/usr/local/IMOD/bin:1;/usr/local/bin:1;/usr/bin:1;/usr/local/sbin:1;/usr/sbin:1;/usr/local/IMOD/pythonLink:1;/opt/conda/bin:1
PYTHONPATH=/etc/profile.d/protocols:/://software/lib:/etc/profile.d/protocols:/://software/lib:/etc/profile.d/protocols:/://software/lib:
__LMOD_REF_COUNT_LD_LIBRARY_PATH=/usr/local/cuda-11.7/lib64:1;/software/lib:1
QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins

which nvcc
/usr/local/cuda-11.7/bin/nvcc

/home/cryosparc_user/V3.X/cryosparc_worker/bin/cryosparcw call which nvcc
/home/cryosparc_user/V3.X/cryosparc_worker
/usr/local/cuda-11.7/bin/nvcc

One other comment which may be relevant. I’d be surprised if someone else hadn’t run into this before if this were the cause of the problem, but RHEL7.9 uses an older version of glibc than most other current Linux distros:

ldd --version
ldd (GNU libc) 2.17

Could this be causing the problem?

Regards,
-jh-

Please ensure this environment is not active

  • when CryoSPARC is started
  • inside any fresh shell that may be started for cryosparc_user

There are cuda-related directories in the PATH and LD_LIBRARY_PATH definition. Their absence is currently a prerequisite for cryosparcw install-3dflex.

As I mentioned before, I’d already tried the install both with and without cuda on PATH and LD_LIBRARY_PATH and gotten the same results either way. (Another post said they could only get the install to work with them present and 11.7.1 as the global cuda install).

It turns out the extra conda on the PATH and in other variables was coming from the fact that this system is using environment modules; those settings were created by system-wide initialization files for all users in /etc/profile.d. I’ve modified these files so cryosparc_user is excluded. The cryosparc environment now looks clean:

cryosparc_user@lomatia cryosparc_worker]$ env | grep PATH
MANPATH=/usr/local/Particle/man:/usr/local/Particle/man:/usr/local/IMOD/man:/usr/local/share/man:/usr/share/man/overrides:/usr/share/man
LD_LIBRARY_PATH=//software/lib:
PATH=/home/cryosparc_user/V3.X/cryosparc_master/bin:/://software/bin:/etc/profile.d/bin:/usr/lib64/qt-3.3/bin:/home/cryosparc_user/perl5/bin:/usr/lib64/openmpi/bin:/usr/local/Particle/bin:/usr/local/IMOD/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/IMOD/pythonLink:/home/cryosparc_user/.local/bin:/home/cryosparc_user/bin
PYTHONPATH=/etc/profile.d/protocols:/://software/lib:
QT_PLUGIN_PATH=/usr/lib64/kde4/plugins:/usr/lib/kde4/plugins

[cryosparc_user@lomatia cryosparc_worker]$ which nvcc
/usr/bin/which: no nvcc in (/home/cryosparc_user/V3.X/cryosparc_master/bin:/://software/bin:/etc/profile.d/bin:/usr/lib64/qt-3.3/bin:/home/cryosparc_user/perl5/bin:/usr/lib64/openmpi/bin:/usr/local/Particle/bin:/usr/local/IMOD/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/IMOD/pythonLink:/home/cryosparc_user/.local/bin:/home/cryosparc_user/bin)

[cryosparc_user@lomatia cryosparc_worker]$ home/cryosparc_user/V3.X/cryosparc_worker/bin/cryosparcw call which nvcc
which: no nvcc in (deps/anaconda/pkgs/cuda-nvcc-11.7.99-0/bin:/home/cryosparc_user/V3.X/cryosparc_worker/bin:/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/condabin:/home/cryosparc_user/V3.X/cryosparc_master/bin:/://software/bin:/etc/profile.d/bin:/usr/lib64/qt-3.3/bin:/home/cryosparc_user/perl5/bin:/usr/lib64/openmpi/bin:/usr/local/Particle/bin:/usr/local/IMOD/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/IMOD/pythonLink:/home/cryosparc_user/.local/bin:/home/cryosparc_user/bin)

Reinstalling 4.2 with --override succeeds on the master, but now fails on the worker with

 gcc -pthread -B /home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/compiler_compat -Wno-unused-result -Wsign-compare -fwrapv -Wall -O3 -DNDEBUG -fPIC -DBOOST_ALL_NO_LIB=1 -DBOOST_THREAD_BUILD_DLL=1 -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_PYTHON_SOURCE=1 -Dboost=pycudaboost -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PACKAGE=pycuda -DPYGPU_PYCUDA=1 -DHAVE_CURAND=1 -Isrc/cpp -Ibpl-subset/bpl_subset -I/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/include -I/home/cryosparc_user/V3.X/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.8 -c src/cpp/cuda.cpp -o build/temp.linux-x86_64-cpython-38/src/cpp/cuda.o
  In file included from src/cpp/cuda.cpp:4:0:
  src/cpp/cuda.hpp:14:18: fatal error: cuda.h: No such file or directory
   #include <cuda.h>
                    ^
  compilation terminated.
  error: command '/bin/gcc' failed with exit code 1
  [end of output]

Please can you

  • post the full cryosparc_worker/install.sh command you used (with license id redacted)
  • confirm that the host on which you ran cryosparc_worker/install.sh has the nvidia driver installed

Nvidia driver is definitely installed:
[cryosparc_user@lomatia cryosparc_worker]$ nvidia-smi
Thu Mar 2 15:12:45 2023
±----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … On | 00000000:05:00.0 Off | N/A |
| 27% 41C P8 7W / 180W | 191MiB / 8192MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce … On | 00000000:06:00.0 Off | N/A |
| 27% 42C P8 7W / 180W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 NVIDIA GeForce … On | 00000000:09:00.0 Off | N/A |
| 29% 39C P8 7W / 180W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 NVIDIA GeForce … On | 00000000:0A:00.0 Off | N/A |
| 30% 41C P8 7W / 180W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

The original install.sh was run way back when V3.X fist came out. It’s been working since then until now through many updates. Sorry, but I can’t tell you the exact runstring from back then unless it’s cached in a log file somewhere. Is it? Of course I know most all the critical alues like ssdpath etc. Not sure what we were using as cuda path way back then. (We’v recompiled with new cuda versions several times). If I recall correctly, the instructions back then did not include the --standalone option, so we installed both master and worker individually even though they’re on the same system.

Currently I’m just running “cryosparcm update --override” or, to try to roll back to 4.1.2 “cryosparcm update --version=v4.1.2”. (They former fails with the error I sent when I switch to the worker and do ./bin/cryosparcw update --override; the latter completes with no error, but the gpu test fails saying pycuda is missing).

I’d prefer to avoid it if possible, since it requires reconfiguring and re-adding all the users, but I’m willing to start over with a clean install if necessary. However, the instructions caution against 2 installs on the same system using the same port number. Is that safe if you’re careful to ensure that the 2 instances are never started at the same time? Or would we have to choose a new port for the new install?

Thanks!

If you successfully updated your master installation to v4.2.0, and

cat /home/cryosparc_user/V3.X/cryosparc_worker/version

also displays the same version v4.2.0, you can try

/home/cryosparc_user/V3.X/cryosparc_worker/bin/cryosparcw forcedeps 2>&1 | tee forcedeps_20230302.log

and post here the error messages that you may encounter. If no errors occur, you may again try

/home/cryosparc_user/V3.X/cryosparc_worker/bin/cryosparcw install-3dflex

I’d tried rolling back to 4.1.2, so I redid the update to 4.2.0. Both master and worker succeeded, but then at the end I got an error:

Done updating all worker nodes.
If any nodes failed to update, you can manually update them.
Cluster worker installations must be manually updated.
To update manually, copy the cryosparc_worker.tar.gz file into the
cryosparc worker installation directory, and then run
$ bin/cryosparcw update
from inside the worker installation directory.
/home/cryosparc_user/V3.X/cryosparc_master/bin/cryosparcm: line 2205: unexpected EOF while looking for matching `"’

I looked at cryosparcm (not exhaustively), but don’t find a mismatched quote. In any event, both master and worker version show 4.2.0 and are running. The launch and ssd tests pass with no problem. Reinstalling the Flex dependencies fails. The first error and subsequent warnings are:

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [2500 lines of output]
***************************************************************
*** WARNING: nvcc not in path.
*** May need to set CUDA_INC_DIR for installation to succeed.
***************************************************************
*************************************************************
*** I have detected that you have not run configure.py.
*************************************************************
*** Additionally, no global config files were found.
*** I will go ahead with the default configuration.
*** In all likelihood, this will not work out.

Needless to say, it does not work out. There are various complaints about pycuda, eventually leading to the pycuda.h include error. I was going to upload the full log file, but it’s not an allowed type. It’s also pretty large (5153 lines) to paste here, although I’ll do that if you like. Meanwhile, I’ll respond to you via email attaching the complete file.

Thanks again!

Oops, I see email replies are disabled. Please let me know how you like me to get this log to you.

In the context of updates from CryoSPARC v4.1.2, this message can be ignored.

I think it’s time to bite the bullet and give up updating this installation. Something has gotten corrupted and it’s taking too much time and effort on both our parts to try to track it down.

Any reason not to shutdown the old version, make really sure nothing is running, hide it so nobody else can restart, and then try a fresh 4.2 install to a new directory but using the old database and ports?

Thanks,
-jh-

You could try that, but I am concerned that the underlying issue might be outside CryoSPARC, and consequently might not be fixed even by a clean CryoSPARC installation.

… looked no different from a successful installation of 3D Flex dependencies.

I briefly searched on the internet for simultaneous mentions of /usr/include/stdc-predef.h and fatal error: cuda_runtime.h: No such file or directory. I cannot tell whether a more thorough search would enable some actionable conclusions.
In case you have full control of the computer(s) in question:

Are you planning to update the OS in the foreseeable future?

I agree that the problem might be outside cryosparc. A clean install should tell us if that’s the case. If it works, we’re done. If not, we’re no worse off than currently.

I tried to send the forcedeps log to you at the address you briefly posted, but it bounced back saying there was a server problem. Please let me know when / if / where you would like it.

I have full root permissions on this system, but not sole authority. There’s been talk of updating to RHEL8 or 9 for a long time, but it keeps getting putting it off in favor of higher priority stuff; installing the new OS is easy, but reinstalling and reconfiguring all the apps etc. is painful. If it turns out that 4.2 + Flex simply won’t run on 7.9, that might finally push us over the edge.

FWIW, I’ve noticed multiple developers and admins complaining over the last 6 months about install issues with conda or pip for various (non-cryosparc) packages with complicated dependencies. Flex certainly seems to fall in that category.

Thanks,
-jh-

Phew! I think we’re finally there. Doing a fresh 4.2 install with a new DB worked. Installing the Flex dependencies still gave the PyTorch or Cuda not found message, but all the installation tests work, including the PyTorch and Tensorflow options.

At that point, I went ahead and did another 4.2 install from scratch, pointing it to the old DB. That worked pretty much as before. The only glitch was a registration problem:

ERROR: This hostname is already registered! Remove it first.

Jobs would get scheduled but not actually run at this point. Rerunning connect manually with the full hostname (rather than localhost which didn’t work) and --update seems to have cured this, and now everything seems to run.

Thanks for your help!