Cryosparc cannot determine topaz version

Versions running:
cryosparc: 3.3.1, using python 3.7
topaz: variable (I’ve tried 0.2.3, 0.2.4, and 0.2.5)
CUDA: 10.2

I am trying to run a Topaz Train job on cryosparc and the job is failing at the stage of checking the version of topaz. The error message reads as follows:

[CPU: 227.4 MB] Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main
File “/home/xxxxxlab/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 115, in run_topaz_wrapper_train
topaz_version = utils.get_topaz_version(topaz_exec_path)
File “/home/xxxxxlab/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 126, in get_topaz_version
f’Cannot determine topaz version, command “{topaz_exec_path} --version” did not produce valid output: “{topaz_version}”’
AssertionError: Cannot determine topaz version, command “/home/xxxxxlab/anaconda3/envs/topaz7/bin/topaz --version” did not produce valid output: “ImportError: /home/xxxxxlab/anaconda3/envs/topaz7/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent”

Putting the command “/home/xxxxxlab/anaconda3/envs/topaz7/bin/topaz --version” into the command line returns the following:

File “/home/xxxxxlab/anaconda3/envs/topaz7/bin/topaz”, line 33, in
sys.exit(load_entry_point(‘topaz-em==0.2.5’, ‘console_scripts’, ‘topaz’)())
File “/home/xxxxxlab/anaconda3/envs/topaz7/lib/python3.7/site-packages/topaz/main.py”, line 60 in main
import topaz.commands.train
File “/home/xxxxxlab/anaconda3/envs/topaz7/lib/python3.7/site-packages/topaz/commands/train.py”, line 12, in
import torch
File “/home/xxxxxlab/anaconda3/envs/topaz7/lib/python3.7/site-packages/torch/init.py”, line 202, in
from torch._C import * # noqa: F403
ImportError: /home/xxxxxlab/anaconda3/envs/topaz7/lib/python3.7/site-packages/torch/lib/liptorch_cpu.so: undefined symbol: iJIT_NotifyEvent

Previously I was having another issue with topaz 0.2.5 train jobs failing at another later step (unable to find the image_list_train.txt file), so I’ve tried installing older versions of topaz to try to fix the issue. Topaz version 0.2.3 has worked before for us, I believe with this same version of cryosparc. I’m not sure what is different about how I’ve tried to install it, but everything I’ve tried has resulted in the above error.

The path to the topaz executable is /home/xxxxxlab/anaconda3/envs/topaz7/bin/topaz
Where it says “topaz7” is the name of the environment I’ve made–there have been several attempts using different versions of topaz/python.

When I make the environment, I have tried specifying python versions 2.7, 3.6, and 3.7. The 2.7 did not seem to work well at all but the other two have at least allowed install.
When I install topaz, I specify either version 0.2.3 or 0.2.5 (0.2.3 has worked historically and 0.2.5 has worked for us outside of cryosparc), plus CUDA toolkit for version 10.2. See example below:

conda create -n topazX python=3.7
conda activate topazX
conda install topaz=0.2.3 cudatoolkit=10.2 -c tbepler -c pytorch

I’ve also tried adding -c nvidia to the end of the install command, but same result.

All combinations of topaz/python I’ve tried have resulted in the same error message, so I think something else must be wrong.

Hello,

You seem to have different problems with some of your installations of topaz not working well.

But even with a working installation of topaz you might run into the version problem because cryosparc expects a version number strictly in the form of X.Y.Z with X, Y and Z numbers (this was introduced in a recent version), while topaz --version produces something like TOPAZ 0.2.5. I reported this problem to the developer of topaz and also suggested a fix, but this hasn’t been included in topaz yet.

Until this is corrected in topaz, you can work around this problem with the following wrapper script:

#!/usr/bin/env bash

# If this script is called with --version
# return a string compatible with what cryosparc expects
# and exit.
if [[ $@ == '--version' ]]; then
	echo '0.2.5'
	exit 0
fi

# In every other case, load the conda environment containing topaz
# and pass the arguments to topaz.
source /path/to/your/miniconda3/etc/profile.d/conda.sh
conda activate topaz-0.2.5

topaz $@

conda deactivate

Make sure to adjust the path to your topaz environment and the name of the environment in the conda activate command to what is on your system. Then save this script as /somewhere/convenient/topaz-wrapper.sh and make it executable. Then in your cryosparc project, write the path to this script in the box “Path to Topaz Executable”, and it should work.

I hope this helps!

1 Like

Hi Guillaume, thanks so much for your advice. I tried it with the path to the topaz environment written a couple different ways (I wasn’t sure if the path should lead to the environment folder or all the way to the topaz executable within the /bin directory) but both ways produced this error message, so I think I’ve done something wrong:

Traceback (most recent call last):
File “/home/xxxxxlab/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 100, in get_topaz_version
process = subprocess.Popen([topaz_exec_path, ‘–version’], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True, text=True)
File “/home/xxxxxlab/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/subprocess.py”, line 800, in init
restore_signals, start_new_session)
File “/home/xxxxxlab/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/subprocess.py”, line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: ‘/home/xxxxxlab/topaz-wrapper.sh’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main
File “/home/xxxxxlab/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 115, in run_topaz_wrapper_train
topaz_version = utils.get_topaz_version(topaz_exec_path)
File “/home/xxxxxlab/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 102, in get_topaz_version
raise OSError(‘Input Topaz executable path does not exist.’)
OSError: Input Topaz executable path does not exist.

Is it obvious from these error messages what I might have done wrong? I also tried to make the wrapper file executable with the command chmod -x ~/topaz-wrapper.sh but that didn’t seem to help either.

Hi,

To solve this problem, I first suggest you install topaz from scratch in a new conda environment. Following the directions under the “Recommended” section from the README has always been reliable for me.

Once you have done this, confirm that it works:

$ conda activate topaz # adjust the name of the environment if you named it differently
$ which topaz
# should return the path to the topaz executable in the conda environment
# this will likely be a different path on your system than on mine
$ topaz --version
TOPAZ 0.2.5 # or whatever version you have, but I recommend installing the latest one

Then, adjust the paths correctly in the wrapper script I posted before, make this script executable, and point your cryosparc project to this script.

You don’t need to point to the topaz executable with the full path if the conda environment is active. And I am not sure the executable will work properly and find the correct version of python and other dependencies if it is run without having activated the conda environment first. This is why you need the wrapper script: it will take care of activating the correct conda environment (the other reason is because cryosparc only lets you run one command, so you need to put all the commands in the wrapper script and have cryosparc run this script instead).
In this script, you need to source the etc/profile.d/conda.sh file from your anaconda or miniconda installation (the path to which will be specific to your system, I can’t guess it). Once you have this line correctly set up, and the correct name for the environment containing topaz in the conda activate line, then the rest of the wrapper script I posted should work.

Cryosparc needs both read and execute permission on the wrapper script, and it is telling you it doesn’t have them (or you entered the path to it incorrectly in your cryosparc project, double-check this too).

This is how it’s done:

$ chmod go+rx /path/to/topaz-wrapper.sh

chmod to change permissions on the file you point it to, go means “for the group and other accounts”, + means add the permissions (- means remove them), and rx means “read and execute permissions”. More details in the manual page for chmod (also accessible on your system with the command man chmod).

Once you have confirmed that topaz works by itself, and cryosparc can use it through the wrapper script, I suggest your remove all the other conda environments that have non-functioning installations of topaz.

I hope this helps!

Also, I forgot to mention that my fix for this version check was just merged into topaz (see tbepler/topaz#177 and tbepler/topaz#178). So the next released version should contain this and will no longer require the if [[$@ == '--version' ]] block at the beginning of the wrapper script. But I don’t know when they will make a new release in conda, so keep this block until you have the next version of topaz.

@akw If

  • you have sufficient control over the computer(s) where your CryoSPARC instance is running
  • and your GPUs are recent enough models

may I suggest:

  1. ensuring the nvidia drivers are at version 520.61.05 or above. In case a driver upgrade is required, ensure the computer is rebooted after the driver upgrade and
    nvidia-smi correctly reports installed GPUs and driver version
  2. then, installing topaz in a new conda environment, as described in Topaz Train - TypeError: concat() takes 1 positional argument but 2 were given - #17 by wtempel
  3. then, confirming topaz function on the command line
  4. then, upgrading CryoSPARC

Hi all,

Thank you so much for your time and effort. Unfortunately I am still having the same problems despite trying the suggested advice. I think I’ve set up the wrapper script and the path to it in the cryosparc job correctly, but cryosparc still is saying the same thing about not having permission to access the wrapper .sh file and then, subsequently, that the executable path does not exist. I also tried to give it permission with chmod but it did not seem to change anything (see cryosparc’s output below):

File “/home/xxxxx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 100, in get_topaz_version
process = subprocess.Popen([topaz_exec_path, ‘–version’], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds=True, text=True)
File “/home/xxxxx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/subprocess.py”, line 800, in init
restore_signals, start_new_session)
File “/home/xxxxx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/subprocess.py”, line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
PermissionError: [Errno 13] Permission denied: ‘/home/xxxxx/topaz-wrapper.sh’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main
File “/home/xxxxx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 115, in run_topaz_wrapper_train
topaz_version = utils.get_topaz_version(topaz_exec_path)
File “/home/xxxxx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 102, in get_topaz_version
raise OSError(‘Input Topaz executable path does not exist.’)
OSError: Input Topaz executable path does not exist.

I have a few more basic questions to see if I can figure out what is wrong on our end:

  1. Regarding the wrapper script: Do I need to add a line specifying where the topaz.sh executable is (within the conda environment in which topaz is installed)? As far as I understand, the wrapper script is telling cryosparc the location of the conda.sh file and then saying to activate a specified topaz conda environment, but then I’m not sure how cryosparc is supposed to know where topaz.sh if the “path to executable” I’m providing to cryosparc is just the path to the topaz-wrapper.sh, which doesn’t include instructions on how to reach topaz.sh. Maybe this is particular to how our files are organized on our workstation. Apologies if this question is unclear.

  2. More generally, are we supposed to be starting cryosparc after activating the conda environment we created for topaz, or should we start cryosparc from the base environment? I have heard different things from different people on this question. To be clear, I’ve tried this whole process both ways and neither has changed the above output, but I’d like to be clear on this so I don’t have to keep trying it both ways.

Thank you again for your help. I really appreciate a second set of eyes on it.

Is your CryoSPARC instance a “standalone”, combined master/worker installation?

What are the outputs of the commands

ls -l /home/xxxxx/topaz-wrapper.sh
cat /home/xxxxx/topaz-wrapper.sh

on the relevant CryoSPARC worker node, run as the Linux user that also owns CryoSPARC job processes?

In case you are speaking of

cryosparcm start

the command need not be run inside a conda environment. Management of conda environments should be left to CryoSPARC and, potentially, the topaz wrapper script.

Activating the conda environment is what makes the topaz executable discoverable without spelling out its full path. This works because conda activate adds the adequate path to the PATH environment variable.
You can test this in a new shell session like so:

$ echo $PATH
# will list a series of paths specific to your system, but in which you shouldn't see anything related to topaz yet
$ conda activate topaz # adjust the name of the environment to match the one on your system
$ echo $PATH
# will list a series of paths again, but this time the path to the bin directory in the topaz environment in your conda installation should also be listed

You can also test that the wrapper script works, again from a new shell session with no conda environment active:

$ /path/to/topaz-wrapper.sh --help
# will take a couple seconds to respond because it activates the conda env first
# and python is also a bit slow to start
# but then you should get the help message from topaz

If calling the wrapper script this way works, it should work fine for cryosparc too, provided that the Linux account running cryosparc has read and execute permissions on this file, and maybe also read permission on all the directories above this file.

On my cryosparc instance (on a shared cluster), the wrapper script I posted above works as it should.

  • Yes, it is a standalone installation.

  • the list command output oddly is:

ls: cannot access ‘/home/xxxxx/topaz-wrapper.sh’: No such file or directory

even though I can clearly see that file listed at that location.

  • the cat command output is displaying the text of the topaz-wrapper.sh file:
#!/usr/bin/env bash

# If this script is called with --version
# return a string compatible with what cryosparc expects
# and exit.
# I am editing the below to 0.2.3 from the original wrapper provided by Guillaume, which said '0.2.5'.
if [[ $@ == '--version' ]]; then
	echo '0.2.3'
	exit 0
fi

# In every other case, load the conda environment containing topaz
# and pass the arguments to topaz.
source /home/xxxxx/anaconda3/etc/profile.d/conda.sh
conda activate topaz2
# Change the environment name above (topazX) as necessary to match the conda environment you're launching. topaz2 contains python 3.6, topaz 0.2.3, and is compiled for CUDA 10.2

topaz $@

conda deactivate

(I added a couple lines of comments to the text to clarify for future users what I was doing)

The fact that you can’t list the contents of /home/xxxxx from the cryosparc account suggests that you probably need to add read permission for other accounts on this directory (from the account that owns it).

Interestingly the “ls” command and “ls -l” work fine but when combined with the absolute path to the wrapper script, then it says ‘cannot access’. Without specifying the file path, ls and ls -l display the list I would expect them to list, which includes the topaz-wrapper.sh file. Also, there is only one account on this workstation. Did I maybe misinterpret the command wtempel asked me to input?

If you run ls -l /home/xxxxx, what do you get? This should list the contents of this directory, and the leftmost part will show you which permissions each file has. If the permissions for the topaz wrapper script look different than rwxr-xr-x (meaning all permissions for the owner, and read+execute permissions for the group and other accounts), then you need to adjust them.

Hi again,

When I looked at it again, I found that the permissions on the wrapper had not been correctly modified. With that corrected, the job now actually runs, but it fails within a few seconds for a different reason. Can you provide any guidance on how to interpret this error message?

[CPU: 229.9 MB]  
Training command complete.

[CPU: 229.9 MB]  Training done in 0.300s.

[CPU: 229.9 MB]  --------------------------------------------------------------
[CPU: 229.9 MB]  Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main
  File "/home/xxxxx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py", line 357, in run_topaz_wrapper_train
    assert len(glob.glob(os.path.join(model_dir, '*'))) > 0, "Training failed, no models were created."
AssertionError: Training failed, no models were created.

Please can you post lines that describe the training process and precede this line in the Event Log.

Yes, thank you for taking a look:


[CPU: 229.9 MB]  raise DistributionNotFound(req, requirers)

[CPU: 229.9 MB]  pkg_resources.DistributionNotFound: The 'future' distribution was not found and is required by torch

[CPU: 229.9 MB]  
Dataset splitting command complete.

[CPU: 229.9 MB]  Train-test splitting done in 0.296s.

[CPU: 229.9 MB]  --------------------------------------------------------------

[CPU: 229.9 MB]  Starting training...

[CPU: 229.9 MB]  Starting training by running command /home/xxxx/topaz-wrapper.sh train --train-images /mnt/disks/data-1/20240222_cryosparc_Screening/P20/J79/image_list_train.txt --train-targets /mnt/disks/data-1/20240222_cryosparc_Screening/P20/J79/topaz_particles_processed_train.txt --test-images /mnt/disks/data-1/20240222_cryosparc_Screening/P20/J79/image_list_test.txt --test-targets /mnt/disks/data-1/20240222_cryosparc_Screening/P20/J79/topaz_particles_processed_test.txt --num-particles 25 --learning-rate 0.0002 --minibatch-size 128 --num-epochs 10 --method GE-binomial --slack -1 --autoencoder 0 --l2 0.0 --minibatch-balance 0.0625 --epoch-size 5000 --model resnet8 --units 32 --dropout 0.0 --bn on --unit-scaling 2 --ngf 32 --num-workers 8 --cross-validation-seed 1412226835 --radius 3 --num-particles 25 --device 0 --no-pretrained --save-prefix=/mnt/disks/data-1/20240222_cryosparc_Screening/P20/J79/models/model -o /mnt/disks/data-1/20240222_cryosparc_Screening/P20/J79/train_test_curve.txt

[CPU: 229.9 MB]  Traceback (most recent call last):

[CPU: 229.9 MB]  File "/home/xxxxx/anaconda3/envs/topaz2/bin/topaz", line 6, in <module>

[CPU: 229.9 MB]  from pkg_resources import load_entry_point

[CPU: 229.9 MB]  File "/home/xxxxx/anaconda3/envs/topaz2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3267, in <module>

[CPU: 229.9 MB]  @_call_aside

[CPU: 229.9 MB]  File "/home/xxxxx/anaconda3/envs/topaz2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3251, in _call_aside

[CPU: 229.9 MB]  f(*args, **kwargs)

[CPU: 229.9 MB]  File "/home/xxxxx/anaconda3/envs/topaz2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3280, in _initialize_master_working_set

[CPU: 229.9 MB]  working_set = WorkingSet._build_master()

[CPU: 229.9 MB]  File "/home/xxxxx/anaconda3/envs/topaz2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 582, in _build_master

[CPU: 229.9 MB]  ws.require(__requires__)

[CPU: 229.9 MB]  File "/home/xxxxx/anaconda3/envs/topaz2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 899, in require

[CPU: 229.9 MB]  needed = self.resolve(parse_requirements(requirements))

[CPU: 229.9 MB]  File "/home/xxxxx/anaconda3/envs/topaz2/lib/python3.6/site-packages/pkg_resources/__init__.py", line 785, in resolve

[CPU: 229.9 MB]  raise DistributionNotFound(req, requirers)

[CPU: 229.9 MB]  pkg_resources.DistributionNotFound: The 'future' distribution was not found and is required by torch
[CPU: 229.9 MB]  
Training command complete.

I suspect this topaz installation is broken. For suggestions, please see this earlier post.