Topaz GPU issue -pytorch issue?

mjmcleod64 · September 20, 2023, 3:23pm

Hi all,

I am trying to install topaz on ubuntu.
CUDA version is 11.7

First attempt:
conda create -n topaz python=3.6
conda activate topaz
conda install topaz=0.2.4 cudatoolkit=11.2 -c tbepler -c pytorch -c conda-forge

which topaz
/home/mjmcleod/eman2-sphire-sparx/envs/topaz/bin/topaz

Put this path into cryosparc and ran.
It would run, but would not use GPU.

Attempt 2: Realized I had the wrong python version as well as cuda version
conda remove -n topaz --all
python3 --version
Python 3.9.16
conda create -n topaz python=3.9.16
conda activate topaz
conda install topaz=0.2.4 cudatoolkit=11.7 -c tbepler -c pytorch -c conda-forge

In log file: Preprocessing now says:
Preprocessing over 8 processes…
/home/mjmcleod/eman2-sphire-sparx/envs/topaz/bin/topaz:6: DeprecationWarning: pkg_resources is deprecated as an API. See Package Discovery and Resource Access using pkg_resources - setuptools 68.2.2.post20230912 documentation
from pkg_resources import load_entry_point

With error:
Receive this error: T**raceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “/home/mjmcleod/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 307, in run_topaz_wrapper_train
utils.run_process(split_command)
File “/home/mjmcleod/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 98, in run_process
assert process.returncode == 0, f"Subprocess exited with status {process.returncode} ({str_command})"
AssertionError: Subprocess exited with status 1 (/home/mjmcleod/eman2-sphire-sparx/envs/topaz/bin/topaz train_test_split --number 60 --seed 1593130959 --image-dir /media/mjmcleod/870qvo_2/eh12_10pep/CS-eh12-10mmpep/J190/preprocessed /media/mjmcleod/870qvo_2/eh12_10pep/CS-eh12-10mmpep/J190/topaz_particles…)

Looking at what is in the topaz conda environment:
conda list
and this is the pytorch installed which looks like its the cpu only.
pytorch conda-forge/linux-64::pytorch-2.0.0-cpu_generic_py39h08b6d46_2

It looks like the first issue was the pytorch version wasnt compatible with CUDA. Now, after reinstalling with proper CUDA version, there is another issue which I am unsure how to proceed. Any insight would be appreciated.
Matt

wtempel · September 20, 2023, 9:20pm

You may want to instead create the environment exactly as described at GitHub - tbepler/topaz: Pipeline for particle picking in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Also featuring micrograph and tomogram denoising with DNNs.

conda create -n topaz python=3.6

then follow instructions at GitHub - tbepler/topaz: Pipeline for particle picking in cryo-electron microscopy images using convolutional neural networks trained from positive and unlabeled examples. Also featuring micrograph and tomogram denoising with DNNs.
(after activating the newly created environment)

conda install topaz -c tbepler -c pytorch

I am not sure whether or not it is necessary to have a GPU and nvidia driver installed on the computer where you run the conda install command.
The commands were copied from the github repository 2023-09-20, but may be outdated by the time you read this.
After topaz installation, specify a wrapper script as Path to Topaz executable and retry the topaz job.

mjmcleod64 · September 21, 2023, 9:13pm

Hi,

I reinstalled but used:
conda install cudatoolkit=11.7 topaz -c tbepler -c pytorch

since without cudatoolkit the program environment wouldnt solve over 24 hours (I assume this is process should take less time). With cuda specified it was about 20 minutes.

I made the topaz.sh file and saved in home directory and,
chmod +x topaz.sh

the topaz.sh file contains
#!/usr/bin/env bash
if command -v conda > /dev/null 2>&1; then
conda deactivate > /dev/null 2>&1 || true # ignore any errors
conda deactivate > /dev/null 2>&1 || true # ignore any errors
fi
unset _CE_CONDA
unset CONDA_DEFAULT_ENV
unset CONDA_EXE
unset CONDA_PREFIX
unset CONDA_PROMPT_MODIFIER
unset CONDA_PYTHON_EXE
unset CONDA_SHLVL
unset PYTHONPATH
unset LD_PRELOAD
unset LD_LIBRARY_PATH

source /home/mjmcleod/miniconda3/etc/profile.d/conda.sh
conda activate topaz
exec topaz $@

Try to rerun topaz through cryosparc and receive this error:
raceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “/home/mjmcleod/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 111, in run_topaz_wrapper_train
topaz_version = utils.get_topaz_version(topaz_exec_path)
File “/home/mjmcleod/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 135, in get_topaz_version
assert semver.VersionInfo.isvalid(topaz_version_for_validation),
AssertionError: Cannot determine topaz version, command “/home/mjmcleod/topaz.sh --version” did not produce valid output: “/home/mjmcleod/topaz.sh: line 19: exec: topaz: not found”

In terminal:
conda activate topaz
topaz --version
TOPAZ 0.2.5a

Do you have any suggestions? Not quite sure why its not reading topaz.

wtempel · September 22, 2023, 12:01am

I am not sure at all. As a sanity check, please can you run this command sequence and post the outputs:

conda activate topaz
topaz --version
conda info --base

mjmcleod64 · September 22, 2023, 12:30am

TOPAZ 0.2.5a
/home/mjmcleod/eman2-sphire-sparx

Is this telling me that conda is actually in eman2-sphire-sparx? and not in /home/miniconda3?

mjmcleod64 · September 22, 2023, 12:33am

OK I switched the path in topaz.sh to eman2-sphire-sparx/etc/conda/profile.d/conda.sh and it is running. I can update if it is using GPU once it finishes preprocessing.

mjmcleod64 · September 22, 2023, 10:58am

Looks to be working. Thanks for the help! Now onto optimizing the trained model…