Ab-initio stop without error message

Hi there,

I had some issues during the Ab-initio reconstruction that suddenly failed but I have no error message. I was digging into the discussion, and I found some other users that have faced a similar issue using RTX3090. Any idea what it could be and how to fix it?

Thanks,

cryosparcm joblog P5 J20
========= sending heartbeat
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
========= sending heartbeat
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
**** handle exception rc
set status to failed
========= main process now complete.
========= monitor process now complete.
Waiting for data… (interrupt to abort)

Hi @vitorserrao,

What version of CUDA did you install cryoSPARC with? Can you also paste the output of the file cryosparc_worker/config.sh and the command nvidia-smi?

Hi @stephan,

I used cuda-11.1 and cryosparc v3.2.0. Below are the outputs:

nvidia-smi:
±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01 Driver Version: 460.73.01 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 3090 Off | 00000000:1A:00.0 Off | N/A |
| 0% 42C P8 23W / 350W | 1MiB / 24268MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 GeForce RTX 3090 Off | 00000000:68:00.0 Off | N/A |
| 0% 43C P8 21W / 350W | 23MiB / 24267MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

config.sh:

export CRYOSPARC_LICENSE_ID=“OUR_CORRECT_LICENSE_NUMBER”
export CRYOSPARC_USE_GPU=true
export CRYOSPARC_CUDA_PATH="/usr/local/cuda"
export CRYOSPARC_DEVELOP=false

Thanks for your response. CUDA looks good.
Can you try this job on a different GPU or different dataset? Does the ab-initio job die every time?

It worked well on the T20 tutorial that I used to check the installation. Either cryosparc or cryosparc-live worked nicely and I haven’t had any issues. It’s only dying with this particular dataset (until now). I also tried different initial models (1; 2; 4 and 6) and it’s always dying without an error message.

Hi @vitorserrao,

Thanks for your response. Can you provide screenshots or copy+paste the outputs of the “Overview” tab of the dead Ab-Initio job, that includes where it dies?

Hi @stephan,

Sure, this is the final part of it. Looking at cryosparcm joblog there was no apparent error message, and I just found it after a very long search on the overview (I should have found this earlier, my apologies).
It did run for a while, and then I get this message, however, there is still free space on the cards and disk.

Hi @vitorserrao,

Can you add export CRYOSPARC_NO_PAGELOCK=true to cryosparc_worker/config.sh, then re-run the job?

Hi @stephan,

I already figured it out. We had some memory allocation problems that were already solved.
Thanks anyway!

Vitor

Hi @vitorserrao

How did you solve memory allocation problems??

Hi @nikydna,

It has been awhile, but I think I downsampled the particles and restarted the job.