Hi there,
I updated our cluster from v4.7 to v5.0 yesterday and found that we were no longer able to queue jobs to our Slurm scheduler. The job would hit the “Launched” status and hang.
The output from job.log is below:
ERROR: ld.so: object '/home/exx/software/cryosparc/cryosparc_master/.pixi/envs/master/lib/libpython3.12.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/home/exx/software/cryosparc/cryosparc_master/.pixi/envs/master/lib/libpython3.12.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
ERROR: ld.so: object '/home/exx/software/cryosparc/cryosparc_master/.pixi/envs/master/lib/libpython3.12.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
/home/exx/software/cryosparc/cryosparc_worker/bin/cryosparcw: line 44: /home/exx/software/cryosparc/cryosparc_master/config.sh: No such file or directory
/home/exx/software/cryosparc/cryosparc_worker/bin/cryosparcw: line 44: install_error: command not found
I was surprised to see the .pixi directory in the errors, because there is no mention of a switch from conda to pixi in the change notes that I could see. Regardless, after doing some poking around online (and consulting the LLM oracles), it seems that that the issue we were running into comes down to differences in how environment variables are handled between the old system and the new one.
By adding the lines:
unset CRYOSPARC_CONFIG_DIR
unset LD_PRELOAD
unset PYTHONPATH
to the cluster_script.sh, we were then able to submit jobs via Slurm as before.