Launch stall error messages

Hi devs!

For some reason right now, many - but not all - of the jobs I try to run do not get past the launch stage. I’ve been able to identify a couple possible reasons why this could happen, but it’s sometimes tough to distinguish before the job fails. I’m one of many users on my cluster, and if there are just too many jobs running, I presume that jobs stall in the launch phase. However, sometimes, my launches stall because I have tried to use the wrong inputs, or my inputs depend on other jobs that have been deleted. I’m starting to get the hang of recognizing which is which, but in some cases it remains ambiguous.

Error messages are notoriously difficult to create, but I was wondering whether it would be possible to distinguish between launch stall problems quickly, so that if it’s a problem at my end, I’m not left wasting time waiting.

Also, are there other reasons I haven’t considered why my jobs might be stalling in the launch step?

Thank you so much,
Kate

Hi Kate, it’s unusual that jobs stall in “Launched” state due to invalid inputs/parameters, generally in these cases the job should run but go into “Failed” status with a clearer error. Jobs getting stuck in “Launched” status imply a configuration issue related to cryoSPARC’s scheduler.

To help further recognize why your jobs are stalling, you have two additional avenues to check: The cryoSPARC scheduler logs and the internal job log.

Shortly after a job stalls, run one of these two commands via command line (substitute X and Y with the Project number and Job number of the stuck job, respectively).

cryosparcm log command_core
cryosparcm joblog PX JY

The first command shows the scheduler logs. The second shows the internal job log.

Feel free to post any errors unusual-looking output you see from these commands for interpretation.

Hope that helps,

Nick

Hi Nick,
I just updated to v3.1, I got the same problem. the job has been launched and stalled there for a while.
I ran the command “cryosparcm joblog PX JY”, here are the messages I got.
Am I running with python issue?

================= CRYOSPARCW =======  2021-02-03 17:41:06.478172  =========
Project P5 Job J261
Master luks-micr-141572 Port 39002
===========================================================================
========= monitor process now starting main process
Process Process-1:
Traceback (most recent call last):
  File "/opt/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/opt/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "cryosparc_worker/cryosparc_compute/run.py", line 30, in cryosparc_compute.run.main
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/__init__.py", line 8, in <module>
    from . import jobregister
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/jobregister.py", line 33, in <module>
    from . import common
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/common.py", line 358, in <module>
    from ..util import paramdict
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/util/__init__.py", line 103, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'
MAINPROCESS PID 35007
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cryosparc_worker/cryosparc_compute/run.py", line 159, in cryosparc_compute.run.run
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/__init__.py", line 8, in <module>
    from . import jobregister
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/jobregister.py", line 33, in <module>
    from . import common
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/common.py", line 358, in <module>
    from ..util import paramdict
  File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/util/__init__.py", line 103, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

@xliu it looks like the dependency installation didn’t finish all the way through, please try the following:

  1. Via command line, log into a machine with GPUs onto which you installed the cryosparc_worker or cryosparc2_worker bundle
  2. Use the cd command to navigate to it
  3. Run the following command to reinstall dependencies:
    ./bin/cryosparcw forcedeps
    

Once the installation finishes, run that job again. Let me know if you run into any trouble with any of this.

Hi!

I thought the issue had resolved itself, but it’s happening again, sadly! Reading back through this thread, it seems relevant to mention that I’m using the browser version of cryoSPARC through a university’s server, and I don’t have access to the computers on which cryoSPARC is installed. I’ve emailed the person who does, but I’m wondering: is there anything I could be doing wrong that causes this? I’m using the same parameters that I’ve used for other projects and which have worked in the past.

Thank you so much!
Best,
Kate