For some reason right now, many - but not all - of the jobs I try to run do not get past the launch stage. I’ve been able to identify a couple possible reasons why this could happen, but it’s sometimes tough to distinguish before the job fails. I’m one of many users on my cluster, and if there are just too many jobs running, I presume that jobs stall in the launch phase. However, sometimes, my launches stall because I have tried to use the wrong inputs, or my inputs depend on other jobs that have been deleted. I’m starting to get the hang of recognizing which is which, but in some cases it remains ambiguous.
Error messages are notoriously difficult to create, but I was wondering whether it would be possible to distinguish between launch stall problems quickly, so that if it’s a problem at my end, I’m not left wasting time waiting.
Also, are there other reasons I haven’t considered why my jobs might be stalling in the launch step?
Hi Kate, it’s unusual that jobs stall in “Launched” state due to invalid inputs/parameters, generally in these cases the job should run but go into “Failed” status with a clearer error. Jobs getting stuck in “Launched” status imply a configuration issue related to cryoSPARC’s scheduler.
To help further recognize why your jobs are stalling, you have two additional avenues to check: The cryoSPARC scheduler logs and the internal job log.
Shortly after a job stalls, run one of these two commands via command line (substitute X and Y with the Project number and Job number of the stuck job, respectively).
Hi Nick,
I just updated to v3.1, I got the same problem. the job has been launched and stalled there for a while.
I ran the command “cryosparcm joblog PX JY”, here are the messages I got.
Am I running with python issue?
================= CRYOSPARCW ======= 2021-02-03 17:41:06.478172 =========
Project P5 Job J261
Master luks-micr-141572 Port 39002
===========================================================================
========= monitor process now starting main process
Process Process-1:
Traceback (most recent call last):
File "/opt/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/opt/cryosparc/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "cryosparc_worker/cryosparc_compute/run.py", line 30, in cryosparc_compute.run.main
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/__init__.py", line 8, in <module>
from . import jobregister
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/jobregister.py", line 33, in <module>
from . import common
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/common.py", line 358, in <module>
from ..util import paramdict
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/util/__init__.py", line 103, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
MAINPROCESS PID 35007
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "cryosparc_worker/cryosparc_compute/run.py", line 159, in cryosparc_compute.run.run
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/__init__.py", line 8, in <module>
from . import jobregister
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/jobregister.py", line 33, in <module>
from . import common
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/jobs/common.py", line 358, in <module>
from ..util import paramdict
File "/opt/cryosparc/cryosparc2_worker/cryosparc_compute/util/__init__.py", line 103, in <module>
import requests
ModuleNotFoundError: No module named 'requests'
I thought the issue had resolved itself, but it’s happening again, sadly! Reading back through this thread, it seems relevant to mention that I’m using the browser version of cryoSPARC through a university’s server, and I don’t have access to the computers on which cryoSPARC is installed. I’ve emailed the person who does, but I’m wondering: is there anything I could be doing wrong that causes this? I’m using the same parameters that I’ve used for other projects and which have worked in the past.