All cryosparc jobs fail

Dear all, I keep encountering a problem where all my cryosparc jobs fail with the same error message (see below). This was working completely normally until today when we restarted the NAS.

I am running Cryosparc version 4.0.1. We have one computer running the cryosparc master, and 6 computers running jobs.
The master computer is running Ubuntu 22.04.1 LTS and has the following config:
Processor - intel I7-6700 CPU @ 3.4 Ghz x 8
Memory - 32.0 GiB
Graphics - Mesa intel HD graphics 530 (SKL GTZ)
Disk capacity - 512 GB

Data is stored on a 120 TB NAS

Config for lane computers.


We have tried restarting all components multiple times but this has not fixed the problem. We also installed the new patch for cryosparc 4.

Error Message:

Traceback (most recent call last):
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 426, in _make_request
six.raise_from(e, None)
File “”, line 3, in raise_from
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 421, in _make_request
httplib_response = conn.getresponse()
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py”, line 1373, in getresponse
response.begin()
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py”, line 319, in begin
version, status, reason = self._read_status()
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py”, line 288, in _read_status
raise RemoteDisconnected(“Remote end closed connection without”
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/adapters.py”, line 449, in send
timeout=timeout
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/util/retry.py”, line 410, in increment
raise six.reraise(type(error), error, _stacktrace)
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/packages/six.py”, line 734, in reraise
raise value.with_traceback(tb)
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 426, in _make_request
six.raise_from(e, None)
File “”, line 3, in raise_from
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 421, in _make_request
httplib_response = conn.getresponse()
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py”, line 1373, in getresponse
response.begin()
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py”, line 319, in begin
version, status, reason = self._read_status()
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py”, line 288, in _read_status
raise RemoteDisconnected(“Remote end closed connection without”
urllib3.exceptions.ProtocolError: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 93, in cryosparc_compute.run.main
File “cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py”, line 125, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
File “/home/lnd/Cryosparc/cryosparc_worker/cryosparc_compute/particles.py”, line 88, in read_blobs
u_blob_paths = cache.download_and_return_cache_paths(u_rel_paths)
File “/home/lnd/Cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py”, line 111, in download_and_return_cache_paths
rc.cli.cache_sync_in_use(worker_hostname, rc._project_uid, rc._job_uid) # ignore self
File “/home/lnd/Cryosparc/cryosparc_worker/cryosparc_compute/client.py”, line 54, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers=headers, timeout=self.timeout)
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/api.py”, line 119, in post
return request(‘post’, url, data=data, json=json, **kwargs)
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/api.py”, line 61, in request
return session.request(method=method, url=url, **kwargs)
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/sessions.py”, line 530, in request
resp = self.send(prep, **send_kwargs)
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/sessions.py”, line 643, in send
r = adapter.send(request, **kwargs)
File “/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/adapters.py”, line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’))

Any advice would be much appreciated,
Sarah

Please can you confirm that the patch was successfully installed on the master and all the workers: For the worker(s), you may
cat /home/lnd/Cryosparc/cryosparc_worker/patch
Did this error message occur after application of the patch?
Please can you post a few lines that immediately precede the error message, to clarify at what point the job is failing.
If available, please also post the error message (including some lines preceding it) for a different job type.

The error occurred before the patch. I believe the patch was successfully installed.

Launching job on lane Cryo-W3 target cryo-w3.local ...
Running job on remote worker node hostname cryo-w3.local
SSD cache : cache successfully synced in_use 
SSD cache : cache successfully synced, found 846255.60MB of files on SSD. 
SSD cache : requested files are locked for past 257s, checking again in 5s
Traceback (most recent call last): File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 426, in _make_request six.raise_from(e, None) File "<string>", line 3, in raise_from File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 421, in _make_request httplib_response = conn.getresponse() File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py", line 1373, in getresponse response.begin() File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py", line 319, in begin version, status, reason = self._read_status() File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/http/client.py", line 280, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/socket.py", line 589, in readinto return self._sock.recv_into(b) socket.timeout: timed out During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/adapters.py", line 449, in send timeout=timeout File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 727, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/util/retry.py", line 410, in increment raise six.reraise(type(error), error, _stacktrace) File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/packages/six.py", line 735, in reraise raise value File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 677, in urlopen chunked=chunked, File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 428, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout self, url, "Read timed out. (read timeout=%s)" % timeout_value urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='cryo-master.local', port=39002): Read timed out. (read timeout=300) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "cryosparc_worker/cryosparc_compute/run.py", line 93, in cryosparc_compute.run.main File "cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py", line 125, in cryosparc_compute.jobs.refine.newrun.run_homo_refine File "/home/lnd/Cryosparc/cryosparc_worker/cryosparc_compute/particles.py", line 88, in read_blobs u_blob_paths = cache.download_and_return_cache_paths(u_rel_paths) File "/home/lnd/Cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py", line 119, in download_and_return_cache_paths compressed_keys = rc.cli.cache_request_check(worker_hostname, rc._project_uid, rc._job_uid, com.compress_paths(rel_paths)) File "/home/lnd/Cryosparc/cryosparc_worker/cryosparc_compute/client.py", line 54, in func r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers=headers, timeout=self.timeout) File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/api.py", line 119, in post return request('post', url, data=data, json=json, **kwargs) File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/sessions.py", line 530, in request resp = self.send(prep, **send_kwargs) File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/sessions.py", line 643, in send r = adapter.send(request, **kwargs) File "/home/lnd/Cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/requests/adapters.py", line 529, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPConnectionPool(host='cryo-master.local', port=39002): Read timed out. (read timeout=300)

Resetting the particle caches may help.
If the reset does not resolve the problems, please can you run:

  1. cryosparcm restart
    (This might call running CryoSPARC jobs to fail, but presumably all jobs have already failed.)
  2. cryosparcm test i
    (guide)
  3. cryosparcm test workers <project_uid> --target cryo-w3.local
    (guide, please substitute valid <project_uid>, like P3, under which the worker test should run.)

and post the output?

2 Likes

We had the same problem upgrading from 3.3 to 4.0.2. Resetting the particle cache as explained in the link by wtempel fixed the problem.

1 Like