Dear all,
we have been experience multiple crashes in different jobs while loading SSD cache from a NAS system via network. The job usually finishes with an error after 5 min: “socket.timeout: timed out. or tp_open
return self.do_open(http.client.HTTPConnection, req)”
I found, if scratch directory is deleted and a reboot then only 1 or 2 jobs will run, which is not very convenient because after few jobs similar error pop-up again. I will appreciate your suggestions.
Please find below the log file:
[2023-05-08 10:00:42.47]
License is valid.
[2023-05-08 10:00:42.47]
Launching job on lane RTX-A5500 target michaelscott ...
[2023-05-08 10:00:42.57]
Running job on master node hostname michaelscott
[2023-05-08 10:00:46.26]
[CPU: 195.2 MB Avail:1019.75 GB]
Job J1087 Started
[2023-05-08 10:00:46.29]
[CPU: 195.4 MB Avail:1019.75 GB]
Master running v4.2.1, worker running v4.2.1
[2023-05-08 10:00:46.30]
[CPU: 195.4 MB Avail:1019.75 GB]
Working in directory: /home/cryosparc/working_directory/CS-gamma-turc/J1087
[2023-05-08 10:00:46.31]
[CPU: 195.4 MB Avail:1019.75 GB]
Running on lane RTX-A5500
[2023-05-08 10:00:46.31]
[CPU: 195.4 MB Avail:1019.75 GB]
Resources allocated:
[2023-05-08 10:00:46.31]
[CPU: 195.4 MB Avail:1019.75 GB]
Worker: michaelscott
[2023-05-08 10:00:46.32]
[CPU: 195.4 MB Avail:1019.75 GB]
CPU : [0, 1, 2, 3]
[2023-05-08 10:00:46.32]
[CPU: 195.4 MB Avail:1019.75 GB]
GPU : [0]
[2023-05-08 10:00:46.32]
[CPU: 195.4 MB Avail:1019.75 GB]
RAM : [0, 1, 2]
[2023-05-08 10:00:46.32]
[CPU: 195.4 MB Avail:1019.75 GB]
SSD : True
[2023-05-08 10:00:46.33]
[CPU: 195.4 MB Avail:1019.75 GB]
--------------------------------------------------------------
[2023-05-08 10:00:46.33]
[CPU: 195.4 MB Avail:1019.75 GB]
Importing job module for job type new_local_refine...
[2023-05-08 10:00:48.42]
[CPU: 264.2 MB Avail:1019.69 GB]
Job ready to run
[2023-05-08 10:00:48.42]
[CPU: 264.2 MB Avail:1019.69 GB]
***************************************************************
[2023-05-08 10:00:52.37]
[CPU: 736.1 MB Avail:1019.21 GB]
Using random seed of 1093838622
[2023-05-08 10:00:52.38]
[CPU: 740.7 MB Avail:1019.20 GB]
Loading a ParticleStack with 45426 items...
[2023-05-08 10:00:55.13]
[CPU: 741.0 MB Avail:1019.15 GB]
SSD cache : cache successfully synced in_use
[2023-05-08 10:00:56.14]
[CPU: 741.0 MB Avail:1019.14 GB]
SSD cache : cache successfully synced, found 0.00MB of files on SSD.
[2023-05-08 10:05:56.41]
[CPU: 290.5 MB Avail:1019.65 GB]
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
File "cryosparc_master/cryosparc_compute/jobs/local_refine/newrun.py", line 123, in cryosparc_compute.jobs.local_refine.newrun.run_local_refine
File "/home/cryosparc/cryosparc_worker/cryosparc_compute/particles.py", line 114, in read_blobs
u_blob_paths = cache.download_and_return_cache_paths(u_rel_paths)
File "/home/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py", line 112, in download_and_return_cache_paths
compressed_keys = get_compressed_keys(worker_hostname, rel_paths)
File "/home/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py", line 285, in get_compressed_keys
compressed_keys = rc.cli.cache_request_check(worker_hostname, rc._project_uid, rc._job_uid, com.compress_paths(rel_paths))
File "/home/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 104, in func
with make_json_request(self, "/api", data=data) as request:
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/home/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 165, in make_request
with urlopen(request, timeout=client._timeout) as response:
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/urllib/request.py", line 542, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/urllib/request.py", line 1383, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/urllib/request.py", line 1358, in do_open
r = h.getresponse()
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/http/client.py", line 277, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
socket.timeout: timed out
05m 07s
RTX-A5500
1
NVIDIA RTX A5500
3 Output Groups