Not all the RAM available after update to CryoSPARC4

Alchimist79 · January 18, 2023, 2:45pm

We have recently upgraded to the latest CryoSPARC and I have noticed that the Resource Manager only reports 64GB of RAM but the node has 128GB. I suspect that’s the reason that starting more than one job is often not possible because the additional jobs remain queued and report “RAM not available”.
Is there a way to make CryoSPARC aware of the additional RAM or any idea why it may not be aware of it. I should add that we upgraded the system memory from 64GB to 128GB recently (before the CryoSPARC upgrade) and I wonder if the value is somewhere hardcoded and CryoSPARC is not aware of the upgrade. It’s likely that the previous version did not use the full amount of RAM either but I don’t know that for sure.

Thanks,
Albert

wtempel · January 18, 2023, 4:32pm

CryoSPARC records the amount of RAM when the worker is getting connected. Please run (guide)

/path/to/cryosparc_worker/bin/cryosparcw connect ... --update

filling in the ... with parameters/values that you used when you most recently ran the command.
If you do not remember those values, you can recall the current settings with the command
cryosparcm cli "get_scheduler_targets()" (guide)

Alchimist79 · January 20, 2023, 9:59am

Thank you! I was not aware that restarting cryosparc is not enough to update a new system configuration (like adding RAM).
I simply had to to run:
cryosparcw connect --master NODE_NAME_RUNNING_MASTER --worker NODE_NAME_RUNNING_WORKER --update
and now it sees the new system memory

Dmitry · February 22, 2023, 5:04pm

Dear colleagues,
@wtempel and @Alchimist79

I tried this command
cryosparcm cli “get_scheduler_targets()”
and got the following output

david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$ cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 6441926656, ‘name’: ‘NVIDIA GeForce RTX 3060 Laptop GPU’}], ‘hostname’: ‘DESKTOP-D1IHD96.’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘DESKTOP-D1IHD96.’, ‘resource_fixed’: {‘SSD’: False}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], ‘GPU’: [0], ‘RAM’: [0]}, ‘ssh_str’: ‘david@DESKTOP-D1IHD96.’, ‘title’: ‘Worker node DESKTOP-D1IHD96.’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/david/cryosparc/cryosparc_worker/bin/cryosparcw’}]
david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$

When running

./cryosparcw connect --master david@DESKTOP-D1IHD96 --worker david@DESKTOP-D1IHD96 --update

But got the following response

david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$ ./cryosparcw connect --master david@DESKTOP-D1IHD96 --worker david@DESKTOP-D1IHD96 --update --------------------------------------------------------------- CRYOSPARC CONNECT -------------------------------------------- --------------------------------------------------------------- Attempting to register worker david@DESKTOP-D1IHD96 to command david@DESKTOP-D1IHD96:39002
Connecting as unix user david
Will register using ssh string: david@david@DESKTOP-D1IHD96
If this is incorrect, you should re-run this command with the flag --sshstr

*** CommandClient: (http://david@DESKTOP-D1IHD96:39002/api) URL Error [Errno -2] Name or service not known
Traceback (most recent call last):
File “/home/david/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 104, in func
with make_json_request(self, “/api”, data=data) as request:
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/home/david/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 191, in make_request
raise CommandClient.Error(client, error_reason, url=url)
cryosparc_tools.cryosparc.command.Error: *** CommandClient: (http://david@DESKTOP-D1IHD96:39002/api) URL Error [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “bin/connect.py”, line 75, in
cli = client.CommandClient(host=master_hostname, port=command_core_port, service=“command_core”)
File “/home/david/cryosparc/cryosparc_worker/cryosparc_compute/client.py”, line 36, in init
super().init(service, host, port, url, timeout, headers, cls=NumpyEncoder)
File “/home/david/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 91, in init
self._reload() # attempt connection immediately to gather methods
File “/home/david/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 118, in _reload
system = self._get_callable(“system.describe”)()
File “/home/david/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py”, line 107, in func
raise CommandClient.Error(
cryosparc_tools.cryosparc.command.Error: *** CommandClient: (http://david@DESKTOP-D1IHD96:39002) Did not receive a JSON response from method “system.describe” with params ()

Could you please explain how to run correctly the command to register the new RAM memory?

leetleyang · February 22, 2023, 5:14pm

Hi,

Perhaps try it again with the --port option?

Cheers,
Yang

Dmitry · February 22, 2023, 5:30pm

@leetleyang , thank you.

Do you mean - something like this?

./cryosparcw connect --master david@DESKTOP-D1IHD96 --port 39002 --worker david@DESKTOP-D1IHD96 --port 39002 --update

I tried but it does not seem to work for me.

Any more suggestions?

Sincerely,

Alchimist79 · February 22, 2023, 5:30pm

Hi,

I used this command (all in one line):
cryosparcw connect --master NODE_NAME_RUNNING_MASTER --worker NODE_NAME_RUNNING_WORKER --update

No need for a username (I.e. remove david@) just replace with the name of your node for the master and worker - they seem to be the same.

Also can’t remember if CS had to be shut down first but I think so.

Albert

Dmitry · February 22, 2023, 5:37pm

Thank you, Albert @Alchimist79

Probably, you mean to try the following command

david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$ ./cryosparcw connect --master DESKTOP-D1IHD96 --worker DESKTOP-D
1IHD96 --update

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker DESKTOP-D1IHD96 to command DESKTOP-D1IHD96:39002
Connecting as unix user david
Will register using ssh string: david@DESKTOP-D1IHD96
If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers:
DESKTOP-D1IHD96.

Traceback (most recent call last):
File “bin/connect.py”, line 133, in
assert len(target) > 0, “Worker %s has not been registered so cannot be updated.” % worker_hostname
AssertionError: Worker DESKTOP-D1IHD96 has not been registered so cannot be updated.

Any more advise are very welcome.

Sincerely,
Dmitry

Dmitry · February 22, 2023, 5:44pm

I have also tried to register the worker but got the following error.

david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$ ./cryosparcw connect --master DESKTOP-D1IHD96 --worker DESKTOP-D1IHD96

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker DESKTOP-D1IHD96 to command DESKTOP-D1IHD96:39002
Connecting as unix user david
Will register using ssh string: david@DESKTOP-D1IHD96
If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers:
DESKTOP-D1IHD96.

Autodetecting available GPUs…
Detected 1 CUDA devices.

id pci-bus name

   0      0000:01:00.0  NVIDIA GeForce RTX 3060 Laptop GPU

All devices will be enabled now.
This can be changed later using --update

Traceback (most recent call last):
File “bin/connect.py”, line 225, in
assert args.ssdpath is not None or args.nossd, “Either provide --ssdpath or --nossd”
AssertionError: Either provide --ssdpath or --nossd

wtempel · February 22, 2023, 8:41pm

You could try:

~/cryosparc/cryosparc_worker/bin$ ./cryosparcw connect --master DESKTOP-D1IHD96 --worker DESKTOP-D1IHD96 --nossd

Dmitry · February 22, 2023, 9:21pm

thank you @wtempel ,

this helped.

But now I have 2 nodes for the selection. One of them is not running (probably the old one)
Is there a way to remove it?

Another question.
After the update I got the following error.
Any clues how to fix that ?

thank you

Traceback (most recent call last):
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/tools.py”, line 429, in context_dependent_memoize
return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7ff2964e07b0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pytools/prefork.py”, line 48, in call_capture_output
popen = Popen(cmdline, cwd=cwd, stdin=PIPE, stdout=PIPE,
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/subprocess.py”, line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/subprocess.py”, line 1704, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: ‘nvcc’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/class2D/run.py”, line 336, in cryosparc_compute.jobs.class2D.run.run_class_2D
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 964, in cryosparc_compute.engine.engine.process
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 974, in cryosparc_compute.engine.engine.process
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 156, in cryosparc_compute.engine.cuda_core.allocate_gpu
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/gpuarray.py”, line 549, in fill
func = elementwise.get_fill_kernel(self.dtype)
File “”, line 2, in get_fill_kernel
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/tools.py”, line 433, in context_dependent_memoize
result = func(*args)
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 493, in get_fill_kernel
return get_elwise_kernel(
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 162, in get_elwise_kernel
mod, func, arguments = get_elwise_kernel_and_types(
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 148, in get_elwise_kernel_and_types
mod = module_builder(arguments, operation, name,
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 45, in get_elwise_module
return SourceModule(“”"
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 290, in init
cubin = compile(source, nvcc, options, keep, no_extern_c,
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 78, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode(“utf-8”))
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 50, in preprocess_source
result, stdout, stderr = call_capture_output(cmdline, error_on_nonzero=False)
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pytools/prefork.py”, line 226, in call_capture_output
return forker.call_capture_output(cmdline, cwd, error_on_nonzero)
File “/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pytools/prefork.py”, line 59, in call_capture_output
raise ExecError(“error invoking ‘%s’: %s”
pytools.prefork.ExecError: error invoking ‘nvcc --preprocess -arch sm_86 -I/home/david/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/cuda /tmp/tmpk4n1l5_d.cu --compiler-options -P’: [Errno 2] No such file or directory: ‘nvcc’

Dmitry · February 23, 2023, 4:46pm

@wtempel ,

Let me specify the question.

So after the memory upgrade and registering the new RAM configuration in the CS I have
a) old node (with 8GB)
b) new node with 55 GB

how to remove a) (old 8GB)

When I try to involve the b) (new one) clicking - Run on specific GPU - and selecting the second one I am getting the error

License is valid.

Launching job on lane default target DESKTOP-D1IHD96 …

Running job on remote worker node hostname DESKTOP-D1IHD96

Failed to launch! 255
ssh: connect to host desktop-d1ihd96 port 22: Connection refused

How to fix that?

Thank you in advance.

Kind regards,
Dmitry

P.s. Not sure if it is important - but the a) the old one has name- DESKTOP-D1IHD96. with dot “.” the b) has name DESKTOP-D1IHD96 without

wtempel · February 23, 2023, 4:54pm

There is.

Identify the name of the node you want to remove
cryosparcm cli "get_scheduler_targets()"
remove the target in you want to remove with
cryosparcm cli "remove_scheduler_target_node('name_of_target_to_delete')"

You can find cli documentation here.

wtempel · February 23, 2023, 4:58pm

Please open a new topic for any new problem that was not previously answered, then remove this portion of the question from this topic (to avoid duplicate appearances of the new question). Thanks.

Dmitry · February 23, 2023, 5:21pm

Dear @wtempel

New issue with memory -

After the RAM Upgrade about 54Gb of RAM is available.

the 2D refinement runs without issue (similar as it worked with 8GB RAM)

But 3D homogeneous refinement fails with the following error.
Any idea how to fix that?

Thank you

[CPU: 1.12 GB Avail: 52.22 GB]
Traceback (most recent call last):
File “/home/david/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 2061, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 1084, in cryosparc_compute.engine.engine.process.work
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 346, in cryosparc_compute.engine.engine.EngineThread.compute_error
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 337, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

wtempel · February 23, 2023, 5:37pm

Please post the current output of
cryosparcm cli "get_scheduler_targets()"

Dmitry · February 23, 2023, 8:44pm

david@DESKTOP-D1IHD96:~$ cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 6441926656, ‘name’: ‘NVIDIA GeForce RTX 3060 Laptop GPU’}], ‘hostname’: ‘DESKTOP-D1IHD96’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘DESKTOP-D1IHD96’, ‘resource_fixed’: {‘SSD’: False}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], ‘GPU’: [0], ‘RAM’: [0, 1, 2, 3, 4, 5, 6]}, ‘ssh_str’: ‘david@DESKTOP-D1IHD96’, ‘title’: ‘Worker node DESKTOP-D1IHD96’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/david/cryosparc/cryosparc_worker/bin/cryosparcw’}]

wtempel · February 23, 2023, 9:22pm

This looks now roughly as I expected after the RAM upgrade (even as I am not 100% sure about what to expect on WSL). How many particles are there, and what is the box size?

Dmitry · February 23, 2023, 10:50pm

@wtempel,

Those are strange values, aren’t they - ‘RAM’: [0, 1, 2, 3, 4, 5, 6] ?

The number of particles are 5,474 - with the box size 256x256
Also D2 symmetry.
The ab-initio was running fine.

wtempel · January 3, 2024, 5:31pm

2 posts were split to a new topic: Out of memory on cluster

Not all the RAM available after update to CryoSPARC4

david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$ ./cryosparcw connect --master DESKTOP-D1IHD96 --worker DESKTOP-D 1IHD96 --update

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker DESKTOP-D1IHD96 to command DESKTOP-D1IHD96:39002 Connecting as unix user david Will register using ssh string: david@DESKTOP-D1IHD96 If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers: DESKTOP-D1IHD96.

david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$ ./cryosparcw connect --master DESKTOP-D1IHD96 --worker DESKTOP-D1IHD96

CRYOSPARC CONNECT --------------------------------------------

Attempting to register worker DESKTOP-D1IHD96 to command DESKTOP-D1IHD96:39002 Connecting as unix user david Will register using ssh string: david@DESKTOP-D1IHD96 If this is incorrect, you should re-run this command with the flag --sshstr

Connected to master.

Current connected workers: DESKTOP-D1IHD96.

id pci-bus name

All devices will be enabled now. This can be changed later using --update

david@DESKTOP-D1IHD96:~/cryosparc/cryosparc_worker/bin$ ./cryosparcw connect --master DESKTOP-D1IHD96 --worker DESKTOP-D
1IHD96 --update

Attempting to register worker DESKTOP-D1IHD96 to command DESKTOP-D1IHD96:39002
Connecting as unix user david
Will register using ssh string: david@DESKTOP-D1IHD96
If this is incorrect, you should re-run this command with the flag --sshstr

Current connected workers:
DESKTOP-D1IHD96.

Attempting to register worker DESKTOP-D1IHD96 to command DESKTOP-D1IHD96:39002
Connecting as unix user david
Will register using ssh string: david@DESKTOP-D1IHD96
If this is incorrect, you should re-run this command with the flag --sshstr

Current connected workers:
DESKTOP-D1IHD96.

All devices will be enabled now.
This can be changed later using --update