ServerError: Authentication failed in jobs after update to v4.0.0

wxh180 · October 4, 2022, 12:56am

Dear CS team/users,
We just updated to the latest 4.0 version. Everything seems to go well and fantastic user interface! However, all the jobs are queued and launched but not really running. It was stuck at this step:

Could you let me know how to resolve this issue? Please let me know if there is any additional information that I can provide for debugging. Thanks!
Best regards,
Wei

olibclarke · October 4, 2022, 12:52pm

Hi Wei,

I had the same issue and realized that the worker didn’t update properly - maybe work checking that? You can try the update again with --override flag to force update of the worker (or do it manually in the cryosparc worker directory with bin/cryosparcw update provided the updated worker archive has been downloaded).

Cheers
Oli

wxh180 · October 4, 2022, 1:52pm

Dear Oli,
Thanks for your reply! I actually updated the worker manually and I just tried --override option to update it again. But unfortunately, it is still the same result. I don’t know whether this could be related to the Linux version. Mine is running CentOS 7. Thanks!
Best,
Wei

stephan · October 4, 2022, 2:10pm

Hey @wxh180, thanks for reporting this. Can you send us the job error report for this job? For more information on how to do this, see Guide: Download Error Reports - CryoSPARC Guide

wxh180 · October 4, 2022, 2:56pm

@stephan this is another great feature. Here are the job logs. Let me know if any additional information is needed. Thanks!

kbasore · October 4, 2022, 3:11pm

Hi Wei and Stephan,

Just piggy-backing to say that I’m getting the same errors in the job log (also using CentOS 7). I haven’t tried running a job yet, but I got this error when I ran ‘cryosparcm test workers’ after installation. Here’s the test output:

[wucci@wucci-014 FF14_TitanF4_Test]$ cryosparcm test workers P15
Using project P15
Running worker tests…
2022-10-04 10:00:56,717 WORKER_TEST log CRITICAL | Worker test results
2022-10-04 10:00:56,717 WORKER_TEST log CRITICAL | wucci-014.wucon.wustl.edu
2022-10-04 10:00:56,718 WORKER_TEST log CRITICAL | ✕ LAUNCH
2022-10-04 10:00:56,718 WORKER_TEST log CRITICAL | Error:
2022-10-04 10:00:56,718 WORKER_TEST log CRITICAL | See P15 J72 for more information
2022-10-04 10:00:56,718 WORKER_TEST log CRITICAL | SSD
2022-10-04 10:00:56,718 WORKER_TEST log CRITICAL | Did not run: Launch test failed
2022-10-04 10:00:56,718 WORKER_TEST log CRITICAL | GPU
2022-10-04 10:00:56,718 WORKER_TEST log CRITICAL | Did not run: Launch test failed

Hope this helps with troubleshooting.

stephan · October 4, 2022, 3:12pm

Hi @wxh180,

Thanks for sending that over. Can you ensure the CRYOSPARC_LICENSE_ID field inside cryosparc_worker/config.sh matches the CRYOSPARC_LICENSE_ID inside cryosparc_master/config.sh?

stephan · October 4, 2022, 3:13pm

Can you send the error report for this job? Also, can you ensure the CRYOSPARC_LICENSE_ID field inside cryosparc_worker/config.sh matches the CRYOSPARC_LICENSE_ID inside cryosparc_master/config.sh ?

kbasore · October 4, 2022, 3:16pm

Hi @stephan,

The license ID matches in each config.sh file for worker and master. Here’s the error report. Hope this helps. Thanks!

================= CRYOSPARCW =======  2022-10-04 10:00:46.197276  =========
Project P15 Job J72
Master wucci-014.wucon.wustl.edu Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 261037
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "cryosparc_worker/cryosparc_compute/run.py", line 162, in cryosparc_compute.run.run
  File "/home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 97, in connect
    assert cli.test_connection(), "Job could not connect to master instance at %s:%s" % (master_hostname, str(master_command_core_port))
  File "/home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/client.py", line 65, in func
    assert 'error' not in res, f"Encountered error for method \"{key}\" with params {params}:\n{res['error']['message'] if 'message' in res['error'] else res['error']}"
AssertionError: Encountered error for method "test_connection" with params ():
ServerError: Authentication failed
Process Process-1:
Traceback (most recent call last):
  File "/home/wucci/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/home/wucci/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "cryosparc_worker/cryosparc_compute/run.py", line 31, in cryosparc_compute.run.main
  File "/home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 97, in connect
    assert cli.test_connection(), "Job could not connect to master instance at %s:%s" % (master_hostname, str(master_command_core_port))
  File "/home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/client.py", line 65, in func
    assert 'error' not in res, f"Encountered error for method \"{key}\" with params {params}:\n{res['error']['message'] if 'message' in res['error'] else res['error']}"
AssertionError: Encountered error for method "test_connection" with params ():
ServerError: Authentication failed

stephan · October 4, 2022, 3:24pm

Hi @kbasore,

Can you confirm your worker updated correctly? What is inside the file cryosparc_worker/version?
If it’s not v4.0.0, can you do the following:
Inside cryosparc_master, you’ll find a binary called cryosparc_worker.tar.gz.
Copy this into the cryosparc_worker directory.
Once that’s done, inside the cryosparc_worker directory, run ./bin/cryosparcw update

wxh180 · October 4, 2022, 3:32pm

@stephan Yes, this fixes the problem. Thanks!

kbasore · October 4, 2022, 3:45pm

Hi @stephan,

Looks like I had v3.3.2. I copied the cryosparc_worker.tar.gz and ran the cryosparcw update. The launch test now passes, but I’m getting a GPU error:

[wucci@wucci-014 FF14_TitanF4_Test]$ cryosparcm test workers P15
Using project P15
Running worker tests...
^[[B2022-10-04 10:33:50,291 WORKER_TEST          log                  CRITICAL | Worker test results
2022-10-04 10:33:50,292 WORKER_TEST          log                  CRITICAL | wucci-014.wucon.wustl.edu
2022-10-04 10:33:50,292 WORKER_TEST          log                  CRITICAL |   ✓ LAUNCH
2022-10-04 10:33:50,292 WORKER_TEST          log                  CRITICAL |   ✓ SSD
2022-10-04 10:33:50,292 WORKER_TEST          log                  CRITICAL |   ✕ GPU
2022-10-04 10:33:50,292 WORKER_TEST          log                  CRITICAL |     Error: Tensorflow detected 0 of 4 GPUs.
2022-10-04 10:33:50,292 WORKER_TEST          log                  CRITICAL |     See P15 J75 for more information

P15/J75/job.log:

================= CRYOSPARCW ======= 2022-10-04 10:33:00.428998 =========

Project P15 Job J75

Master wucci-014.wucon.wustl.edu Port 39002

===========================================================================

========= monitor process now starting main process

MAINPROCESS PID 267316

MAIN PID 267316

instance_testing.run cryosparc_compute.jobs.jobregister

========= monitor process now waiting for main process

2022-10-04 10:33:07.883113: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/wucci/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib:/home/wucci/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib

2022-10-04 10:33:07.883231: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

2022-10-04 10:33:09.185174: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set

2022-10-04 10:33:09.185301: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1

2022-10-04 10:33:09.186850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:

pciBusID: 0000:18:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5

coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s

2022-10-04 10:33:09.188052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 1 with properties:

pciBusID: 0000:3b:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5

coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s

2022-10-04 10:33:09.189212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 2 with properties:

pciBusID: 0000:86:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5

coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s

2022-10-04 10:33:09.190381: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 3 with properties:

pciBusID: 0000:af:00.0 name: NVIDIA GeForce RTX 2080 Ti computeCapability: 7.5

coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s

2022-10-04 10:33:09.190627: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/wucci/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib:/home/wucci/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib

2022-10-04 10:33:09.190780: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/wucci/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib:/home/wucci/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib

2022-10-04 10:33:09.190915: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/wucci/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib:/home/wucci/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib

2022-10-04 10:33:09.190959: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10

2022-10-04 10:33:09.191007: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10

2022-10-04 10:33:09.232231: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10

2022-10-04 10:33:09.232696: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/wucci/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/wucci/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib:/home/wucci/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib:/home/wucci/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib

2022-10-04 10:33:09.234134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8

2022-10-04 10:33:09.234208: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.

Skipping registering GPU devices...

***************************************************************

**** handle exception rc

set status to failed

========= main process now complete.

========= monitor process now complete.

I’m running CUDA 10.2, and I know that version 11 is needed for the Deep Picker. Maybe that’s why I’m getting this error? Let me try running a job (not Deep Picker) that uses GPUs and I’ll get back to you in a minute. Thank you so much for your help!

kbasore · October 4, 2022, 3:51pm

Job has launched, started, and is running just fine! Thanks again, @stephan!

stephan · October 4, 2022, 3:59pm

Hi @kbasore,

Glad to see it’s running properly, did you mean the Deep Picker jobs are running fine? Odd that Tensorflow failed in the worker test but not in the Deep Picker job itself

kbasore · October 4, 2022, 4:12pm

Hi @stephan,

I ran a NU refinement job.

stephan · October 4, 2022, 4:18pm

Okay that makes sense then. Yes, you’ll most likely need to install CUDA 11 in order to get the Deep Picker jobs to work based on this error.

donaldb · October 4, 2022, 4:52pm

I just had an issue with Centos7 upgrade also and it had different LICENSE_ID in master and worker, which when changed made everything work. Is this some sort of bug?

stephan · October 4, 2022, 5:06pm

Hi @donaldb,

This isn’t related to CentOS7. It’s also not a bug: the master and worker packages are expected to have the same License ID. This is now enforced in v4, as the authentication layer relies on this.
You can read more about it here: CryoSPARC Architecture and System Requirements - CryoSPARC Guide

donaldb · October 5, 2022, 9:05am

Thanks for that. Could there be a way to add a warning output when they don’t match for easier troubleshooting?

stephan · October 5, 2022, 4:01pm

Hi @donaldb,

Thanks for the suggestion, we’ll add this in our next release.