Cannot create new job (or clone etc). Webgui is not responsive

Hi,
after an upgrade to v.3.1 I started getting problems with the Web interface, clicking the clone button or trying to create a new job stopped working. It’s possible to click the respective buttons but nothing happens.
As i thought it might be a bug with v3.1 I then downgraded back to 3.0.1, without solving this problem.
Unfortunately I cannot find any error messages to give me a pointer regarding where to look for a solution…

I am running cryosparc in cluster configuration with a dedicated non-GPU master and a bunch of workers. All systems run CentOS7.

cryosparcm status

CryoSPARC System master node installed at
/opt/cryosparc_cluster/cryosparc2_master
Current cryoSPARC version: v3.0.1

cryosparcm process status:

app RUNNING pid 25702, uptime 0:06:06
app_dev STOPPED Not started
command_core RUNNING pid 25497, uptime 0:06:22
command_rtp RUNNING pid 25570, uptime 0:06:16
command_vis RUNNING pid 25556, uptime 0:06:18
database RUNNING pid 25390, uptime 0:06:24
liveapp RUNNING pid 25744, uptime 0:06:05
liveapp_dev STOPPED Not started
watchdog_dev STOPPED Not started
webapp RUNNING pid 25679, uptime 0:06:08
webapp_dev STOPPED Not started


global config variables:
export CRYOSPARC_FORCE_USER=true
export CRYOSPARC_LICENSE_ID="{redacted}"
export CRYOSPARC_MASTER_HOSTNAME=“be-cryosparc”
export CRYOSPARC_DB_PATH="/cs04/cryosparc_master"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true

Hi @lkat,

When the web application is unresponsive due to clicking on actions (such as creating or cloning a job) it is typically due to an issue with the cryoSPARC master instance. When you encounter this issue again, please reply with the output of the following command:

cryosparcm log command_core

- Suhail

Hi Suhail,

we have done some further investigating. When starting cryosparc (cryosparcm start, does a normal start up, no errors or warnings), the gui is responsive for about 1 minute. During this time it is possible to start jobs etc. After this 1 Minute, it is not possible to start new jobs but one can still browse old projects and check resources etc. If a job builder is open while this happens, it is no longer possible to browse the file system (e.g. when browsing for a file to import).
If you start a job before this ~1min interval is up, it will continue running on the respective node. The results will be visible during the entire time in the web gui but the job dies during the final step (Exporting job and creating csg-files). After cryosparc stops being responsive, there is also no further output to the command_core logs.

If I do a fresh start on cryosparc and log the command core, do nothing on the web gui and wait for a minute, I get the following:

[cryosparc@be-cryosparc ~]$ cryosparcm start
Starting cryoSPARC System master process..
CryoSPARC is not already running.
database: started
command_core: started
command_core connection succeeded

command_vis: started
command_rtp: started
command_rtp connection succeeded

webapp: started
app: started
liveapp: started

-----------------------------------------------------

CryoSPARC master started.
 From this machine, access cryoSPARC at
    http:/ /localhost:39000
 and access cryoSPARC Live at
    http:/ /localhost:39006
 please note the legacy cryoSPARC Live application is running at
    http:/ /localhost:39007

 From other machines on the network, access cryoSPARC at
    http:/ /be-cryosparc:39000
 and access cryoSPARC Live at
    http:/ /be-cryosparc:39006


Startup can take several minutes. Point your browser to the address
and refresh until you see the cryoSPARC web interface.

And for the command_core log:

 cryosparcm log command_core
 COMMAND CORE STARTED ===  2021-02-09 14:18:17.186763  ==========================
 *** BG WORKER START
  * Serving Flask app "command_core" (lazy loading)
  * Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  * Debug mode: off
 COMMAND CORE STARTED ===  2021-02-09 15:32:23.458067  ==========================
 *** BG WORKER START
  * Serving Flask app "command_core" (lazy loading)
  * Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  * Debug mode: off

Also, after this one minute I cannot use cryosparcm cli e.g. to remove a node. The below command does not complete and gives an error message when the command is killed with ctrl + c…

[cryosparc@be-cryosparc cryosparc_master]$ cryosparcm cli “remove_scheduler_target_node(‘gpu08’)”
^C*** client.py: command (http:/ /be-cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 1 of 3
^C*** client.py: command (http:/ /be-cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 2 of 3
^C*** client.py: command (http:/ /be-cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 3 of 3
Traceback (most recent call last):
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/opt/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 85, in
cli = CommandClient(host, int(port))
File “/opt/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 35, in init
self._reload()
File “/opt/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 63, in _reload
system = self._get_callable(‘system.describe’)()
File “/opt/cryosparc/cryosparc_master/cryosparc_compute/client.py”, line 51, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers = header, timeout=self.timeout)
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/requests/api.py”, line 119, in post
return request(‘post’, url, data=data, json=json, **kwargs)
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/requests/api.py”, line 61, in request
return session.request(method=method, url=url, **kwargs)
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/requests/sessions.py”, line 530, in request
resp = self.send(prep, **send_kwargs)
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/requests/sessions.py”, line 643, in send
r = adapter.send(request, **kwargs)
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/requests/adapters.py”, line 449, in send
timeout=timeout
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 677, in urlopen
chunked=chunked,
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 426, in _make_request
six.raise_from(e, None)
File “”, line 3, in raise_from
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/urllib3/connectionpool.py”, line 421, in _make_request
httplib_response = conn.getresponse()
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/http/client.py”, line 1354, in getresponse
response.begin()
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/http/client.py”, line 306, in begin
version, status, reason = self._read_status()
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/http/client.py”, line 267, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), “iso-8859-1”)
File “/opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/socket.py”, line 589, in readinto
return self._sock.recv_into(b)

We also tried a clean install on the same system, that seems to work fine. If we then import the database dumped from the previous installation, the same problem occurs.

Thanks for any help, we can provide more logs etc if needed!

Lukas

HI @lkat,

Thanks for the additional details, this helps a lot! It seems that cryoSPARC is doing some processing in the background that is causing the web application to become unresponsive. Are there any cryoSPARC Live sessions in the database of your previous installation?

In any case, can you please report the result of the following command:

cryosparcm log command_rtp

Thanks,
Suhail

Hi Suhail,

No, we do not utilize cryosparc live at the moment.

2021-02-09 14:18:26,802     RTP                   INFO      === STARTED ===
2021-02-09 14:18:26,803     RTP.BACKGROUND_WORKER INFO      === STARTED ===
 * Serving Flask app "rtp_manager" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2021-02-09 15:32:33,131     RTP                   INFO      === STARTED ===
2021-02-09 15:32:33,132     RTP.BACKGROUND_WORKER INFO      === STARTED ===
 * Serving Flask app "rtp_manager" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
2021-02-09 17:15:25,958     RTP                   INFO      === STARTED ===
2021-02-09 17:15:25,958     RTP.BACKGROUND_WORKER INFO      === STARTED ===
 * Serving Flask app "rtp_manager" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

I found the following output with log app, maybe it is relevant?

cryosparcm log app
(node:30331) Warning: Accessing non-existent property 'findOne' of module exports inside circular dependency
(node:30331) Warning: Accessing non-existent property 'remove' of module exports inside circular dependency
(node:30331) Warning: Accessing non-existent property 'updateOne' of module exports inside circular dependency
(node:31850) Warning: Accessing non-existent property 'count' of module exports inside circular dependency
(Use `node --trace-warnings ...` to show where the warning was created)
(node:31850) Warning: Accessing non-existent property 'findOne' of module exports inside circular dependency
(node:31850) Warning: Accessing non-existent property 'remove' of module exports inside circular dependency
(node:31850) Warning: Accessing non-existent property 'updateOne' of module exports inside circular dependency

this is just a short selection of such entries.

Cheers
Lukas

Hi @lkat,

This is very peculiar. Could you please report the output of the following command:

cryosparcm log database

Thanks

Hi,
They should cover the time imediatly after the clean re-install (where it seems to work fine) and the time after the point where we imported the old database where everything stops working as described.

https://syncandshare.lrz.de/dl/fi4hkX5tf3Rq7HzNGAZURH1T/database.log

Maybe this helps to navigate the log file: the new and empty database is:
/cryoscratch05/cryosparc_database/
the old one where problems come up is:
/cryoscratch04/cryosparc_master/

Cheers
Lukas

Hi,

any ideas what could be causing this? Or do you have any tips how we could at least salvage parts of our DB to set up a new instance?
It would be really frustrating if we lost all recent jobs and had to start from scratch.

Any help is greatly appreciated!

Cheers
Lukas

Hi @lkat,

This is quite odd - your database logs don’t seem to indicate any error either. What version are you running cryoSPARC now? Was the previous installation also running the same version (from which you are trying to migrate the database)?