The clone, create, and submit tasks do not respond [Resolved: SSH connection to worker node]

    ----------------------------------------------------------------------------
    CryoSPARC System master node installed at
    /Share/app/cryosparc/cryosparc2_master
    Current cryoSPARC version: v3.2.0+211012
    ----------------------------------------------------------------------------

    CryoSPARC process status:

    app                              RUNNING   pid 303010, uptime 0:03:46
    app_dev                          STOPPED   Not started
    command_core                     RUNNING   pid 302824, uptime 0:04:18
    command_rtp                      RUNNING   pid 302934, uptime 0:04:02
    command_vis                      RUNNING   pid 302927, uptime 0:04:03
    database                         RUNNING   pid 302702, uptime 0:04:25
    liveapp                          RUNNING   pid 303035, uptime 0:03:44
    liveapp_dev                      STOPPED   Not started
    webapp                           RUNNING   pid 303000, uptime 0:03:47
    webapp_dev                       STOPPED   Not started

Creating a task, cloning a task, or submitting a task does not respond. If the fault persists after the service is updated and restarted, webApp logs show that the operation fails.

    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [projects] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [workspace] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [workspace] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [workspace] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    [PUB] job.events.checkpoints: { project_uid: 'P401', job_uid: 'J223', type: 'checkpoint' }
    [PUB] job.events: { project_uid: 'P401', job_uid: 'J223' } 100 0
    [PUB] events.countAfterCheckpoint
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [projects] project query user  5e1bdfd6d56b698c28137af8 kemeng false
    ==== [workspace] project query user  5e1bdfd6d56b698c28137af8 kemeng false
    ==== [jobs] project query user  5e1bdfd6d56b698c28137af8 kemeng false
    ==== [workspace] project query user  5e1bdfd6d56b698c28137af8 kemeng false
    ==== [jobs] project query user  5e1bdfd6d56b698c28137af8 kemeng false
    create_new_job
    {"job_type":"import_micrographs","project_uid":"P377","workspace_uid":"W2","created_by_user_id":"5e1bdfd6d56b698c28137af8"}
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    ==== [jobs] project query user  60a7311390d1192eb80ee49e shijunhui false
    [PUB] job.events.checkpoints: { project_uid: 'P401', job_uid: 'J222', type: 'checkpoint' }
    set_user_viewed_job
    ["60a7311390d1192eb80ee49e","P401","W3","J222"]
    [PUB] job.events: { project_uid: 'P401', job_uid: 'J222' } 100 0
    [PUB] events.countAfterCheckpoint
    enqueue_job
    {"project_uid":"P401","job_uid":"J231","lane":"V100s"}
    ==== [projects] project query user  5e3e6106d56b698c28146692 shenhuaizong false
    ==== [workspace] project query user  5e3e6106d56b698c28146692 shenhuaizong false
    ==== [jobs] project query user  5e3e6106d56b698c28146692 shenhuaizong false
    ==== [jobs] project query user  5e3e6106d56b698c28146692 shenhuaizong false
    ==== [projects] project query user  5e3e6106d56b698c28146692 shenhuaizong false
    layout_tree
    {"project_uid":"P483","workspace_uid":"W1","layout_name":"P483W1"}
    ==== [workspace] project query user  5e3e6106d56b698c28146692 shenhuaizong false
    ==== [jobs] project query user  5e3e6106d56b698c28146692 shenhuaizong false
    ==== [jobs] project query user  5da82a1cd56b6950435770f8 admin true
    [PUB] job.events.checkpoints: { project_uid: 'P38', job_uid: 'J10', type: 'checkpoint' }
    set_user_viewed_job
    ["5da82a1cd56b6950435770f8","P38","W2","J10"]
    [PUB] job.events: { project_uid: 'P38', job_uid: 'J10' } 100 0
    [PUB] events.countAfterCheckpoint
    [PUB] job.events: { project_uid: 'P38',
      job_uid: 'J10',
      created_at: { '$gt': 2020-03-09T07:08:10.047Z } } 100 0
    [PUB] events.countAfterCheckpoint
    [PUB] job.events: { project_uid: 'P38',
      job_uid: 'J10',
      created_at: { '$gt': 2020-03-09T07:08:15.488Z } } 100 0
    [PUB] events.countAfterCheckpoint
    [PUB] job.events: { project_uid: 'P38',
      job_uid: 'J10',
      created_at: { '$gt': 2020-03-09T07:08:56.448Z } } 100 0
    [PUB] events.countAfterCheckpoint
    clone_job
{"project_uid":"P38","workspace_uid":"W2","job_uid":"J10","created_by_user_id":"5da82a1cd56b6950435770f8"}

Hi @zhenyuanliu,

The web application logs you’ve attached don’t indicate any error. Typically we’ve seen users encountering an unresponsive web application due to either high CPU or memory usage on the machine running cryoSPARC or slow network/file system responses. Could that be the case here?

Could you please output the result of the following command: cryosparcm log command_core

- Suhail

It is found that some working nodes in the cluster are stuck and cannot be logged in using SSH. Restart the working node and cryosparc service and reply. Thank you for your support.

Glad you were able to get it working!