Jobs not starting on worker node(s)


I have set up the cryoSPARC master on a single node and installed cryoSPARC workers on two other nodes that contain GPUs and SSD for caching. However, no jobs are actually running on the worker nodes I see a 2D classification job is scheduled to run on one of the worker nodes but it never starts. I can see the worker nodes in the resource manager on the master node correctly but as far as I can tell the workers have no logging to speak of so I would like to know how I should being troubleshooting the problem.

Also the cryosparcw script needs some serious TLC because the script has no help output so I have to open the script to read through and see if there is something useful.

Turns out this was an issue of where the logfile was trying to be created. The user running the webapp has its home directory in /var/lib which is not a shared file system (Since it is a service account). When the user created the project or job the destination for at least the log file was set to /var/lib/username/P1/J9/job.log and the directory structure did not exist on the node. The job threw an error but no messages were logged by the master. I think this sort of error probably should be logged to aid in troubleshooting.

As a side note. Is there a way to set the default directory the web app starts the file browser in to something other than $HOME?