Installed to submit to a cluster, but the import job fails

ozej8y · February 4, 2022, 6:15am

Hi CryoSPARC,

Can you please help me ?

I have previously installed CryoSPARC on a standalone machine, but this time I have installed it on a cluster so jobs are submitted to SLURM.

I have followed the instructions to do this.

To test, I have tried to use the ‘Extensive Workflow for T20s’.

On the import job it fails.
I reviewed my installation but I was unable to find an issue.

I have tried to manually follow through the tute located at : Cryo-EM Data Processing in cryoSPARC: Introductory Tutorial - CryoSPARC Guide but receive the same error.

Here is the output from the import job.

License is valid.

Running job on master node

[CPU: 78.6 MB]   Project P1 Job J4 Started

[CPU: 78.6 MB]   Master running v3.3.1, worker running v3.3.1

[CPU: 78.8 MB]   Working in directory: /mnt/nfs01/jvanschy/P1/J4

[CPU: 78.8 MB]   Running on lane default

[CPU: 78.8 MB]   Resources allocated: 

[CPU: 78.8 MB]     Worker:  mlerp-login1

[CPU: 78.8 MB]   --------------------------------------------------------------

[CPU: 78.8 MB]   Importing job module for job type import_movies...

[CPU: 238.7 MB]  Job ready to run

[CPU: 238.7 MB]  ***************************************************************

[CPU: 238.7 MB]  Importing movies from /mnt/nfs01/jvanschy/cryosparc/empiar_10025_subset/*.tif

[CPU: 238.7 MB]  Importing 20 files

[CPU: 238.9 MB]  Import paths were unique at level -1

[CPU: 238.9 MB]  Importing 21 files

[CPU: 238.9 MB]  Reading header for each exposure...

[CPU: 239.2 MB]  Spawning worker processes to read headers in parallel...

[CPU: 239.2 MB]  Processed 20 headers...

[CPU: 240.0 MB]  Processing results...

[CPU: 240.0 MB]  Reading headers of gain reference file /mnt/nfs01/jvanschy/cryosparc/empiar_10025_subset/norm-amibox05-0.mrc

[CPU: 240.1 MB]  Done importing.

[CPU: 240.1 MB]  --------------------------------------------------------------

[CPU: 240.1 MB]  ===========================================================

[CPU: 240.1 MB]  Loaded 20 movies.

[CPU: 240.1 MB]    Common fields: 

[CPU: 240.1 MB]                 mscope_params/accel_kv :  {300.0}

[CPU: 240.1 MB]                    mscope_params/cs_mm :  {2.7}

[CPU: 240.1 MB]      mscope_params/total_dose_e_per_A2 :  {53.0}

[CPU: 240.1 MB]             mscope_params/exp_group_id :  {1}

[CPU: 240.1 MB]              mscope_params/phase_plate :  {0}

[CPU: 240.1 MB]                mscope_params/neg_stain :  {0}

[CPU: 240.1 MB]                     movie_blob/psize_A :  {0.6575}

[CPU: 240.1 MB]                       movie_blob/shape :  [  38 7676 7420]

[CPU: 240.1 MB]           movie_blob/is_gain_corrected :  {0}

[CPU: 240.1 MB]  ===========================================================

[CPU: 240.1 MB]  Making example plots. Exposures will be displayed without defect correction.

[CPU: 240.1 MB]  Reading file...

[CPU: 54.3 MB]   ====== Job process terminated abnormally.

I’ve looked at the logs in the files in the job folder, but haven’t found the cause.
If you are able to assist I would be very grateful.

Thanks,
Jay.

wtempel · February 4, 2022, 4:04pm

@ozej8y Please can you post “joblog” output for the failed job, presumably Job 4 in Project 1, after running:
cryosparcm joblog P1 J4
?

ozej8y · February 7, 2022, 12:46am

@wtempel

Thanks for the reply.
Here is the output.

================= CRYOSPARCW =======  2022-02-04 05:57:14.697795  =========
Project P1 Job J4
Master mlerp-login1 Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 3888542
========= monitor process now waiting for main process
MAIN PID 3888542
imports.run cryosparc_compute.jobs.jobregister
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= main process now complete.
========= monitor process now complete.
Waiting for data... (interrupt to abort)

ozej8y · February 7, 2022, 5:06am

@wtempel I suspect I have found the issue. The VM I’m using only has 2 CPUs, the requirements are 4+ CPUs. Would that be a likely cause ?

wtempel · February 11, 2022, 10:48pm

@ozej8y Did you succeed in running any jobs? If not, please describe in some detail:

how cryoSPARC master and workers are integrated in the slurm environment
- which of master and workers are controlled by slurm
the role that VMs play in that environment

ozej8y · March 17, 2022, 5:18am

Sorry for the delayed reply.
I have relocated the cryosparc server to a machine where I have >2 CPU to run cryosparc_master.

This has fixed the issue.