Dear Ali,
thank you for your help. I paste the entire page:
Launching job on lane merlin5 target merlin5 …
License is valid.
Launching job on cluster merlin5
====================== Cluster submission script: ========================
#!/usr/bin/env bash
cryoSPARC cluster submission script template for SLURM
Available variables:
/gpfs/data/marino_j/cryosparc/v2/cryosparc/cryosparc2_worker/bin/cryosparcw run --project P4 --job J100 --master_hostname merlin-l-02.psi.ch --master_command_core_port 39002 > /gpfs/data/marino_j/rhonofab/J100/job.log 2>&1 - the complete command string to run the job
2 - the number of CPUs needed
1 - the number of GPUs needed.
Note: the code will use this many GPUs starting from dev id 0
the cluster scheduler or this script have the responsibility
of setting CUDA_VISIBLE_DEVICES so that the job code ends up
using the correct cluster-allocated GPUs.
16.0 - the amount of RAM needed in GB
/gpfs/data/marino_j/rhonofab/J100 - absolute path to the job directory
/gpfs/data/marino_j/rhonofab - absolute path to the project dir
/gpfs/data/marino_j/rhonofab/J100/job.log - absolute path to the log file for the job
/gpfs/data/marino_j/cryosparc/v2/cryosparc/cryosparc2_worker/bin/cryosparcw - absolute path to the cryosparc worker command
–project P4 --job J100 --master_hostname merlin-l-02.psi.ch --master_command_core_port 39002 - arguments to be passed to cryosparcw run
P4 - uid of the project
J100 - uid of the job
What follows is a simple SLURM script:
#SBATCH --job-name cryosparc_P4_J100
##SBATCH -n 2
##SBATCH --gres=gpu:1
#SBATCH -p gpu
#SBATCH --mem=16000MB
#SBATCH -o /gpfs/data/marino_j/rhonofab/J100/job.out
#SBATCH -e /gpfs/data/marino_j/rhonofab/J100/job.err
#SBATCH --nodes=1
#SBATCH --exclusive
##SBATCH -w merlin-g-02
available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z “$available_devs” ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
hostname
echo “CUDA_VISIBLE_DEVICES : ${CUDA_VISIBLE_DEVICES}”
echo "/gpfs/data/marino_j/cryosparc/v2/cryosparc/cryosparc2_worker/bin/cryosparcw run --project P4 --job J100 --master_hostname merlin-l-02.psi.ch --master_command_core_port 39002 > /gpfs/data/marino_j/rhonofab/J100/job.log 2>&1 "
/gpfs/data/marino_j/cryosparc/v2/cryosparc/cryosparc2_worker/bin/cryosparcw run --project P4 --job J100 --master_hostname merlin-l-02.psi.ch --master_command_core_port 39002 > /gpfs/data/marino_j/rhonofab/J100/job.log 2>&1
==========================================================================
-------- Submission command:
sbatch /gpfs/data/marino_j/rhonofab/J100/queue_sub_script.sh
-------- Cluster Job ID:
770971
-------- Queued at 2018-09-03 11:32:55.399259
-------- Job status at 2018-09-03 11:32:55.420849
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
770971 gpu cryospar marino_j PD 0:00 1 (None)
Project P4 Job J100 Started
Master running v2.0.27, worker running v2.0.27
Running on lane merlin5
Resources allocated:
Worker: merlin5
CPU : [0, 1]
GPU : [0]
RAM : [0, 1]
SSD : True
Importing job module for job type class_2D…
Job ready to run
Using random seed of 1019575907
Loading a ParticleStack with 161982 items…
SSD cache : cache successfuly synced in_use
SSD cache : cache successfuly synced, found 0.00MB of files on SSD.
SSD cache : cache successfuly requested to check 449 files.
SSD cache : cache requires 40495.94MB more on the SSD for files to be downloaded.
SSD cache : cache has enough available space.
Transferring J95/localmotioncorrected/FoilHole_144661_Data_128122_128123_20180201_2104_Fractions_particles_local_aligned.mrc (32MB)
Complete : 40464MB
Total : 40496MB
Speed : 197.29MB/s
SSD cache : complete, all requested files are available on SSD.
Done.
Windowing particles
Done.
Using 50 classes.
Computing 2D class averages:
Volume Size: 128 (voxel size 2.22A)
Zeropadded Volume Size: 256
Data Size: 256 (pixel size 1.11A)
Using Resolution: 6.00A (47.0 radius)
Windowing only corners of 2D classes at each iteration.
Using random seed for initialization of 825386084
Done in 1.117s.
Start of Iteration 0
Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 705, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 92, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 93, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 832, in cryosparc2_compute.engine.engine.process.work
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 226, in cryosparc2_compute.engine.engine.EngineThread.compute_resid_pow
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 233, in cryosparc2_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 101, in cryosparc2_compute.engine.cuda_core.allocate_cpu
MemoryError: cuMemHostAlloc failed: out of memory