Cryosparc live session stuck at queued

Hello,

We started a live session after the collection of the raw data was already finished, but they dont start processing somehow, its over 21K movies. Seems to be problem with that many images?
Took a while before they appeared in queued and then nothing happens anymore. Workers just reporting no new exposures. Pausing and restarting didn’t help either.

Edit1: Same movies can be processed normally without any problem, so i would say its no data corruption. Also creating a new live sessions leads the same result.

Here are some error i found in command_rtp log.

cheers
Kilian

2023-07-10 10:26:17,757 RTP.LIVE_SESSION     start_session        INFO     | === START === P45 S21
2023-07-10 10:38:11,289 RTP.FILE_ENGINE      find_new_files       INFO     | Found 21150 new files in 713.02s for XXXXX/eer in P45 S21
2023-07-10 10:38:12,336 RTP.PHASE1           handle_rtp_worker    INFO     | P45 S21 Phase One Worker J1811 -- Current Status: running, Previous Status: launched
2023-07-10 10:38:12,343 RTP.PHASE1           handle_rtp_worker    INFO     | P45 S21 Phase One Worker J1812 -- Current Status: running, Previous Status: launched
2023-07-10 10:38:12,349 RTP.PHASE1           handle_rtp_worker    INFO     | P45 S21 Phase One Worker J1813 -- Current Status: running, Previous Status: launched
2023-07-10 10:38:12,370 RTP.PHASE1           handle_rtp_worker    INFO     | P45 S21 Phase One Worker J1814 -- Current Status: running, Previous Status: queued
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    | File Engine Scheduler Failed
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 109, in background_worker
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |     run_file_engine(file_engine['project_uid'], file_engine['session_uid'])
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/commandcommon.py", line 191, in wrapper
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |     return func(*args, **kwargs)
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 637, in run_file_engine
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |     find_new_files( project_uid = project_uid,
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 668, in find_new_files
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |     entities = _list_directory(
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 737, in _list_directory
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |     last_modified_time = os.path.getmtime(filepath_abs)
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/genericpath.py", line 55, in getmtime
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    |     return os.stat(filename).st_mtime
2023-07-10 11:01:04,610 RTP.BG_WORKER        background_worker    ERROR    | OSError: [Errno 24] Too many open files: 'XXXXX/eer/FoilHole_5866707_Data_5775604_5775606_20230707_233952_EER.eer'
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    | File Engine Scheduler Failed
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 109, in background_worker
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |     run_file_engine(file_engine['project_uid'], file_engine['session_uid'])
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/commandcommon.py", line 191, in wrapper
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |     return func(*args, **kwargs)
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 637, in run_file_engine
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |     find_new_files( project_uid = project_uid,
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 668, in find_new_files
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |     entities = _list_directory(
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 737, in _list_directory
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |     last_modified_time = os.path.getmtime(filepath_abs)
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/genericpath.py", line 55, in getmtime
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    |     return os.stat(filename).st_mtime
2023-07-10 11:33:22,478 RTP.BG_WORKER        background_worker    ERROR    | OSError: [Errno 24] Too many open files: 'XXXXX/eer/FoilHole_5853792_Data_5778044_5778046_20230707_070847_EER.eer'
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    | File Engine Scheduler Failed
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 109, in background_worker
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     run_file_engine(file_engine['project_uid'], file_engine['session_uid'])
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/commandcommon.py", line 191, in wrapper
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     return func(*args, **kwargs)
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 637, in run_file_engine
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     find_new_files( project_uid = project_uid,
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 711, in find_new_files
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     update_session(project_uid, session_uid, {'file_engine_last_run' : time.time()})
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/commandcommon.py", line 191, in wrapper
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     return func(*args, **kwargs)
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 589, in update_session
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     cli.dump_workspaces(project_uid)
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_tools/cryosparc/command.py", line 112, in func
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     assert "error" not in res, f'Error for "{key}" with params {params}:\n' + format_server_error(res["error"])
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    | AssertionError: Error for "dump_workspaces" with params ('P45',):
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    | ServerError: [Errno 9] Bad file descriptor
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/commandcommon.py", line 200, in wrapper
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     res = func(*args, **kwargs)
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/commandcommon.py", line 266, in wrapper
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     return func(*args, **kwargs)
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_core/__init__.py", line 4765, in dump_workspaces
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |     json.dump(all_workspaces, openfile, indent=4, default=json_util.default)
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    | OSError: [Errno 9] Bad file descriptor
2023-07-10 11:33:30,308 RTP.BG_WORKER        background_worker    ERROR    |

The last patch for CryoSPARC apparently had a fix for this. Check you’re running CS 4.2.1+230621, if not, update to 4.2.1 then patch with cryosparcm patch.

Too many files seems strange. I’ve had projects with 30,000 exposures (although they are in different “GridSquare” directories, so maybe that is why it hasn’t happened to our systems?

Its all in one folder since the files are just links to the actual raw data.
But yes ls on that folder takes a bit, though python functions should be much faster and not really have trouble with it.

The master runs as LXD VM on a server that mounts the storage (BeeGFS) and then mounts that mount to the VM. But so far i never had problems with that system.

I updated to 4.2.1+230621
I suppose even a patch thats not listed under release notes yet:D

Works now, although finding the images takes really long.
Also i had to do a restart of cryosparc after applying patches, didnt work without. Did everything from instructions also reloading command etc.

Edit: ok it seems like it works, atleast its running still have some to many files open errors in log, though particle blob files now.

2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    | JSONRPC ERROR at get_individual_data_picks
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    | Traceback (most recent call last):
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/commandcommon.py", line 200, in wrapper
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |     res = func(*args, **kwargs)
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_command/command_rtp/__init__.py", line 3460, in get_individual_data_picks
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |     picks = rtp_com.import_particles_database_to_dataset(session_doc, exposure_doc, picker_type, proj_dir_abs, picks_only=True)
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 264, in import_particles_database_to_dataset
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |     dset = dataset.Dataset.load(inpath_abs)
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_tools/cryosparc/dataset.py", line 493, in load
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |     with bopen(file, "rb") as f:
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |   File "/home/cryosparcuser/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/contextlib.py", line 113, in __enter__
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |     return next(self.gen)
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |   File "/home/cryosparcuser/cryosparc_master/cryosparc_tools/cryosparc/util.py", line 351, in bopen
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    |     with open(file, mode) as f:
2023-07-11 11:57:20,657 COMMAND.COMMON       wrapper              ERROR    | OSError: [Errno 24] Too many open files: 'XXXXXXX/SX/pick/FoilHole_9906204_Data_9901108_9901110_20230711_022516_EER_patch_aligned_doseweighted_particle_blob.cs'

@KiSchnelle Please can you post the outputs of the following commands under the Linux account that owns CryoSPARC processes:

cat /etc/security/limits.conf
ulimit -Sn
ulimit -Hn
cat /proc/$(ps -eopid,cmd | grep command_rtp | grep -v grep | tail -1 | awk '{print $1}')/limits

Curious why Python should be faster than ls… it’s still filesystem (and hardware) dependent? The fastest parser in the world won’t be quicker if the hardware and filesystem are causing a bottleneck.

@wtempel

cryosparcuser@cryosparc-master:~$ cat /etc/security/limits.conf
# /etc/security/limits.conf
#
#Each line describes a limit for a user in the form:
#
#<domain>        <type>  <item>  <value>
#
#Where:
#<domain> can be:
#        - a user name
#        - a group name, with @group syntax
#        - the wildcard *, for default entry
#        - the wildcard %, can be also used with %group syntax,
#                 for maxlogin limit
#        - NOTE: group and wildcard limits are not applied to root.
#          To apply a limit to the root user, <domain> must be
#          the literal username root.
#
#<type> can have the two values:
#        - "soft" for enforcing the soft limits
#        - "hard" for enforcing hard limits
#
#<item> can be one of the following:
#        - core - limits the core file size (KB)
#        - data - max data size (KB)
#        - fsize - maximum filesize (KB)
#        - memlock - max locked-in-memory address space (KB)
#        - nofile - max number of open file descriptors
#        - rss - max resident set size (KB)
#        - stack - max stack size (KB)
#        - cpu - max CPU time (MIN)
#        - nproc - max number of processes
#        - as - address space limit (KB)
#        - maxlogins - max number of logins for this user
#        - maxsyslogins - max number of logins on the system
#        - priority - the priority to run user process with
#        - locks - max number of file locks the user can hold
#        - sigpending - max number of pending signals
#        - msgqueue - max memory used by POSIX message queues (bytes)
#        - nice - max nice priority allowed to raise to values: [-20, 19]
#        - rtprio - max realtime priority
#        - chroot - change root to directory (Debian-specific)
#
#<domain>      <type>  <item>         <value>
#

#*               soft    core            0
#root            hard    core            100000
#*               hard    rss             10000
#@student        hard    nproc           20
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4

# End of file

cryosparcuser@cryosparc-master:~$ ulimit -Sn
1024

cryosparcuser@cryosparc-master:~$ ulimit -Hn
1048576

cryosparcuser@cryosparc-master:~$ cat /proc/$(ps -eopid,cmd | grep command_rtp | grep -v grep | tail -1 | awk '{print $1}')/limits
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             127822               127822               processes 
Max open files            1024                 1048576              files     
Max locked memory         4203442176           4203442176           bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       127822               127822               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

@rbs_sci
True i was mislead by what appears to be the rendering time or so in the terminal to be the really long taking step. Though i run the code below on the cryosparc_master instance on exactly that folder with the 20k+ movies, so that shouldnt be the limiting thing then:)

from pathlib import Path
import time
import subprocess
import os
import timeit

test_folder = "/XXXXX/eer"


def with_pathlib(test_folder):
    for file in Path(test_folder).glob("*.eer"):
        print(file)


def with_ls(test_folder):
    result = subprocess.run(['ls', test_folder], capture_output=True, text=True)
    lines = result.stdout.splitlines()
    for line in lines:
        print(test_folder + "/" + line)

def with_os(test_folder):
    for file in os.listdir(test_folder):
        print(os.path.join(test_folder, file))

result_dict = {}
for function in [with_pathlib, with_ls, with_os]:
    st = time.time()
    cpu_st = time.process_time()
    function(test_folder)
    et = time.time()
    cpu_et = time.process_time()

    elapsed_time = et - st
    elapsed_cpu_time = cpu_et = cpu_st
    result_dict.update({function.__name__ : [elapsed_time, elapsed_cpu_time]})


for func, etime in result_dict.items():
    print(f"The wall time of fucntion: {func} is {etime[0]} seconds and cpu time is {etime[1]} seconds.")

The wall time of fucntion: with_pathlib is 1.0268385410308838 seconds and cpu time is 0.050097356 seconds.
The wall time of fucntion: with_ls is 1.0077898502349854 seconds and cpu time is 0.282393117 seconds.
The wall time of fucntion: with_os is 0.8679301738739014 seconds and cpu time is 0.42075248 seconds.
1 Like

@KiSchnelle Thanks for posting these details.
Can you try including in the ~/.bashrc file of the Linux account that runs CryoSPARC processes
ulimit -n 8192
and see if this setting resolves the issue?