CryoSPARC incorrectly thinks it can not write to directory

Hello,

When CryoSPARC checks that the user running the application can write to a directory on an NFSv4 mounted filesystem with ACLs Python returns the incorrect answer.

This is occurring on multiple master node after upgrading the host from RHEL 7 to RHEL 8.

Notes

  • This is being done as the user that runs the CryoSPARC application.
  • The user has read and write permissions to the directory (I’ll demonstrate this).
  • I have obfuscated some paths, usernames, UIDs, GIDs, and hostnames.

CryoSPARC instance information

  • Type: master with cluster
  • Software version from cryosparcm status
    $ cryosparcm status | grep version
    Current cryoSPARC version: v4.4.1+240110
    
  • Output of uname -a && free -g on master node
    $ uname -a && free -g
    Linux hostname.fqdn 4.18.0-513.18.1.el8_9.x86_64 #1 SMP Thu Feb 1 03:51:05 EST 2024 x86_64 x86_64 x86_64 GNU/Linux
                  total        used        free      shared  buff/cache   available
    Mem:            376           3         250           0         121         370
    Swap:             3           0           3
    
  • Operating System:
    $ cat /etc/system-release
    Red Hat Enterprise Linux release 8.9 (Ootpa)
    

Problem with steps of troubleshooting

This sequence was done to get information about the issue but it occurs at more than this step.
When trying to create a new project called ‘test’ via the web within the directory /nfs/mount/data/
I initially see the message “Missing write permissions for project container directory /nfs/mount/data”.
So I modify the file /usr/local/cryosparc/cryosparc_master/cryosparc_command/command_core/__init__.py as shown in the
output of diff -c below:

*** __init__.py	2024-02-29 17:57:23.646083466 -0500
--- __init__.py.org	2024-02-29 17:50:23.087832163 -0500
***************
*** 3794,3802 ****
          valid = False
          message = f"Missing read permissions for project container directory {expanded_project_container_dir}"
      elif not os.access(expanded_project_container_dir, os.W_OK):
!         #valid = False
!         #message = f"Missing write permissions for project container directory {expanded_project_container_dir}"
!         pass
  
      return {
          "slug" : title_slug,
--- 3794,3801 ----
          valid = False
          message = f"Missing read permissions for project container directory {expanded_project_container_dir}"
      elif not os.access(expanded_project_container_dir, os.W_OK):
!         valid = False
!         message = f"Missing write permissions for project container directory {expanded_project_container_dir}"
  
      return {
          "slug" : title_slug,

I then restart the CryoSPARC application and try to create a new project called test in the directory /nfs/mount/data/
I receive this error in the browser:
“Unable to create project: ServerError Error: new project directory not writable /nfs/mount/data/CS-test”
In the command_core log is the following:

2024-02-29 16:32:12,662 wrapper              ERROR    | JSONRPC ERROR at create_empty_project
2024-02-29 16:32:12,662 wrapper              ERROR    | Traceback (most recent call last):
2024-02-29 16:32:12,662 wrapper              ERROR    |   File "/usr/local/cryosparc/cryosparc_master/cryosparc_command/commandcommon.py", line 195, in wrapper
2024-02-29 16:32:12,662 wrapper              ERROR    |     res = func(*args, **kwargs)
2024-02-29 16:32:12,662 wrapper              ERROR    |   File "/usr/local/cryosparc/cryosparc_master/cryosparc_command/command_core/__init__.py", line 3852, in create_empty_project
2024-02-29 16:32:12,662 wrapper              ERROR    |     project_dir = create_and_return_new_project_dir(project_container_dir, project_dir_name)
2024-02-29 16:32:12,662 wrapper              ERROR    |   File "/usr/local/cryosparc/cryosparc_master/cryosparc_command/command_core/__init__.py", line 3744, in create_and_return_new_project_dir
2024-02-29 16:32:12,662 wrapper              ERROR    |     assert False, "Error: new project directory not writable %s" % expanded_new_project_dir
2024-02-29 16:32:12,662 wrapper              ERROR    | AssertionError: Error: new project directory not writable /nfs/mount/data/CS-test

The directory is created:

But the directory is created:
$ stat /nfs/mount/data/CS-test
  File: /nfs/mount/data/CS-test
  Size: 0         	Blocks: 64         IO Block: 1048576 directory
Device: 40h/64d	Inode: 7301205695  Links: 2
Access: (2770/drwxrws---)  Uid: (#######/cryosparc_user)   Gid: (#######/some_group)
Context: system_u:object_r:nfs_t:s0
Access: 2024-02-29 16:32:12.660183000 -0500
Modify: 2024-02-29 16:32:12.660183000 -0500
Change: 2024-02-29 16:32:12.660183000 -0500
 Birth: -

And the directory is writable:

$ touch /nfs/mount/data/CS-test/write-test
$ stat /nfs/mount/data/CS-test/write-test
  File: /nfs/mount/data/CS-test/write-test
  Size: 0         	Blocks: 48         IO Block: 1048576 regular empty file
Device: 40h/64d	Inode: 7290082035  Links: 1
Access: (0770/-rwxrwx---)  Uid: (#######/cryosparc_user)   Gid: (#######/some_group)
Context: system_u:object_r:nfs_t:s0
Access: 2024-02-29 16:38:37.428924000 -0500
Modify: 2024-02-29 16:38:37.428924000 -0500
Change: 2024-02-29 16:38:37.429056000 -0500
 Birth: -

I make some further changes to the file /usr/local/cryosparc/cryosparc_master/cryosparc_command/command_core/__init__.py
as shown in the output of diff -c below:

$ diff -c __init__.py __init__.py.org 
*** __init__.py	2024-02-29 18:25:12.083844409 -0500
--- __init__.py.org	2024-02-29 17:50:23.087832163 -0500
***************
*** 3714,3721 ****
          assert False, f"Error: could not create project container directory {full_project_container_dir} due to {exc}"
      # now we know path exists and is directory, now check writable:
      if not os.access(full_project_container_dir, os.W_OK):
!         #assert False, "Error: project container directory not writable."
!         pass
      return full_project_container_dir
  
  def create_and_return_new_project_dir(project_container_dir, project_name):
--- 3714,3720 ----
          assert False, f"Error: could not create project container directory {full_project_container_dir} due to {exc}"
      # now we know path exists and is directory, now check writable:
      if not os.access(full_project_container_dir, os.W_OK):
!         assert False, "Error: project container directory not writable."
      return full_project_container_dir
  
  def create_and_return_new_project_dir(project_container_dir, project_name):
***************
*** 3741,3748 ****
          assert False, f"Error: could not create project directory {expanded_new_project_dir} due to {exc}"
      # now we know path exists and is directory, now check writable:
      if not os.access(expanded_new_project_dir, os.W_OK):
!         #assert False, "Error: new project directory not writable %s" % expanded_new_project_dir
!         pass
      return new_project_dir # this one has shell vars
  
  def check_project_dir(project_dir: str):
--- 3740,3746 ----
          assert False, f"Error: could not create project directory {expanded_new_project_dir} due to {exc}"
      # now we know path exists and is directory, now check writable:
      if not os.access(expanded_new_project_dir, os.W_OK):
!         assert False, "Error: new project directory not writable %s" % expanded_new_project_dir
      return new_project_dir # this one has shell vars
  
  def check_project_dir(project_dir: str):
***************
*** 3751,3757 ****
      full_project_dir = os.path.expandvars(project_dir)
      assert os.path.isdir(full_project_dir), "Project directory does not exist"
      assert os.access(full_project_dir, os.R_OK), "Project directory is not readable."
!     #assert os.access(full_project_dir, os.W_OK), "Project directory is not writable."
      return project_dir
  
  @extern
--- 3749,3755 ----
      full_project_dir = os.path.expandvars(project_dir)
      assert os.path.isdir(full_project_dir), "Project directory does not exist"
      assert os.access(full_project_dir, os.R_OK), "Project directory is not readable."
!     assert os.access(full_project_dir, os.W_OK), "Project directory is not writable."
      return project_dir
  
  @extern
***************
*** 3796,3804 ****
          valid = False
          message = f"Missing read permissions for project container directory {expanded_project_container_dir}"
      elif not os.access(expanded_project_container_dir, os.W_OK):
!         #valid = False
!         #message = f"Missing write permissions for project container directory {expanded_project_container_dir}"
!         pass
  
      return {
          "slug" : title_slug,
--- 3794,3801 ----
          valid = False
          message = f"Missing read permissions for project container directory {expanded_project_container_dir}"
      elif not os.access(expanded_project_container_dir, os.W_OK):
!         valid = False
!         message = f"Missing write permissions for project container directory {expanded_project_container_dir}"
  
      return {
          "slug" : title_slug,
***************
*** 4433,4441 ****
          com.error_notification(mongo.db, notification_id, "Unable to import project: directory %s does not exist"%(abs_path_export_project_dir))
          assert False, "[IMPORT_PROJECT] : Directory %s does not exist"%(abs_path_export_project_dir)
      if not os.access(abs_path_export_project_dir, os.W_OK):
!         #com.error_notification(mongo.db, notification_id, "Unable to import project: directory %s not writable."%(abs_path_export_project_dir))
!         #assert False, "[IMPORT_PROJECT] Error: project container directory not writable."
!         pass
      
      all_projects = list(mongo.db['projects'].find({}, {'uid' : 1, 'project_dir' : 1, 'deleted' : 1, 'detached' : 1}))
  
--- 4430,4437 ----
          com.error_notification(mongo.db, notification_id, "Unable to import project: directory %s does not exist"%(abs_path_export_project_dir))
          assert False, "[IMPORT_PROJECT] : Directory %s does not exist"%(abs_path_export_project_dir)
      if not os.access(abs_path_export_project_dir, os.W_OK):
!         com.error_notification(mongo.db, notification_id, "Unable to import project: directory %s not writable."%(abs_path_export_project_dir))
!         assert False, "[IMPORT_PROJECT] Error: project container directory not writable."
      
      all_projects = list(mongo.db['projects'].find({}, {'uid' : 1, 'project_dir' : 1, 'deleted' : 1, 'detached' : 1}))
  
***************
*** 5976,5983 ****
          assert False, f"Error: could not create job directory {full_new_job_dir} due to {exc}"
      # now we know path exists and is directory, now check writable:
      if not os.access(full_new_job_dir, os.W_OK):
!         #assert False, "Error: new job directory not writable %s" % full_new_job_dir
!         pass
      return job_dir_rel
  
  @extern
--- 5972,5978 ----
          assert False, f"Error: could not create job directory {full_new_job_dir} due to {exc}"
      # now we know path exists and is directory, now check writable:
      if not os.access(full_new_job_dir, os.W_OK):
!         assert False, "Error: new job directory not writable %s" % full_new_job_dir
      return job_dir_rel
  
  @extern

It now works as expected.

Conclusion

Basically wherever CryoSPARC does an os.access(path, os.W_OK) check was causing issues.
It seems the underlying issue is that CryoSPARC is relying on Python’s os.access check without considering that some may be using NFS with ACLs for granting access.
From Python’s os.access documentation:
“Note: I/O operations may fail even when access() indicates that they would succeed, particularly for operations on
network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model.”

Would it be possible to not do the os.access checks and instead use the EAFP style to avoid these odd issues?

I tried to include everything but if something is unclear or confusing please let me know.

If you

$ chmod -R 777 /nfs/mount/data/CS-test/write-test

can cryosparc work e.g. run validation ?

I have a similar setup with a ZFS file system and NFS mounts (much faster than SMB) and had a similar experience.

Hello @Mark-A-Nakasone,

Thank you for taking the time to respond! I’m glad I’m not alone in this issue.

Strangely, when I execute chmod -R 777 /nfs/mount/data/CS-test, Python and consequently CryoSPARC no longer consider the directory to be unwritable by cryosparc_user.

Although this offers a temporary fix and serves as a helpful diagnostic test, it’s not a viable long-term solution. CryoSPARC is installed for use by an entire lab, and multiple users cannot be expected to run chmod every time CryoSPARC creates a new directory.

What was your long-term solution? Were you using ACLs as well?

1 Like

Hi @clil16 - It is hard managing file system access on some HPCs.

I am running OpenBSD data servers and use ACLs on the server side to work with a GPU node and several workstations.

A lot of people may have GPFS/IBM Spectrum scale with REHL - so slightly different approach.

I should have said, $chmod 777 is absurd to run and not secure, but for a single test directory is ok. This will tell you if you have permission issues or it is just something with cryosparc.

What is your system like ? How many instances of cryosparc ? You have NFS and REHL, fine. You may need to configure another login node or have only the “cryosparcuser” account have access. Could this also be changed on how NFS is mounted.

Hi @Mark-A-Nakasone ,

Sorry for the delayed response, I’ve had a lot on my plate recently.

Yeah, I never recommend people run that chmod command but for testing it is appropriate.
The issue isn’t permissions as the CryoSPARC user can write to the storage just fine but using Python’s os.access('path', os.W_OK) method returns an incorrect response.

Our Setup

Linux machines are running RHEL 8.

In our setup we have 4 CryoSPARC master nodes (one for each lab) that submits to a SLURM cluster and each CryoSPARC instance is run by a different CryoSPARC service account that is a member of the appropriate lab group. The cluster GPU nodes and the master nodes mount the same NFS storage.

The storage is provided by a Dell EMC Isilon cluster running OneFS Version: 9.4.0.17. The shares are multi protocol so they can access the same storage space via NFS or SAMBA/CIFS. Because of this the Isilon takes permissions from SAMBA/CIFS and NFS and combines them into a unified permissions model and presents the appropriate permissions to the appropriate client along with ACLs.

When I have a chance I plan on looking at the Python source code to see if I can figure anything else out.

1 Like

RHEL 8 should be fine, I know 9 has been out a while but I know people running Cryosparc fine on 7.

Does each instance of Cryosparc have its own user (UID) associated with it from the different nodes ?
I am trying to understand if this is from the ACLs on your setup or something specific to the several cryosparc instances that all have their own container.

Yes, NFS is the way to go and much faster than SMB.

If the permission are on the Dell EMC side, then likely something in Isilon / PowerScale OneFS is the culprit. I am sure your ACLs are correct for users, but are they setup for each “cryosparcuser” for each of your instances ?

I am hosting many of my projects on a few PB of iXsystems TrueNAS file server with Open BSD (TrueNAS Core v13) and it took some changes to get the permissions correct.

I’ll elaborate a bit on our ecosystem.
We use Active Directory (AD) for managing users and groups and each user gets a real UID in AD and each group gets a real GID as well. All the Linux machines are bound to AD using SSSD and get user/group data from there so we don’t have to maintain local passwd or shadow files.

Yes, each instance of cryoSPARC is run by a unique user with its own UID. Each of the machines that run cryoSPARC are able to submit to our SLURM cluster and each of the users that run the cryoSPARC instances can submit to the cluster as well.

These cryoSPARC instances were running on the same machines with the same permissions/ACLs but using RHEL 7. With RHEL 7 reaching end of maintenance June 30, 2024 we upgraded the OS to RHEL 8 and started encountering these issues.

After some testing, I believe the issue is related to the changes in glibc. RHEL 7 used glibc version 2.17 and RHEL 8 uses glibc version 2.28. I’m not sure if I’ll have time to really dig in and figure out what change caused it.

I really appreciate your willingness to help! Maybe one day I’ll be able to return the favor.

1 Like

Hi @clil16, I have heard about issues with different versions of REHL, but your feeling on glibc seems valid.

I am not sure I would much help in this case, sorry about that.

Is each instance of cryosparc in the user group that is using it ?

glibc-compat could be worth checking out, but not a great solution