"Error: Could not list directory" even after setting rights correctly

Hi,

It’s the second time I experienced this. I’m a systems administrator and i manage Cryosparc instances for our users.

When they try to access a directory for which they don’t have rights, they get an access error. So far, so good. But when I set their access rights so they can get to it, the message “Error: Could not list directory” still comes up.

I tried restarting Cryosparc, but it didn’t change anything. Then, a few days later (3-4), access to the directory becomes possible.

This strongly looks like Cryosparc caches some info about access rights and doesn’t check it again until the cache expires. Here, the files are served through NFS. Needless to say, I tried to access the files with the same user that Cryosparc uses and I was able to access the files directly from the same machine. The problems only show up when listing the directory from within Cryosparc.

Is there some way to get around this problem ?

Thanks,
J.C.H

1 Like

Please can you describe the context of the attempted access (while running a job? while performing a specific action in the web app?) and where (specific log files or GUI elements) the error messages appear.

What method did you use to modify access rights?
Do the affected files have extended attributes, for example related to selinux?

Hi,

The error appeared while browsing when creating a new project :

The rights were set by adding the corresponding user to the owner group in our active directory. I then tested by su’ing to the user with which Cryosparc runs was able to do ls of the directory.

Thank you for your interest,
J.C.H

As the Linux user that runs the CryoSPARC webapp, please can you run these commands on the CryoSPARC master computer and post their outputs:

whoami
ls -ald / 
ls -ald /shared
ls -ald /shared/mendel
ls -ald /shared/mendel/projects

Hi, here are the results :

$ whoami
cs-cs-<REDACTED>
$ ls -ald /
drwxr-xr-x 23 root root 4096 15 mai   15:22 /
$ ls -ald /shared/
drwxr-xr-x 8 root root 4096 21 févr. 17:26 /shared/
$ ls -ald /shared/mendel/
drwxr-xr-x 5 root root 1536 17 nov.   2022 /shared/mendel/
$ ls -ald /shared/mendel/projects/
drwxr-xr-x 199 root root 100864 18 juin  08:03 /shared/mendel/projects/

As a side note, when the error shows up like on the picture posted above, the “Up one directory” button is inoperative. The user has to edit the location bar to get back to the previous directory.

Thanks @haessigj. What about

ls -ald /shared/mendel/projects/Reverse_Gyrase
$ ls -ald /shared/mendel/projects/Reverse_Gyrase/
drwxrwxrwx 4 root nogroup 1536 17 juin  10:38 /shared/mendel/projects/Reverse_Gyrase/

As i said, asking for the listing from the shell works.

… and

stat -f /shared/mendel/projects/Reverse_Gyrase/
$ LANG=C stat -f /shared/mendel/projects/Reverse_Gyrase/
  File: "/shared/mendel/projects/Reverse_Gyrase/"
    ID: 0        Namelen: 255     Type: nfs
Block size: 1048576    Fundamental block size: 1048576
Blocks: Total: 4768372    Free: 4044007    Available: 4044007
Inodes: Total: 1220703125 Free: 1035265785

Hi, If that helps, I tried the same test accessing a Ceph filesystem and the results were not the same. I tried to read a directory without the correct rights and got the error but i could access it successfully just after fixing the rights.
The difference there is that the rights are managed by regular Linux ACLs, whereas the NFS service uses rights from the AD (and it is not possible to have a detailed view of said rights in the Unix ways, but they are enforced). Still, is there any differential treatment that depends on the announced filesystem type ?

Thanks,
J.C.H

I did one further test on another NFS filesystem and played with permissions. I had no problems. The issue boils down to the fact that on the problematic fileysytem, the rights are not listable, but enforced. Moreover the filesystem can respond “access denied” while at the same time the rights look like to be granted on a directory.

Please update this forum topic if and how you are able to resolve this issue, perhaps by adjusting access controls “at the source” (in Active Directory or on the file server).

Hi,
The issue is still unsolved as of now. The last instance of it (not the one exposed here) took several days before it “disappeared” and access to the directory was possible.

There is nothing more that can be done on the file server since it is a proprietary system. Accessing the directories/files using the user that runs Cryosparc always works. The thing is, that NFS service uses Windows ACLs which are not really mappable to Linux, therefore they are not presented at all. However, the rights are correctly enforced depending on the user that tries to access. The only slightly sketchy thing is that the basic directory rights appear to be 0777, which isn’t true but in the end that information doesn’t prevent the real rights to be applied properly.

The issue only shows from Cryosparc and the fact that the individual issues resolve themselves after some time (minutes up to several days) imply that there is some kind of caching of accessibility metadata…

Thanks,
J.C.H

1 Like

Cryo Devs,

Very similar issue(s) as @haessigj here. I work on a High-Performance Computing cluster where CryoSPARC mounts our shared-storage array over NFS v4.1.

Is there any progress regarding a solution?

Here’s the steps I’ve taken (below) - currently we’re being forced to bounce services (SSSD and the CryoSPARC application) on our side as a temporary remedy for this issue.

  1. Initial inability to list directories from CryoSPARC webUI

  2. The webUI - can list group1/group2/group3 (expected, as it’s a member of these groups in AD). When i try to list group4 (which it’s also a member of in AD), I get an error:

    “Error: Could not list directory.” and the [Up one directory] button ceases to be functional (as described here, on this board).

  3. Verified AD groups

  4. Verified id command showed all relevant memberships

  5. Verified SELinux wasn’t interfering

  6. Stopped/Started the CryoSPARC application from master when no jobs were running

  7. Verified writing data on disk with the same account that’s unable to list the desired directories from the webUI

  8. Tried making a symlink to the desired directory inside of a known, functional directory, which did not work

  9. At this point, the commonalities among various issues on forums made me question CryoSPARC mechanisms (mainly CryoSPARC’s Python web backend) for caching/refreshing group memberships, possibly because of an internal Python process environment caching issue within CryoSPARC, likely in the filesystem browser backend, where:

    • It uses os.listdir() or os.scandir() under the hood,
    • But the underlying C library or Python runtime doesn’t refresh getgrouplist() correctly for that process tree.
  10. This is seemingly a shortcoming in the way CryoSPARC is written (there are known issues related to CryoSPARC’s file browser not re-evaluating group memberships during runtime as well as potentially performing nonstandard functions with group resolution).

  11. We have a mechanism (more of a bandaid) that forces a refresh of the pertinent cache(s)

  12. Replicated the issue on our identical development/test server

  13. Added account-in-question to a new AD group (group5) to increase data points

  14. Bouncing/Restarting the SSSD service from the CryoSPARC master and the application worked to force refresh of the cache, both desired groups were accessible from the CryoSPARC webUI

However, given our CryoSPARC instance is expected to maintain uptime for extended periods of time, during which, communication with the shared-storage array over NFS is essential - the ability to restart services/applications is not always a viable solution.

Please advise, thank you!!