Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudpathlib does not automatically resolve paths that contain .. #494

Open
lejmr opened this issue Dec 27, 2024 · 2 comments
Open

Cloudpathlib does not automatically resolve paths that contain .. #494

lejmr opened this issue Dec 27, 2024 · 2 comments

Comments

@lejmr
Copy link

lejmr commented Dec 27, 2024

When .. are in the path, the AzureBlobPath fails to list directory with

azure.core.exceptions.ResourceNotFoundError: The specified path does not exist.

Interestingly, the metadata to identify whether a path exists, is dir, etc., works just fine.

In my opinion, this should work since pathlib.Path works this way.

from pathlib import Path
# Create directroy
Path("root/test/test2").mkdir(parents=True, exist_ok=True)
p = Path("root/test/test2")
assert (p.exists(), p.is_dir()) == (True, True)
assert list(p.iterdir()) == []
p = Path("root/test/../test/test2")
assert (p.exists(), p.is_dir()) == (True, True)
assert list(p.iterdir()) == []
@pjbull pjbull changed the title Azure client fail when path contains .. Cloudpathlib does not automatically resolve paths that contain .. Dec 27, 2024
@pjbull
Copy link
Member

pjbull commented Dec 27, 2024

Thanks for raising, since the behavior here could be surprising. This affects all providers, not just Azure.

Unfortunately, I don't think we will implement automatically resolving literal .. in paths. The reason is that for most providers, it is a valid object to have a .. in the key, e.g. on blob storage you can have a blob literally named (az://container/folder/../my_file.txt). Doing anything on our end to remove .. by resolving the path automatically will make it so that you cannot access those blobs at all. For that reason, I don't think that we will automatically resolve paths.

We have a few options.

  • Implement CloudPath.resolve, which currently just returns self and recommend calling that method to users that may have relative paths. This should be doable with something like:
import posixpath

...

class CloudPath:
    ...

    def resolve(self):
        return self._new_cloudpath(posixpath.normpath(self._path))
  • Document this behavior and recommend a workaround. For example, joining strings with relative paths works as you might expect:
In [1]: from cloudpathlib import CloudPath

In [2]: CloudPath("s3://bucket/folder1/") / "../hello.txt"
Out[2]: S3Path('s3://bucket/hello.txt')

With this in mind, I am going to close #495 since it wouldn't be the right fix for the issue you mention.

@lejmr
Copy link
Author

lejmr commented Dec 28, 2024

Thank you very much for all the information! Since Azure has quite a lot of strange things, I totally forget to validate it against what I know about S3 which is basically a key-value store hence having .. in key is fine. (I just tested it on command line).

After I posted this issue I played with the code a bit more, and I eventually created Azure/azure-sdk-for-python#38992. Funnily, even the REST API on the Azure side is not consistent ;-)

I think it is good to stress out that .. is valid part of the name, so implementing the resolve method is just a waste of time.

Wrapping all together, I think this is a bug:

In [1]: from cloudpathlib import CloudPath
In [2]: CloudPath("s3://bucket/folder1/") / "../hello.txt"
Out[2]: S3Path('s3://bucket/hello.txt')

Or documentation should mention that / implements the convenience everyone is used to from regular filesystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants