-
-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
maybe_string dealing with non-unicode strings #1329
base: master
Are you sure you want to change the base?
Conversation
There are other solutions, these work for me:
I think I prefer number 3, could you try it? Also, please add a unit test, it can simply clone this DeblenderVAE repo. Thanks! |
I implemented the surrogate solution, and a unit test using the testrepo. However I had to create the new branch with subprocess (hope git CLI is available though for CI), because the surrogate policy is not applied at encoding either. I can look into it, but maybe you can tell me if what I did so far complies. Also, I just created a cloned folder next to the testrepo folder, is it cleaned up automatically, or should I clean it up manually? |
Just saw that black auto-formatted all the quotes, I can revert it if it's a problem |
Yes please, don't make format changes Everything should be cleaned up automatically, this is handled by pytest. For example when I tried your changes locally the cloned repo is at The test fails on Windows (AppVeyor), https://ci.appveyor.com/project/jdavid/pygit2/builds/51066441/job/ueawtoexy0bqussu Better to create the |
… fails because of surrogates not allowed
Commit title self-explanatory. Here the pytest output:
I tried to track it down, but this goes into the C code. |
Alternatively, it works with subprocess using
The issue is then that non-unicode bytestrings cannot be passed to create branches, but at least my case is solved: cloning a repository with an already existing non-unicode branch. |
Windows and macOS tests fail:
What I meant before was to add a new test repo, for example Alernatively what I can do is to add such a repo to https://github.com/pygit2/ |
Just did that, see https://github.com/pygit2/test_branch_notutf Then:
Could you change the unit test to clone https://github.com/pygit2/test_branch_notutf.git ? |
Thanks! The tests for macOS and Windows still fail though. I've created a branch just to try, but regardless the method to decode the errors are the same, see https://github.com/libgit2/pygit2/actions/workflows/tests.yml and https://ci.appveyor.com/project/jdavid/pygit2/history There may be an issue with libgit2 clone in Windows and macOS, this needs some research.. |
Looking at the error it seems that there is a lockfile written with the name of the ref/branch, and the string is used not as bytestring, which apparently causes it to fail because it does not recognize the surrogates. I guess it could be able to write the files with filenames not utf8 if the string is passed as bytes, but one would have to track down where this happens. I'll have a look in a few days (unless you do before). |
Hi (and happy new year). Could you rebase? I've upgraded to libgit2 1.9, it would be nice to run the CI again, maybe we're lucky and the bug got fixed. |
Hi, Sorry for the delay... I thought I would find time before Still not working but: I fixed the linting, and had a closer look. Did not find anything specific, just figured there might be something in Feliz año nuevo tb :) |
NB: I saw in the macos logs that the declared ARCH is arm64, but the workflow has in its name x86_64, is it a typo, or just me missing something? |
Good catch, I've clarified it, https://github.com/libgit2/pygit2/actions/runs/12561450033 |
Hi,
I am cloning and analyzing a big bunch of repositories, and therefore stumble on rare edge cases.
In this repo: https://github.com/LSSTDESC/DeblenderVAE
One of the refs turned out to be non-unicode (a branch name, '0xc3master' ), and that causes even clone to fail.
I tracked the function being wrapped:
<function _update_tips_cb at 0x75b8ebb17420>
I found this, about refs names not being necessarily unicode:
https://stackoverflow.com/questions/69174955/what-character-encoding-is-used-in-git-symbolic-refs-especially-on-windows
No error if in .utils, I catch the error like this:
But of course there are other encodings than utf8 and latin1. What could be a generic solution? Does the output of
maybe_string
need to be a text string and not byte string?If not, I suggest:
which also clones without error. But it creates a case where the output is only in rare cases a byte string...