Answers to Which characters are illegal within a git remote name? ( 2 )

  1. 2017-01-04 12:01

    I didn't find anything in the documentation, either. So let's take a look at the source.

    When you try to add a remote with an invalid name or rename a remote to an invalid name, you'll get an error message like

    fatal: 'foo@{bar' is not a valid remote name

    So let's search the Git source for that.

    We see that Git goes about this a bit backwards: It tests (here for adding, here for renaming (mv)) whether refs/heads/test:refs/remotes/<the remote name>/test is a valid fetch reference, as determined by valid_fetch_refspec(<the ref name>), which in turn calls parse_refspec_internal(...).

    The latter does many checks that will most of the time pass anyway due to the majority of the input being given in our case, but it will also call check_refname_format(...) on the right-hand side (i.e. the refs/remotes/<the remote name>/test part if the splitting at : went alright).

    I guess this means that the characters and character sequences disallowed for branches and tags are also forbidden for remote short names.

  2. 2017-01-04 23:01

    As you and das-g have noted, it's not documented anywhere. However, remote names normally become embedded within remote-tracking branch names, so all of the constraints enforced by git check-ref-format will generally apply.

    There is something relatively minor that is missing from the check-ref-format code, though. A branch name that consists solely of letters from the set [a-f], possibly mixed with digits, sometimes becomes ambiguous! For instance, the word fade is a perfectly good English-language verb. However, it's also a valid hexadecimal number ... and all Git objects have a 160-bit SHA-1 hash, usually expressed as a 40-character hexadecimal number like e05806da9ec4aff8adfed142ab2a2b3b02e33c8c.

    There is also an "abbreviation length", set via core.abbrev, which defaults to 7. This is documented in git config, but what remains undocumented is that there is also a minimum abbreviation length that is currently compiled-in at 4 and is not configurable (it's not clear why the source uses a variable when the value is not settable). You may use an abbreviated form of a hash, as long as it unambiguously selects one particular Git object, so if master is that big ugly 40-character hash above, you may express it as just e05806d, and Git often will:

    $ git reflog
    e05806d HEAD@{0}: checkout: moving from ...

    Let's note that the word cab, meaning taxi or taxicab (or an ancient Hebrew measure), also consists solely of letters from the "hexadecimal number" space (so it could be the number 3243, expressed in hexadecimal, just as faded might mean 1027565). However, since cab is below the minimum abbreviation length of 4, it will not be treated as a hexadecimal number, so it makes a good branch name, or remote name. But faded exceeds the minimum abbreviation length.

    Should Git happen to create an object whose full 160-bit hash begins with the sequence faded, the name faded will become ambiguous!

    Git actually starts trying to interpret object IDs at the minimum value (4), but as it happens, there are two e058 commits1 in the Git repository for Git now:

    $ git show e058
    error: short SHA1 e058 is ambiguous.
    error: short SHA1 e058 is ambiguous.
    fatal: ambiguous argument 'e058': unknown revision or path not in the working tree.
    Use '--' to separate paths from revisions, like this:
    'git <command> [<revision>...] -- [<file>...]'
    $ git rev-list --all | grep '^e058'

    What this means is that a perfectly good word like fade may be a valid branch or remote name for some time (because there are no commits whose ID begins with fade...). Once there is such a commit, while the name is still a valid branch name in format, some Git commands could perhaps treat it as an object ID. We can test this by creating a branch whose name is the same as this commit-specifier e0580:

    $ git branch e0580 e0580^
    $ git show e0580
    warning: refname 'e0580' is ambiguous.
    commit af09003b2897db76cefdb08ab363ed68f2bb295b
    Merge: 58fcd54 b22d748
    $ git branch -d e0580
    Deleted branch e0580 (was af09003).

    (af09003 is the commit just before e0580, abbreviated to core.abbrev length). Testing shows that branch names are normally preferred here, which is good, since these things don't become ambiguous until there's at least one matching commit.

    But, this brings us to the one missing check in git check-ref-format. It probably should warn if the name could become ambiguous, and it definitely should warn or fail if the name (a) could become ambiguous and (b) is exactly 40 characters long. The reason is that if the name is exactly 40 characters long and can be interpreted as a SHA-1 hash value, it is interpreted as a SHA-1 hash.

    In sha1_name.c, around line 580, Git contains a message, and bit of code that prints it out if it encounters one of these 40-character-long reference names. Curiously, there is no corresponding test in refs.c. It seems like there should be.

    1In fact, there are five e058 objects, but two of them are blobs and one is a tree:

    $ git rev-list --all --objects | grep '^e058'
    e058d184d1c072bd3078fe17ad41f1026f093201 t/
    e05827cba5488fb0c45e7055194071e1fda0df13 Documentation
    $ git cat-file -t e058ad2324a89ad5e10a80acf947253eac6c41e1

    Git's revision parsing code receives "disambiguation information" from the caller, so that if a command, such as git log, tends to prefer commits, Git can pick out only those objects that are commits, skipping over potentially-ambiguous non-commit objects.)

Leave a reply to - Which characters are illegal within a git remote name?

◀ Go back