Normally, git-annex repositories consist of symlinks that are checked into
git, and in turn point at the content of large files that is stored in
.git/annex/objects/
. Direct mode gets rid of the symlinks.
The advantage of direct mode is that you can access files directly, including modifying them. The disadvantage is that most regular git commands cannot safely be used, and only a subset of git-annex commands can be used.
Normally, git-annex repositories start off in indirect mode. With some exceptions:
- Repositories created by the assistant use direct mode by default.
- Repositories on FAT and other less than stellar filesystems that don't support things like symlinks will be automatically put into direct mode.
- Windows always uses direct mode.
enabling (and disabling) direct mode
Any repository can be converted to use direct mode at any time, and if you decide not to use it, you can convert back to indirect mode just as easily. Also, you can have one clone of a repository using direct mode, and another using indirect mode; direct mode interoperates.
To start using direct mode:
git annex direct
To stop using direct mode:
git annex indirect
safety of using direct mode
With direct mode, you're operating without large swathes of git-annex's carefully constructed safety net, which ensures that past versions of files are preserved and can be accessed. With direct mode, any file can be edited directly, or deleted at any time, and there's no guarantee that the old version is backed up somewhere else.
So if you care about preserving the history of files, you're strongly encouraged to tell git-annex that your direct mode repository cannot be trusted to retain the content of a file. To do so:
git annex untrust .
On the other hand, if you only care about the current versions of files, and are using git-annex with direct mode to keep files synchronised between computers, and manage your files, this should not be a concern for you.
use a direct mode repository
You can use most git-annex commands as usual in a direct mode repository. A very few commands don't work in direct mode, and will refuse to do anything.
Direct mode also works well with the git-annex assistant.
The most important command to use in a direct mode repository is git annex
sync
. This will commit any files you have run git annex add
on, as well
as files that were added earlier and have been modified. It will push
the changes to other repositories for git annex sync
there to pick up,
and will pull and merge any changes made on other repositories into the
local repository.
While you generally will just use git annex sync
, if you want to,
you can use git commit --staged
, or plain git commit
.
But not git commit -a
, or git commit <file>
..
that'd commit whole large files into git!
what doesn't work in direct mode
git annex status
shows incomplete information. A few other commands,
like git annex unlock
don't make sense in direct mode and will refuse to
run.
As for git commands, you can probably use some git working tree
manipulation commands, like git checkout
and git revert
in useful
ways... But beware, these commands can replace files that are present in
your repository with broken symlinks. If that file was the only copy you
had of something, it'll be lost.
This is one more reason it's wise to make git-annex untrust your direct mode repositories. Still, you can lose data using these sort of git commands, so use extreme caution.
All git commands that do not change files in the work tee (and do not stage files from the work tree), are safe. I don't have a complete list; it includes
git log
,git show
,git diff
,git commit
(but not -a or with a file as a parameter),git branch
,git fetch
,git push
,git grep
,git status
,git tag
,git mv
(this one is somewhat surprising, but I've tested it and it's ok)git commands that change files in the work tree will replace your data with dangling symlinks. This includes things like
git revert
,git checkout
,git merge
,git pull
,git reset
git commands that stage files from the work tree will commit your data to git directly. This includes
git add
,git commit -a
, andgit commit file
So, if I edit a "content file" (change a music file's metadata, say), what's the workflow to record that fact and then synchronise it to other repositories?
I can't do a
git add
, so I don't understand what has to happen as a first step. (Thanks for your quick reply above, BTW.)What happens to the object database (
.git/annex/objects
) when going to direct mode? Are the objects deleted, moved to another location, kept?If the objects are kept, does it means that the file on the repository in direct mode is duplicated in the object database? If so, would it be relevant to use
cp --reflink=auto
to populate the working directory to enable copy on write on filesystems that supports it?.git/annex/objects
does not typically contain any file contents in direct mode. The file contents are stored directly in the working tree.Would it be safe to add largefiles to gitignore in direct mode?
Can git-annex still track large files ignored by git?
Thanks. :-)
asbraithwaite: No, as far as I know it can not.
I'd like to have an indirect mode repo on my laptop cloned on a cifs mount point (mounted off an SMB NAS) thus in direct mode. But all I can see on the clone after merge/pull is text files of length 207 chars containg the symlink in plain text.
I guess this is what git manages internally for the symlinks... so I'm afraid git annex doesn't work in such case.
Can you confirm that indirect and direct modes can coexist on clones of the same repo ?
Re-reading @joey's reponse above, I see that merge/pull don't seem to be safe and will create dangling symlinks. That corresponds to those files I can see on cifs, I guess.
But then, how can a direct repo sync with changes made in other remotes, if there no pull/fetch available.
Can it then be only the source of changes which will propagate to indirect remotes ?
I too have issues with mixing direct and indirect mode repositories.
I have a regular, existing repository with ebooks, shared between various clones on proper :) filesystems; now I would need a copy of some of them on an ereader which only offers a FAT filesystem, so it has to be direct mode.
I get a directory full of small files, the way git manages links on FAT.
This detects the fact that it is working on a crippled filesystem, enables direct mode and disables ssh connection caching; up to now everything seems to be fine, but then
seems to work, downloads the file somewhere, but when I try to open $SOME_BOOK it is still the fake link, and the file has been downloaded in its destination, as if the repo wasn't in direct mode.
I use version 4.20130723 on debian jessie
There should be no obstacles to using direct mode on one clone of a git repository, and indirect mode on another clone. The data stored in git for either mode is identical, and I do this myself for some repositories.
@valhalla, you probably need to run
git annex fsck
, and if that does not solve your problem, you need to file a bug report.@obergix asked:
The answer is simple: By running
git annex sync
, which handles all that.Thanks for these details @joeyh. But AFAIU, one needs to proceed to the git annex copy before doing the git annex sync, otherwise, symlinks (or files containing the symlink path on SMB) will be created, instead of the plain "direct" files that are expected.
I'm still not sure whether the git annex sync needs to be issued on either of the indirect or direct remotes first, or both, then in which sequence. I think a "walkthrough" script would help.