Git

Setting the stage

Git repos are conceptually made up of three areas:

Your working directory, also called the tree
The stage, also called the index or cache
The most recent commit, called HEAD

Using Git is essentially the art of crafting beautiful stages.

Once you make a stage which captures something useful, you publish it or save it within the Git repo. When you do this, the published or saved stage is called a commit. Commits are just published stages; they are frozen or immutable stages.

The primary command that you use to manipulate the stage is the poorly named git add. You can also use git stage, which means the same thing and makes more sense, but is not considered a conventional name. The verb "add" means "add to stage", though you can use git add to modify the stage in other ways.

Stage commands

add and rm

Tree to stage: git add filename
Remove from stage: git rm --cached filename (before a commit) or git reset filename
NOTE: There really ought to be a git command that calls the appropriate command.
Add fake entry to stage: git add -N filename

Doing git add filename doesn't add a filename to the stage: it adds the whole of filename—its path, metadata, and contents—to the stage. It's like doing a standard cp of the file to some invisible magic directory—which is what the stage basically is. (Note: Git does this by actually adding the file to its database of everything it knows about, and then linking to it from the stage database.) It can garbage collect the data, and git gc will garbage collect after two weeks by default, if it isn't actually used in any commit.

git add . (where . can be any directory) is recursive.

git rm --cached . is not recursive and must be given an extra -r flag to be made so.

Doing git add . will only sync added and modified files to the stage. To sync modified and deleted files, use git add -u. To sync added, modified, and deleted files, use git add -A. There's more about this on Stack Overflow. (Check: Where is the stage change detected? Between HEAD and tree?) If you don't pass any path argument, . is assumed.

Since you can do {add, mod} with ., {mod, del} with -u, and {add, mod, del} with -A, this means that you can't do {add}, {mod}, {del}, or {add, del}. There's a formula for doing {mod}:

git diff --name-only --diff-filter=M | xargs git add

And another for doing {del}:

git diff --name-only --diff-filter=D | xargs git rm --

There is a filter character, A, for added files, but not one for untracked files. In other words, A seems to mean the status of the file when it has already been added to the cache: added refers to the change from HEAD to stage. In fact, diff does not seem to be able to display untracked files at all, though there is an unknown (X) filter option.

You can, though, apparently use git ls-files to achieve {add}:

git add $(git ls-files -o --exclude-standard)

It's not obvious which of these {add}, {mod}, and {del} solutions are recursive.

The -d flag to git ls-files can probably be used for {del}, but -m appears to show both modified and deleted files, i.e. counting deleted files as having been modified. In summary:

git add -- $(git ls-files -o --exclude-standard) # {add}
git add -- $(git diff --name-only --diff-filter=M) # {mod}
git rm -- $(git ls-files -d --exclude-standard) # {del}

And since you can use -A along with path names as well as directories:

git add -A -- $(git ls-files -od --exclude-standard) # {add, del}

You can also use -a during a commit, git commit -a, to do the equivalent of git add -u in the root of the repo (cf. explanation) before a commit. (Note: It seems that it actually does git add -A when making the first commit.) In other words, this syncs the states of any modified or deleted files to the stage, regardless of what subdirectory you might be in.

git reset completely erases everything in the stage. Or, more accurately, it populates it with HEAD, so that the diff which is computed between stage and HEAD is zero, meaning that there won't be anything to commit. So really, the default stage is a mutable copy of HEAD; and the next commit will actually be a diff between stage and HEAD. (Note: Strictly speaking, this isn't true either. In fact, entries will be contained in the index but stat information won't be. Then again, that's probably because stat information isn't contained in the commit either. Note that mode information is populated before any files are synced into the stash.)

ls

Doing git status will show you what is in the stage currently, though it might not always reflect exactly what will be added to a commit. Doing git add --interactive and then selecting "1" for [s]tatus will show you more information. There doesn't appear to be a way to access the information without using the interactive mode.

It would be nice to have a diff that shows you what changes were made in the stage compared to HEAD, and what changes were made in the local tree.

The stage is actually stored on disk as a custom database in .git/index. You can view which files are included in it by using git ls-files, and using git ls-files -s --debug will give you more detailed information. You can use the external gin script to get all of the information from the stage database.

diff

Diff stage to tree: git diff
Diff HEAD to stage: git diff --cached

If you want to see HEAD-to-stage and stage-to-tree changes side by side, there's a tool called diffuse which can do that. Nothing built in though, apparently.

There is a synonym of --cached called --staged.

Committing to the system

When making commits, sometimes you bundle a lot of changes together and generally don't make nice stages. The best way to fix this is to put more thought into it in the first place, but you know how it is sometimes. So another way to fix it is to change the commit history, breaking up large commits into smaller, atomic commits. Unfortunately this tends to break history, and that means bad-news-bears for anybody who relies on your commit history to be consistent.

Another possible approach is to make a branch. Simply branch off before the mega-commit, separate the mega-commits into into atomic chunks and apply them one by one in the new branch. Then merge the branch back into a commit which has already been committed. This approach probably won't work since the merge probably will require new hashes and so on.