Git

Internals

Repo

A git repository is an on-disk data structure which stores metadata for a set of files and directories.

It lives in your project’s .git/ folder. Every time you commit data to git, it gets stored here. Inversely, .git/ contains every single commit.

It’s basic structure is like this:

.git/
    objects/
    refs/

Objects

git is fundamentally a key-value store. When you add data to git, it builds an object and uses the SHA-1 hash of the object’s contents as a key.

Therefore, any content in git can be looked up by it’s hash:

git cat-file -p 4bb6f98

There are 4 types of Object:

  • blob
  • tree
  • commit
  • tag

HEAD ref

HEAD is a special ref. It always points to the current object.

You can see where it’s currently pointing by checking the .git/HEAD file.

Normally, HEAD points to another ref:

$cat .git/HEAD
ref: refs/heads/mainline

But it can also point directly to an object:

$ cat .git/HEAD
4bb6f98a223abc9345a0cef9200562333

This is what’s known as a “detached head” - because HEAD is not attached to (pointing at) any ref, but rather points directly to an object.

Refs

A ref is essentially a pointer. It’s a name that points to an object. For example,

"master" --> 1a410e...

They are stored in `.git/refs/heads/ in plain text files.

$ cat .git/refs/heads/mainline
4bb6f98a223abc9345a0cef9200562333

This is commonly what are called branches. However, you’ll note that in git there is no such thing as a branch - only a ref.

Now, it’s possible to navigate git purely by jumping around to different objects directly by their hashes. But this would be terribly inconvenient. A ref gives you a convenient name to refer to objects by. It’s much easier to ask git to go to a specific place by name rather than by hash.

Commit Object

A commit is probably the object type most familiar to git users, as it’s what they are used to creating with the git commit commands.

However, the commit does not directly contain any changed files or data. Rather, it contains mostly metadata and pointers to other objects which contain the actual contents of the commit.

A commit contains a few things:

  • hash of a tree
  • hash of a parent commit
  • author name/email, commiter name/email
  • commit message

You can see the contents of any commit like this:

$ git cat-file commit 5bac93
tree 04d1daef...
parent b7850ef5...
author Geddy Lee <glee@rush.com>
commiter Neil Peart <npeart@rush.com>

First commit!

Tree

A very important note is that the tree objects stores EVERY file in your project, and it stores whole files not diffs. This means that each commit contains a snapshot of the entire project*.

*Technically, only changed files are stored. But this is more an implementation detail for efficiency. From a design perspective, a commit should be considered as containing a complete copy of the project.

Parent

The parent line contains a hash of another commit object, and can be thought of as a “parent pointer” that points to the “previous commit”. This implicitly forms a graph of commits known as the commit graph. Specifically, it’s a directed acyclic graph (or DAG).

Tree Object

A tree basically represents a folder in a traditional filesystem: nested containers for files or other folders.

A tree contains:

  • 0 or more blob objects
  • 0 or more tree objects

Just as you can use ls or dir to list the contents of a folder, you can list the contents of a tree object.

$ git cat-file -p 07b1a631
100644 blob b91bba1b   .gitignore
100644 blob cc0956f1   Makefile
040000 tree 92e1ca7e   src
...

You can look up the files in a commit by first finding the hash of the tree in the commit, and then looking at that tree:

$ git cat-file commit 4bb6f93a
tree 07b1a631
parent ...
author ...
commiter ... 
 
$ git cat-file -p 07b1a631
100644 blob b91bba1b   .gitignore
100644 blob cc0956f1   Makefile
040000 tree 92e1ca7e   src
...

Blob Object

A blob contains arbitrary binary file contents. Commonly, it will be raw text such as source code or a blog article. But it could just as easily be the bytes of a PNG file or anything else.

If you have the hash of a blob, you can look at it’s contents.

$ git cat-file -p d429810
package com.example.project

class Foo {
 ...
}
...

For example, you can browse a tree as above, and then look at one of the blobs in it.

$ git cat-file -p 07b1a631
100644 blob b91bba1b   .gitignore
100644 blob cc0956f1   Makefile
040000 tree 92e1ca7e   src
100644 blob cae391ff   Readme.txt

$ git cat-file -p cae391ff
Welcome to my project! This is the readmefile
...

Creating new Commits

The git commit command does a few things:

  1. Create blobs and trees to represent your project directory - stored in .git/objects
  2. Creates a new commit object with your author information, commit message, and the root tree from step 1 - also stored in .git/objects
  3. Updates the HEAD ref in .git/HEAD to the hash of the newly-created commit

This results in a new snapshot of your project being added to git that is connected to the previous state.

Moving HEAD

When you run git checkout on a commit (specified by hash or ref) you’re telling git to make your working directory look like how it did when the snapshot was taken.

  1. Update the files in the working directory to match the tree inside the commit
  2. Update HEAD to point to the specified hash or ref

Moving refs around

Running git reset --hard moves refs to the specified hash/ref.

Moving MyBranch to b8dc53:

$ git checkout MyBranch      # moves HEAD to MyBranch
$ git reset --hard b8dc53    # makes MyBranch point to b8dc53     

Creating new Refs

Running git checkout -b <refname> will create a new ref that points to the current commit.

$ cat .git/head
1f324a

$ git checkout -b TestBranch

$ cat .git/refs/heads/TestBranch
1f324a

This modified text is an extract of the original Stack Overflow Documentation created by the contributors and released under CC BY-SA 3.0 This website is not affiliated with Stack Overflow