Git Internals: Understanding How Git Actually Works

Deep dive into Git's internal architecture - learn how Git stores data, manages commits, and builds your project's history using blobs, trees, and commits.

December 1, 2024
Git Folder Internals - Visual representation of Git's internal structure

Why Learn Git Internals?

We use Git daily to commit, branch, and push code — but very few know how Git actually stores data inside that hidden .git folder.

Git isn't just saving "versions of files." It builds a mini database of your project — made of small building blocks (objects) linked together by cryptographic hashes.

By exploring these building blocks, you'll understand exactly what happens when you run git add, git commit, or git checkout.


Creating a Git Repository

Start with a clean folder and initialize Git:

mkdir git-internal
cd git-internal
git init

You’ll see:

Initialized empty Git repository in .git/

If you open .git:

ls .git

Output:

HEAD  config  hooks  info  objects  refs

What this means:

  • .git/ → the brain of your project. Everything Git does lives here.
  • objects/ → actual data storage (commits, files, etc.).
  • refs/ → pointers (branches, tags).
  • HEAD → tells Git which branch or commit you’re currently on.

At this point, your project has no commits — it’s just an empty database ready to store snapshots.


Adding Your First File

Create a file and add it:

echo "hello" > a.txt
git add a.txt

Let's look inside the .git/objects folder:


Now Git has tracked your file — but not committed it yet.

Let’s look inside the `.git/objects` folder:

```bash
ls .git/objects

You’ll find something like:

ce  info  pack

Inside ce/:

013625030ba8dba906f756967f9e9ca394464a

This long hash is not random — it's a SHA-1 checksum of your file's content. Git uses this hash as the filename for storing your file safely and uniquely.

Note:

Objects are stored in subdirectories named by the first two characters of their SHA-1 hash. This prevents any single directory from having too many files.


What's Stored Inside Objects?

Let’s peek at it:

git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a

Output:

hello

That’s the exact content of your file — but Git calls this a blob (Binary Large OBject).

What is a Blob?

A blob stores only file data. It has:

  • ✅ The file's content
  • ❌ No filename
  • ❌ No directory information
  • ❌ No history

Git stores every file version like this — content only. The structure (file name, directory) comes later. All objects are compressed using zlib to save space.


Taking Your First Snapshot (Tree)

Next, tell Git to record what’s in the staging area:

git write-tree

Note:

git write-tree is a plumbing command (low-level). In normal workflows, you use git commit which creates both the tree and commit objects automatically.

Output:

2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1

This created a tree object — think of it as a folder structure that knows:

  • each file name,
  • the blob it points to,
  • and its permissions.

Inspect it:

git cat-file -p 2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1

Output:

100644 blob ce013625030ba8dba906f756967f9e9ca394464a    a.txt

Understanding the Tree Entry

  • 100644 → File permission (regular file, readable/writable)
  • blob → Object type
  • ce0136... → Hash of the blob (file content)
  • a.txt → Filename

The tree connects file names to their blobs — this is how Git rebuilds your working directory.


Creating Your First Commit

Now we’ll permanently save this tree as a snapshot:

git commit -m "first commit"

View the internal commit data:

git cat-file -p HEAD

Output:

tree 2e81171448eb9f2ee3821e3d447aa6b2fe3ddba1
author Suraj Vishwakarma <dev.surajv@gmail.com> 1733961600 +0530
committer Suraj Vishwakarma <dev.surajv@gmail.com> 1733961600 +0530

first commit

Anatomy of a Commit Object

A commit contains:

  1. tree → Points to the tree object (your folder state)
  2. author → Who created the changes (with timestamp)
  3. committer → Who committed the changes (with timestamp)
  4. message → Commit description

The commit does not store file content directly — only references to the tree and blobs. This makes Git extremely efficient.


Adding Another File (New Snapshot)

Add another file:

echo "new file" > b.txt
git add b.txt
git write-tree

Output:

1424e6f9aa2ead19d4238516d37f5d40692cb0ce

Check what’s inside:

git cat-file -p 1424e6f9aa2ead19d4238516d37f5d40692cb0ce

Output:

100644 blob ce013625030ba8dba906f756967f9e9ca394464a    a.txt
100644 blob fa49b077972391ad58037050f2a75f74e3671e92    b.txt

Notice that a.txt still points to the same blob (ce0136...) because its content didn't change. Git reuses objects efficiently!

Now commit again:


Now commit again:

```bash
git commit -m "add b.txt"

View the new commit:

git cat-file -p HEAD

Output:

tree 1424e6f9aa2ead19d4238516d37f5d40692cb0ce
parent 832f7f14575e2241d0d77c6bc80631f2f1f11cf5
author Suraj Vishwakarma <dev.surajv@gmail.com> 1733961660 +0530
committer Suraj Vishwakarma <dev.surajv@gmail.com> 1733961660 +0530

add b.txt

Note:

This commit includes a parent field — linking it back to the previous commit. That's how Git builds your history. Each commit points backward, forming a chain called the commit graph.


Visualizing the Object Graph

HEAD
refs/heads/main
Commit 2662847 ("add b.txt")
 ├── parent → 832f7f1 ("first commit")
 │            ├── parent → (none, initial commit)
 │            └── tree 2e8117...
 │                 └── blob ce0136... (a.txt)
 └── tree 1424e6f9...
      ├── blob ce0136... (a.txt) ← reused!
      └── blob fa49b0... (b.txt) ← new!

The Logic

  • Blobs = Raw file contents (immutable)
  • Trees = Directory structure (maps names → blobs/subtrees)
  • Commits = Snapshots (point to a tree + optional parent(s))
  • HEAD = Points to the current branch (and therefore the latest commit)

Git never overwrites or deletes objects — it only adds new ones and links them.


Understanding Git Objects: Complete Reference

Object TypePurposeContainsExample Hash
BlobStores file contentRaw file data (zlib compressed)ce0136...
TreeStores directory structureList of blobs/trees + names1424e6...
CommitSnapshot with metadataTree + parent + author + msg2662847...
HEADPointer to current branch/commitBranch ref or commit hashrefs/heads/main