Documentation — Brancher Mini

Overview

How it works

brancher-mini reads your staged Git diff, feeds it to a local language model, and writes a Conventional Commit message for you — all without sending anything to the internet.

1

Stage your changes as usual with git add, or pass --all to let brancher-mini stage everything for you.

2

brancher-mini feeds the diff to a local LLM and generates a semantic commit message following the Conventional Commits specification.

3

You review the message and choose: commit, edit, regenerate, or exit.

4

The commit is made — or you exit and your staging area is restored to exactly how you left it.

Lock files and generated/build artifacts (source maps, minified assets, dist/ output) are detected automatically and committed with a static message — no LLM call needed for them.

Core concept

Two generation modes

Both modes analyse each file individually first, then diverge in how they group and present the results.

Compact

brancher-mini

Analyses each file individually, then squashes all per-file summaries into a single commit message covering all staged changes.

Best for: focused, single-purpose changes — a bug fix, a feature addition, a refactor.

Extended

--mode extended

Analyses each file individually, groups related files by semantic similarity, then generates one commit message per group and commits them separately.

Best for: large changesets that touch multiple concerns at once — e.g. a feature plus a docs update plus a dependency bump.

How large diffs are handled

Each file's diff goes through a three-tier pipeline before reaching the LLM:

Fits the context budget — one LLM call (the normal path).
Exceeds budget, multiple hunks — split on @@ boundaries into batches, each summarised separately, then merged.
Single oversized hunk (minified line, binary blob) — no LLM call; falls back to chore: update <filename>.

Privacy & performance

Local-first design

Every part of the inference pipeline runs on your machine. Your code, diffs, and commit messages are never sent anywhere.

⬇️

Resumable model downloads

Models are downloaded from Hugging Face on first use and cached in ~/.brancher-mini/models/. An interrupted download picks up where it left off. GGUF magic-byte verification catches corrupt downloads before they reach the model loader.

⚡

Load-on-exec mode

Loads the model fresh on each run, generates, then unloads. Ideal if you commit infrequently or want zero background processes.

🔄

Daemon mode

Keeps the model loaded in a background process so every subsequent commit is instant — no reload wait. The daemon starts automatically and is managed via brancher-mini daemon.

🖥️

GPU with CPU fallback

Inference runs with all layers on the GPU by default. If the process exits within 8 seconds (OOM), it automatically restarts in CPU-only mode.

No API keys required. After the first model download, brancher-mini works entirely offline. The only network activity is the optional anonymous telemetry sync (once per day, preview builds only).

Capabilities

Features

Conventional Commits — always generates properly typed messages: feat:, fix:, chore:, refactor: and more.
Two generation modes — compact (single message) and extended (per-group commits for multi-concern changesets).
Interactive workflow — accept, edit inline, regenerate, or exit without committing. Edit uses your existing message as the default.
Staging area safety — your staging area is always restored to its original state on cancel, ignore, or crash.
Lock file & generated file detection — package-lock.json, yarn.lock, source maps, minified assets, and dist/ output get static messages — no LLM call wasted.
Large diff handling — oversized diffs are chunked by hunk and summarised in batches; single-hunk monster files fall back gracefully.
Background daemon mode — keeps the model warm between commits for instant generation on subsequent runs.
GPU inference with automatic CPU fallback — uses all available GPU layers; restarts in CPU mode if VRAM is insufficient.
Debug logging — optionally captures every LLM prompt, output, and crash to dated log files in ~/.brancher-mini/logs/.

Reference

Usage

Generate a commit message

bash

brancher-mini                    # analyse staged changes
brancher-mini --all              # stage everything, then generate
brancher-mini --mode extended    # one message per logical change group
brancher-mini --message-only     # print message to stdout, skip prompts
brancher-mini --single-line      # title only, no body

Configuration

Run brancher-mini config to open the interactive setup wizard. Settings are saved to ~/.brancher-minirc.json.

Option	Values	Default
`model`	`gemma-4:e4b` — Gemma 3 4B `qwen2.5-coder:3b` — Qwen2.5-Coder 3B	Set at first run
`gitBehavior`	`staged-only` · `stage-all`	Set at first run
`executionMode`	`load-on-exec` · `always-loaded`	Set at first run
`telemetry`	`true` · `false`	`true` (stable releases only)

Daemon

Daemon mode keeps the model loaded between commits for instant generation. It starts automatically when executionMode is set to always-loaded.

bash

brancher-mini daemon status    # check if daemon is running
brancher-mini daemon stop      # stop the daemon

Debug logging

When enabled, every commit run appends a structured log to ~/.brancher-mini/logs/YYYY-MM-DD.log containing the full LLM prompt, output, and any crash details. Crash logs are always written regardless of the debug flag.

bash

brancher-mini debug enable     # turn on verbose logging
brancher-mini debug disable    # turn off verbose logging
brancher-mini debug status     # show log file count and size
brancher-mini debug clear      # delete all log files

Under the hood

Tech stack

brancher-mini is a single self-contained binary with no runtime dependencies for end users.

Runtime

Bun + TypeScript — compiled to a self-contained native binary (~60–80 MB). No Node or Bun required on the end-user's machine.

LLM inference

llama-server (llama.cpp's built-in HTTP server) — exposes an OpenAI-compatible API on localhost. Bundled in the release tarball.

Models

GGUF Q4_K_M quantised models from Hugging Face. Downloaded on first use, cached locally. Verified with GGUF magic-byte check after download.

CLI framework

Commander.js for subcommands and flags. Inquirer (@inquirer/select, @inquirer/input) for interactive prompts.

The binary and llama-server are packaged together in a platform-specific .tar.gz and installed by install.sh (Linux/macOS) or install.ps1 (Windows). No package manager, no runtime, no PATH gymnastics beyond a single symlink.