If you ever needed to check whether a set of files changed — to invalidate a cache, skip redundant builds, or trigger incremental CI — fast-fs-hash is for you.
It hashes hundreds of files in milliseconds using xxHash3-128 via a native C++ addon with SIMD acceleration.
xxHash3 is a non-cryptographic hash function — it is not suitable for security purposes, but it is more than enough for cache invalidation, deduplication, and change detection, which is what this library is designed for.
Zero external dependencies. Requires Node.js >= 22.
npm install fast-fs-hash
Requires Node.js >= 22.
The native addon is prebuilt for common platforms via platform-specific optional
dependencies. When you run npm install, npm automatically installs only the
package matching your current OS and architecture.
Supported platforms: macOS, Linux (glibc & musl), Windows, FreeBSD — both x64 and arm64.
On x64, optimized variants for AVX2 and AVX-512 are included and selected automatically at load time via native CPUID detection. Set FAST_FS_HASH_ISA=avx2|avx512|baseline to override.
CI note: Some CI configurations disable optional dependencies by default
(e.g. npm install --no-optional or --omit=optional). To get the native addon
in CI, either allow optional dependencies or install the platform package explicitly:
npm install @fast-fs-hash/fast-fs-hash-node-linux-x64-gnu
FileHashCache reads, validates, and writes a compact binary cache file that tracks per-file
stat metadata (inode, mtime, ctime, size) and content hashes (xxHash3-128).
On the next run it re-stats every tracked file and compares — files whose stat matches are skipped entirely (no re-read), giving near-instant validation for large file sets.
Build systems, code generators, and CI pipelines often produce output that depends on many input files. Recomputing that output on every run is expensive — even when nothing changed.
FileHashCache solves this by persisting a fingerprint of all input files between runs.
On the next invocation, it checks whether any input changed in sub-millisecond time
(stat-only, no re-reading). If nothing changed, you skip the expensive step entirely.
Common use cases:
The cache file also supports user data — opaque binary payloads stored alongside the
file hashes. This lets you embed build output manifests, dependency graphs, or configuration
snapshots directly in the cache, so a single open() tells you both "did anything change?"
and "what was the previous result?" — no separate metadata files needed.
Hashing is fast, but reading thousands of files from disk is not. FileHashCache avoids
re-reading files that haven't changed by comparing stat() metadata first. Only files with
changed stat are re-hashed. This makes cache validation O(n × stat) instead of
O(n × read + hash) — typically 10-100× faster for warm caches.
Native (C++ addon):
| Scenario | Mean | Hz | Files/s | Throughput |
|---|---|---|---|---|
| no change | 0.6 ms (624.5 µs) | 1 601 op/s | 1 128 991 files/s | — |
| 1 file changed | 0.9 ms (922.7 µs) | 1 084 op/s | 764 054 files/s | — |
| many files changed | 2.5 ms (2 517.0 µs) | 397 op/s | 280 094 files/s | 9.8 GB/s |
| no existing cache | 8.0 ms (8 015.1 µs) | 125 op/s | 87 959 files/s | 3.1 GB/s |
| overwrite | 7.9 ms (7 915.8 µs) | 126 op/s | 89 062 files/s | 3.1 GB/s |
Node.js v22.22.2, Vitest 4.x — Apple M4 Max, macOS 25.4.0 (arm64), with anti-virus.
Results vary by hardware, file sizes, and OS cache state.
A long-lived cache that tracks file content hashes with exclusive OS-level locking.
Create the instance once, then call open() on each build cycle. Configuration
(files, version, fingerprint) is set via the constructor, setters, or configure().
The typical usage: the file list is only known after a build step. Open without files
(reuses the list from the previous cache on disk), then set the new file list before writing.
Use compressedPayloads (LZ4-compressed inside the cache body) or
uncompressedPayloads (stored raw, readable without decompression) to store
arbitrary build metadata alongside the cache.
import { FileHashCache } from "fast-fs-hash";
const cache = new FileHashCache({
cachePath: ".cache/build.fsh",
rootPath: ".",
version: 1,
});
export async function build() {
using session = await cache.open();
if (session.status === "upToDate" && session.compressedPayloads.length > 0) {
return JSON.parse(session.compressedPayloads[0].toString()); // cached result
}
const result = await runBuild();
cache.configure({ files: result.getSourceFiles().map((f) => f.fileName) });
await session.write({
compressedPayloads: [Buffer.from(JSON.stringify(result.output))],
});
return result.output;
}
When the file list is known upfront, pass it to the constructor:
import { FileHashCache } from "fast-fs-hash";
import { globSync } from "node:fs";
const cache = new FileHashCache({
cachePath: ".cache/build.fsh",
rootPath: ".",
files: globSync("src/**/*.ts"),
version: 1,
});
using session = await cache.open();
if (session.status === "upToDate") {
console.log("Build cache is fresh — skipping.");
} else {
console.log("Files changed — rebuilding...");
await runBuild();
await session.write();
}
Constructor: new FileHashCache({ cachePath, files?, rootPath?, version?, fingerprint?, lockTimeoutMs? })
Cache configuration (mutable between opens):
configure(opts) — set multiple config fields at once: files, rootPath, version, fingerprint, lockTimeoutMscache.files, cache.rootPath, cache.version, cache.fingerprint, cache.lockTimeoutMsneedsOpen — true when config changed since last open, or cache was never openedCache methods:
open(signal?) — acquires an exclusive lock, reads from disk, validates version/fingerprint, stat-matches entries. Returns a FileHashCacheSession.overwrite(options?) — writes a brand-new cache without reading the old one. Options: payloadValue0..3, compressedPayloads, uncompressedPayloads, signal, lockTimeoutMs.invalidate(paths) / invalidateAll() — mark files as dirty for the next open (watch mode).isLocked() / waitUnlocked(timeout?, signal?) — check or wait for lock.checkCacheFile() — sync stat check if the cache file on disk changed since last open.Session properties (read-only, from disk):
status — 'upToDate' | 'changed' | 'stale' | 'missing' | 'statsDirty' | 'lockFailed'needsWrite — true if the session holds the lock and the status indicates changesconfigChanged — true if cache config was modified since this session was openedwouldNeedWrite — true if either files changed on disk or config changedbusy / disposed — async operation statefiles, fileCount, version, rootPathpayloadValue0..3 — four f64 numeric values read from diskcompressedPayloads — array of LZ4-compressed binary Buffer payloads read from diskuncompressedPayloads — array of raw binary Buffer payloads readable without LZ4 decompressionSession methods:
write(options?) — hashes unresolved entries, compresses, writes to disk, releases lock. Can only be called once. Options: payloadValue0..3, compressedPayloads, uncompressedPayloads, signal.resolve(signal?) — completes stat + hash for ALL files, returns FileHashCacheEntries. Can be called before write(). See below.close() — releases the lock. Also called automatically by using.Static methods:
FileHashCache.isLocked(cachePath) — check if locked by another processFileHashCache.waitUnlocked(cachePath, lockTimeoutMs?, signal?) — wait for unlockLock behavior:
flock(2) (POSIX) / LockFileEx (Windows)worker_threads in the same process serialize correctly against each otherlockTimeoutMs: -1 = block forever (default), 0 = non-blocking, >0 = timeout msstatus === 'lockFailed'. Calling write() falls back to overwrite().AbortSignal on open(), overwrite(), and waitUnlocked()resolve()After open(), the session knows the aggregate status but not which specific files
changed. Call resolve() to complete stat + hash for every file and get per-file metadata.
Note: resolve() stats and hashes every unresolved file on the thread pool. This has
a cost proportional to the number of changed files. Use it only when you need per-file
information — for simple "changed → rebuild all" workflows, just check session.status.
using session = await cache.open();
if (session.status !== "upToDate") {
const entries = await session.resolve();
for (const entry of entries) {
if (entry.changed) {
console.log(
`Changed: ${entry.path} (${entry.size} bytes, hash: ${entry.contentHashHex})`,
);
}
}
await session.write();
}
Each FileHashCacheEntry provides:
path — absolute file pathsize — file size in bytesmtimeMs / ctimeMs — modification / change time in mschanged — true if content differs from the cached version (or is a new file)contentHash — 16-byte xxHash3-128 as a Buffer (zero-copy view)contentHashHex — 32-char hex string (lazy, computed on first access)FileHashCacheEntries supports get(index), find(path), and iteration.
The result is cached — subsequent calls to resolve() return the same snapshot.
When you don't need a persistent cache file — or you want raw xxHash3-128 digests to
compare yourself — use the digest functions directly. FileHashCache uses them under the
hood, but they are fully usable on their own.
large file (~197.3 KB):
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native | 0.04 ms (40.8 µs) | 24 489 op/s | 4.8 GB/s | 6.8× faster |
| Node.js crypto (md5) | 0.3 ms (276.7 µs) | 3 614 op/s | 713 MB/s | baseline |
medium file (~49.9 KB):
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native | 0.03 ms (25.9 µs) | 38 653 op/s | 1.9 GB/s | 4.3× faster |
| Node.js crypto (md5) | 0.1 ms (111.4 µs) | 8 977 op/s | 448 MB/s | baseline |
small file (~1.0 KB):
| Scenario | Mean | Hz | Relative |
|---|---|---|---|
| Node.js crypto (md5) | 0.06 ms (56.0 µs) | 17 871 op/s | 266.1× faster |
| native | 14.9 ms (14 891.2 µs) | 67 op/s | baseline |
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native | 7.9 ms (7 948.8 µs) | 126 op/s | 3.1 GB/s | 4.4× faster |
| Node.js crypto (md5) | 34.7 ms (34 657.6 µs) | 29 op/s | 713 MB/s | baseline |
64 KB buffer:
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native XXH3-128 | 0.001 ms (1.4 µs) | 704 869 op/s | 46.2 GB/s | 49.3× faster |
| Node.js crypto md5 | 0.07 ms (70.0 µs) | 14 288 op/s | 936 MB/s | baseline |
1 MB buffer:
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native XXH3-128 | 0.02 ms (22.0 µs) | 45 515 op/s | 47.7 GB/s | 49.1× faster |
| Node.js crypto md5 | 1.1 ms (1 078.2 µs) | 927 op/s | 973 MB/s | baseline |
import { digestFilesParallel, hashToHex } from "fast-fs-hash";
const digest = await digestFilesParallel([
"package.json",
"src/index.ts",
"src/utils.ts",
]);
console.log("Aggregate:", hashToHex(digest));
Sequential variant (feeds files into a single running hash):
import { digestFilesSequential, hashToHex } from "fast-fs-hash";
const digest = await digestFilesSequential(["package.json", "src/index.ts"]);
console.log(hashToHex(digest));
import { digestFile, hashToHex } from "fast-fs-hash";
const digest = await digestFile("package.json");
console.log(hashToHex(digest));
import { digestFileToHex, digestFilesToHexArray } from "fast-fs-hash";
// Single file → 32-char hex string
const hex = await digestFileToHex("package.json");
// Multiple files in parallel → per-file hex strings
const hexes = await digestFilesToHexArray(["src/a.ts", "src/b.ts"], 8);
| Function | Description |
|---|---|
digestFileToHex(path, throwOnError?) |
Hash a file → 32-char hex string. Wrapper around digestFile + hashToHex |
digestFilesToHexArray(paths, concurrency?, throwOnError?) |
Hash files in parallel → per-file hex strings. Default concurrency 8 |
import { digestBuffer, digestString } from "fast-fs-hash";
const d1 = digestBuffer(myBuffer);
const d2 = digestString("hello world");
console.log(d2.toString("hex"));
For combining file hashes with extra data (config, environment, etc.):
import { XxHash128Stream } from "fast-fs-hash";
const h = new XxHash128Stream();
h.addString("my-config-v2");
await h.addFiles(["src/index.ts", "src/utils.ts"]);
console.log(h.digest().toString("hex"));
Busy guard: Async methods (addFile, addFiles, addFilesParallel) mark the instance
as busy while the native worker thread is processing. During this time, calling any
synchronous method or starting another async operation will throw an error. Always await
each async call before invoking another method. Use the busy getter to check:
const h = new XxHash128Stream();
const promise = h.addFile("large.bin");
console.log(h.busy); // true — async operation in flight
// h.addString("oops"); // would throw!
await promise;
console.log(h.busy); // false — safe to use again
fast-fs-hash exposes the LZ4 block compression API used internally for the cache file format. Both synchronous and asynchronous (pool-thread) variants are available.
LZ4 block format does not embed the uncompressed size — the caller must store it alongside the compressed data and pass it to the decompression function.
compress 64 KB:
| Scenario | Ratio | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|---|
| native LZ4 | 0.7% | 0.004 ms (3.5 µs) | 285 548 op/s | 18.7 GB/s | 7.5× faster |
| Node.js deflate level=1 | 1.0% | 0.03 ms (26.3 µs) | 37 993 op/s | 2.5 GB/s | baseline |
decompress 64 KB:
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native LZ4 | 0.002 ms (2.4 µs) | 417 766 op/s | 27.4 GB/s | 3.8× faster |
| Node.js deflate | 0.009 ms (9.2 µs) | 108 604 op/s | 7.1 GB/s | baseline |
compress 1 MB:
| Scenario | Ratio | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|---|
| native LZ4 | 0.4% | 0.04 ms (35.2 µs) | 28 424 op/s | 29.8 GB/s | 10.5× faster |
| Node.js deflate level=1 | 0.7% | 0.4 ms (370.6 µs) | 2 698 op/s | 2.8 GB/s | baseline |
decompress 1 MB:
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native LZ4 | 0.03 ms (31.3 µs) | 31 923 op/s | 33.5 GB/s | 2.6× faster |
| Node.js deflate | 0.08 ms (80.4 µs) | 12 432 op/s | 13.0 GB/s | baseline |
import {
lz4CompressBlock,
lz4DecompressBlock,
lz4CompressBound,
} from "fast-fs-hash";
const input = Buffer.from("Hello, LZ4!");
const compressed = lz4CompressBlock(input);
const decompressed = lz4DecompressBlock(compressed, input.length);
console.log(decompressed.toString()); // "Hello, LZ4!"
| Function | Description |
|---|---|
lz4CompressBlock(input, offset?, length?) |
Sync compress → new Buffer |
lz4CompressBlockTo(input, output, outputOffset?, inputOffset?, inputLength?) |
Sync compress into pre-allocated buffer → bytes written |
lz4CompressBlockAsync(input, offset?, length?) |
Async compress on pool thread → Promise<Buffer> |
lz4DecompressBlock(input, uncompressedSize, offset?, length?) |
Sync decompress → new Buffer |
lz4DecompressBlockTo(input, uncompressedSize, output, outputOffset?, inputOffset?, inputLength?) |
Sync decompress into pre-allocated buffer → bytes written |
lz4DecompressBlockAsync(input, uncompressedSize, offset?, length?) |
Async decompress on pool thread → Promise<Buffer> |
lz4CompressBound(inputSize) |
Max compressed size for pre-allocation |
lz4ReadAndCompress(path) |
Read a file and LZ4-compress it on pool thread → Promise<{data, uncompressedSize}> |
lz4DecompressAndWrite(compressedData, uncompressedSize, path) |
Decompress and write to file on pool thread (creates dirs) → Promise<boolean> |
Note: LZ4 block compression supports inputs up to ~1.9 GiB (
LZ4_MAX_INPUT_SIZE = 0x7E000000).lz4ReadAndCompressandlz4DecompressAndWritesupport files up to 512 MiB.
lz4ReadAndCompress reads a file and LZ4-block-compresses it in a single pool-thread operation —
no JS-thread I/O, no intermediate Buffer allocation visible to the event loop.
import {
lz4ReadAndCompress,
lz4DecompressAndWrite,
lz4DecompressBlock,
} from "fast-fs-hash";
const { data, uncompressedSize } = await lz4ReadAndCompress("large-file.bin");
console.log(`Compressed ${uncompressedSize} → ${data.length} bytes`);
// Decompress back to a file (creates parent directories if needed)
await lz4DecompressAndWrite(data, uncompressedSize, "restored-file.bin");
// Or decompress to a buffer in memory
const original = lz4DecompressBlock(data, uncompressedSize);
Compare two files for byte-equality asynchronously on a native pool thread. Opens both files,
compares sizes via fstat, then reads in lockstep chunks with memcmp. Returns false if
either file cannot be opened/read or if sizes differ — never throws.
equal files (~49.9 KB):
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native | 0.04 ms (40.5 µs) | 24 691 op/s | 1.2 GB/s | 2.5× faster |
| Node.js (fs.open + read + compare) | 0.1 ms (101.0 µs) | 9 901 op/s | 494 MB/s | baseline |
equal files (~197.3 KB):
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| native | 0.05 ms (50.4 µs) | 19 832 op/s | 3.9 GB/s | 2.9× faster |
| Node.js (fs.open + read + compare) | 0.1 ms (148.4 µs) | 6 737 op/s | 1.3 GB/s | baseline |
different content, same size (~49.9 KB):
| Scenario | Mean | Hz | Throughput | Relative |
|---|---|---|---|---|
| Node.js (fs.open + read + compare) | 0.1 ms (100.7 µs) | 9 935 op/s | 496 MB/s | 15.6× faster |
| native | 1.6 ms (1 574.5 µs) | 635 op/s | 32 MB/s | baseline |
different sizes (early exit):
| Scenario | Mean | Hz | Relative |
|---|---|---|---|
| native | 0.04 ms (35.7 µs) | 28 038 op/s | 2.5× faster |
| Node.js (fs.open + read + compare) | 0.09 ms (88.8 µs) | 11 261 op/s | baseline |
import { filesEqual } from "fast-fs-hash";
if (await filesEqual("output.bin", "expected.bin")) {
console.log("Files are identical");
} else {
console.log("Files differ (or one doesn't exist)");
}
| Function | Description |
|---|---|
filesEqual(pathA, pathB) |
Async byte-equality check on pool thread → Promise<boolean> |
Walk the parent chain from a start path and locate project markers in a single pass: .git,
package.json, tsconfig.json, and node_modules/. Reports nearest* (first hit walking up)
and root* (last hit, bounded by the enclosing .git) for each marker, plus gitRoot and
gitSuperRoot for submodule/worktree awareness.
The walk stops at the filesystem root, the user's home directory (or any ancestor of it), an
optional stopPath, and a depth cap of 128 (symlink-loop defense). Tolerant of missing paths —
if startPath doesn't exist, the walk begins from its longest existing ancestor and missing
markers are returned as null rather than thrown.
shallow (3 levels deep):
| Scenario | Mean | Hz | Relative |
|---|---|---|---|
| native (sync) | 0.04 ms (42.5 µs) | 23 522 op/s | 6.5× faster |
| native (async) | 0.05 ms (54.9 µs) | 18 226 op/s | 5.0× faster |
| Node.js (sync, fs.statSync) | 0.1 ms (115.2 µs) | 8 679 op/s | 2.4× faster |
| Node.js (async, fs.stat) | 0.3 ms (274.2 µs) | 3 647 op/s | baseline |
deep (12 levels deep):
| Scenario | Mean | Hz | Relative |
|---|---|---|---|
| native (sync) | 0.09 ms (91.7 µs) | 10 911 op/s | 7.8× faster |
| native (async) | 0.1 ms (101.0 µs) | 9 906 op/s | 7.1× faster |
| Node.js (sync, fs.statSync) | 0.3 ms (331.5 µs) | 3 017 op/s | 2.2× faster |
| Node.js (async, fs.stat) | 0.7 ms (719.4 µs) | 1 390 op/s | baseline |
missing start path (tolerant fallback):
| Scenario | Mean | Hz | Relative |
|---|---|---|---|
| native (sync) | 0.05 ms (50.3 µs) | 19 879 op/s | 2.7× faster |
| Node.js (sync, fs.statSync) | 0.1 ms (133.5 µs) | 7 489 op/s | baseline |
import { findProjectRoot, findProjectRootSync } from "fast-fs-hash";
// Sync (recommended for startup-time / build-tool use)
const info = findProjectRootSync(import.meta.dirname);
console.log(info.gitRoot, info.rootPackageJson, info.nearestTsconfigJson);
// Async — runs on the native thread pool, useful on cold or networked filesystems
const info2 = await findProjectRoot("/some/deep/file.ts");
// Optional stopPath: halt when the walker reaches this directory (or any ancestor of it)
const info3 = findProjectRootSync(start, "/workspace");
| Function | Description |
|---|---|
findProjectRootSync(startPath, stopPath) |
Walk parent chain for project markers (sync) → ProjectRoot |
findProjectRoot(startPath, stopPath) |
Walk parent chain on pool thread → Promise<ProjectRoot> |
The returned ProjectRoot object has these fields (each string | null):
| Field | Description |
|---|---|
gitRoot |
Innermost .git (dir or file). Matches git rev-parse --show-toplevel. |
gitSuperRoot |
Outermost .git directory. Non-null only in submodules / nested worktrees. |
nearestPackageJson |
First package.json walking up. |
rootPackageJson |
Last package.json walking up, bounded by gitRoot. |
nearestTsconfigJson |
First tsconfig.json walking up. |
rootTsconfigJson |
Last tsconfig.json walking up, bounded by gitRoot. |
nearestNodeModules |
First node_modules/ directory walking up (also detects when started inside). |
rootNodeModules |
Last node_modules/ walking up, bounded by gitRoot. |
| Function | Description |
|---|---|
hashToHex(digest) |
Convert a 16-byte digest to a 32-char hex string |
hashesToHexArray(digests) |
Convert an array of digests to hex strings |
findCommonRootPath(files, baseRoot?, allowedRoot?) |
Longest common parent directory of file paths |
normalizeFilePaths(rootPath, files) |
Resolve, sort, deduplicate paths relative to root |
toRelativePath(rootPath, filePath) |
Single path → clean unix-style relative path (or null) |
threadPoolTrim() |
Wake idle native pool threads so they self-terminate and free memory |
| Variable | Default | Description |
|---|---|---|
FAST_FS_HASH_ISA |
auto-detect | Override SIMD variant: avx512, avx2, or baseline (x64 only) |
FAST_FS_HASH_POOL_IDLE_TIMEOUT_MS |
15000 |
Idle timeout for native pool threads (1–3600000 ms). Threads self-terminate after this duration with no work. They respawn automatically when new work arrives. |
The native C++ backend uses:
See NOTICES.md for full license texts.
| Tool | Version | Install |
|---|---|---|
| Node.js | >= 22 | nodejs.org |
| npm | >= 9 | bundled with Node.js |
| CMake | >= 3.15 | brew install cmake / apt install cmake / cmake.org |
| C++20 compiler | Clang 14+ / GCC 12+ / MSVC 2022 | Xcode CLT / build-essential / Visual Studio |
git clone --recurse-submodules https://github.com/SalvatorePreviti/fast-fs-hash.git
cd fast-fs-hash
npm install
npm run build:all # compile C++ addon + TypeScript
npm test # run tests
npm run bench # run benchmarks
Note:
git clone --recurse-submodulesis required to pulldeps/xxHash(the xxHash source used by the native addon).
The deps/xxHash/ directory is a git submodule pointing to xxHash v0.8.3.
If you cloned without --recurse-submodules, initialize the submodule manually:
git submodule update --init --recursive
See package.json for the full list of available build scripts.
main — development branch. CI runs lint, typecheck, tests, and builds native binaries for all platforms on every push and PR.publish — release branch. Pushing to publish triggers the full CI pipeline. After all builds and tests pass, a dry-run publish verifies all packages. An admin must then manually approve the publish job (via the npm-publish GitHub environment) to publish to npm, create a git tag, and deploy docs.npm packages are published with provenance attestations via GitHub Actions OIDC — no npm tokens are stored in CI.
publish: require PR reviews, require status checks to pass, restrict push access to admins only.npm-publish: create under Settings → Environments with "Required reviewers" restricted to trusted maintainers.@fast-fs-hash/* package on npmjs.com to trust the npm-publish environment from this repository.MIT — Copyright (c) 2025-present Salvatore Previti