fast-fs-hash - v0.0.0-rc4
    Preparing search index...

    fast-fs-hash - v0.0.0-rc4

    fast-fs-hash

    npm GitHub API Docs

    If you ever needed to check whether a set of files changed — to invalidate a cache, skip redundant builds, or trigger incremental CI — fast-fs-hash is for you.

    It hashes hundreds of files in milliseconds using xxHash3-128 via a native C++ addon with SIMD acceleration.

    xxHash3 is a non-cryptographic hash function — it is not suitable for security purposes, but it is more than enough for cache invalidation, deduplication, and change detection, which is what this library is designed for.

    Zero external dependencies. Requires Node.js >= 22.

    npm install fast-fs-hash
    

    Requires Node.js >= 22.

    The native addon is prebuilt for common platforms via platform-specific optional dependencies. When you run npm install, npm automatically installs only the package matching your current OS and architecture.

    Supported platforms: macOS, Linux (glibc & musl), Windows, FreeBSD — both x64 and arm64.

    On x64, optimized variants for AVX2 and AVX-512 are included and selected automatically at load time via native CPUID detection. Set FAST_FS_HASH_ISA=avx2|avx512|baseline to override.

    CI note: Some CI configurations disable optional dependencies by default (e.g. npm install --no-optional or --omit=optional). To get the native addon in CI, either allow optional dependencies or install the platform package explicitly:

    npm install @fast-fs-hash/fast-fs-hash-node-linux-x64-gnu
    

    FileHashCache reads, validates, and writes a compact binary cache file that tracks per-file stat metadata (inode, mtime, ctime, size) and content hashes (xxHash3-128).

    On the next run it re-stats every tracked file and compares — files whose stat matches are skipped entirely (no re-read), giving near-instant validation for large file sets.

    Build systems, code generators, and CI pipelines often produce output that depends on many input files. Recomputing that output on every run is expensive — even when nothing changed.

    FileHashCache solves this by persisting a fingerprint of all input files between runs. On the next invocation, it checks whether any input changed in sub-millisecond time (stat-only, no re-reading). If nothing changed, you skip the expensive step entirely.

    Common use cases:

    • Incremental builds: track source files → skip compilation when inputs are unchanged
    • Generated output caching: store a compiled bundle, generated types, or processed assets alongside the cache — rebuild only when dependencies change
    • CI artifact caching: validate whether a cached artifact is still fresh before uploading or downloading a new one
    • Multi-step pipelines: each stage writes its own cache file, checked independently

    The cache file also supports user data — opaque binary payloads stored alongside the file hashes. This lets you embed build output manifests, dependency graphs, or configuration snapshots directly in the cache, so a single open() tells you both "did anything change?" and "what was the previous result?" — no separate metadata files needed.

    Hashing is fast, but reading thousands of files from disk is not. FileHashCache avoids re-reading files that haven't changed by comparing stat() metadata first. Only files with changed stat are re-hashed. This makes cache validation O(n × stat) instead of O(n × read + hash) — typically 10-100× faster for warm caches.

    Native (C++ addon):

    Scenario Mean Hz Files/s Throughput
    no change 0.6 ms (624.5 µs) 1 601 op/s 1 128 991 files/s
    1 file changed 0.9 ms (922.7 µs) 1 084 op/s 764 054 files/s
    many files changed 2.5 ms (2 517.0 µs) 397 op/s 280 094 files/s 9.8 GB/s
    no existing cache 8.0 ms (8 015.1 µs) 125 op/s 87 959 files/s 3.1 GB/s
    overwrite 7.9 ms (7 915.8 µs) 126 op/s 89 062 files/s 3.1 GB/s

    Node.js v22.22.2, Vitest 4.x — Apple M4 Max, macOS 25.4.0 (arm64), with anti-virus.

    Results vary by hardware, file sizes, and OS cache state.

    A long-lived cache that tracks file content hashes with exclusive OS-level locking. Create the instance once, then call open() on each build cycle. Configuration (files, version, fingerprint) is set via the constructor, setters, or configure().

    The typical usage: the file list is only known after a build step. Open without files (reuses the list from the previous cache on disk), then set the new file list before writing. Use compressedPayloads (LZ4-compressed inside the cache body) or uncompressedPayloads (stored raw, readable without decompression) to store arbitrary build metadata alongside the cache.

    import { FileHashCache } from "fast-fs-hash";

    const cache = new FileHashCache({
    cachePath: ".cache/build.fsh",
    rootPath: ".",
    version: 1,
    });

    export async function build() {
    using session = await cache.open();

    if (session.status === "upToDate" && session.compressedPayloads.length > 0) {
    return JSON.parse(session.compressedPayloads[0].toString()); // cached result
    }

    const result = await runBuild();

    cache.configure({ files: result.getSourceFiles().map((f) => f.fileName) });

    await session.write({
    compressedPayloads: [Buffer.from(JSON.stringify(result.output))],
    });

    return result.output;
    }

    When the file list is known upfront, pass it to the constructor:

    import { FileHashCache } from "fast-fs-hash";
    import { globSync } from "node:fs";

    const cache = new FileHashCache({
    cachePath: ".cache/build.fsh",
    rootPath: ".",
    files: globSync("src/**/*.ts"),
    version: 1,
    });

    using session = await cache.open();

    if (session.status === "upToDate") {
    console.log("Build cache is fresh — skipping.");
    } else {
    console.log("Files changed — rebuilding...");
    await runBuild();
    await session.write();
    }

    Constructor: new FileHashCache({ cachePath, files?, rootPath?, version?, fingerprint?, lockTimeoutMs? })

    Cache configuration (mutable between opens):

    • configure(opts) — set multiple config fields at once: files, rootPath, version, fingerprint, lockTimeoutMs
    • Setters: cache.files, cache.rootPath, cache.version, cache.fingerprint, cache.lockTimeoutMs
    • needsOpentrue when config changed since last open, or cache was never opened

    Cache methods:

    • open(signal?) — acquires an exclusive lock, reads from disk, validates version/fingerprint, stat-matches entries. Returns a FileHashCacheSession.
    • overwrite(options?) — writes a brand-new cache without reading the old one. Options: payloadValue0..3, compressedPayloads, uncompressedPayloads, signal, lockTimeoutMs.
    • invalidate(paths) / invalidateAll() — mark files as dirty for the next open (watch mode).
    • isLocked() / waitUnlocked(timeout?, signal?) — check or wait for lock.
    • checkCacheFile() — sync stat check if the cache file on disk changed since last open.

    Session properties (read-only, from disk):

    • status'upToDate' | 'changed' | 'stale' | 'missing' | 'statsDirty' | 'lockFailed'
    • needsWritetrue if the session holds the lock and the status indicates changes
    • configChangedtrue if cache config was modified since this session was opened
    • wouldNeedWritetrue if either files changed on disk or config changed
    • busy / disposed — async operation state
    • files, fileCount, version, rootPath
    • payloadValue0..3 — four f64 numeric values read from disk
    • compressedPayloads — array of LZ4-compressed binary Buffer payloads read from disk
    • uncompressedPayloads — array of raw binary Buffer payloads readable without LZ4 decompression

    Session methods:

    • write(options?) — hashes unresolved entries, compresses, writes to disk, releases lock. Can only be called once. Options: payloadValue0..3, compressedPayloads, uncompressedPayloads, signal.
    • resolve(signal?) — completes stat + hash for ALL files, returns FileHashCacheEntries. Can be called before write(). See below.
    • close() — releases the lock. Also called automatically by using.

    Static methods:

    • FileHashCache.isLocked(cachePath) — check if locked by another process
    • FileHashCache.waitUnlocked(cachePath, lockTimeoutMs?, signal?) — wait for unlock

    Lock behavior:

    • Cross-process exclusive lock via flock(2) (POSIX) / LockFileEx (Windows)
    • Cross-thread safe: per-OFD semantics on POSIX and per-handle on Windows make worker_threads in the same process serialize correctly against each other
    • Crash-safe: automatically released when the process dies
    • lockTimeoutMs: -1 = block forever (default), 0 = non-blocking, >0 = timeout ms
    • When lock fails: status === 'lockFailed'. Calling write() falls back to overwrite().
    • Cancellable via AbortSignal on open(), overwrite(), and waitUnlocked()

    After open(), the session knows the aggregate status but not which specific files changed. Call resolve() to complete stat + hash for every file and get per-file metadata.

    Note: resolve() stats and hashes every unresolved file on the thread pool. This has a cost proportional to the number of changed files. Use it only when you need per-file information — for simple "changed → rebuild all" workflows, just check session.status.

    using session = await cache.open();

    if (session.status !== "upToDate") {
    const entries = await session.resolve();

    for (const entry of entries) {
    if (entry.changed) {
    console.log(
    `Changed: ${entry.path} (${entry.size} bytes, hash: ${entry.contentHashHex})`,
    );
    }
    }

    await session.write();
    }

    Each FileHashCacheEntry provides:

    • path — absolute file path
    • size — file size in bytes
    • mtimeMs / ctimeMs — modification / change time in ms
    • changedtrue if content differs from the cached version (or is a new file)
    • contentHash — 16-byte xxHash3-128 as a Buffer (zero-copy view)
    • contentHashHex — 32-char hex string (lazy, computed on first access)

    FileHashCacheEntries supports get(index), find(path), and iteration. The result is cached — subsequent calls to resolve() return the same snapshot.


    When you don't need a persistent cache file — or you want raw xxHash3-128 digests to compare yourself — use the digest functions directly. FileHashCache uses them under the hood, but they are fully usable on their own.

    large file (~197.3 KB):

    Scenario Mean Hz Throughput Relative
    native 0.04 ms (40.8 µs) 24 489 op/s 4.8 GB/s 6.8× faster
    Node.js crypto (md5) 0.3 ms (276.7 µs) 3 614 op/s 713 MB/s baseline

    medium file (~49.9 KB):

    Scenario Mean Hz Throughput Relative
    native 0.03 ms (25.9 µs) 38 653 op/s 1.9 GB/s 4.3× faster
    Node.js crypto (md5) 0.1 ms (111.4 µs) 8 977 op/s 448 MB/s baseline

    small file (~1.0 KB):

    Scenario Mean Hz Relative
    Node.js crypto (md5) 0.06 ms (56.0 µs) 17 871 op/s 266.1× faster
    native 14.9 ms (14 891.2 µs) 67 op/s baseline
    Scenario Mean Hz Throughput Relative
    native 7.9 ms (7 948.8 µs) 126 op/s 3.1 GB/s 4.4× faster
    Node.js crypto (md5) 34.7 ms (34 657.6 µs) 29 op/s 713 MB/s baseline

    64 KB buffer:

    Scenario Mean Hz Throughput Relative
    native XXH3-128 0.001 ms (1.4 µs) 704 869 op/s 46.2 GB/s 49.3× faster
    Node.js crypto md5 0.07 ms (70.0 µs) 14 288 op/s 936 MB/s baseline

    1 MB buffer:

    Scenario Mean Hz Throughput Relative
    native XXH3-128 0.02 ms (22.0 µs) 45 515 op/s 47.7 GB/s 49.1× faster
    Node.js crypto md5 1.1 ms (1 078.2 µs) 927 op/s 973 MB/s baseline
    import { digestFilesParallel, hashToHex } from "fast-fs-hash";

    const digest = await digestFilesParallel([
    "package.json",
    "src/index.ts",
    "src/utils.ts",
    ]);
    console.log("Aggregate:", hashToHex(digest));

    Sequential variant (feeds files into a single running hash):

    import { digestFilesSequential, hashToHex } from "fast-fs-hash";

    const digest = await digestFilesSequential(["package.json", "src/index.ts"]);
    console.log(hashToHex(digest));
    import { digestFile, hashToHex } from "fast-fs-hash";

    const digest = await digestFile("package.json");
    console.log(hashToHex(digest));
    import { digestFileToHex, digestFilesToHexArray } from "fast-fs-hash";

    // Single file → 32-char hex string
    const hex = await digestFileToHex("package.json");

    // Multiple files in parallel → per-file hex strings
    const hexes = await digestFilesToHexArray(["src/a.ts", "src/b.ts"], 8);
    Function Description
    digestFileToHex(path, throwOnError?) Hash a file → 32-char hex string. Wrapper around digestFile + hashToHex
    digestFilesToHexArray(paths, concurrency?, throwOnError?) Hash files in parallel → per-file hex strings. Default concurrency 8
    import { digestBuffer, digestString } from "fast-fs-hash";

    const d1 = digestBuffer(myBuffer);
    const d2 = digestString("hello world");
    console.log(d2.toString("hex"));

    For combining file hashes with extra data (config, environment, etc.):

    import { XxHash128Stream } from "fast-fs-hash";

    const h = new XxHash128Stream();
    h.addString("my-config-v2");
    await h.addFiles(["src/index.ts", "src/utils.ts"]);
    console.log(h.digest().toString("hex"));

    Busy guard: Async methods (addFile, addFiles, addFilesParallel) mark the instance as busy while the native worker thread is processing. During this time, calling any synchronous method or starting another async operation will throw an error. Always await each async call before invoking another method. Use the busy getter to check:

    const h = new XxHash128Stream();
    const promise = h.addFile("large.bin");
    console.log(h.busy); // true — async operation in flight
    // h.addString("oops"); // would throw!
    await promise;
    console.log(h.busy); // false — safe to use again

    fast-fs-hash exposes the LZ4 block compression API used internally for the cache file format. Both synchronous and asynchronous (pool-thread) variants are available.

    LZ4 block format does not embed the uncompressed size — the caller must store it alongside the compressed data and pass it to the decompression function.

    compress 64 KB:

    Scenario Ratio Mean Hz Throughput Relative
    native LZ4 0.7% 0.004 ms (3.5 µs) 285 548 op/s 18.7 GB/s 7.5× faster
    Node.js deflate level=1 1.0% 0.03 ms (26.3 µs) 37 993 op/s 2.5 GB/s baseline

    decompress 64 KB:

    Scenario Mean Hz Throughput Relative
    native LZ4 0.002 ms (2.4 µs) 417 766 op/s 27.4 GB/s 3.8× faster
    Node.js deflate 0.009 ms (9.2 µs) 108 604 op/s 7.1 GB/s baseline

    compress 1 MB:

    Scenario Ratio Mean Hz Throughput Relative
    native LZ4 0.4% 0.04 ms (35.2 µs) 28 424 op/s 29.8 GB/s 10.5× faster
    Node.js deflate level=1 0.7% 0.4 ms (370.6 µs) 2 698 op/s 2.8 GB/s baseline

    decompress 1 MB:

    Scenario Mean Hz Throughput Relative
    native LZ4 0.03 ms (31.3 µs) 31 923 op/s 33.5 GB/s 2.6× faster
    Node.js deflate 0.08 ms (80.4 µs) 12 432 op/s 13.0 GB/s baseline
    import {
    lz4CompressBlock,
    lz4DecompressBlock,
    lz4CompressBound,
    } from "fast-fs-hash";

    const input = Buffer.from("Hello, LZ4!");
    const compressed = lz4CompressBlock(input);
    const decompressed = lz4DecompressBlock(compressed, input.length);
    console.log(decompressed.toString()); // "Hello, LZ4!"
    Function Description
    lz4CompressBlock(input, offset?, length?) Sync compress → new Buffer
    lz4CompressBlockTo(input, output, outputOffset?, inputOffset?, inputLength?) Sync compress into pre-allocated buffer → bytes written
    lz4CompressBlockAsync(input, offset?, length?) Async compress on pool thread → Promise<Buffer>
    lz4DecompressBlock(input, uncompressedSize, offset?, length?) Sync decompress → new Buffer
    lz4DecompressBlockTo(input, uncompressedSize, output, outputOffset?, inputOffset?, inputLength?) Sync decompress into pre-allocated buffer → bytes written
    lz4DecompressBlockAsync(input, uncompressedSize, offset?, length?) Async decompress on pool thread → Promise<Buffer>
    lz4CompressBound(inputSize) Max compressed size for pre-allocation
    lz4ReadAndCompress(path) Read a file and LZ4-compress it on pool thread → Promise<{data, uncompressedSize}>
    lz4DecompressAndWrite(compressedData, uncompressedSize, path) Decompress and write to file on pool thread (creates dirs) → Promise<boolean>

    Note: LZ4 block compression supports inputs up to ~1.9 GiB (LZ4_MAX_INPUT_SIZE = 0x7E000000). lz4ReadAndCompress and lz4DecompressAndWrite support files up to 512 MiB.

    lz4ReadAndCompress reads a file and LZ4-block-compresses it in a single pool-thread operation — no JS-thread I/O, no intermediate Buffer allocation visible to the event loop.

    import {
    lz4ReadAndCompress,
    lz4DecompressAndWrite,
    lz4DecompressBlock,
    } from "fast-fs-hash";

    const { data, uncompressedSize } = await lz4ReadAndCompress("large-file.bin");
    console.log(`Compressed ${uncompressedSize}${data.length} bytes`);

    // Decompress back to a file (creates parent directories if needed)
    await lz4DecompressAndWrite(data, uncompressedSize, "restored-file.bin");

    // Or decompress to a buffer in memory
    const original = lz4DecompressBlock(data, uncompressedSize);

    Compare two files for byte-equality asynchronously on a native pool thread. Opens both files, compares sizes via fstat, then reads in lockstep chunks with memcmp. Returns false if either file cannot be opened/read or if sizes differ — never throws.

    equal files (~49.9 KB):

    Scenario Mean Hz Throughput Relative
    native 0.04 ms (40.5 µs) 24 691 op/s 1.2 GB/s 2.5× faster
    Node.js (fs.open + read + compare) 0.1 ms (101.0 µs) 9 901 op/s 494 MB/s baseline

    equal files (~197.3 KB):

    Scenario Mean Hz Throughput Relative
    native 0.05 ms (50.4 µs) 19 832 op/s 3.9 GB/s 2.9× faster
    Node.js (fs.open + read + compare) 0.1 ms (148.4 µs) 6 737 op/s 1.3 GB/s baseline

    different content, same size (~49.9 KB):

    Scenario Mean Hz Throughput Relative
    Node.js (fs.open + read + compare) 0.1 ms (100.7 µs) 9 935 op/s 496 MB/s 15.6× faster
    native 1.6 ms (1 574.5 µs) 635 op/s 32 MB/s baseline

    different sizes (early exit):

    Scenario Mean Hz Relative
    native 0.04 ms (35.7 µs) 28 038 op/s 2.5× faster
    Node.js (fs.open + read + compare) 0.09 ms (88.8 µs) 11 261 op/s baseline
    import { filesEqual } from "fast-fs-hash";

    if (await filesEqual("output.bin", "expected.bin")) {
    console.log("Files are identical");
    } else {
    console.log("Files differ (or one doesn't exist)");
    }
    Function Description
    filesEqual(pathA, pathB) Async byte-equality check on pool thread → Promise<boolean>

    Walk the parent chain from a start path and locate project markers in a single pass: .git, package.json, tsconfig.json, and node_modules/. Reports nearest* (first hit walking up) and root* (last hit, bounded by the enclosing .git) for each marker, plus gitRoot and gitSuperRoot for submodule/worktree awareness.

    The walk stops at the filesystem root, the user's home directory (or any ancestor of it), an optional stopPath, and a depth cap of 128 (symlink-loop defense). Tolerant of missing paths — if startPath doesn't exist, the walk begins from its longest existing ancestor and missing markers are returned as null rather than thrown.

    shallow (3 levels deep):

    Scenario Mean Hz Relative
    native (sync) 0.04 ms (42.5 µs) 23 522 op/s 6.5× faster
    native (async) 0.05 ms (54.9 µs) 18 226 op/s 5.0× faster
    Node.js (sync, fs.statSync) 0.1 ms (115.2 µs) 8 679 op/s 2.4× faster
    Node.js (async, fs.stat) 0.3 ms (274.2 µs) 3 647 op/s baseline

    deep (12 levels deep):

    Scenario Mean Hz Relative
    native (sync) 0.09 ms (91.7 µs) 10 911 op/s 7.8× faster
    native (async) 0.1 ms (101.0 µs) 9 906 op/s 7.1× faster
    Node.js (sync, fs.statSync) 0.3 ms (331.5 µs) 3 017 op/s 2.2× faster
    Node.js (async, fs.stat) 0.7 ms (719.4 µs) 1 390 op/s baseline

    missing start path (tolerant fallback):

    Scenario Mean Hz Relative
    native (sync) 0.05 ms (50.3 µs) 19 879 op/s 2.7× faster
    Node.js (sync, fs.statSync) 0.1 ms (133.5 µs) 7 489 op/s baseline
    import { findProjectRoot, findProjectRootSync } from "fast-fs-hash";

    // Sync (recommended for startup-time / build-tool use)
    const info = findProjectRootSync(import.meta.dirname);
    console.log(info.gitRoot, info.rootPackageJson, info.nearestTsconfigJson);

    // Async — runs on the native thread pool, useful on cold or networked filesystems
    const info2 = await findProjectRoot("/some/deep/file.ts");

    // Optional stopPath: halt when the walker reaches this directory (or any ancestor of it)
    const info3 = findProjectRootSync(start, "/workspace");
    Function Description
    findProjectRootSync(startPath, stopPath) Walk parent chain for project markers (sync) → ProjectRoot
    findProjectRoot(startPath, stopPath) Walk parent chain on pool thread → Promise<ProjectRoot>

    The returned ProjectRoot object has these fields (each string | null):

    Field Description
    gitRoot Innermost .git (dir or file). Matches git rev-parse --show-toplevel.
    gitSuperRoot Outermost .git directory. Non-null only in submodules / nested worktrees.
    nearestPackageJson First package.json walking up.
    rootPackageJson Last package.json walking up, bounded by gitRoot.
    nearestTsconfigJson First tsconfig.json walking up.
    rootTsconfigJson Last tsconfig.json walking up, bounded by gitRoot.
    nearestNodeModules First node_modules/ directory walking up (also detects when started inside).
    rootNodeModules Last node_modules/ walking up, bounded by gitRoot.

    Function Description
    hashToHex(digest) Convert a 16-byte digest to a 32-char hex string
    hashesToHexArray(digests) Convert an array of digests to hex strings
    findCommonRootPath(files, baseRoot?, allowedRoot?) Longest common parent directory of file paths
    normalizeFilePaths(rootPath, files) Resolve, sort, deduplicate paths relative to root
    toRelativePath(rootPath, filePath) Single path → clean unix-style relative path (or null)
    threadPoolTrim() Wake idle native pool threads so they self-terminate and free memory

    Variable Default Description
    FAST_FS_HASH_ISA auto-detect Override SIMD variant: avx512, avx2, or baseline (x64 only)
    FAST_FS_HASH_POOL_IDLE_TIMEOUT_MS 15000 Idle timeout for native pool threads (1–3600000 ms). Threads self-terminate after this duration with no work. They respawn automatically when new work arrives.

    The native C++ backend uses:

    See NOTICES.md for full license texts.

    Tool Version Install
    Node.js >= 22 nodejs.org
    npm >= 9 bundled with Node.js
    CMake >= 3.15 brew install cmake / apt install cmake / cmake.org
    C++20 compiler Clang 14+ / GCC 12+ / MSVC 2022 Xcode CLT / build-essential / Visual Studio
    git clone --recurse-submodules https://github.com/SalvatorePreviti/fast-fs-hash.git
    cd fast-fs-hash
    npm install
    npm run build:all # compile C++ addon + TypeScript
    npm test # run tests
    npm run bench # run benchmarks

    Note: git clone --recurse-submodules is required to pull deps/xxHash (the xxHash source used by the native addon).

    The deps/xxHash/ directory is a git submodule pointing to xxHash v0.8.3.

    If you cloned without --recurse-submodules, initialize the submodule manually:

    git submodule update --init --recursive
    

    See package.json for the full list of available build scripts.

    • main — development branch. CI runs lint, typecheck, tests, and builds native binaries for all platforms on every push and PR.
    • publish — release branch. Pushing to publish triggers the full CI pipeline. After all builds and tests pass, a dry-run publish verifies all packages. An admin must then manually approve the publish job (via the npm-publish GitHub environment) to publish to npm, create a git tag, and deploy docs.

    npm packages are published with provenance attestations via GitHub Actions OIDC — no npm tokens are stored in CI.

    • Branch protection on publish: require PR reviews, require status checks to pass, restrict push access to admins only.
    • Environment npm-publish: create under Settings → Environments with "Required reviewers" restricted to trusted maintainers.
    • npm trusted publishing: configure each @fast-fs-hash/* package on npmjs.com to trust the npm-publish environment from this repository.

    MIT — Copyright (c) 2025-present Salvatore Previti