Quickstart

This page is a working reference for the CLI: what to run, what knobs matter, and where outputs land.

If you are new: start with one slide, then scale up.

# 1. validate settings on one slide
python run_single_slide.py --slide_path ./wsis/example.svs --job_dir ./out \
    --patch_encoder uni_v1 --mag 20 --patch_size 256

# 2. run the full batch when happy
python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./out \
    --patch_encoder uni_v1 --mag 20 --patch_size 256 --skip_errors

End-to-end pipeline

--task all runs the three stages in order. You can also run them individually on the same --job_dir β€” TRIDENT will pick up where it left off.

python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./out \
    --patch_encoder uni_v1 --mag 20 --patch_size 256

This produces:

  1. Tissue segmentation (contours/, contours_geojson/, thumbnails/).

  2. Patch coordinates (<mag>x_<patch>px_<overlap>px_overlap/patches/<slide>_patches.h5).

  3. Patch features (<mag>x_<patch>px_<overlap>px_overlap/features_<encoder>/<slide>.h5).

Equivalent unified CLI:

trident batch  -- --task all --wsi_dir ./wsis --job_dir ./out --patch_encoder uni_v1 --mag 20 --patch_size 256
trident single -- --slide_path ./wsis/example.svs --job_dir ./out --patch_encoder uni_v1 --mag 20 --patch_size 256
trident doctor -- --profile base

Outputs and run tracking

In your --job_dir, TRIDENT writes:

  • summary.md: appended once per run; counts (completed / skipped / errored), per-encoder breakdown, and a short error list.

  • runs/<run_id>.json: one manifest per CLI invocation (args, timestamps, status).

  • wsi_states/<slide>__<hash>.json: per-slide machine-readable state (tasks, attempts, outputs, last error, resume info).

  • contours/ + contours_geojson/: tissue masks (open .geojson in QuPath to QC/edit).

  • <mag>x_<patch>px_<overlap>px_overlap/: per-config coords and feature dirs.

Resume and skip behavior

Re-running on the same --job_dir is the recommended way to retry / extend a job:

  • If the expected output for a (slide, task) already exists and is not locked, TRIDENT marks the task skipped. No recomputation.

  • .lock files mark tasks that are currently being written. If a worker crashes mid-task, the lock can become stale (an β€œorphan”). Clean those safely with:

    python run_batch_of_slides.py --clear_dead_locks --dead_lock_max_age_hours 24 \
        --task all --wsi_dir ./wsis --job_dir ./out ...
    

    This removes only locks where (a) the target output already exists, or (b) the writer PID is dead on this host, or (c) the lock is unreadable / legacy and older than --dead_lock_max_age_hours (default 24). Active locks from running jobs are never removed.

Multi-GPU and multi-worker

Use --gpus to shard pending slides across devices:

# Two GPUs (production)
python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./out \
    --patch_encoder uni_v1 --mag 20 --patch_size 256 --gpus 0 1

# Two CPU workers (no GPU)
python run_batch_of_slides.py --task seg --wsi_dir ./wsis --job_dir ./out \
    --segmenter otsu --gpus -1 -1

Notes:

  • Pending slides are sharded round-robin across the listed GPU IDs.

  • Duplicate positive GPU IDs are deduplicated (running two workers on the same CUDA device wastes memory). Duplicate -1 entries are kept (each is an independent CPU worker).

  • --gpu (singular) is the legacy form and still works, but prefer --gpus.

Caching for slow / network storage

If WSIs sit on a slow network drive, copy them in batches to a local SSD via the producer/consumer cache pipeline:

python run_batch_of_slides.py --task all --wsi_dir /mnt/nfs/wsis --job_dir ./out \
    --patch_encoder uni_v1 --mag 20 --patch_size 256 \
    --gpus 0 1 \
    --wsi_cache /local/ssd/cache --cache_batch_size 32

The cache directory is wiped and recreated at the start of each run (this is separate from lock cleanup, which is opt-in via --clear_dead_locks).

High-signal knobs

  • --segmenter:

    • grandqc β€” fast, accurate on clean H&E.

    • hest β€” better on IHC / dirtier slides.

    • otsu β€” CPU-only fallback, no model weights needed.

  • --mag / --patch_size / --overlap define the patch grid; the same values must be used across coords and feat runs.

  • --min_tissue_proportion (0.0 to 1.0) raises the bar for keeping a patch; 0.3–0.7 removes many weak edge patches.

  • --remove_artifacts / --remove_penmarks: extra artifact-cleaning segmentation pass.

  • --search_nested: discover slides in nested subfolders.

  • --custom_list_of_wsis my.csv: process a CSV subset (column wsi with paths relative to --wsi_dir; optional mpp column).

  • --reader_type {openslide,cucim,image,sdpc,omezarr,czi}: force a backend, mostly for debugging.

  • --max_workers 0: force single-process data loading (use this if your environment has DataLoader multiprocessing issues).

Stage-only examples

Segmentation only

python run_batch_of_slides.py --task seg --wsi_dir ./wsis --job_dir ./out --segmenter grandqc

Patching only (with patch images for QC)

python run_batch_of_slides.py --task coords --wsi_dir ./wsis --job_dir ./out \
    --mag 20 --patch_size 256 \
    --dump_patches --dump_patches_max 100 --dump_patches_format jpg --dump_patches_jpeg_quality 90

Feature extraction only (reusing existing coords)

python run_batch_of_slides.py --task feat --wsi_dir ./wsis --job_dir ./out \
    --patch_encoder uni_v1 --mag 20 --patch_size 256

Slide-level embeddings

python run_batch_of_slides.py --task feat --wsi_dir ./wsis --job_dir ./out \
    --slide_encoder titan --mag 20 --patch_size 512

If patch features for the required underlying encoder don’t exist, TRIDENT extracts them automatically.

Convert awkward formats to pyramidal TIFF

trident convert --input_dir ./wsis --mpp_csv ./wsis/to_process.csv --job_dir ./pyramidal_tiff --downscale_by 1 --num_workers 1

Common failure modes

  • β€œPatch features not found” during slide embeddings: each slide encoder requires a specific patch encoder (mapping in trident.slide_encoder_models.load.slide_to_patch_encoder_name). Run patch features with the right encoder, or let TRIDENT auto-extract them by passing --slide_encoder.

  • OOM during feature extraction: lower --feat_batch_size (or --batch_size), or pick a smaller patch encoder / patch size.

  • No slides discovered: add --search_nested for nested layouts; or check that your CSV uses the column name wsi and relative paths under --wsi_dir.

  • Pipeline looks stuck: check for stale .lock files. After confirming no TRIDENT process is running, re-run with --clear_dead_locks.

  • Offline / no internet: set HF_TOKEN only when needed; otherwise put weights into trident/*/local_ckpts.json or pass --patch_encoder_ckpt_path.

Argument cheat sheet

The list below is not exhaustive β€” for full defaults and choices, scroll to β€œRaw parser help”.

Flag

Use

--task {seg,coords,feat,all}

Pipeline stage. all runs seg β†’ coords β†’ feat.

--gpus 0 1 / --gpus -1 -1

Multi-GPU sharding (positive IDs) or multi-CPU workers (-1 entries).

--max_workers

DataLoader workers. 0 forces single-process loading.

--clear_dead_locks, --dead_lock_max_age_hours

Safe cleanup of stale .lock files; active locks are never touched.

--skip_errors

Continue when a slide fails. Errors are recorded in summary.md and wsi_states/.

--segmenter / --seg_conf_thresh

Tissue segmenter and its threshold.

--remove_holes / --remove_artifacts / --remove_penmarks

Mask post-processing.

--mag / --patch_size / --overlap

Patch grid definition (must match between coords and feat).

--min_tissue_proportion

0..1 floor on tissue overlap to keep a patch.

--coords_dir

Custom coords directory (e.g. to feed legacy CLAM coordinates into --task feat).

--dump_patches / --dump_patches_max / --dump_patches_format / --dump_patches_jpeg_quality

Save patch images to disk during coords (debug / QC).

--patch_encoder / --patch_encoder_ckpt_path / --slide_encoder

Encoders. See API page for full list.

--batch_size / --seg_batch_size / --feat_batch_size

Stage-specific batch overrides.

--wsi_dir / --wsi_ext / --search_nested / --custom_list_of_wsis / --custom_mpp_keys / --reader_type

Slide discovery and reader controls.

--wsi_cache / --cache_batch_size

Local cache pipeline for slow source storage.

Raw parser help

For exact defaults, choices, and the complete flag list:

usage: cli_generate.py [-h] [--gpu GPU] [--gpus GPUS [GPUS ...]]
                       [--task {seg,coords,feat,all}] --job_dir JOB_DIR
                       [--skip_errors] [--clear_dead_locks]
                       [--dead_lock_max_age_hours DEAD_LOCK_MAX_AGE_HOURS]
                       [--max_workers MAX_WORKERS] [--batch_size BATCH_SIZE]
                       [--wsi_cache WSI_CACHE]
                       [--cache_batch_size CACHE_BATCH_SIZE] --wsi_dir WSI_DIR
                       [--wsi_ext WSI_EXT [WSI_EXT ...]]
                       [--custom_mpp_keys CUSTOM_MPP_KEYS [CUSTOM_MPP_KEYS ...]]
                       [--custom_list_of_wsis CUSTOM_LIST_OF_WSIS]
                       [--reader_type {openslide,image,cucim,sdpc,omezarr,czi}]
                       [--search_nested] [--segmenter {hest,grandqc,otsu}]
                       [--seg_conf_thresh SEG_CONF_THRESH] [--remove_holes]
                       [--remove_artifacts] [--remove_penmarks]
                       [--seg_batch_size SEG_BATCH_SIZE] [--mag MAG]
                       [--patch_size PATCH_SIZE] [--overlap OVERLAP]
                       [--min_tissue_proportion MIN_TISSUE_PROPORTION]
                       [--coords_dir COORDS_DIR] [--dump_patches]
                       [--dump_patches_max DUMP_PATCHES_MAX]
                       [--dump_patches_format {png,jpg}]
                       [--dump_patches_jpeg_quality DUMP_PATCHES_JPEG_QUALITY]
                       [--patch_encoder {conch_v1,conch_v15,uni_v1,uni_v2,ctranspath,phikon,phikon_v2,resnet50,keep,gigapath,virchow,virchow2,hoptimus0,hoptimus1,h0-mini,musk,openmidnight,gpfm,hibou_l,kaiko-vitb8,kaiko-vitb16,kaiko-vits8,kaiko-vits16,kaiko-vitl14,lunit-vits8,midnight12k,genbio-pathfm,gemma4-e4b,gemma4-26b}]
                       [--patch_encoder_ckpt_path PATCH_ENCODER_CKPT_PATH]
                       [--slide_encoder {threads,titan,prism,chief,gigapath,madeleine,feather,feather_uni_v2,abmil,mean-conch_v1,mean-conch_v15,mean-uni_v1,mean-uni_v2,mean-ctranspath,mean-phikon,mean-resnet50,mean-gigapath,mean-virchow,mean-virchow2,mean-hoptimus0,mean-phikon_v2,mean-musk,mean-hibou_l,mean-kaiko-vit8s,mean-kaiko-vit16s,mean-kaiko-vit8b,mean-kaiko-vit16b,mean-kaiko-vit14l}]
                       [--feat_batch_size FEAT_BATCH_SIZE]

Run Trident

options:
  -h, --help            show this help message and exit
  --gpu GPU             [DEPRECATED] Single GPU index. Use `--gpus <id>`
                        instead.
  --gpus GPUS [GPUS ...]
                        Optional space-separated list of GPU indices to enable
                        multi-GPU execution.
  --task {seg,coords,feat,all}
                        Task to run: seg (segmentation), coords (save tissue
                        coordinates), img (save tissue images), feat (extract
                        features).
  --job_dir JOB_DIR     Directory to store outputs.
  --skip_errors         Skip errored slides and continue processing.
  --clear_dead_locks    If set, remove stale `.lock` files under `--job_dir`
                        (safe heuristics) before running.
  --dead_lock_max_age_hours DEAD_LOCK_MAX_AGE_HOURS
                        Max age (hours) before a `.lock` file is considered
                        stale (when its target output is missing). Defaults to
                        24.
  --max_workers MAX_WORKERS
                        Maximum number of workers. Set to 0 to use main
                        process.
  --batch_size BATCH_SIZE
                        Batch size used for segmentation and feature
                        extraction. Will be override by`seg_batch_size` and
                        `feat_batch_size` if you want to use different ones.
                        Defaults to 64.
  --wsi_cache WSI_CACHE
                        Path to a local cache (e.g., SSD) used to speed up
                        access to WSIs stored on slower drives (e.g., HDD).
  --cache_batch_size CACHE_BATCH_SIZE
                        Maximum number of slides to cache locally at once.
                        Helps control disk usage.
  --wsi_dir WSI_DIR     Directory containing WSI files (no nesting allowed).
  --wsi_ext WSI_EXT [WSI_EXT ...]
                        List of allowed file extensions for WSI files.
  --custom_mpp_keys CUSTOM_MPP_KEYS [CUSTOM_MPP_KEYS ...]
                        Custom keys used to store the resolution as MPP
                        (micron per pixel) in your list of whole-slide image.
  --custom_list_of_wsis CUSTOM_LIST_OF_WSIS
                        Custom list of WSIs specified in a csv file.
  --reader_type {openslide,image,cucim,sdpc,omezarr,czi}
                        Force the use of a specific WSI image reader. Options
                        are ["openslide", "image", "cucim", "sdpc", "omezarr",
                        "czi"]. Defaults to None (auto-determine which reader
                        to use).
  --search_nested       If set, recursively search for whole-slide images
                        (WSIs) within all subdirectories of `wsi_source`. Uses
                        `os.walk` to include slides from nested folders. This
                        allows processing of datasets organized in
                        hierarchical structures. Defaults to False (only top-
                        level slides are included).
  --segmenter {hest,grandqc,otsu}
                        Type of tissue vs background segmenter. Options are
                        HEST, GrandQC, or Otsu.
  --seg_conf_thresh SEG_CONF_THRESH
                        Confidence threshold to apply to binarize segmentation
                        predictions. Lower this threhsold to retain more
                        tissue. Defaults to 0.5. Try 0.4 as 2nd option.
  --remove_holes        Do you want to remove holes?
  --remove_artifacts    Do you want to run an additional model to remove
                        artifacts (including penmarks, blurs, stains, etc.)?
  --remove_penmarks     Do you want to run an additional model to remove
                        penmarks?
  --seg_batch_size SEG_BATCH_SIZE
                        Batch size for segmentation. Defaults to None (use
                        `batch_size` argument instead).
  --mag MAG             Magnification for coords/features extraction. Supports
                        fractional values (e.g., 1.25x, 2.5x, 5x, etc.).
  --patch_size PATCH_SIZE
                        Patch size for coords/image extraction.
  --overlap OVERLAP     Absolute overlap for patching in pixels. Defaults to
                        0.
  --min_tissue_proportion MIN_TISSUE_PROPORTION
                        Minimum proportion of the patch under tissue to be
                        kept. Between 0. and 1.0. Defaults to 0.
  --coords_dir COORDS_DIR
                        Directory to save/restore tissue coordinates.
  --dump_patches        During the coords task, also dump patch images (PNGs)
                        to disk.
  --dump_patches_max DUMP_PATCHES_MAX
                        Max number of patch images to dump per slide (0 = no
                        limit).
  --dump_patches_format {png,jpg}
                        Patch image format to dump (png or jpg). Defaults to
                        png.
  --dump_patches_jpeg_quality DUMP_PATCHES_JPEG_QUALITY
                        JPEG quality (1-100) when --dump_patches_format=jpg.
                        Defaults to 90.
  --patch_encoder {conch_v1,conch_v15,uni_v1,uni_v2,ctranspath,phikon,phikon_v2,resnet50,keep,gigapath,virchow,virchow2,hoptimus0,hoptimus1,h0-mini,musk,openmidnight,gpfm,hibou_l,kaiko-vitb8,kaiko-vitb16,kaiko-vits8,kaiko-vits16,kaiko-vitl14,lunit-vits8,midnight12k,genbio-pathfm,gemma4-e4b,gemma4-26b}
                        Patch encoder to use
  --patch_encoder_ckpt_path PATCH_ENCODER_CKPT_PATH
                        Optional local path to a patch encoder checkpoint
                        (.pt, .pth, .bin, or .safetensors). This is only
                        needed in offline environments (e.g., compute clusters
                        without internet). If not provided, models are
                        downloaded automatically from Hugging Face. You can
                        also specify local paths via the model registry at
                        `./trident/patch_encoder_models/local_ckpts.json`.
  --slide_encoder {threads,titan,prism,chief,gigapath,madeleine,feather,feather_uni_v2,abmil,mean-conch_v1,mean-conch_v15,mean-uni_v1,mean-uni_v2,mean-ctranspath,mean-phikon,mean-resnet50,mean-gigapath,mean-virchow,mean-virchow2,mean-hoptimus0,mean-phikon_v2,mean-musk,mean-hibou_l,mean-kaiko-vit8s,mean-kaiko-vit16s,mean-kaiko-vit8b,mean-kaiko-vit16b,mean-kaiko-vit14l}
                        Slide encoder to use
  --feat_batch_size FEAT_BATCH_SIZE
                        Batch size for feature extraction. Defaults to None
                        (use `batch_size` argument instead).