Quickstartο
This page is a working reference for the CLI: what to run, what knobs matter, and where outputs land.
If you are new: start with one slide, then scale up.
# 1. validate settings on one slide
python run_single_slide.py --slide_path ./wsis/example.svs --job_dir ./out \
--patch_encoder uni_v1 --mag 20 --patch_size 256
# 2. run the full batch when happy
python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./out \
--patch_encoder uni_v1 --mag 20 --patch_size 256 --skip_errors
End-to-end pipelineο
--task all runs the three stages in order. You can also run them individually
on the same --job_dir β TRIDENT will pick up where it left off.
python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./out \
--patch_encoder uni_v1 --mag 20 --patch_size 256
This produces:
Tissue segmentation (
contours/,contours_geojson/,thumbnails/).Patch coordinates (
<mag>x_<patch>px_<overlap>px_overlap/patches/<slide>_patches.h5).Patch features (
<mag>x_<patch>px_<overlap>px_overlap/features_<encoder>/<slide>.h5).
Equivalent unified CLI:
trident batch -- --task all --wsi_dir ./wsis --job_dir ./out --patch_encoder uni_v1 --mag 20 --patch_size 256
trident single -- --slide_path ./wsis/example.svs --job_dir ./out --patch_encoder uni_v1 --mag 20 --patch_size 256
trident doctor -- --profile base
Outputs and run trackingο
In your --job_dir, TRIDENT writes:
summary.md: appended once per run; counts (completed / skipped / errored), per-encoder breakdown, and a short error list.runs/<run_id>.json: one manifest per CLI invocation (args, timestamps, status).wsi_states/<slide>__<hash>.json: per-slide machine-readable state (tasks, attempts, outputs, last error, resume info).contours/+contours_geojson/: tissue masks (open.geojsonin QuPath to QC/edit).<mag>x_<patch>px_<overlap>px_overlap/: per-config coords and feature dirs.
Resume and skip behaviorο
Re-running on the same --job_dir is the recommended way to retry / extend a job:
If the expected output for a (slide, task) already exists and is not locked, TRIDENT marks the task skipped. No recomputation.
.lockfiles mark tasks that are currently being written. If a worker crashes mid-task, the lock can become stale (an βorphanβ). Clean those safely with:python run_batch_of_slides.py --clear_dead_locks --dead_lock_max_age_hours 24 \ --task all --wsi_dir ./wsis --job_dir ./out ...
This removes only locks where (a) the target output already exists, or (b) the writer PID is dead on this host, or (c) the lock is unreadable / legacy and older than
--dead_lock_max_age_hours(default 24). Active locks from running jobs are never removed.
Multi-GPU and multi-workerο
Use --gpus to shard pending slides across devices:
# Two GPUs (production)
python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./out \
--patch_encoder uni_v1 --mag 20 --patch_size 256 --gpus 0 1
# Two CPU workers (no GPU)
python run_batch_of_slides.py --task seg --wsi_dir ./wsis --job_dir ./out \
--segmenter otsu --gpus -1 -1
Notes:
Pending slides are sharded round-robin across the listed GPU IDs.
Duplicate positive GPU IDs are deduplicated (running two workers on the same CUDA device wastes memory). Duplicate
-1entries are kept (each is an independent CPU worker).--gpu(singular) is the legacy form and still works, but prefer--gpus.
Caching for slow / network storageο
If WSIs sit on a slow network drive, copy them in batches to a local SSD via the producer/consumer cache pipeline:
python run_batch_of_slides.py --task all --wsi_dir /mnt/nfs/wsis --job_dir ./out \
--patch_encoder uni_v1 --mag 20 --patch_size 256 \
--gpus 0 1 \
--wsi_cache /local/ssd/cache --cache_batch_size 32
The cache directory is wiped and recreated at the start of each run (this is separate
from lock cleanup, which is opt-in via --clear_dead_locks).
High-signal knobsο
--segmenter:grandqcβ fast, accurate on clean H&E.hestβ better on IHC / dirtier slides.otsuβ CPU-only fallback, no model weights needed.
--mag/--patch_size/--overlapdefine the patch grid; the same values must be used acrosscoordsandfeatruns.--min_tissue_proportion(0.0 to 1.0) raises the bar for keeping a patch; 0.3β0.7 removes many weak edge patches.--remove_artifacts/--remove_penmarks: extra artifact-cleaning segmentation pass.--search_nested: discover slides in nested subfolders.--custom_list_of_wsis my.csv: process a CSV subset (columnwsiwith paths relative to--wsi_dir; optionalmppcolumn).--reader_type {openslide,cucim,image,sdpc,omezarr,czi}: force a backend, mostly for debugging.--max_workers 0: force single-process data loading (use this if your environment has DataLoader multiprocessing issues).
Stage-only examplesο
Segmentation only
python run_batch_of_slides.py --task seg --wsi_dir ./wsis --job_dir ./out --segmenter grandqc
Patching only (with patch images for QC)
python run_batch_of_slides.py --task coords --wsi_dir ./wsis --job_dir ./out \
--mag 20 --patch_size 256 \
--dump_patches --dump_patches_max 100 --dump_patches_format jpg --dump_patches_jpeg_quality 90
Feature extraction only (reusing existing coords)
python run_batch_of_slides.py --task feat --wsi_dir ./wsis --job_dir ./out \
--patch_encoder uni_v1 --mag 20 --patch_size 256
Slide-level embeddings
python run_batch_of_slides.py --task feat --wsi_dir ./wsis --job_dir ./out \
--slide_encoder titan --mag 20 --patch_size 512
If patch features for the required underlying encoder donβt exist, TRIDENT extracts them automatically.
Convert awkward formats to pyramidal TIFF
trident convert --input_dir ./wsis --mpp_csv ./wsis/to_process.csv --job_dir ./pyramidal_tiff --downscale_by 1 --num_workers 1
Common failure modesο
βPatch features not foundβ during slide embeddings: each slide encoder requires a specific patch encoder (mapping in
trident.slide_encoder_models.load.slide_to_patch_encoder_name). Run patch features with the right encoder, or let TRIDENT auto-extract them by passing--slide_encoder.OOM during feature extraction: lower
--feat_batch_size(or--batch_size), or pick a smaller patch encoder / patch size.No slides discovered: add
--search_nestedfor nested layouts; or check that your CSV uses the column namewsiand relative paths under--wsi_dir.Pipeline looks stuck: check for stale
.lockfiles. After confirming no TRIDENT process is running, re-run with--clear_dead_locks.Offline / no internet: set
HF_TOKENonly when needed; otherwise put weights intotrident/*/local_ckpts.jsonor pass--patch_encoder_ckpt_path.
Argument cheat sheetο
The list below is not exhaustive β for full defaults and choices, scroll to βRaw parser helpβ.
Flag |
Use |
|---|---|
|
Pipeline stage. |
|
Multi-GPU sharding (positive IDs) or multi-CPU workers ( |
|
DataLoader workers. |
|
Safe cleanup of stale |
|
Continue when a slide fails. Errors are recorded in |
|
Tissue segmenter and its threshold. |
|
Mask post-processing. |
|
Patch grid definition (must match between |
|
0..1 floor on tissue overlap to keep a patch. |
|
Custom coords directory (e.g. to feed legacy CLAM coordinates into |
|
Save patch images to disk during |
|
Encoders. See API page for full list. |
|
Stage-specific batch overrides. |
|
Slide discovery and reader controls. |
|
Local cache pipeline for slow source storage. |
Raw parser helpο
For exact defaults, choices, and the complete flag list:
usage: cli_generate.py [-h] [--gpu GPU] [--gpus GPUS [GPUS ...]]
[--task {seg,coords,feat,all}] --job_dir JOB_DIR
[--skip_errors] [--clear_dead_locks]
[--dead_lock_max_age_hours DEAD_LOCK_MAX_AGE_HOURS]
[--max_workers MAX_WORKERS] [--batch_size BATCH_SIZE]
[--wsi_cache WSI_CACHE]
[--cache_batch_size CACHE_BATCH_SIZE] --wsi_dir WSI_DIR
[--wsi_ext WSI_EXT [WSI_EXT ...]]
[--custom_mpp_keys CUSTOM_MPP_KEYS [CUSTOM_MPP_KEYS ...]]
[--custom_list_of_wsis CUSTOM_LIST_OF_WSIS]
[--reader_type {openslide,image,cucim,sdpc,omezarr,czi}]
[--search_nested] [--segmenter {hest,grandqc,otsu}]
[--seg_conf_thresh SEG_CONF_THRESH] [--remove_holes]
[--remove_artifacts] [--remove_penmarks]
[--seg_batch_size SEG_BATCH_SIZE] [--mag MAG]
[--patch_size PATCH_SIZE] [--overlap OVERLAP]
[--min_tissue_proportion MIN_TISSUE_PROPORTION]
[--coords_dir COORDS_DIR] [--dump_patches]
[--dump_patches_max DUMP_PATCHES_MAX]
[--dump_patches_format {png,jpg}]
[--dump_patches_jpeg_quality DUMP_PATCHES_JPEG_QUALITY]
[--patch_encoder {conch_v1,conch_v15,uni_v1,uni_v2,ctranspath,phikon,phikon_v2,resnet50,keep,gigapath,virchow,virchow2,hoptimus0,hoptimus1,h0-mini,musk,openmidnight,gpfm,hibou_l,kaiko-vitb8,kaiko-vitb16,kaiko-vits8,kaiko-vits16,kaiko-vitl14,lunit-vits8,midnight12k,genbio-pathfm,gemma4-e4b,gemma4-26b}]
[--patch_encoder_ckpt_path PATCH_ENCODER_CKPT_PATH]
[--slide_encoder {threads,titan,prism,chief,gigapath,madeleine,feather,feather_uni_v2,abmil,mean-conch_v1,mean-conch_v15,mean-uni_v1,mean-uni_v2,mean-ctranspath,mean-phikon,mean-resnet50,mean-gigapath,mean-virchow,mean-virchow2,mean-hoptimus0,mean-phikon_v2,mean-musk,mean-hibou_l,mean-kaiko-vit8s,mean-kaiko-vit16s,mean-kaiko-vit8b,mean-kaiko-vit16b,mean-kaiko-vit14l}]
[--feat_batch_size FEAT_BATCH_SIZE]
Run Trident
options:
-h, --help show this help message and exit
--gpu GPU [DEPRECATED] Single GPU index. Use `--gpus <id>`
instead.
--gpus GPUS [GPUS ...]
Optional space-separated list of GPU indices to enable
multi-GPU execution.
--task {seg,coords,feat,all}
Task to run: seg (segmentation), coords (save tissue
coordinates), img (save tissue images), feat (extract
features).
--job_dir JOB_DIR Directory to store outputs.
--skip_errors Skip errored slides and continue processing.
--clear_dead_locks If set, remove stale `.lock` files under `--job_dir`
(safe heuristics) before running.
--dead_lock_max_age_hours DEAD_LOCK_MAX_AGE_HOURS
Max age (hours) before a `.lock` file is considered
stale (when its target output is missing). Defaults to
24.
--max_workers MAX_WORKERS
Maximum number of workers. Set to 0 to use main
process.
--batch_size BATCH_SIZE
Batch size used for segmentation and feature
extraction. Will be override by`seg_batch_size` and
`feat_batch_size` if you want to use different ones.
Defaults to 64.
--wsi_cache WSI_CACHE
Path to a local cache (e.g., SSD) used to speed up
access to WSIs stored on slower drives (e.g., HDD).
--cache_batch_size CACHE_BATCH_SIZE
Maximum number of slides to cache locally at once.
Helps control disk usage.
--wsi_dir WSI_DIR Directory containing WSI files (no nesting allowed).
--wsi_ext WSI_EXT [WSI_EXT ...]
List of allowed file extensions for WSI files.
--custom_mpp_keys CUSTOM_MPP_KEYS [CUSTOM_MPP_KEYS ...]
Custom keys used to store the resolution as MPP
(micron per pixel) in your list of whole-slide image.
--custom_list_of_wsis CUSTOM_LIST_OF_WSIS
Custom list of WSIs specified in a csv file.
--reader_type {openslide,image,cucim,sdpc,omezarr,czi}
Force the use of a specific WSI image reader. Options
are ["openslide", "image", "cucim", "sdpc", "omezarr",
"czi"]. Defaults to None (auto-determine which reader
to use).
--search_nested If set, recursively search for whole-slide images
(WSIs) within all subdirectories of `wsi_source`. Uses
`os.walk` to include slides from nested folders. This
allows processing of datasets organized in
hierarchical structures. Defaults to False (only top-
level slides are included).
--segmenter {hest,grandqc,otsu}
Type of tissue vs background segmenter. Options are
HEST, GrandQC, or Otsu.
--seg_conf_thresh SEG_CONF_THRESH
Confidence threshold to apply to binarize segmentation
predictions. Lower this threhsold to retain more
tissue. Defaults to 0.5. Try 0.4 as 2nd option.
--remove_holes Do you want to remove holes?
--remove_artifacts Do you want to run an additional model to remove
artifacts (including penmarks, blurs, stains, etc.)?
--remove_penmarks Do you want to run an additional model to remove
penmarks?
--seg_batch_size SEG_BATCH_SIZE
Batch size for segmentation. Defaults to None (use
`batch_size` argument instead).
--mag MAG Magnification for coords/features extraction. Supports
fractional values (e.g., 1.25x, 2.5x, 5x, etc.).
--patch_size PATCH_SIZE
Patch size for coords/image extraction.
--overlap OVERLAP Absolute overlap for patching in pixels. Defaults to
0.
--min_tissue_proportion MIN_TISSUE_PROPORTION
Minimum proportion of the patch under tissue to be
kept. Between 0. and 1.0. Defaults to 0.
--coords_dir COORDS_DIR
Directory to save/restore tissue coordinates.
--dump_patches During the coords task, also dump patch images (PNGs)
to disk.
--dump_patches_max DUMP_PATCHES_MAX
Max number of patch images to dump per slide (0 = no
limit).
--dump_patches_format {png,jpg}
Patch image format to dump (png or jpg). Defaults to
png.
--dump_patches_jpeg_quality DUMP_PATCHES_JPEG_QUALITY
JPEG quality (1-100) when --dump_patches_format=jpg.
Defaults to 90.
--patch_encoder {conch_v1,conch_v15,uni_v1,uni_v2,ctranspath,phikon,phikon_v2,resnet50,keep,gigapath,virchow,virchow2,hoptimus0,hoptimus1,h0-mini,musk,openmidnight,gpfm,hibou_l,kaiko-vitb8,kaiko-vitb16,kaiko-vits8,kaiko-vits16,kaiko-vitl14,lunit-vits8,midnight12k,genbio-pathfm,gemma4-e4b,gemma4-26b}
Patch encoder to use
--patch_encoder_ckpt_path PATCH_ENCODER_CKPT_PATH
Optional local path to a patch encoder checkpoint
(.pt, .pth, .bin, or .safetensors). This is only
needed in offline environments (e.g., compute clusters
without internet). If not provided, models are
downloaded automatically from Hugging Face. You can
also specify local paths via the model registry at
`./trident/patch_encoder_models/local_ckpts.json`.
--slide_encoder {threads,titan,prism,chief,gigapath,madeleine,feather,feather_uni_v2,abmil,mean-conch_v1,mean-conch_v15,mean-uni_v1,mean-uni_v2,mean-ctranspath,mean-phikon,mean-resnet50,mean-gigapath,mean-virchow,mean-virchow2,mean-hoptimus0,mean-phikon_v2,mean-musk,mean-hibou_l,mean-kaiko-vit8s,mean-kaiko-vit16s,mean-kaiko-vit8b,mean-kaiko-vit16b,mean-kaiko-vit14l}
Slide encoder to use
--feat_batch_size FEAT_BATCH_SIZE
Batch size for feature extraction. Defaults to None
(use `batch_size` argument instead).