Frequently Asked Questionsď
This page groups common questions by theme.
Troubleshooting by symptomď
Found 0 slides / missing files â why no slides
Pipeline looks stuck (locks) â locks
Re-run / resume on same job_dir â resume
Job is slow â performance
Slide embeddings complain about missing patch features â missing patch features
Offline cluster / no internet â offline
One slide keeps failing â debug one slide
Where are outputs / what happened â where results are
Getting started and discoveryď
How do I extract embeddings from legacy CLAM coordinates?
Use the âcoords_dir flag to pass CLAM-style patch coordinates:
python run_batch_of_slides.py --task feat --wsi_dir wsis --job_dir legacy_dir --coords_dir extracted_coords --patch_encoder uni_v1
TRIDENT says âFound 0 valid slidesâ. Why?
Common causes:
Your folder is nested: add
--search_nested.Your extension filter is too strict: remove
--wsi_extor include the right extensions.You used
--custom_list_of_wsisbut the CSV is wrong: - CSV must contain awsicolumn - values must be relative paths under--wsi_dir(e.g.,patientA/slide.svs)
My WSIs are in multiple subfolders. How can I process them all?
By default, only the top-level directory is scanned. Use âsearch_nested to recursively search for WSIs in all nested folders and include them in processing.
What does `âreader_type` do? Which one should I use?
TRIDENT can force a reader backend. Use this mostly for debugging:
openslide: default for many WSI formats (.svs,.tif/.tiff,.ndpi,.mrxs, âŚ; also.dcmif your OpenSlide build supports it)cucim: GPU-friendly WSI reading (when available)image: standard images via PIL (.png,.jpg/.jpeg)sdpc: SDPC filesomezarr: OME-Zarr / NGFF Zarrczi: Zeiss CZI (requires the optional CZI dependency)
Metadata and patching semanticsď
My WSIs have no micron-per-pixel (MPP) or magnification metadata. What should I do?
PNGs and JPEGs do not store MPP metadata in the file itself. If youâre working with such formats, passing a CSV via âcustom_list_of_wsis is required. This CSV should include at least two columns: wsi and mpp.
Example:
wsi,mpp
TCGA-AJ-A8CV-01Z-00-DX1_1.png,0.25
TCGA-AJ-A8CV-01Z-00-DX1_2.png,0.25
TCGA-AJ-A8CV-01Z-00-DX1_3.png,0.25
If youâre using OpenSlide-readable formats (e.g., .svs, .tiff), this CSV is optionalâbut you can still use it to:
Restrict processing to a specific subset of slides
Override incorrect MPP metadata
I want to skip patches on holes.
By default, TRIDENT includes all tissue patches (including holes). Use âremove_holes to exclude them. Not recommended, as âholesâ often help define the tissue microenvironment.
Models and compute (GPU/CPU)ď
Which tissue vs. background segmenter should I use?
TRIDENT supports three segmenters:
hest: preferred for IHC and dirtier slides.grandqc: often preferred for clean H&E workflows (fast and reliable).otsu: image-processing-only fallback (no segmentation weights required), runs at 1.25x on CPU.
Which tasks need GPU and which are fine on CPU?
Segmentation: -
hestandgrandqcuse GPU. - Optional artifact cleanup (--remove_artifacts/--remove_penmarks) adds additional segmentation cost. -otsuruns on CPU.Patching: - CPU-only; usually fast, but can be CPU-intensive on very large slides or heavy overlap settings. - Use
--min_tissue_proportionto require more tissue overlap and reduce weak/edge patches. - For debugging, you can also dump patch images during the coords task using:--dump_patches --dump_patches_format {png,jpg} --dump_patches_jpeg_quality 90 --dump_patches_max 100.Feature extraction: - Patch-level and slide-level feature extraction require GPU in practice.
How do I scale across multiple GPUs (or multiple CPU workers)?
Pass the list of GPU IDs to --gpus. Pending slides are sharded round-robin across
the listed devices and one worker process is spawned per shard.
# 4 GPUs
python run_batch_of_slides.py --task all --wsi_dir ./wsis --job_dir ./out \
--patch_encoder uni_v1 --mag 20 --patch_size 256 --gpus 0 1 2 3
# No GPU: two CPU workers
python run_batch_of_slides.py --task seg --wsi_dir ./wsis --job_dir ./out \
--segmenter otsu --gpus -1 -1
Notes:
Duplicate positive GPU IDs are deduplicated (running two workers on the same CUDA device wastes memory). Duplicate
-1entries are kept â each is an independent CPU worker.The legacy
--gpu N(singular) flag still works but--gpusis preferred. If both are passed,--gpuswins and a warning is printed.If you hit DataLoader multiprocessing pickling issues, set
--max_workers 0to force main-process data loading.
Performanceď
My job is slow. What are the usual bottlenecks?
I/O bound (common on network drives): enable
--wsi_cacheon a local SSD.GPU bound (feature extraction): reduce
--feat_batch_size/--batch_sizeif you see OOM.Too many patches: increase
--min_tissue_proportionor decrease overlap.
I donât have enough local SSD storage and my WSIs are on a slow remote disk. How can I accelerate processing?
When WSIs are stored on slow network or external drives, processing can be very slow. Use âwsi_cache ./cache âcache_batch_size 32 to enable local caching. WSIs will be copied in batches to a local SSD, processed in parallel, and automatically cleaned up after use. This significantly reduces I/O bottlenecks.
Why does `trident convert` exist if TRIDENT already reads many formats?
The converter is mainly for uncommon formats that OpenSlide does not handle well. It uses BioFormats-backed readers when possible, then writes pyramidal TIFF outputs for downstream workflows.
Reliability, resume, and debuggingď
Where can I see what TRIDENT has done (and what failed)?
In your --job_dir:
summary.md: appended once per run; compact counts and per-model breakdown, plus a short error list.runs/<run_id>.json: per-run JSON manifest (args, timestamps, status).wsi_states/<slide>__<hash>.json: per-slide state (task attempts, outputs, and resume info).
How do I safely re-run or resume a job on an existing `âjob_dir`?
TRIDENT is designed so that re-running on the same ``âjob_dir`` is usually safe:
If an output already exists and is not locked, the corresponding task is marked skipped.
State is persisted under
wsi_states/; you can inspect it to see what has already run.When in doubt, start by re-running only one stage (e.g.,
--task feat) instead of--task all.
How should I read the `wsi_states/*.json` files? What do the fields mean?
At a high level:
slide: identity (name, extension, absolute path, reader type).meta: one-shot WSI metadata snapshot (e.g., dimensions, mpp, level_count) when available.tasks: one entry per logical task (segmentation,coords,patch_features:<encoder>,slide_features:<encoder>) with: -status(not_started,running,completed,skipped,error), -reason(why it was skipped/errored, if known), -attempts(merged start/finish records with timestamps and durations), -outputs(paths + existence/bytes).summary: a compact view of task statuses, grouped by patch/slide encoder.resume: last task + status + last error, useful when debugging failed runs.
How do locks (`.lock` files) work and when is it safe to remove them?
TRIDENT uses simple filesystem locks (<output>.lock) to avoid two workers writing the same file:
A task creates a
.lockfile when it starts, and removes it on success or handled error. The lock file contains JSON metadata (PID, hostname, timestamp) so stale locks can be detected safely.If you see a stale
.lockfile but no corresponding running process, it usually means a crash or interruption.The safest way to clean up stale locks is to pass
--clear_dead_lockson the next run:python run_batch_of_slides.py --clear_dead_locks --dead_lock_max_age_hours 24 ...
This removes only locks where (a) the target output already exists, (b) the writer PID is not running on this host, or (c) the lock is a legacy/unreadable lock older than
--dead_lock_max_age_hours(default 24). Active locks from running jobs are never touched.You can also delete
.lockfiles manually, but only after confirming no TRIDENT process is still running for that job.
Iâm running slide embeddings and it says âPatch features not foundâ.
Slide encoders require a specific patch encoder (internal mapping).
Fix:
run patch features for the required encoder under the same
coords_dir, orrun slide features and let TRIDENT auto-extract missing patch features.
One slide keeps failing while `âskip_errors` is on. How do I debug it?
Check
summary.mdand the slideâs entry inwsi_states/to see which task and error are reported.Re-run a small test focusing only on that slide:
via API: use
load_wsi+ a minimal pipeline around the failing step, orvia CLI: create a CSV for that slide only (
--custom_list_of_wsis) and re-run the relevant task.
Once you understand/fix the cause, you can safely re-run the full batch with
--skip_errorsagain.
Environment and advanced usageď
Which Python versions are supported? What about 3.12+?
TRIDENT is tested and packaged for Python 3.10 and 3.11 (see pyproject.toml).
Python 3.12+ may work at the pure-Python level, but binary dependencies (PyTorch, OpenSlide, etc.) and some pinned versions are not guaranteed to be compatible.
For production use, stick to 3.10/3.11 until explicit 3.12+ support is advertised.
How can I control where TRIDENT stores downloaded weights and caches?
TRIDENT follows a simple hierarchy:
If
TRIDENT_HOMEis set, weights and related files go under that directory.Else, it falls back to
$XDG_CACHE_HOME/trident(defaulting to~/.cache/tridentif unset).
On clusters with small home directories, point TRIDENT_HOME or XDG_CACHE_HOME to a larger scratch or project disk.
Can I plug in my own custom patch or slide encoder?
Yes. The recommended approach is:
Wrap your patch encoder in
CustomInferenceEncoder(seetrident/patch_encoder_models/load.py).Wrap your slide encoder in
CustomSlideEncoder(seetrident/slide_encoder_models/load.py).Use the API (
Processor+ your custom encoder) rather than the CLI for these advanced cases.
This way, you still benefit from TRIDENTâs I/O and patching pipeline while controlling the model.
I work on a cluster without Internet access. How can I use models offline?
You can use local checkpoint files by editing the model registry files in Trident. This allows you to cache or pre-download all necessary models for both segmentation and patch encoding.
1. Segmentation Models
Update the segmentation model registry at: trident/segmentation_models/local_ckpts.json
Example:
{
"hest": "./ckpts/trident/deeplabv3_seg_v4.ckpt",
"grandqc": "./ckpts/trident/Tissue_Detection_MPP10.pth",
"grandqc_artifact": "./ckpts/trident/GrandQC_MPP1_state_dict.pth"
}
2. Patch Encoder Models
Update the patch encoder model registry at: trident/patch_encoder_models/local_ckpts.json
Example:
{
"conch_v1": "./ckpts/conch_patch_encoder/pytorch_model.bin",
"uni_v1": "./ckpts/uni_patch_encoder/pytorch_model.bin",
"uni_v2": "./ckpts/uni2_patch_encoder/pytorch_model.bin",
"ctranspath": "./ckpts/ctranspath_patch_encoder/CHIEF_CTransPath.pth",
"phikon": "./ckpts/phikon_patch_encoder/pytorch_model.bin",
"resnet50": "./ckpts/resnet_patch_encoder/pytorch_model.bin",
"gigapath": "./ckpts/gigapath_patch_encoder/pytorch_model.bin",
"virchow": "./ckpts/virchow_patch_encoder/pytorch_model.bin",
"virchow2": "./ckpts/virchow2_patch_encoder/pytorch_model.bin",
"hoptimus0": "./ckpts/hoptimus0_patch_encoder/pytorch_model.bin",
"hoptimus1": "./ckpts/hoptimus1_patch_encoder/pytorch_model.bin",
"phikon_v2": "./ckpts/phikon-v2_patch_encoder/model.safetensors",
"kaiko-vitb8": "./ckpts/kaiko_vitb8_patch_encoder/model.safetensors",
"kaiko-vitb16": "./ckpts/kaiko_vitb16_patch_encoder/model.safetensors",
"kaiko-vits8": "./ckpts/kaiko_vits8_patch_encoder/model.safetensors",
"kaiko-vits16": "./ckpts/kaiko_vits16_patch_encoder/model.safetensors",
"kaiko-vitl14": "./ckpts/kaiko_vitl14_patch_encoder/model.safetensors",
"lunit-vits8": "./ckpts/lunit_patch_encoder/model.safetensors",
"conch_v15": "./ckpts/conchv1_5_patch_encoder/pytorch_model_vision.bin"
}
3. Alternative Option
You can also directly pass a local checkpoint path at runtime using the âpatch_encoder_ckpt_path argument in run_batch_of_slides.py.
4. Optional: Pre-download All Models in Advance
Full credit to @haydenych. If youâd like to automatically download all model weights in advance (e.g., from a connected machine), use the following:
XDG_CACHE_HOME="<YOUR_CACHE_DIR>" HF_TOKEN="<YOUR_HUGGINGFACE_TOKEN>" python run_predownload_weights.py
This will fetch all segmentation, patch encoder, and slide encoder weights supported in Trident.
To run downstream tasks using the cached models:
XDG_CACHE_HOME="<YOUR_CACHE_DIR>" python run_single_slide.py ...
XDG_CACHE_HOME="<YOUR_CACHE_DIR>" python run_batch_of_slides.py ...
Example run_predownload_weights.py script (can be adapted based on needs):
from trident.segmentation_models import segmentation_model_factory
from trident.patch_encoder_models.load import encoder_factory as patch_encoder_model_factory
from trident.slide_encoder_models.load import encoder_factory as slide_encoder_model_factory
segmentation_models = ["hest", "grandqc", "grandqc_artifact", "otsu"]
for model in segmentation_models:
try:
segmentation_model_factory(model)
except Exception as e:
print(f"Failed to download weights for {model}: {e}")
patch_encoder_models = [
"conch_v1", "uni_v1", "uni_v2", "ctranspath", "phikon", "resnet50", "gigapath",
"virchow", "virchow2", "hoptimus0", "hoptimus1", "phikon_v2", "conch_v15",
"musk", "hibou_l", "kaiko-vits8", "kaiko-vits16", "kaiko-vitb8", "kaiko-vitb16",
"kaiko-vitl14", "lunit-vits8"
]
for model in patch_encoder_models:
try:
patch_encoder_model_factory(model)
except Exception as e:
print(f"Failed to download weights for {model}: {e}")
slide_encoder_models = [
"threads", "titan", "prism", "gigapath", "chief", "madeleine", "mean-virchow",
"mean-virchow2", "mean-conch_v1", "mean-conch_v15", "mean-ctranspath", "mean-gigapath",
"mean-resnet50", "mean-hoptimus0", "mean-phikon", "mean-phikon_v2", "mean-musk",
"mean-uni_v1", "mean-uni_v2"
]
for model in slide_encoder_models:
try:
slide_encoder_model_factory(model)
except Exception as e:
print(f"Failed to download weights for {model}: {e}")