Battery Particle Segmentation

Segmentation Completed Jan 2025 – May 2025 Project Lead

Documents

Highlights

Battery Particle Segmentation — LoRA-based adaptation of the Segment Anything Model. Trainable low-rank adapters are injected into the frozen image encoder to enable domain-specific representation learning.

Details

Adapting foundation segmentation models to dense, non-describable particle imagery using semi-automatic labeling and parameter-efficient fine-tuning.

Problem

Segmenting particles in scanning electron microscopy (SEM) images is challenging due to dense object packing, irregular particle shapes, and the absence of clear semantic boundaries.
While the Segment Anything Model (SAM) generalizes well to describable objects, it struggles on non-describable, texture-dominated domains such as particle microstructures.

Manual instance-level annotation for such data is prohibitively time-consuming, motivating a workflow that combines automatic mask generation, human-in-the-loop correction, and efficient model adaptation.

Overview

This project followed a three-stage pipeline:

Generate initial instance masks using the SAM Mask Generator
Refine masks through manual removal and correction
Fine-tune SAM using LoRA to adapt it to particle imagery

The goal was to improve both mask quality and instance separation without full model fine-tuning.

Data and Label Generation

Initial annotations were produced using the SAM Mask Generator, which proposes instance masks by sampling a dense grid of points across the image.

Key steps:

Default SAM mask generator parameters were first applied
Hyperparameters (IoU threshold, stability score, minimum area) were tuned to reduce over-segmentation
Generated masks were indexed and visually inspected
Incorrect, merged, or spurious masks were manually removed
Missing particle regions were added where necessary

This semi-automatic process produced high-quality image–mask pairs suitable for supervised fine-tuning.

Model Adaptation with LoRA

Rather than fully fine-tuning SAM, Low-Rank Adaptation (LoRA) was used to achieve parameter-efficient domain adaptation.

Design choices:

The image encoder was frozen to preserve general visual representations
LoRA layers were inserted into transformer blocks of the encoder
Only the low-rank adapter parameters were updated during training
Prompt and mask decoders remained unchanged

This strategy enables near full fine-tuning performance with significantly fewer trainable parameters and no added inference latency :contentReference[oaicite:2]{index=2}.

Training Setup

Dataset size: 243 SEM images
Augmented samples: 5,820 image–mask pairs
Input resolution: 512 × 512
Train / validation / test split: 70% / 20% / 10%
Loss function: Dice + Cross-Entropy
Optimizer: Adam with weight decay (0.01)
Learning rate: 1e-4
Batch size: 8
Training epochs: 40
Experiment tracking: Weights & Biases

Results

Quantitative Performance

Average test Dice score: 96.14
Significant improvement over the base SAM mask generator
Stable convergence across training and validation splits

Training curves for particle segmentation

Training and validation curves showing stable optimization and consistent gains in accuracy and mIoU.

Qualitative Improvements

Improved instance separation in densely packed regions
Reduced under- and over-segmentation
More stable mask boundaries across scales
Clear improvement in SAM predictor outputs after adaptation

Across multiple test cases, LoRA-adapted SAM improved Dice scores from the high-80s to mid-90s, with the largest gains observed in dense particle clusters :contentReference[oaicite:3]{index=3}.

Ablation Insights

Increasing LoRA rank improves performance up to a point
Rank 512 achieved the best trade-off between accuracy and parameter count
Higher ranks provided diminishing returns while increasing training cost
Optimal learning rate was 1e-4; both higher and lower rates degraded performance

Notes and Lessons Learned

Foundation models require explicit domain adaptation for non-describable imagery
SAM Mask Generator is effective for bootstrapping labels but insufficient alone
Human-in-the-loop correction dramatically improves downstream performance
LoRA provides a practical balance between adaptability and efficiency
Encoder-side adaptation is more impactful than decoder-side tuning for dense texture domains

This project informed later work on memory-augmented segmentation, guided adaptation, and efficient fine-tuning of large vision models.

Stack

Pytorch SAM Finetuning