This self-supervised contrastive learning pipeline offers a unified framework for
learning powerful image representations without labels. It supports five leading
model families, each leveraging a distinct strategy to build invariance into the
learned feature space:
SimCLR: Learns by pulling together different random augmentations
of the same image and pushing apart other images, using a temperature-scaled
contrastive loss to shape the embedding geometry.
DINO: A teacher–student approach that distills knowledge from
a momentum-updated network into a student across multi-crop views. Particularly
effective with Vision Transformers, it yields semantically rich feature maps.
SimSiam: Eliminates the need for negative pairs by predicting
one view’s representation from another through a siamese network with a stop-gradient.
BYOL: Bootstrap Your Own Latent uses two networks (online & target)
and an asymmetrical predictor to avoid collapse, learning by minimizing the distance
between online predictions and target projections.
MoCo: Maintains a dynamic memory bank (queue) of embeddings
and a momentum encoder, enabling large-scale contrastive learning with consistent
negatives even on small batches.
Pipeline Capabilities
Configurable Augmentations: Swap in SimCLR’s heavy random crops,
SimSiam’s minimal pipeline, DINO’s multi-crop strategy, or MoCo’s queue-based negatives
via simple YAML toggles.
Scalable Training: PyTorch Lightning handles mixed-precision,
multi-GPU, checkpointing, and logging to TensorBoard or Weights & Biases.
Flexible Backbones: Use ResNets or Vision Transformers; easily extend
to custom architectures.
Embedding Extraction: Export high-dimensional feature vectors for
downstream classification, retrieval, or clustering tasks.
Interactive Visualization: Generate 2D UMAP projections, nearest-neighbor
galleries, and hexbin mappings of both observable and hidden (unobservable) properties.
Examples of Learned Representations
Below we showcase how the contrastive models capture meaningful structure in image data.
Each example includes a UMAP projection and nearest-neighbor retrievals (or hexbin histograms)
to illustrate clustering by morphology or physical properties.
🪼 Jellyfish Galaxies
Data Source:
Zooniverse – Cosmological Jellyfish
Using galaxy cutouts from this project, we trained a SimCLR model to learn morphology-aware embeddings.
UMAP Projection
In this projection, galaxies with similar tail-like morphology cluster tightly, revealing the model’s ability to distinguish visual features purely from contrastive signals.
Nearest Neighbors Visualization
For each query image, the top-10 nearest neighbors are shown with model-inferred “jellyfish” probability scores. High visual similarity and probabilities confirm robust clustering in embedding space.
🌌 X-ray Galaxy Clusters (TNG-Cluster)
Data Source:
TNG-Cluster Simulations
We applied DINO on raw X-ray maps across multiple snapshots to uncover morphological groupings.
UMAP Projection
Distinct regions correspond to different cluster morphologies—relaxed, merging, or cool-core systems—demonstrating DINO’s capacity to encode high-level astrophysical features.
Nearest Neighbors Visualization
Query cluster (left) and its top-9 nearest neighbors reveal strong morphological consistency, supporting the embedding’s semantic organization.