- ⸻ 2026-05-12
Re-engineering the PerturBench benchmarking tasks with data lineage
PerturBench (Wu, Wershof, Schmon, Nassar, Osinski, Eksi, Yan, et al., NeurIPS 2025) is a framework for benchmarking machine learning models that predict cellular transcriptional response to perturbations. Its core contributions are benchmarking tasks in the form of curated datasets and definitions of metrics, which are available from GitHub and Hugging Face, albeit without data lineage. To make it easy to see how exactly each dataset came about and assess model performance in light of that context, we re-ran all curation workflows using lineage tracking. We exemplify model training and evaluation, and show equivalence of the re-curated datasets with the originally deposited datasets.
- ⸻ 2026-04-15
Managing spatial omics datasets with SpatialData & LaminDB
Spatial omics technologies — Xenium, Visium, MERFISH, seqFISH, and others — are generating datasets that combine molecular profiling with spatial coordinates.
The SpatialData framework[1] provides a unified format for these heterogeneous datasets: images, segmentation masks, point clouds, shapes, and count tables, all stored in a single .zarr store.
But as spatial datasets accumulate across experiments and technologies, managing, querying, and training models on them becomes a major challenge.
To address this, we have built native SpatialData support into LaminDB, enabling cross-dataset queries, dataset validation, and lineage tracking.
- ⸻ 2026-03-02
Interactive visualization of multimodal and spatial data with Vitessce
The open-source tool Vitessce and Lamin now work together to manage & visualize multimodal and spatial single-cell data. It’s simple: define a Vitessce config in code, save it as an artifact, and share the interactive visualization along with your datasets on LaminHub.
- ⸻ 2026-02-27
Symbolic memory for biological R&D
What should the shared memory layer for agents and humans look like? Will it live in embeddings or in records? A high-level note.
- ⸻ 2024-04-03
MappedCollection: Weighted random sampling from large collections of scRNA-seq datasets
A few labs and companies now train models on large-scale scRNA-seq count matrices and related data modalities. But unlike for many other data types, there isn’t yet a playbook for data scales that don’t fit into memory.
- ⸻ 2022-08-29
nbproject: Manage Jupyter notebooks
nbproject is an open-source Python tool to help manage Jupyter notebooks with metadata, dependency, and integrity tracking. A draft-to-publish workflow creates more reproducible notebooks with context.
- ⸻ 2022-08-27
readfcs: Read FCS files
readfcs is a lightweight open-source Python package that loads data and metadata from Flow Cytometry Standard (FCS) files into DataFrame and AnnData objects, allowing users to flexibly use downstream analytical tools.
- ⸻ 2022-07-31
Key problems of data-heavy R&D
The complexity of modern R&D data often blocks realizing the scientific progress it promises.
- ⸻ 2022-05-04
Hello world!
We just launched lamin.ai as a place for sharing prototypes with our beta customers and collaborators. Over time, we’ll add public releases and use this blog to explain our work.