⸻ 2026-06-09

Simpler queries for the 2.5B transcriptional profiles of the Arc Virtual Cell Atlas

Sunny Sun, Sergei Rybakov, Frederic Enard, Chaichontat Sriworarat, Alex Wolf

With 2.5B expression profiles that map to about 600M cells, the Arc Virtual Cell Atlas offers the world’s largest collection of uniformly processed scRNA-seq datasets. Arc Institute distributes the atlas as 460k parquet and h5ad files totaling 41TB on Google Cloud Storage. We present a database mirror that offers queries by entities, a graphical user interface, and zero-copy, lineage-aware sharing of datasets.

⸻ 2026-03-02

Interactive visualization of multimodal and spatial data with Vitessce

Mark Keller, Altana Namsaraeva, Alex Wolf, Chaichontat Sriworarat, Sunny Sun

The open-source tool Vitessce and Lamin now work together to manage & visualize multimodal and spatial single-cell data. It’s simple: define a Vitessce config in code, save it as an artifact, and share the interactive visualization along with your datasets on LaminHub.

⸻ 2024-04-03

MappedCollection: Weighted random sampling from large collections of scRNA-seq datasets

Sergei Rybakov, Felix Fischer, Maciek Wiatrak, Ilan Gold, Yanay Rosen, Sunny Sun, Chaichontat Sriworarat, Fabian Theis, Jeremie Kalfon, Alex Wolf

A few labs and companies now train models on large-scale scRNA-seq count matrices and related data modalities. But unlike for many other data types, there isn’t yet a playbook for data scales that don’t fit into memory.