Contributed by SpiralDB, Vortex is an extensible, next-generation columnar storage format designed for building high-performance, future-proof data systems
SAN FRANCISCO, August 6, 2025 — The LF AI & Data Foundation, the premier organization supporting open source innovation in artificial intelligence and data under the Linux Foundation, today announced the launch of the Vortex Project: an open, extensible columnar format that bridges the gap between cloud storage and heterogeneous compute, handling data seamlessly across memory, disk (file format), and network (IPC format) while maintaining compression throughout.
Contributed to LF AI & Data as a new Incubation-stage project by SpiralDB, Vortex joins LF AI & Data with contributions and support from Microsoft, Snowflake, Palantir, NVIDIA, and other industry leaders, signaling broad industry alignment around the need for next-generation storage infrastructure.
Vortex is purpose-built as the foundational storage format for modern data systems backed by object storage and is based on the latest compression research. Recent public validation includes the Technical University of Munich’s (TUM) database group calling Vortex the "cutting edge," and Microsoft demonstrating 30% runtime reductions when running traditional Spark workloads with Vortex in Apache IcebergTM. Unlike Apache ParquetTM and other formats that were built only for structured analytics performed on CPUs, Vortex is optimized to also support multimodal data, wide schemas, GPU-based training workloads, and high performance reads from cloud object stores such as S3 and GCS.
“Storage and compute have always been fungible, but data processing is no longer only about moving data from a disk into the CPU. Modern GPUs can consume terabits per second, but legacy storage formats are a huge bottleneck – they effectively require CPUs to sit in the middle, decompressing data before passing it on. We created Vortex to support this next generation of workloads, while dramatically improving performance for traditional data systems at the same time,” said Will Manning, co-founder and CEO at SpiralDB. “By contributing Vortex to LF AI & Data, we’re excited to foster a broader community. What excites me most is that Vortex gives the entire community a platform to innovate on storage – researchers can contribute new compression techniques, companies can optimize it for their workloads, and everyone can benefit from shared advances.”
Designed for speed, simplicity and composability, Vortex provides:
“Vortex tackles one of the most overlooked performance problems in AI infrastructure: how slow and cumbersome it is to access training data from the cloud,” said Mark Collier, general manager of AI & Infrastructure at the Linux Foundation. “This project represents a huge step forward for scalable, AI-native data pipelines – and we’re thrilled to welcome it into the LF AI & Data community.”
Vortex has been initiated with contributions from leading researchers and engineers across academia and industry, and welcomes broad participation from the global open source community.
To learn more or get involved, visit https://vortex.dev.
Additional Industry Support:
“Vortex combines cutting edge research in file formats with industrial-scale software engineering experience—just what we need for the composable data systems era." – Wes McKinney, co-creator Apache Arrow and creator of Python pandas, Vortex TSC member
“AI is driving the need for a modern data infrastructure, and Vortex is a critical step forward in addressing the storage bottlenecks that have traditionally held back innovation. We are happy to support this project as we continue to help customers get more value out of their data with an open, extensible format that can handle next-generation workloads. Vortex is advancing open source data processing and ensuring seamless data integration for the future of AI.” – Christian Kleinerman, EVP of Product, Snowflake
“Vortex brings together practical extensibility and cutting-edge research in encoding, compression, and indexing. Designed for the demands of modern analytical and AI workloads, Vortex has already shown impressive performance gains and cost savings in our benchmarks, especially handling wide tables, vector embeddings, and large objects. We are excited about its potential and pleased to see Vortex join the open source ecosystem.” – Raghu Ramakrishnan, CTO for Data, Microsoft, & Carlo Curino, Director of Research, Microsoft GSL
“Activating AI in mission critical environments requires going beyond traditional data formats and compute modalities. We are excited to support Vortex within Palantir’s Multimodal Data Plane, and we are looking forward to partnering with the community to advance next-generation, interoperable data systems.” – Akshay Krishnaswamy, Chief Architect, Palantir
“Vortex is a modern file format for modern data and AI systems. Joining the Linux Foundation is a crucial step toward prudent governance of the project, ensuring long-term and widespread adoption on the streets.” – Andy Pavlo, Associate Professor of Databaseology at Carnegie Mellon University, Vortex TSC member