TL;DR:
- TimescaleDB, known for its time-series database expertise, is entering the vector database market.
- Vector databases are essential for GenAI applications, serving as the memory for large language models.
- TimescaleDB combines open-source vector technology with an Approximate Nearest Neighbor algorithm for superior performance.
- The company claims significantly faster search speeds compared to competitors.
- TimescaleDB’s Postgres compatibility provides a streamlined solution for organizations already using Postgres.
- The move diversifies TimescaleDB’s offerings and strengthens its position in the database market.
Main AI News:
TimescaleDB, renowned for its prowess in the realm of open-source time-series databases, has taken a bold step towards diversification. This New York City-based company, celebrated for its contributions in augmenting Postgres with time-series capabilities, is now venturing into the flourishing market of vector databases. This strategic move is driven by the surging interest in generative AI applications powered by colossal language models.
Vector databases, essentially the long-term memory hubs for Large Language Models (LLMs), have gained substantial relevance in the age of AI giants like OpenAI’s GPT-4 and Meta’s Llama. They serve as repositories for mathematical representations, known as vector embeddings, of textual data pieces harnessed by LLMs during training. These vector databases are instrumental in swiftly matching user inputs with the most pertinent training data pieces, thereby enhancing the performance and accuracy of GenAI applications at runtime.
In the case of TimescaleDB, their journey into vector territory involves adopting the open-source vector library for Postgres known as pgvector. Additionally, they’ve fortified their vector capabilities with the integration of an Approximate Nearest Neighbor (ANN) algorithm, a move that claims to outperform both vanilla pgvector and dedicated vector databases in terms of performance.
Michael Freedman, the CTO and co-founder of Timescale, stated, “We’ve built additional support for these types of vector lookups that could enable people to build LLM models on top of it to answer questions in a way that is much more performant, faster, and has better accuracy than other stuff that’s in the market.”
In a recent blog post, the company unveiled internal benchmark figures showcasing its ANN index’s superior performance. They claimed a remarkable 243% increase in search speed with a 99% recall rate compared to Weaviate’s vector database. Furthermore, TimescaleDB asserted a 39% faster search speed than pgvector’s Hierarchical Navigable Small World (HNSW) algorithm and an astounding 363% faster search speed than pg_embedding.
Timescale Vector, as it is dubbed, optimizes time-based vector searches, taking advantage of Timescale’s automatic time-based partitioning and indexing features. This enables efficient retrieval of recent embeddings, constrains vector searches by time range or document age, and facilitates the storage and retrieval of LLM responses and chat histories with utmost ease.
In an interview with Datanami, Freedman highlighted Pinecone, a dedicated vector database developer, as a new competitor. He pointed out the challenge with dedicated vector databases, emphasizing that they solely store vector embeddings. In contrast, TimescaleDB’s approach allows users to consolidate their relational data seamlessly, offering a more operationally straightforward stack.
While TimescaleDB initially gained prominence as a time-series database, it has since evolved into a comprehensive database provider. Beyond catering to time-series and event data needs for IoT and gaming applications, Timescale can now accommodate any relational data, courtesy of its Postgres core. Freedman described their approach as “Postgres ++,” emphasizing their compatibility with Postgres, the world’s most popular database.
This compatibility positions Timescale as an attractive choice for organizations already utilizing Postgres, a substantial market segment. The open-source offering has amassed tens of millions of users, and Timescale’s managed database service in the cloud boasts approximately 1,000 paying customers.
Freedman shared insights, saying, “They’re like, ‘Oh, I already use Postgres. I should just be using you for all of [my workloads].’ As long as they want a relational database like Postgres, we can become a great go-to for Postgres.“
Timescale introduced its vector support to cloud customers a few months ago and is now officially launching the preview program. Early adopters, including PolyPerception and Blueway Software, have already embraced Timescale Vector’s integrated approach, recognizing its potential in expediting AI product development and seamlessly combining PostgreSQL’s classic database features with vector embeddings storage for Retrieval Augmented Generation (RAG).
PolyPerception CEO Nicolas Bream enthused, “Choosing TimescaleDB was one of the best technical decisions we made, and we are excited to use Timescale Vector.”
Conclusion:
TimescaleDB’s expansion into vector databases demonstrates its adaptability and commitment to addressing the evolving needs of the AI market. By seamlessly integrating vector capabilities with its existing Postgres core, the company offers a compelling solution for organizations seeking efficient and high-performance options for GenAI applications. This strategic move positions TimescaleDB as a formidable player in the broader database market, catering to a wide range of data storage and retrieval needs.