From b2f6b2dfd8df6017f393f473c2602df58369ac84 Mon Sep 17 00:00:00 2001 From: ryjiang Date: Mon, 20 Sep 2021 19:49:52 +0800 Subject: [PATCH] [skip ci]format system overview markdown (#8259) Signed-off-by: ruiyi.jiang --- .../chap01_system_overview.md | 21 ++----------------- 1 file changed, 2 insertions(+), 19 deletions(-) diff --git a/docs/developer_guides/chap01_system_overview.md b/docs/developer_guides/chap01_system_overview.md index bbc18eb436..7c42c9f2c4 100644 --- a/docs/developer_guides/chap01_system_overview.md +++ b/docs/developer_guides/chap01_system_overview.md @@ -1,18 +1,14 @@ - - ## 1. System Overview In this section, we sketch the system design of Milvus, including the data model, data organization, architecture, and state synchronization. - - #### 1.1 Data Model Milvus exposes the following set of data features to applications: -* a data model based on schematized relational tables, in that rows must have primary keys, +- a data model based on schematized relational tables, in that rows must have primary keys, -* a query language specifies data definition, data manipulation, and data query, where data definition includes create, drop, and data manipulation includes insert, upsert, delete, and data query falls into three types, primary key search, approximate nearest neighbor search (ANNS), ANNS with predicates. +- a query language specifies data definition, data manipulation, and data query, where data definition includes create, drop, and data manipulation includes insert, upsert, delete, and data query falls into three types, primary key search, approximate nearest neighbor search (ANNS), ANNS with predicates. The requests' execution order is strictly in accordance with their issue-time order. We take proxy's issue time as a request's issue time. For a batch request, all its sub-requests share the same issue time. In cases there are multiple proxies, issue time from different proxies are regarded as coming from a central clock. @@ -20,12 +16,8 @@ Transaction is currently not supported by Milvus. A batch insert/delete is guaranteed to become visible atomically. - - #### 1.2 Data Organization - - In Milvus, 'collection' refers to the concept of table. A collection can be optionally divided into several 'partitions'. Both collection and partition are the basic execution scopes of queries. When using partition, users should know how a collection should be partitioned. In most cases, partition leads to more flexible data management and more efficient querying. For a partitioned collection, queries can be executed both on the collection or a set of specified partitions. @@ -34,12 +26,8 @@ Each collection or partition contains a set of 'segment groups'. Segment group i 'Segment' is the finest unit of data organization. It is where the data and indexes are actually kept. Each segment contains a set of rows. In order to reduce the memory footprint during query execution and to fully utilize SIMD, the physical data layout within segments is organized in a column-based manner. - - #### 1.3 Architecture Overview - - The main components, proxy, WAL, query node, and write node can scale to multiple instances. These components scale separately for better tradeoff between availability and cost. @@ -54,8 +42,6 @@ The write nodes are stateless. They simply transform the newly arrived WALs to b Note that not all the components are necessarily replicated. The system provides failure tolerance by maintaining multiple copies of WAL and binlog. When there is no in-memory index replica and there occurs a query node failure, other query nodes will take over its indexes by loading the dumped index files, or rebuilding them from binlog and WALs. The links from query nodes to the hash ring will also be adjusted such that the failure node's input WAL stream can be properly handled by its neighbors. - - #### 1.4 State Synchronization @@ -66,9 +52,6 @@ Each of the WAL is attached with a timestamp, which is the time when the log is For better throughput, Milvus allows asynchronous state synchronization between WAL and index/binlog/table. Whenever the data is not fresh enough to satisfy a query, the query will be suspended until the data is up-to-date, or timeout will be returned. - - #### 1.5 Stream and Time In order to boost throughput, we model Milvus as a stream-driven system. -