From d9cd783650378ae2d483555682ac032a3c16daf2 Mon Sep 17 00:00:00 2001 From: ryjiang Date: Fri, 22 Oct 2021 14:06:01 +0800 Subject: [PATCH] [skip ci]Format markdown for milvus_create_index_en.md (#10409) Signed-off-by: ruiyi.jiang --- docs/design_docs/milvus_create_index_en.md | 61 +++++++++++++--------- 1 file changed, 37 insertions(+), 24 deletions(-) diff --git a/docs/design_docs/milvus_create_index_en.md b/docs/design_docs/milvus_create_index_en.md index 757036e1b6..719a224546 100644 --- a/docs/design_docs/milvus_create_index_en.md +++ b/docs/design_docs/milvus_create_index_en.md @@ -1,4 +1,5 @@ # Create Index + `Index system` is the core part of `Milvus`, which is used to speed up the searches, this document indroduces which components are involved in `Create Index`,and what these components do. The execution flow of `Create Index` is shown in the following figure: @@ -6,6 +7,7 @@ The execution flow of `Create Index` is shown in the following figure: ![create_index](./graphs/milvus_create_index.png) 1. Firstly, `SDK` starts a `CreateIndex` request to `Proxy` via `Grpc`, the `proto` is defined as follows: + ```proto service MilvusService { ... @@ -25,7 +27,7 @@ message CreateIndexRequest { ``` 2. When received the `CreateIndex` request, the `Proxy` would wrap this request into `CreateIndexTask`, and pushs this task into `DdTaskQueue` queue. After that, `Proxy` would call method of `WatiToFinish` to wait until the task finished. - + ```go type task interface { TraceCtx() context.Context @@ -54,20 +56,24 @@ type createIndexTask struct { ``` 3. There is a background service in `Proxy`, this service would get the `CreateIndexTask` from `DdTaskQueue`, and executes it in three phases. - - `PreExecute`, do some static checking at this phase, such as check if the index param is legal, etc. - - `Execute`, at this phase, `Proxy` would send `CreateIndex` request to `RootCoord` via `Grpc`,and wait the reponse, the `proto` is defined as follow: - ```proto - service RootCoord { - ... - rpc CreateIndex(milvus.CreateIndexRequest) returns (common.Status) {} + - `PreExecute`, do some static checking at this phase, such as check if the index param is legal, etc. + - `Execute`, at this phase, `Proxy` would send `CreateIndex` request to `RootCoord` via `Grpc`,and wait the reponse, the `proto` is defined as follow: - ... - } - ``` - - `PostExecute`, `CreateIndexTask` does nothing at this phase, and return directly. + ```proto + service RootCoord { + ... + + rpc CreateIndex(milvus.CreateIndexRequest) returns (common.Status) {} + + ... + } + ``` + + - `PostExecute`, `CreateIndexTask` does nothing at this phase, and return directly. 4. `RootCoord` would wraps the `CreateIndex` request into `CreateIndexReqTask`, and then call function `executeTask`. `executeTask` would return until the `context` is done or `CreateIndexReqTask.Execute` returned. + ```go type reqTask interface { Ctx() context.Context @@ -82,15 +88,16 @@ type CreateIndexReqTask struct { } ``` -5. According to the index type and index parameters, `RootCoord` lists all the `Segments` that need to be indexed on this `Collection`. `RootCoord` would only check those `Segments` which have been flushed at this stage. We will describe how to deal with those newly add segments and growing segments later. +5. According to the index type and index parameters, `RootCoord` lists all the `Segments` that need to be indexed on this `Collection`. `RootCoord` would only check those `Segments` which have been flushed at this stage. We will describe how to deal with those newly add segments and growing segments later. 6. For each `Segment`, `RootCoord` would start a `Grpc` request to `DataCoord` to get `Binlog` paths of that `Segment`, the `proto` is defined as following + ```proto service DataCoord { ... - + rpc GetInsertBinlogPaths(GetInsertBinlogPathsRequest) returns (GetInsertBinlogPathsResponse) {} - + ... } @@ -107,7 +114,9 @@ message GetInsertBinlogPathsResponse { } ``` -7. After getting the `Segment`'s `Binlog` paths, `RootCoord` would send a `Grpc` request to `IndexCoord`, ask `IndexCoord` to build index on this `Segment`, the `proto` is defined as follow: + +7. After getting the `Segment`'s `Binlog` paths, `RootCoord` would send a `Grpc` request to `IndexCoord`, ask `IndexCoord` to build index on this `Segment`, the `proto` is defined as follow: + ```proto service IndexCoord { ... @@ -137,13 +146,14 @@ message BuildIndexResponse { ![index_coord](./graphs/milvus_create_index_index_coord.png) -9. `IndexCoord` would wrap the `BuildIndex` request into `IndexAddTask`, then alloc a global unique ID as `IndexBuildID`, and write this `Segment`'s `index mate` into `IndexCoord`'s `metaTable`. When finish these operation, `IndexCoord` would send response to `RootCoord`, the response includes the `IndexBuildID`. +9. `IndexCoord` would wrap the `BuildIndex` request into `IndexAddTask`, then alloc a global unique ID as `IndexBuildID`, and write this `Segment`'s `index mate` into `IndexCoord`'s `metaTable`. When finish these operation, `IndexCoord` would send response to `RootCoord`, the response includes the `IndexBuildID`. -10. When `RootCoood` receives the `BuildIndexResponse`, it would extract the `IndexBuildID` from the response, update `RootCoord`'s `metaTable`, then send responses to `Proxy`. +10. When `RootCoood` receives the `BuildIndexResponse`, it would extract the `IndexBuildID` from the response, update `RootCoord`'s `metaTable`, then send responses to `Proxy`. -11. There is a backgroud service, `assignTaskLoop`, in `IndexCoord`. `assignTaskLoop` would call `GetUnassignedTask` periodically, the default interval is 3s. `GetUnassignedTask` would list these segments whos `index meta` has been updated, but index has not been created yet. +11. There is a backgroud service, `assignTaskLoop`, in `IndexCoord`. `assignTaskLoop` would call `GetUnassignedTask` periodically, the default interval is 3s. `GetUnassignedTask` would list these segments whos `index meta` has been updated, but index has not been created yet. + +12. The previous step has listed the segments whose index has not been created, for each those segments, `IndexCoord` would call `PeekClient` to get an available `IndexNode`, and send `CreateIndex` request to this `IndexNode`. The `proto` is defined as follow. -12. The previous step has listed the segments whose index has not been created, for each those segments, `IndexCoord` would call `PeekClient` to get an available `IndexNode`, and send `CreateIndex` request to this `IndexNode`. The `proto` is defined as follow. ```proto service IndexNode { ... @@ -165,17 +175,18 @@ message CreateIndexRequest { } ``` -13. When receive `CreateIndex` request, `IndexNode` would wrap this request into `IndexBuildTask`, and push this task into `IndexBuildQueue`, then send response to `IndexCoord` +13. When receive `CreateIndex` request, `IndexNode` would wrap this request into `IndexBuildTask`, and push this task into `IndexBuildQueue`, then send response to `IndexCoord` 14. There is a background service, `indexBuildLoop`, in the `IndexNode`. `indexBuildLoop` would call `scheduleIndexBuildTask` to get a `IndexBuildTask` from `IndexBuildQueue`, and then start another `goroutine` to build index and update meta. -*Node*: `InexNode` will not notify the `QueryCoord` to load the index file, if an user wants to speed up seach by these index files, he should call `ReleaseCollection` firstly, then call `LoadCollection` to load these index files. +_Node_: `InexNode` will not notify the `QueryCoord` to load the index file, if an user wants to speed up seach by these index files, he should call `ReleaseCollection` firstly, then call `LoadCollection` to load these index files. -15. As mentioned earlier, `RootCoord` would only search on these flushed segments on `CreateIndex` request, the following figure show how to deal with the newly add segments. +15. As mentioned earlier, `RootCoord` would only search on these flushed segments on `CreateIndex` request, the following figure show how to deal with the newly add segments. ![data_coord_flushed](./graphs/milvus_create_index_data_coord_flushed.png) -16. When a segment has been flushed, `DataCoord` would notify `RootCoord` via `SegmentFlushCompleted`, the `proto` is defined as follow: +16. When a segment has been flushed, `DataCoord` would notify `RootCoord` via `SegmentFlushCompleted`, the `proto` is defined as follow: + ```proto service RootCoord { ... @@ -205,7 +216,8 @@ message SegmentInfo { } ``` -17. If an user has called `CreateIndex` on this `Collection`, then when `RootCoord` receives `SegmentFlushCompleted` request, it would extract the `SegmentID` from the request, and send a `GetInsertBinlogPaths` request to `DataCoord` to get the `Binlog` paths, finally `RootCoord` would send a `BuildIndex` request to `IndexCoord` to notify `IndexCoord` to build index on this segment. + +17. If an user has called `CreateIndex` on this `Collection`, then when `RootCoord` receives `SegmentFlushCompleted` request, it would extract the `SegmentID` from the request, and send a `GetInsertBinlogPaths` request to `DataCoord` to get the `Binlog` paths, finally `RootCoord` would send a `BuildIndex` request to `IndexCoord` to notify `IndexCoord` to build index on this segment. 18. The `Grpc` call of `SegmentFlushCompleted` might failed due to network problem or some others, so how to create index if the `Grpc` failed ? The follwing figure show the solution. @@ -214,6 +226,7 @@ message SegmentInfo { 19. There is a backgroud service, `checkFlushedSegmentLoop`, in `RootCoord`. `checkFlushedSegmentLoop` would periodically check whether there is a segment that needs to be created index but has not been created, the default interval is `10 minutes`, and call `DataCoord` and `IndexCoord`'s service to create index on these segments. 20. In `Milvus 2.0`, `Create Index` is an asynchronous operation, so the `SDK` needs to send `GetIndexStates` request to `IndexCoord` periodically to check if the index has been created, the `proto` is defined as follow. + ```proto service IndexCoord { ...