mirror of
https://gitee.com/milvus-io/milvus.git
synced 2025-12-30 23:45:28 +08:00
Benchmark Milvus with https://github.com/qdrant/vector-db-benchmark and specify the datasets as 'deep-image-96-angular'. Meanwhile, do perf profiling during 'upload + index' stage of vector-db-benchmark and see the following hot spots. 39.59%--github.com/milvus-io/milvus/internal/storage.MergeInsertData | |--21.43%--github.com/milvus-io/milvus/internal/storage.MergeFieldData | | | |--17.22%--runtime.memmove | | | |--1.53%--asm_exc_page_fault | ...... | |--18.16%--runtime.memmove | |--1.66%--asm_exc_page_fault ...... The hot code path is in storage.MergeInsertData() which updates buffer.buffer by creating a new 'InsertData' instance and merging both the old buffer.buffer and addedBuffer into it. When it calls golang runtime.memmove to move buffer.buffer which is with big size (>1M), the hot spots appear. To avoid the above overhead, update storage.MergeInsertData() by appending addedBuffer to buffer.buffer, instead of moving buffer.buffer and addedBuffer to a new 'InsertData'. This change removes the hot spots 'runtime.memmove' from perf profiling output. Additionally, the 'upload + index' time, which is one performance metric of vector-db-benchmark, is reduced around 60% with this change. Signed-off-by: Cathy Zhang <cathy.zhang@intel.com>
Data Node
DataNode is the component to write insert and delete messages into persistent blob storage, for example MinIO or S3.
Dependency
- KV store: a kv store that persists messages into blob storage.
- Message stream: receive messages and publish imformation
- Root Coordinator: get the latest unique IDs.
- Data Coordinator: get the flush information and which message stream to subscribe.