# ShardClient Package The `shardclient` package provides client-side connection management and load balancing for communicating with QueryNode shards in the Milvus distributed architecture. It manages QueryNode client connections, caches shard leader information, and implements intelligent request routing strategies. ## Overview In Milvus, collections are divided into shards (channels), and each shard has multiple replicas distributed across different QueryNodes for high availability and load balancing. The `shardclient` package is responsible for: 1. **Connection Management**: Maintaining a pool of gRPC connections to QueryNodes with automatic lifecycle management 2. **Shard Leader Cache**: Caching the mapping of shards to their leader QueryNodes to reduce coordination overhead 3. **Load Balancing**: Distributing requests across available QueryNode replicas using configurable policies 4. **Fault Tolerance**: Automatic retry and failover when QueryNodes become unavailable ## Architecture ``` ┌──────────────────────────────────────────────────────────────┐ │ Proxy Layer │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ ShardClientMgr │ │ │ │ • Shard leader cache (database → collection → shards) │ │ │ • QueryNode client pool management │ │ │ • Client lifecycle (init, purge, close) │ │ └───────────────────────┬──────────────────────────────┘ │ │ │ │ │ ┌───────────────────────▼──────────────────────────────┐ │ │ │ LBPolicy │ │ │ │ • Execute workload on collection/channels │ │ │ │ • Retry logic with replica failover │ │ │ │ • Node selection via balancer │ │ │ └───────────────────────┬──────────────────────────────┘ │ │ │ │ │ ┌────────────────┴────────────────┐ │ │ │ │ │ │ ┌──────▼────────┐ ┌─────────▼──────────┐ │ │ │ RoundRobin │ │ LookAsideBalancer │ │ │ │ Balancer │ │ • Cost-based │ │ │ │ │ │ • Health check │ │ │ └───────────────┘ └────────────────────┘ │ │ │ │ │ ┌───────────────────────▼──────────────────────────────┐ │ │ │ shardClient (per QueryNode) │ │ │ │ • Connection pool (configurable size) │ │ │ │ • Round-robin client selection │ │ │ │ • Lazy initialization and expiration │ │ │ └──────────────────────────────────────────────────────┘ │ └─────────────────────┬────────────────────────────────────────┘ │ gRPC ┌───────────────┴───────────────┐ │ │ ┌─────▼─────┐ ┌──────▼──────┐ │ QueryNode │ │ QueryNode │ │ (1) │ │ (2) │ └───────────┘ └─────────────┘ ``` ## Core Components ### 1. ShardClientMgr The central manager for QueryNode client connections and shard leader information. **File**: `manager.go` **Key Responsibilities**: - Cache shard leader mappings from QueryCoord (`database → collectionName → channel → []nodeInfo`) - Manage `shardClient` instances for each QueryNode - Automatically purge expired clients (default: 60 minutes of inactivity) - Invalidate cache when shard leaders change **Interface**: ```go type ShardClientMgr interface { GetShard(ctx context.Context, withCache bool, database, collectionName string, collectionID int64, channel string) ([]nodeInfo, error) GetShardLeaderList(ctx context.Context, database, collectionName string, collectionID int64, withCache bool) ([]string, error) DeprecateShardCache(database, collectionName string) InvalidateShardLeaderCache(collections []int64) GetClient(ctx context.Context, nodeInfo nodeInfo) (types.QueryNodeClient, error) Start() Close() } ``` **Configuration**: - `purgeInterval`: Interval for checking expired clients (default: 600s) - `expiredDuration`: Time after which inactive clients are purged (default: 60min) ### 2. shardClient Manages a connection pool to a single QueryNode. **File**: `shard_client.go` **Features**: - **Lazy initialization**: Connections are created on first use - **Connection pooling**: Configurable pool size (`ProxyCfg.QueryNodePoolingSize`, default: 1) - **Round-robin selection**: Distributes requests across pool connections - **Expiration tracking**: Tracks last active time for automatic cleanup - **Thread-safe**: Safe for concurrent access **Lifecycle**: 1. Created when first request needs a QueryNode 2. Initializes connection pool on first `getClient()` call 3. Tracks `lastActiveTs` on each use 4. Closed by manager if expired or during shutdown ### 3. LBPolicy Executes workloads on collections/channels with retry and failover logic. **File**: `lb_policy.go` **Key Methods**: - **`Execute(ctx, CollectionWorkLoad)`**: Execute workload in parallel across all shards - **`ExecuteOneChannel(ctx, CollectionWorkLoad)`**: Execute workload on any single shard (for lightweight operations) - **`ExecuteWithRetry(ctx, ChannelWorkload)`**: Execute on specific channel with retry on different replicas **Retry Strategy**: - Retry up to `max(retryOnReplica, len(shardLeaders))` times - Maintain `excludeNodes` set to avoid retrying failed nodes - Refresh shard leader cache if initial attempt fails - Clear `excludeNodes` if all replicas exhausted **Workload Types**: ```go type ChannelWorkload struct { Db string CollectionName string CollectionID int64 Channel string Nq int64 // Number of queries Exec ExecuteFunc // Actual work to execute } type ExecuteFunc func(context.Context, UniqueID, types.QueryNodeClient, string) error ``` ### 4. Load Balancers Two strategies for selecting QueryNode replicas: #### RoundRobinBalancer **File**: `roundrobin_balancer.go` Simple round-robin selection across available nodes. No state tracking, minimal overhead. **Use case**: Uniform workload distribution when all nodes have similar capacity #### LookAsideBalancer **File**: `look_aside_balancer.go` Cost-aware load balancer that considers QueryNode workload and health. **Features**: - **Cost metrics tracking**: Caches `CostAggregation` (response time, service time, total NQ) from QueryNodes - **Workload score calculation**: Uses power-of-3 formula to prefer lightly loaded nodes: ``` score = executeSpeed + (1 + totalNQ + executingNQ)³ × serviceTime ``` - **Periodic health checks**: Monitors QueryNode health via `GetComponentStates` RPC - **Unavailable node handling**: Marks nodes unreachable after consecutive health check failures - **Adaptive behavior**: Falls back to round-robin when workload difference is small **Configuration Parameters**: - `ProxyCfg.CostMetricsExpireTime`: How long to trust cached cost metrics (default: varies) - `ProxyCfg.CheckWorkloadRequestNum`: Check workload every N requests (default: varies) - `ProxyCfg.WorkloadToleranceFactor`: Tolerance for workload difference before preferring lighter node - `ProxyCfg.CheckQueryNodeHealthInterval`: Interval for health checks - `ProxyCfg.HealthCheckTimeout`: Timeout for health check RPC - `ProxyCfg.RetryTimesOnHealthCheck`: Failures before marking node unreachable **Selection Strategy**: ``` if (requestCount % CheckWorkloadRequestNum == 0) { // Cost-aware selection select node with minimum workload score if (maxScore - minScore) / minScore <= WorkloadToleranceFactor { fall back to round-robin } } else { // Fast path: round-robin select next available node } ``` ## Configuration Key configuration parameters from `paramtable`: | Parameter | Path | Description | Default | |-----------|------|-------------|---------| | QueryNodePoolingSize | `ProxyCfg.QueryNodePoolingSize` | Size of connection pool per QueryNode | 1 | | RetryTimesOnReplica | `ProxyCfg.RetryTimesOnReplica` | Max retry times on replica failures | varies | | ReplicaSelectionPolicy | `ProxyCfg.ReplicaSelectionPolicy` | Load balancing policy: `round_robin` or `look_aside` | `look_aside` | | CostMetricsExpireTime | `ProxyCfg.CostMetricsExpireTime` | Expiration time for cost metrics cache | varies | | CheckWorkloadRequestNum | `ProxyCfg.CheckWorkloadRequestNum` | Frequency of workload-aware selection | varies | | WorkloadToleranceFactor | `ProxyCfg.WorkloadToleranceFactor` | Tolerance for workload differences | varies | | CheckQueryNodeHealthInterval | `ProxyCfg.CheckQueryNodeHealthInterval` | Health check interval | varies | | HealthCheckTimeout | `ProxyCfg.HealthCheckTimeout` | Health check RPC timeout | varies | ## Usage Example ```go import ( "context" "github.com/milvus-io/milvus/internal/proxy/shardclient" "github.com/milvus-io/milvus/internal/types" ) // 1. Create ShardClientMgr with MixCoord client mgr := shardclient.NewShardClientMgr(mixCoordClient) mgr.Start() // Start background purge goroutine defer mgr.Close() // 2. Create LBPolicy policy := shardclient.NewLBPolicyImpl(mgr) policy.Start(ctx) // Start load balancer (health checks, etc.) defer policy.Close() // 3. Execute collection workload (e.g., search/query) workload := shardclient.CollectionWorkLoad{ Db: "default", CollectionName: "my_collection", CollectionID: 12345, Nq: 100, // Number of queries Exec: func(ctx context.Context, nodeID int64, client types.QueryNodeClient, channel string) error { // Perform actual work (search, query, etc.) req := &querypb.SearchRequest{/* ... */} resp, err := client.Search(ctx, req) return err }, } // Execute on all channels in parallel err := policy.Execute(ctx, workload) // Or execute on any single channel (for lightweight ops) err := policy.ExecuteOneChannel(ctx, workload) ``` ## Cache Management ### Shard Leader Cache The shard leader cache stores the mapping of shards to their leader QueryNodes: ``` database → collectionName → shardLeaders { collectionID: int64 shardLeaders: map[channel][]nodeInfo } ``` **Cache Operations**: - **Hit**: When cached shard leaders are used (tracked via `ProxyCacheStatsCounter`) - **Miss**: When cache lookup fails, triggers RPC to QueryCoord via `GetShardLeaders` - **Invalidation**: - `DeprecateShardCache(db, collection)`: Remove specific collection - `InvalidateShardLeaderCache(collectionIDs)`: Remove collections by ID (called on shard leader changes) - `RemoveDatabase(db)`: Remove entire database ### Client Purging The `ShardClientMgr` periodically purges unused clients: 1. Every `purgeInterval` (default: 600s), iterate all cached clients 2. Check if client is still a shard leader (via `ListShardLocation()`) 3. If not a leader and expired (`lastActiveTs` > `expiredDuration`), close and remove 4. This prevents connection leaks when QueryNodes are removed or shards rebalance ## Error Handling ### Common Errors - **`errClosed`**: Client is closed (returned when accessing closed `shardClient`) - **`merr.ErrChannelNotAvailable`**: No available shard leaders for channel - **`merr.ErrNodeNotAvailable`**: Selected node is not available - **`merr.ErrCollectionNotLoaded`**: Collection is not loaded in QueryNodes - **`merr.ErrServiceUnavailable`**: All available nodes are unreachable ### Retry Logic Retry is handled at multiple levels: 1. **LBPolicy level**: - Retries on different replicas when request fails - Refreshes shard leader cache on failure - Respects context cancellation 2. **Balancer level**: - Tracks failed nodes and excludes them from selection - Health checks recover nodes when they come back online 3. **gRPC level**: - Connection-level retries handled by gRPC layer ## Metrics The package exports several metrics: - `ProxyCacheStatsCounter`: Shard leader cache hit/miss statistics - Labels: `nodeID`, `method` (GetShard/GetShardLeaderList), `status` (hit/miss) - `ProxyUpdateCacheLatency`: Latency of updating shard leader cache - Labels: `nodeID`, `method` ## Testing The package includes extensive test coverage: - `shard_client_test.go`: Tests for connection pool management - `manager_test.go`: Tests for cache management and client lifecycle - `lb_policy_test.go`: Tests for retry logic and workload execution - `roundrobin_balancer_test.go`: Tests for round-robin selection - `look_aside_balancer_test.go`: Tests for cost-aware selection and health checks **Mock interfaces** (via mockery): - `mock_shardclient_manager.go`: Mock `ShardClientMgr` - `mock_lb_policy.go`: Mock `LBPolicy` - `mock_lb_balancer.go`: Mock `LBBalancer` ## Thread Safety All components are designed for concurrent access: - `shardClientMgrImpl`: Uses `sync.RWMutex` for cache, `typeutil.ConcurrentMap` for clients - `shardClient`: Uses `sync.RWMutex` and atomic operations - `LookAsideBalancer`: Uses `typeutil.ConcurrentMap` for all mutable state - `RoundRobinBalancer`: Uses `atomic.Int64` for index ## Related Components - **Proxy** (`internal/proxy/`): Uses `shardclient` to route search/query requests to QueryNodes - **QueryCoord** (`internal/querycoordv2/`): Provides shard leader information via `GetShardLeaders` RPC - **QueryNode** (`internal/querynodev2/`): Receives and processes requests routed by `shardclient` - **Registry** (`internal/registry/`): Provides client creation functions for gRPC connections ## Future Improvements Potential areas for enhancement: 1. **Adaptive pooling**: Dynamically adjust connection pool size based on load 2. **Circuit breaker**: Add circuit breaker pattern for consistently failing nodes 3. **Advanced metrics**: Export more detailed metrics (per-node latency, error rates, etc.) 4. **Smart caching**: Use TTL-based cache expiration instead of invalidation-only 5. **Connection warming**: Pre-establish connections to known QueryNodes