aoiasd ee216877bb
enhance: support compaction with file resource in ref mode (#46399)
Add support for DataNode compaction using file resources in ref mode.
SortCompation and StatsJobs will build text indexes, which may use file
resources.
relate: https://github.com/milvus-io/milvus/issues/43687

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
- Core invariant: file resources (analyzer binaries/metadata) are only
fetched, downloaded and used when the node is configured in Ref mode
(fileresource.IsRefMode via CommonCfg.QNFileResourceMode /
DNFileResourceMode); Sync now carries a version and managers track
per-resource versions/resource IDs so newer resource sets win and older
entries are pruned (RefManager/SynchManager resource maps).
- Logic removed / simplified: component-specific FileResourceMode flags
and an indirection through a long-lived BinlogIO wrapper were
consolidated — file-resource mode moved to CommonCfg, Sync/Download APIs
became version- and context-aware, and compaction/index tasks accept a
ChunkManager directly (binlog IO wrapper creation inlined). This
eliminates duplicated config checks and wrapper indirection while
preserving the same chunk/IO semantics.
- Why no data loss or behavior regression: all file-resource code paths
are gated by the configured mode (default remains "sync"); when not in
ref-mode or when no resources exist, compaction and stats flows follow
existing code paths unchanged. Versioned Sync + resourceID maps ensure
newly synced sets replace older ones and RefManager prunes stale files;
GetFileResources returns an error if requested IDs are missing (prevents
silent use of wrong resources). Analyzer naming/parameter changes add
analyzer_extra_info but default-callers pass "" so existing analyzers
and index contents remain unchanged.
- New capability: DataNode compaction and StatsJobs can now build text
indexes using external file resources in Ref mode — DataCoord exposes
GetFileResources and populates CompactionPlan.file_resources;
SortCompaction/StatsTask download resources via fileresource.Manager,
produce an analyzer_extra_info JSON (storage + resource->id map) via
analyzer.BuildExtraResourceInfo, and propagate analyzer_extra_info into
BuildIndexInfo so the tantivy bindings can load custom analyzers during
text index creation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2026-01-06 16:31:31 +08:00

91 lines
2.9 KiB
Go

package server
import (
"context"
"fmt"
"google.golang.org/grpc"
"github.com/milvus-io/milvus/internal/streamingnode/client/handler/registry"
"github.com/milvus-io/milvus/internal/streamingnode/server/resource"
"github.com/milvus-io/milvus/internal/streamingnode/server/service"
"github.com/milvus-io/milvus/internal/streamingnode/server/walmanager"
"github.com/milvus-io/milvus/internal/util/fileresource"
"github.com/milvus-io/milvus/internal/util/initcore"
"github.com/milvus-io/milvus/internal/util/sessionutil"
"github.com/milvus-io/milvus/pkg/v2/log"
"github.com/milvus-io/milvus/pkg/v2/proto/streamingpb"
_ "github.com/milvus-io/milvus/pkg/v2/streaming/walimpls/impls/kafka"
_ "github.com/milvus-io/milvus/pkg/v2/streaming/walimpls/impls/pulsar"
_ "github.com/milvus-io/milvus/pkg/v2/streaming/walimpls/impls/rmq"
"github.com/milvus-io/milvus/pkg/v2/util/paramtable"
)
// Server is the streamingnode server.
type Server struct {
// session of current server.
session *sessionutil.Session
grpcServer *grpc.Server
// service level instances.
handlerService service.HandlerService
managerService service.ManagerService
// basic component instances.
walManager walmanager.Manager
}
// Init initializes the streamingnode server.
func (s *Server) init() {
log.Info("init streamingnode server...")
// init all basic components.
s.initBasicComponent()
// init all service.
s.initService()
// init file resource manager
fileresource.InitManager(resource.Resource().ChunkManager(), fileresource.ParseMode(paramtable.Get().CommonCfg.QNFileResourceMode.GetValue()))
log.Info("init query segcore...")
if err := initcore.InitQueryNode(context.TODO()); err != nil {
panic(fmt.Sprintf("init query node segcore failed, %+v", err))
}
log.Info("streamingnode server initialized")
}
// Stop stops the streamingnode server.
func (s *Server) Stop() {
log.Info("stopping streamingnode server...")
log.Info("close wal manager...")
s.walManager.Close()
log.Info("release streamingnode resources...")
resource.Release()
log.Info("streamingnode server stopped")
}
// initBasicComponent initialize all underlying dependency for streamingnode.
func (s *Server) initBasicComponent() {
var err error
s.walManager, err = walmanager.OpenManager()
if err != nil {
panic(fmt.Sprintf("open wal manager failed, %+v", err))
}
// Register the wal manager to the local registry.
registry.RegisterLocalWALManager(s.walManager)
}
// initService initializes the grpc service.
func (s *Server) initService() {
s.handlerService = service.NewHandlerService(s.walManager)
s.managerService = service.NewManagerService(s.walManager)
s.registerGRPCService(s.grpcServer)
}
// registerGRPCService register all grpc service to grpc server.
func (s *Server) registerGRPCService(grpcServer *grpc.Server) {
streamingpb.RegisterStreamingNodeHandlerServiceServer(grpcServer, s.handlerService)
streamingpb.RegisterStreamingNodeManagerServiceServer(grpcServer, s.managerService)
}