shiyu22 a7d57d7a82
add Tanimoto ground truth (#1138)
* add milvus ground truth

* add milvus groundtruth

* [skip ci] add milvus ground truth

* [skip ci]add tanimoto ground truth
2020-02-14 12:35:50 +08:00

34 lines
2.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Quick Start
- For calculating L2 or IP distance of feature vectors.
- At below table, the last five parameters do not need to alter.
### Parameter Description
| parameter | description | default setting |
| ------------------ | --------------------------------------------- | ----------------------- |
| PROCESS_NUM | number of processes | 12 |
| GET_VEC | whether to save feature vectors | False |
| CSV | whether the query vector file format is csv | False |
| UINT8 | whether the query vector data format is uint8 | False |
| BASE_FOLDER_NAME | path to the source vector dataset | '/data/milvus/base' |
| NQ_FOLDER_NAME | path to the query vector dataset | '/data/milvus/query' |
| GT_ALL_FOLDER_NAME | intermediate filename | 'ground_truth_all' |
| GT_FOLDER_NAME | path saved the ground truth results | 'ground_truth' |
| LOC_FILE_NAME | file saved the gorund truth's location info | 'ground_truth.txt' |
| FLOC_FILE_NAME | file saved the gorund truth's filenames info | 'file_ground_truth.txt' |
| VEC_FILE_NAME | file saved the gorund truth's feature vectors | 'vectors.npy' |
### Usage
```bash
$ python3 milvus_ground_truth.py [-q <nq_num>] -k <topk_num> -m <metric typr>-l
# -q or --nq points the number of vectors taken from the query vector set. This parameter is optional, Without it will take all the data in the query set.
# -k or --topk points calculate the top k similar vectors.
# -m or --metric points the method vector distances are compared in Milvus,such as IP/L2/Tan.
# -l means generate the ground truth results, it will save in GT_FOLDER_NAME.In this path, LOC_FILE_NAME saved the gorund truth's results info, such as "8002005210",the first 8 is meaningless, the 2-4th position means the position of the result file in the folder, the 5-10th position means the position of the result vector in the result file. The result filename and vector location saved in FLOC_FILE_NAME, such as "binary_128d_00000.npy 81759", and the result vector is saved in VEC_FILE_NAME.
```