Friday, 21 October 2022

can I 'inner-search' most similar vectors within a FAISS index?

I have a FAISS index populated with 8M embedding vectors. I don't have the embedding vectors anymore, only the index, and it is expensive to recompute the embeddings.

Can I search the index for the top-k most similar vectors to each of the index's vectors?

To be more concrete, say this is how my index was populated:

d = 1024
N = 100
embeddings = np.random.rand(N, d)
ids = range(N)
index = faiss.index_factory(
    d, 'IDMap,Flat', faiss.METRIC_INNER_PRODUCT
)
index.add_with_ids(embeddings, ids)

I would like to get D, I such that:

D, I = index.search(embeddings, k) 

but I don't have access to embeddings anymore, I only have the index.

I tried using index.reconstruct() to get back my (approximated?) embeddings but I run into

RuntimeError: Error in virtual void 
faiss::Index::reconstruct(faiss::Index::idx_t, float*) const at /root/miniconda3/conda-bld/faiss-pkg_1613228717761/work/faiss/Index.cpp:57: reconstruct not implemented for this type of index


from can I 'inner-search' most similar vectors within a FAISS index?

No comments:

Post a Comment