Boost RAG Efficiency by 30% with Just Two Lines of Code: Applicable to Various Tasks and Scalable to 100-Billion-Scale Data Applications

The method has been open-sourced.

Just modify two lines of code, and the efficiency of RAG vector retrieval will skyrocket by 30%!

It is not only applicable to various tasks such as "text-to-text search", "image-to-image search", "text-to-image search", and "recommendation system recall"; moreover, it has good scalability and is suitable for large-scale applications at the level of billions or tens of billions.

The team of Gao Yunjun and Ke Xiangyu from Zhejiang University, in collaboration with Fu Cong, a leading figure in the field of vector retrieval, has open-sourced a new method PSP (Proximity graph with Spherical Pathway), which breaks through two major problems of RAG.

To put it simply, mainstream vector retrieval methods are designed based on Euclidean distance, mainly focusing on "who is closest to you"; however, sometimes AI actually needs to compare "semantic relevance", that is, the maximum inner product and see who is the most similar.

Previous inner product retrieval methods cannot satisfy the mathematical triangular relationship like Euclidean distance retrieval methods, so many old methods fail.

PSP found that with only minor modifications, the old graph structure can also find the optimal solution for the maximum inner product.

Moreover, PSP also sets an early stopping strategy, which can determine whether the retrieval should end early, avoiding waste of computing power and making the search faster.

The Core Technology Behind AI Products

Vector retrieval is the core technical component that supports star AI products. It not only greatly broadens the boundaries of traditional semantic retrieval (keyword retrieval), but also works seamlessly with large models.

The key to unleashing the true potential of this technology and making the combination of vector models and vector databases truly effective lies in - choosing the right "metric space".

Although graph-based vector retrieval algorithms, such as HNSW and NSG, are favored for their excellent retrieval speed, what is often overlooked is that they are all vector retrieval algorithms designed for Euclidean space.

"Metric mismatch" is devastating in many scenarios. For many vector data suitable for "maximum inner product" retrieval, when paired with Euclidean vector algorithms, the problem of "the retrieval results being semantically irrelevant to the query" often occurs.

Looking back at the field of maximum inner product retrieval, there has not yet been a phenomenon-level retrieval algorithm like HNSW or NSG. Many previous works often only performed well on certain datasets, but the performance would degrade significantly when the dataset was changed.

The Key to Breaking the Deadlock: Achieve the Global Maximum Inner Product Solution by Modifying Only 2 Lines of Code

The research team found through theoretical exploration that research in the field of maximum inner product retrieval is clearly divided into two paradigms:

One is to convert the maximum inner product into the minimum Euclidean distance, and then it can be solved using HNSW and NSG. However, this conversion often comes with information loss or non-linear transformation of the topological space, and these problems will have varying degrees of negative impacts on the search results.

The other is to perform retrieval directly in the inner product space without space conversion. The advantage of doing this is to avoid information loss or space distortion, but the corresponding pain point is the lack of effective means to prune the invalid retrieval space, making it difficult to achieve better retrieval speed.

Why is it so difficult to perform retrieval directly in the inner product space?

The most core reason is that the inner product space is not a "metric space" in the strict sense. Mathematically, for a space to be called a "metric space", it needs to meet many conditions. Typically, the Euclidean space we are most familiar with is a metric space. As an "incomplete space", the most important property the inner product space lacks is the "triangle inequality".

According to the theoretical part of the NSG paper, the reason why state-of-the-art vector retrieval algorithms such as HNSW, NSG, and SSG are so efficient is that they all use the triangle inequality to efficiently prune the index structure (graph structure).

When the inner product is used as the distance metric, the constructed triangle does not satisfy the well-known rule that "the sum of any two sides of a triangle is greater than the third side, and the difference between any two sides is less than the third side". It is the lack of this property that hinders the further development of the maximum inner product retrieval algorithm.

The PSP research team conducted in-depth research on this problem and theoretically proved one thing: for any search request, that is, the Query point q, on a graph index structure designed for Euclidean distance, the globally optimal maximum inner product solution can be found through a simple greedy algorithm.

Graph-based vector retrieval algorithms all use the greedy algorithm for retrieval: when we start walking on the graph from a random point, algorithms like NSG will find the "closest" neighbor to the target among the neighbors of the points on the path and jump to it, gradually jumping from neighbor to neighbor to the globally optimal solution.

The theoretical requirement implicitly assumed by this greedy algorithm was that if the graph is constructed using Euclidean distance to express "far and near", then the greedy walk also needs to use Euclidean distance to define far and near.

The significance of the research results of the PSP team lies in that if the graph is constructed using Euclidean distance, the inner product can be used to define far and near during the greedy walk, and the final destination will be the globally optimal maximum inner product solution!

Therefore, the research team can adapt an existing Euclidean algorithm to the maximum inner product by only modifying two lines of code in the retrieval (greedy walk) algorithm:

△

Optimization: Reasonably Guide Search Behavior to Avoid Redundant Computation

The PSP research team found that there is a large amount of redundant computation during the maximum inner product retrieval process, and this redundancy can be avoided by reasonably guiding the search behavior.

The search behavior in the maximum inner product is very different from that in the Euclidean space, as shown in the following figure:

In the left figure, the closest Euclidean neighbor of the green box (query) is the red triangle, but its closest neighbor in terms of the maximum inner product is the orange square. Therefore, when searching for the closest Euclidean neighbor of the query, the walking behavior will quickly stop near the triangle, but when searching for its closest neighbor in terms of the maximum inner product, it will continue to move to the "periphery" near the orange square.

From a more macroscopic perspective, the research team found that the solution space of the maximum inner product retrieval often lies in the "periphery" of the dataset (unlike the closest neighbor in Euclidean distance, which can exist anywhere in the data space). Therefore, the search behavior of the maximum inner product often follows a pattern of "from the inside out and then expanding on the periphery" (as shown in the right figure above).

Based on this characteristic, PSP will design targeted strategies to make the starting points of the graph search be distributed in areas closer to the "answer" as much as possible.

At the same time, redundancy not only occurs in the early stage of the search process but also is highly concentrated in the later stage of the search process.

As shown in the figure above, the PSP research team found that the "minimum number of steps" to search for the exact solution on the graph index varies with the Query and shows an obvious long-tail distribution (Figure a). They also mined four types of "features" through a large number of experiments to help us determine when the search should stop (Figure b). These four types of features can be calculated and recorded at a very low cost during the search process to implement an adaptive "early stopping" strategy.

Specifically, a part of points can be randomly sampled from the database as queries. By searching for them, the data before and after the optimal stopping steps can be collected to form a classifiable sample. Then, this sample can be used to train a decision tree to assist the search process in judging the stopping conditions:

As shown in the figure above, by pruning the decision tree, the research team can keep the entire decision tree at a relatively small height. Choosing the decision tree as the classifier can effectively fit a small number of samples and directly translate it into if-else statements to be embedded in the search code, realizing efficient "stop judgment".

Performance Test: Stability, Efficiency, and Scalability

To fully test the effectiveness of the PSP algorithm, the research team conducted comprehensive tests on 8 large-scale, high-dimensional datasets. In terms of dimensions, DBpedia100K and DBpedia1M have as high as 1536 and 3072 dimensions respectively, which were extracted using the OpenAI text-embedding-3-large model; in terms of quantity, the largest dataset, Commerce100M, contains 100 million database points.

When comparing vector retrieval algorithms, we often focus on the retrieval speed at the same recall rate, that is, Query-Per-Second (QPS). As can be seen from the figure above, PSP has a stable and obvious improvement compared with the existing state-of-the-art methods. On the MNIST data, it even exceeds the second-place method by more than 4 times.

It is worth noting that some of the baseline methods often "disappear" from the graph. This is because their performance is far worse than other methods, and it is difficult to plot them on the same graph as other methods. For example, ip-HNSW is absent from the MNIST dataset; ScaNN is absent from Laion10M and Commerce100M, etc. This highlights the stability of PSP's performance.

In addition, the datasets used include various data modalities such as "text-to-text search", "image-to-image search", "text-to-image search", and "recommendation system recall", which reflects the strong generalization ability of PSP.

In addition to comparing retrieval performance, another important dimension for evaluating the application value of a vector retrieval algorithm is scalability. Good retrieval requires a time complexity that grows far less than linearly.

As can be seen from the figure above, PSP shows a time complexity that grows at a rate of log(N) for the Top-1 nearest neighbor. And for Top-K retrieval, it shows a complexity close to log(N). This reflects the excellent scalability of PSP, that is, the potential for efficient retrieval on data at the level of billions or even tens of billions.

Paper link: https://arxiv.org/pdf/2503.06882

Github link: https://github.com/ZJU-DAILY/PSP

This article is from the WeChat official account "QbitAI", author: PSP team. Republished by 36Kr with permission.

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

By modifying only two lines of code, the efficiency of RAG surges by 30%. It is applicable to various tasks and can be scaled up to applications with data on the scale of tens of billions.

The Core Technology Behind AI Products

The Key to Breaking the Deadlock: Achieve the Global Maximum Inner Product Solution by Modifying Only 2 Lines of Code

Optimization: Reasonably Guide Search Behavior to Avoid Redundant Computation

Performance Test: Stability, Efficiency, and Scalability