Generative Query Expansion
Introduction
In many query expansion studies, the primary evaluation depends on its search result. This project tries to provide a second way1 of measuring QE performance by measuring how many query expansion terms overlaps with ideal expansion terms, where ideal expansion terms are trained from a high performing QE from supervised learning. After expansion, we used Okapi BM25 for ranking(searching).
Approaches
(i) Try to find expansion cadidates using GQE model which tries find the terms that maximizes the probability of coocurrening terms.
(ii) Try to see if maximizing probabilities of coocurring terms is the right way
(iii) Try to find cohesiveness of coocurring terms
Finding Ideal Expansion Terms
We need to find ideal expansion terms first, so we have a criterion for comparing model's ouput. A supervised learning can find expansion terms that performs search well as it gives a result with high P@10 and MAP. The ideal expansion from training on AP88/89 performs as following:
Expansion | p@5 | p@10 | MAP@10 | MAP@100 | MAP@1k | Recall@1k | Recall@10k |
---|---|---|---|---|---|---|---|
+0 terms | 0.41 | 0.371 | 0.28 | < 0.25 | < 0.25 | 0.662 | 0.861 |
+400 | 0.906 | 0.827 | 0.817 | 0.556 | 0.546 | 0.8578 | 0.9642 |
+2000 | 0.98 | 0.959 | 0.9832 | 0.8734 | 0.7935 | 0.8825 | 0.9495 |
In early work, the goal was to find 400-expansion terms. Later, we found that 400 was not the upper bound but 2000+ terms could be used for expansion.
Early Work
The first approach assumes that 400 expansion terms are enough quantity for QE. The top-100 terms that maxmizes the probability of coocurrence decides high amount of MAP. So, finding these top-100 terms are important as top-100 contributes most to the maximization. GQE finds top 100 as below:
The above graph is in query level measurment which shows how many exansion terms could be found that are in top-100 ideal expansion terms.
The larger model finds more ideal expansion terms, but so is the false positive expansion candidates.
Quality vs Quantity
The early work tries find terms that maximizes the probability of coocurrence, and it focused on top 100 terms. During the process, we found that the number of expandable terms can be more than 1000 and it gives a much better result as we see below:
Expansion | p@5 | p@10 | MAP@10 | MAP@100 | MAP@1k |
---|---|---|---|---|---|
+400 | 0.906 | 0.827 | 0.817 | 0.556 | 0.546 |
+2000 | 0.98 | 0.959 | 0.9832 | 0.8734 | 0.7935 |
Let's use Monte Carlo method to make sure that the quantity matters more than quality; we randomly pick n terms from the ideal expansion term pool that we got from supervised learning, and see how it performs below:
Max Cooc 400 | Monte Carlo @400 | Monte Carlo @500 | Monte Carlo @600 | Monte Carlo @700 | Monte Carlo @800 | Monte Carlo @900 | Monte Carlo @1000 | |||
---|---|---|---|---|---|---|---|---|---|---|
MAP@100: | 0.556 | 0.568 | 0.614 | 0.653 | 0.682 | 0.704 | 0.725 | 0.742 |
Max Cooc is using 400 expansion terms which come from maximized term co-occurence probability; this is no better than QE with 400 randomly picked expansion terms from ideal expansion pool as shown in Monte Carlo@400 method. As random picks are better than the co-occurence proability maximization method, it's better to focus on finding terms quantitatively.
Recent Work
The idea of finding top 100 ideal expansion is not the right direction as we found that 400 ideal expansion terms were at its local maxima for its performance. The quantity of expansion term mattered more than the quality, where the quality is high when the ideal expansion terms maximizes the co-occurrence probability of the expansion terms.
GQE seems to capture a lot of ideal expansion terms:
GQE first n-terms | 0-50 | 0-100 | 0-200 | 0-300 | 0-400 | 0-Max |
---|---|---|---|---|---|---|
Avg ideal terms found | 41 | 82 | 161 | 236 | 304 | 1156 |
Precision | 0.82 | 0.82 | 0.805 | 07867 | 0.76 | n/a |
The precision of finding these ideal expansion terms with GQE is quite high, but its error is not low enough yet. But, with appropriate terms injected, GQE's extension can be useful. In the first feedback loop, we get enough positive terms that can lower the error rates of GQE. With one feedback and GQE expansion, the search results improves as below:
No GQE | GQE+50 | GQE+100 | GQE+150 | GQE+200 | GQE+250 | GQE+300 | GQE+350 | |
---|---|---|---|---|---|---|---|---|
p@5 | 0.41 | 0.496 | 0.506 | 0.516 | 0.522 | 0.526 | 0.534 | 0.514 |
p@10 | 0.371 | 0.391 | 0.404 | 0.411 | 0.417 | 0.428 | 0.42 | 0.415 |
GQE+n is for queries expanded with n-GQE terms and 1 feedback. The injection of true positive terms from a feedback helps to reduce GQE's error rate, and improves the search performance.
Conclusion
So far, we've found that GQE captures a great portion of ideal expansion terms and is useful when it's used with the first feedback. If we find a way to increase the precision of finding GQE candidates(ie. filter out false positive terms), then the query expansion will improve even more as GQE's candidate terms are becoming closer to ideal expansion terms.
Acknowledgement
This project is advised by Prof. Cheng Xiang Zhai
-
This project is currently under progress. ↩︎