Generative Query Expansion

Introduction

In many query expansion studies, the primary evaluation depends on its search result. This project tries to provide a second way1 of measuring QE performance by measuring how many query expansion terms overlaps with ideal expansion terms, where ideal expansion terms are trained from a high performing QE from supervised learning. After expansion, we used Okapi BM25 for ranking(searching).

Approaches

(i) Try to find expansion cadidates using GQE model which tries find the terms that maximizes the probability of coocurrening terms.
(ii) Try to see if maximizing probabilities of coocurring terms is the right way
(iii) Try to find cohesiveness of coocurring terms

Finding Ideal Expansion Terms

We need to find ideal expansion terms first, so we have a criterion for comparing model's ouput. A supervised learning can find expansion terms that performs search well as it gives a result with high P@10 and MAP. The ideal expansion from training on AP88/89 performs as following:

Expansion p@5 p@10 MAP@10 MAP@100 MAP@1k Recall@1k Recall@10k
+0 terms 0.41 0.371 0.28 < 0.25 < 0.25 0.662 0.861
+400 0.906 0.827 0.817 0.556 0.546 0.8578 0.9642
+2000 0.98 0.959 0.9832 0.8734 0.7935 0.8825 0.9495

In early work, the goal was to find 400-expansion terms. Later, we found that 400 was not the upper bound but 2000+ terms could be used for expansion.

Early Work

The first approach assumes that 400 expansion terms are enough quantity for QE. The top-100 terms that maxmizes the probability of coocurrence decides high amount of MAP. So, finding these top-100 terms are important as top-100 contributes most to the maximization. GQE finds top 100 as below:

generated terrain colors

The above graph is in query level measurment which shows how many exansion terms could be found that are in top-100 ideal expansion terms.

The larger model finds more ideal expansion terms, but so is the false positive expansion candidates.

Quality vs Quantity

The early work tries find terms that maximizes the probability of coocurrence, and it focused on top 100 terms. During the process, we found that the number of expandable terms can be more than 1000 and it gives a much better result as we see below:

Expansion p@5 p@10 MAP@10 MAP@100 MAP@1k
+400 0.906 0.827 0.817 0.556 0.546
+2000 0.98 0.959 0.9832 0.8734 0.7935

Let's use Monte Carlo method to make sure that the quantity matters more than quality; we randomly pick n terms from the ideal expansion term pool that we got from supervised learning, and see how it performs below:

Max Cooc 400 Monte Carlo @400 Monte Carlo @500 Monte Carlo @600 Monte Carlo @700 Monte Carlo @800 Monte Carlo @900 Monte Carlo @1000
MAP@100: 0.556 0.568 0.614 0.653 0.682 0.704 0.725 0.742

Max Cooc is using 400 expansion terms which come from maximized term co-occurence probability; this is no better than QE with 400 randomly picked expansion terms from ideal expansion pool as shown in Monte Carlo@400 method. As random picks are better than the co-occurence proability maximization method, it's better to focus on finding terms quantitatively.

Recent Work

The idea of finding top 100 ideal expansion is not the right direction as we found that 400 ideal expansion terms were at its local maxima for its performance. The quantity of expansion term mattered more than the quality, where the quality is high when the ideal expansion terms maximizes the co-occurrence probability of the expansion terms.

GQE seems to capture a lot of ideal expansion terms:

GQE first n-terms 0-50 0-100 0-200 0-300 0-400 0-Max
Avg ideal terms found 41 82 161 236 304 1156
Precision 0.82 0.82 0.805 07867 0.76 n/a

The precision of finding these ideal expansion terms with GQE is quite high, but its error is not low enough yet. But, with appropriate terms injected, GQE's extension can be useful. In the first feedback loop, we get enough positive terms that can lower the error rates of GQE. With one feedback and GQE expansion, the search results improves as below:

No GQE GQE+50 GQE+100 GQE+150 GQE+200 GQE+250 GQE+300 GQE+350
p@5 0.41 0.496 0.506 0.516 0.522 0.526 0.534 0.514
p@10 0.371 0.391 0.404 0.411 0.417 0.428 0.42 0.415

GQE+n is for queries expanded with n-GQE terms and 1 feedback. The injection of true positive terms from a feedback helps to reduce GQE's error rate, and improves the search performance.

Conclusion

So far, we've found that GQE captures a great portion of ideal expansion terms and is useful when it's used with the first feedback. If we find a way to increase the precision of finding GQE candidates(ie. filter out false positive terms), then the query expansion will improve even more as GQE's candidate terms are becoming closer to ideal expansion terms.

Acknowledgement

This project is advised by Prof. Cheng Xiang Zhai


  1. This project is currently under progress. ↩︎

Avatar
Byung Il Choi

My research interests include Deep Learning, Information Retreival, and Computer Graphics.

Related