publications
2025
- ProceedingsClustering with bandit feedback: breaking down the computation/information gap24–27 feb 2025
We investigate the Clustering with Bandit feedback Problem (CBP). A learner interacts with an N-armed stochastic bandit with d-dimensional subGaussian feedback. There exists a hidden partition of the arms into K groups,such that arms within the same group, share the same mean vector. The learner’s task is to uncover this hidden partition with the smallest budget - i.e. the least number of observation - and with a probability of error smaller than a prescribed constant δ. In this paper, (i) we derive a non asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the bandit setting.
@proceedings{pmlr-v272-thuot25a, title = {Clustering with bandit feedback: breaking down the computation/information gap}, author = {Thuot, Victor and Carpentier, Alexandra and Giraud, Christophe and Verzelen, Nicolas}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {1221--1284}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = {24--27 Feb}, publisher = {PMLR}, url = {https://proceedings.mlr.press/v272/thuot25a.html} }
- PreprintClustering Items through Bandit Feedback: Finding the Right Feature out of ManyMaximilian Graf, Victor Thuot, and Nicolas VerzelenMar 2025working paper or preprint
We study the problem of clustering a set of items based on bandit feedback. Each of the n items is characterized by a feature vector, with a possibly large dimension d. The items are partitioned into two unknown groups, such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item’s feature. The learner’s objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least 1-δ, we obtain an accurate recovery of the partition and derive an upper bound on the budget required. Furthermore, we derive an instance-dependent lower bound, which is tight in some relevant cases.
@unpublished{graf:hal-04990410, title = {{Clustering Items through Bandit Feedback: Finding the Right Feature out of Many}}, author = {Graf, Maximilian and Thuot, Victor and Verzelen, Nicolas}, url = {https://hal.science/hal-04990410}, note = {working paper or preprint}, year = {2025}, month = mar, hal_id = {hal-04990410}, hal_version = {v2} }