publications
2026
- HalNonparametric Kernel Clustering with Bandit FeedbackVictor Thuot, Sebastian Vogt , Debarghya Ghoshdastidar , and Nicolas VerzelenIn , Jan 2026
Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query the items to receive noisy observations. The problem is formally posed as the task of partitioning the arms of an N-armed stochastic bandit according to their underlying distributions, grouping two arms together if and only if they share the same distribution, using samples collected sequentially and adaptively. This setting has gained attention in recent years due to its applicability in recommendation systems and crowdsourcing. Existing works on clustering with bandit feedback rely on a strong assumption that the underlying distributions are sub-Gaussian. As a consequence, the existing methods mainly cover settings with linearly-separable clusters, which has little practical relevance. We introduce a framework of “nonparametric clustering with bandit feedback”, where the underlying arm distributions are not constrained to any parametric, and hence, it is applicable for active clustering of real-world datasets. We adopt a kernel-based approach, which allows us to reformulate the nonparametric problem as the task of clustering the arms according to their kernel mean embeddings in a reproducing kernel Hilbert space (RKHS). Building on this formulation, we introduce the KABC algorithm with theoretical correctness guarantees and analyze its sampling budget. We introduce a notion of signal-to-noise ratio for this problem that depends on the maximum mean discrepancy (MMD) between the arm distributions and on their variance in the RKHS. Our algorithm is adaptive to this unknown quantity: it does not require it as an input yet achieves instance-dependent guarantees.
@inproceedings{thuot2026kernel, title = {{Nonparametric Kernel Clustering with Bandit Feedback}}, author = {Thuot, Victor and Vogt, Sebastian and Ghoshdastidar, Debarghya and Verzelen, Nicolas}, url = {https://hal.science/hal-05451062}, year = {2026}, month = jan, }
2025
- ALTClustering with bandit feedback: breaking down the computation/information gapIn Proceedings of The 36th International Conference on Algorithmic Learning Theory , Feb 2025
We investigate the Clustering with Bandit feedback Problem (CBP). A learner interacts with an N-armed stochastic bandit with d-dimensional subGaussian feedback. There exists a hidden partition of the arms into K groups,such that arms within the same group, share the same mean vector. The learner’s task is to uncover this hidden partition with the smallest budget - i.e. the least number of observation - and with a probability of error smaller than a prescribed constant δ. In this paper, (i) we derive a non asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the bandit setting.
@inproceedings{pmlr-v272-thuot25a, title = {Clustering with bandit feedback: breaking down the computation/information gap}, author = {Thuot, Victor and Carpentier, Alexandra and Giraud, Christophe and Verzelen, Nicolas}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {1221--1284}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = feb, publisher = {PMLR}, url = {https://proceedings.mlr.press/v272/thuot25a.html}, } - ICMLClustering Items through Bandit Feedback: Finding the Right Feature out of ManyMaximilian Graf, Victor Thuot, and Nicolas VerzelenIn Proceedings of The forty-second International Conference on Machine Learning , Jul 2025
We study the problem of clustering a set of items based on bandit feedback. Each of the n items is characterized by a feature vector, with a possibly large dimension d. The items are partitioned into two unknown groups, such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item’s feature. The learner’s objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least 1 − δ, we obtain an accurate recovery of the partition and derive an upper bound on the budget required. Furthermore, we obtain an instance-dependent lower bound, which is tight in some relevant cases.
@inproceedings{graf2025clustering, title = {{Clustering Items through Bandit Feedback: Finding the Right Feature out of Many}}, author = {Graf, Maximilian and Thuot, Victor and Verzelen, Nicolas}, url = {https://openreview.net/forum?id=99zsyZpUqp}, year = {2025}, month = jul, booktitle = {Proceedings of The forty-second International Conference on Machine Learning}, }