publications | Victor Thuot

2026

Hal
Nonparametric Kernel Clustering with Bandit Feedback

Victor Thuot, Sebastian Vogt , Debarghya Ghoshdastidar , and Nicolas Verzelen

In , Jan 2026

Abs Bib PDF

Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query the items to receive noisy observations. The problem is formally posed as the task of partitioning the arms of an N-armed stochastic bandit according to their underlying distributions, grouping two arms together if and only if they share the same distribution, using samples collected sequentially and adaptively. This setting has gained attention in recent years due to its applicability in recommendation systems and crowdsourcing. Existing works on clustering with bandit feedback rely on a strong assumption that the underlying distributions are sub-Gaussian. As a consequence, the existing methods mainly cover settings with linearly-separable clusters, which has little practical relevance. We introduce a framework of “nonparametric clustering with bandit feedback”, where the underlying arm distributions are not constrained to any parametric, and hence, it is applicable for active clustering of real-world datasets. We adopt a kernel-based approach, which allows us to reformulate the nonparametric problem as the task of clustering the arms according to their kernel mean embeddings in a reproducing kernel Hilbert space (RKHS). Building on this formulation, we introduce the KABC algorithm with theoretical correctness guarantees and analyze its sampling budget. We introduce a notion of signal-to-noise ratio for this problem that depends on the maximum mean discrepancy (MMD) between the arm distributions and on their variance in the RKHS. Our algorithm is adaptive to this unknown quantity: it does not require it as an input yet achieves instance-dependent guarantees.
@inproceedings{thuot2026kernel, title = {{Nonparametric Kernel Clustering with Bandit Feedback}}, author = {Thuot, Victor and Vogt, Sebastian and Ghoshdastidar, Debarghya and Verzelen, Nicolas}, url = {https://hal.science/hal-05451062}, year = {2026}, month = jan, }
Hal
The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits

El Mehdi Saad, Victor Thuot, and Nicolas Verzelen

In , Mar 2026

Abs Bib PDF

We study best-arm identification in stochastic dueling bandits under the sole assumption that a Condorcet winner exists, i.e., an arm that wins each noisy pairwise comparison with probability at least 1/2. We introduce a new identification procedure that exploits the full gap matrix Δ𝑖,𝑗=𝑞𝑖,𝑗−12 (where 𝑞𝑖,𝑗 is the probability that arm 𝑖 beats arm 𝑗), rather than only the gaps between the Condorcet winner and the other arms. We derive high-probability, instance-dependent sample-complexity guarantees that (up to logarithmic factors) improve the best known ones by leveraging informative comparisons beyond those involving the winner. We complement these results with new lower bounds which, to our knowledge, are the first for Condorcet-winner identification in stochastic dueling bandits. Our lower-bound analysis isolates the intrinsic cost of locating informative entries in the gap matrix and estimating them to the required confidence, establishing the optimality of our non-asymptotic bounds. Overall, our results reveal new regimes and trade-offs in the sample complexity that are not captured by asymptotic analyses based only on the expected budget.
@inproceedings{saad:hal-05551276, title = {{The Sampling Complexity of Condorcet Winner Identification in Dueling Bandits}}, author = {Saad, El Mehdi and Thuot, Victor and Verzelen, Nicolas}, url = {https://hal.science/hal-05551276}, year = {2026}, month = mar, keywords = {Best arm identification ; Dueling bandits ; Query complexity ; Condorcet winner. ; Best arm identification Dueling bandits Query complexity Condorcet winner ; Condorcet winner}, }

2025

ALT
Clustering with bandit feedback: breaking down the computation/information gap

Victor Thuot, Alexandra Carpentier, Christophe Giraud, and Nicolas Verzelen

In Proceedings of The 36th International Conference on Algorithmic Learning Theory , Feb 2025

Abs Bib PDF Video Poster Slides

We investigate the Clustering with Bandit feedback Problem (CBP). A learner interacts with an N-armed stochastic bandit with d-dimensional subGaussian feedback. There exists a hidden partition of the arms into K groups,such that arms within the same group, share the same mean vector. The learner’s task is to uncover this hidden partition with the smallest budget - i.e. the least number of observation - and with a probability of error smaller than a prescribed constant δ. In this paper, (i) we derive a non asymptotic lower bound for the budget, and (ii) we introduce the computationally efficient ACB algorithm, whose budget matches the lower bound in most regimes. We improve on the performance of a uniform sampling strategy. Importantly, contrary to the batch setting, we establish that there is no computation-information gap in the bandit setting.
@inproceedings{pmlr-v272-thuot25a, title = {Clustering with bandit feedback: breaking down the computation/information gap}, author = {Thuot, Victor and Carpentier, Alexandra and Giraud, Christophe and Verzelen, Nicolas}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {1221--1284}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = feb, publisher = {PMLR}, url = {https://proceedings.mlr.press/v272/thuot25a.html}, }
ICML
Clustering Items through Bandit Feedback: Finding the Right Feature out of Many

Maximilian Graf, Victor Thuot, and Nicolas Verzelen

In Proceedings of The forty-second International Conference on Machine Learning , Jul 2025

Abs Bib PDF Poster

We study the problem of clustering a set of items based on bandit feedback. Each of the n items is characterized by a feature vector, with a possibly large dimension d. The items are partitioned into two unknown groups, such that items within the same group share the same feature vector. We consider a sequential and adaptive setting in which, at each round, the learner selects one item and one feature, then observes a noisy evaluation of the item’s feature. The learner’s objective is to recover the correct partition of the items, while keeping the number of observations as small as possible. We provide an algorithm which relies on finding a relevant feature for the clustering task, leveraging the Sequential Halving algorithm. With probability at least 1 − δ, we obtain an accurate recovery of the partition and derive an upper bound on the budget required. Furthermore, we obtain an instance-dependent lower bound, which is tight in some relevant cases.
@inproceedings{graf2025clustering, title = {{Clustering Items through Bandit Feedback: Finding the Right Feature out of Many}}, author = {Graf, Maximilian and Thuot, Victor and Verzelen, Nicolas}, url = {https://openreview.net/forum?id=99zsyZpUqp}, year = {2025}, month = jul, booktitle = {Proceedings of The forty-second International Conference on Machine Learning}, }