Publications > Scalable Systems > Copeland Dueling Bandits

Copeland Dueling Bandits

Publication

Dec 7, 2015

Abstract

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form O(K log T) but require restrictive assumptions, or offer bounds of the form O(K^2 log T ) without requiring such assumptions. Our results offer the best of both worlds: O(K log T ) bounds without restrictive assumptions.

Download

Venue:

Neural Information Processing Systems (NIPS 2015)

Type:

Conference/Workshop Paper

Authors:

Masrour Zoghi
Shimon Whiteson
Maarten De Rijke

BibTeX

@inproceedings{ author = {Masrour Zoghi and Shimon Whiteson and Maarten De Rijke}, title = {Copeland Dueling Bandits}, booktitle = {Proceedings of Neural Information Processing Systems}, year = {2015} }

- Help
- About our ads

Copeland Dueling Bandits

Publication

Abstract

Neural Information Processing Systems (NIPS 2015)

Conference/Workshop Paper

Masrour Zoghi

Shimon Whiteson

Maarten De Rijke

BibTeX