Multi-heuristic Stochastic Sampling Search: Extreme Value Theory and the Max K-Armed Bandit

Vincent A. Cicirello

Multi-heuristic Stochastic Sampling Search: Extreme Value Theory and the Max K-Armed Bandit

Vincent A. Cicirello

Technical Report AI-09-003, Cicirello.org, March 2009.

Show BibTeX

@techreport{AI-09-003,
  title = {Multi-heuristic Stochastic Sampling Search: Extreme Value Theory and the Max $K$-Armed Bandit},
  author = {Vincent A. Cicirello},
  year = {2009},
  month = {March},
  number = {AI-09-003},
  institution = {Cicirello.org},
  url = {https://reports.cicirello.org/09/003/VAC-TR-09-003.pdf}
}

Download BibTeX file

Abstract

In this paper, we develop theoretical foundations for integrating multiple heuristics within a stochastic sampling search procedure. Such a capability is important in problem domains where several heuristics are available but none dominate across all problem instances. In order to utilize multiple heuristics, a search algorithm needs a mechanism for selecting from among the alternatives. We can view this selection problem as an online learning problem where the search framework builds models of the distributions of the quality of solutions given by the heuristics and uses those models as guidance for selecting heuristics throughout the search process. One key observation is that the distribution of solution values obtained across multiple runs of a stochastic sampling search algorithm using a strong domain heuristic is heavy-tailed. Modeling these solution quality distributions requires turning to extremal value theory. In particular, we motivate the use of a distribution known as the Generalized Extreme Value (GEV) distribution for our model of the performance of a search heuristic. We show specifically that it can be quite advantageous to abandon the more typical normality assumptions in favor of this GEV model. Secondly, we develop an exploration policy for allocating trials of a stochastic sampling algorithm amongst the set of component heuristics. This policy follows from an analysis of a new variation of the well known multiarmed bandit problem that we call the Max K-Armed Bandit. The analysis indicates that under our GEV model, a double exponentially increasing rate of trials of the stochastic sampling algorithm should be allocated to the observed best heuristic relative to the number of trials allocated to the alternative heuristics. We validate our search framework, which we call Quality Distribution Based sEArch CONtrol (QD-BEACON), empirically for two NP-Hard scheduling problems: (1) weighted tardiness scheduling and (2) resource constrained project scheduling with time windows (RCPSP/max). Our approach leads to an algorithm that is competitive with the current best heuristic algorithms for each of these problems, including finding new best solutions to some difficult instances of RCPSP/max.