Sanjoy Dasgupta
Random Projection Trees and Low-Dimensional Manifolds
The curse of dimensionality has traditionally been the bane of nonparametric statistics (histograms, kernel density estimation, nearest neighbor search, and so on), as reflected in running times and convergence rates that are exponentially bad in the dimension. This problem is all the more pressing as data sets get increasingly high dimensional. Recently the field has been rejuvenated in several ways, of which perhaps the most promising is the realization that a lot of real-world data which appears high-dimensional in fact has low "intrinsic" dimension in the sense of lying close to a low-dimensional manifold. In the past few years, there has been a huge interest in learning such manifolds from data, and then using the learned structure to transform the data into a lower-dimensional space where standard statistical methods generically work better. I'll exhibit a way to benefit from intrinsic low dimensionality without having to go to the trouble of explicitly learning its fine structure. Specifically, I'll present a simple variant of the ubiquitous k-d tree (a spatial data structure widely used in machine learning and statistics) that is provably adaptive to low dimensional structure. We call this a "random projection tree" (RP tree). Along the way, I'll discuss different notions of intrinsic dimension -- motivated by manifolds, by local statistics, and by analysis on metric spaces -- and relate them. I'll then prove that RP trees require resources that depend only on these dimensions rather than the dimension of the space in which the data happens to be situated.
This is work with Yoav Freund (UC San Diego).
Mehryar Mohri
Learning to Rank with QuickSort
This talk discusses several theoretical aspects of the learning problem of ranking, including its connections with AUC and the standard classification problem. It analyzes previous results in the score-based and preference-based settings and presents a series of novel results for both the bipartite ranking and the general ranking problems.
This presentation is based on recent work with Nir Ailon (Google Research) and previous work with Corinna Cortes (Google Research).
Peter Auer
Analysis of optimistic algorithms for the exploration/exploitation trade-off
We consider decision problems where repeatedly decisions, associated with some gain, need to be made. Making a decision, one may rely on the information collected so far (exploitation), or one may try to collect further information (exploration). The risk of an exploitative decision is that a better decision - with higher gain - is not recognized because of insufficient information. The risk of an explorative decision is its non-optimality, typically. Optimistic algorithms deal with this exploration/exploitation trade-off implicitly, by assuming the most favourable gain process which is consistent with the information collected so far. Decisions are made based on this optimistic assumption. In my talk I will show how such optimistic algorithms can be analysed, by examples for the bandit problem and for the reinforcement learning problem. While the generic part of these analyses is very similar, the more technical part needs to bound the distance between the optimistically assumed gain process and the actual gain process.
Alexander Rakhlin
Online Learning with Limited Feedback
One's ability to learn and make decisions rests heavily on the availability of feedback. In sequential decision-making problems such feedback is often limited. A gambler, for example, can observe entirely the outcome of a horse race regardless of where he placed his bet; however, when the same gambler chooses his route to travel to the race track, perhaps at a busy hour, he will likely never learn the outcome of possible alternatives. The latter limited-feedback problem is the focus of this talk. The problem can be phrased as an Online Linear Optimization game with "bandit" feedback. The existence of a low-regret algorithm has been an open question since the work of Awerbuch and Kleinberg in 2004. A recent reduction to the multi-armed bandit problem allowed Dani, Hayes, and Kakade to achieve a low-regret guarantee, unfortunately at the expense of computational efficiency. We present the first known efficient low-regret algorithm for bandit Online Linear Optimization over arbitrary convex decision sets. We show how the difficulties encountered by previous approaches are overcome by employing regularization. Our solution reveals surprising connections between online learning and Interior Point methods in Optimization. We also discuss an emerging phenomenon: regret guarantees for stochastic and adversarial settings turn out to be the same. In particular, our method solves the Online Shortest Path problem: at each round, a path from source to sink is chosen and only the total length (delay) of this path is revealed. A low-regret algorithm for this problem has applications in network routing, resource allocation, dynamic treatment of patients, and more. The worst-case guarantees enjoyed by our method imply robustness with respect to noise and malicious adversary.
Joint work with Jacob Abernethy and Elad Hazan
Ingo Steinwart
Estimating Conditional Quantiles
We first recall a recently proposed support vector machine (SVM) formulation for the problem of estimating conditional quantiles. Since this SVM is based on the so-called pinball loss we then investigate this loss and its quantitative relation to conditional quantiles in detail. Finally, we apply these findings to describe the learning performance of the corresponding SVM.
Adam Klivans
Efficient Algorithms for Agnostic Learning
The Agnostic framework of learning is a notoriously difficult generalization of the PAC learning model where a learner must succeed in the presence of adversarial noise. Over the last decade in computational learning theory, almost every result regarding agnostic learning has been negative. In recent work, we have obtained the first positive results for agnostically learning well-studied concept classes (such as halfspaces and decision trees) with respect to natural distributions. Most of the talk will be devoted to presenting a polynomial-time query algorithm for agnostically learning decision trees with respect to the uniform distribution over {0,1}^n.
Joint work with Parkshit Gopalan and Adam Kalai.
Andrea Caponnetto
Properties of regularization operators in learning theory
We consider the properties of a large class of learning algorithms defined in terms of classical regularization operators for ill-posed problems. This class includes regularized least-squares, Landweber method, $\nu$-methods and truncated singular value decomposition on hypotyesis spaces of vector-valued functions defined in terms of suitable reproducing kernels. In particular universal consistency, minimax rates and statistical adaptation of the methods we will be discussed.
Ding-Xuan Zhou
Analysis of Some Learning Algorithms with Gaussians
Gaussian kernels with flexible variances provide a rich family of Mercer kernels for learning algorithms. They are used to learn function or data components with different frequencies when the variances change. We show that the union of the unit balls of reproducing kernel Hilbert spaces generated by Gaussian kernels with flexible variances is a uniform Glivenko-Cantelli class. This confirms the learnability of Gaussian kernels and verifies the uniform convergence of many multi-scale learning algorithms involving Gaussians with changing variances. In particular, regularization schemes for regression with the least square loss, classification with a general convex loss, and graph Laplacians will be discussed.
Massi Pontil
Spectral Regularization for Transfer Learning
We consider the problem of transfer learning, in which tasks encountered in the past are used to choose a data representation which is expected to work well on future tasks. Each task is assumed to be binary classification or regression in a Hilbert space. We first review previous work on this problem within the machine learning as well the statistics communities. A common idea which emerges is that of learning a low dimensional representation shared by the tasks. We present a novel method which implements this idea via regularization with spectral functions of matrices. The representation is chosen to minimize an empirical error criterion plus a regularization term, which encourages the tasks to share a low dimensional linear representation. We show theoretically and experimentally that this type of regularization can be solved efficiently by an alternating minimization algorithm. Next, we discuss extensions of the above method in which the observed tasks are divided into groups and a distinct linear representation is learned for each group. To learn a future task, one learns a linear function with the representation which leads to the minimal regularized empirical error criterion. The expected error of this method when applied to a future task is shown to be uniformly bounded, over the set of candidate representations, by the empirical error criterion. The bound is independent of the dimension of the Hilbert space. The advantages of transfer learning over single task learning and the advantages of task grouping over no grouping are discussed.
This is joint work with Andreas Argyriou, Theodoros Evgeniou, Andreas Maurer and Charles Micchelli.
Mihail Belkin
Spectral and geometric methods in learning
In recent years a variety of spectral and geometry-based methods have become popular for various tasks of machine learning, such as dimensionality reduction, clustering and semi-supervised learning. In the talk I will discuss some of these methods and discuss recent theoretical results on their convergence. I will also talk about how spectral methods can be used to learn mixtures of Gaussian distributions.
Sham Kakade
On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
We provide sharp bounds for Rademacher and Gaussian complexities of (constrained) linear classes. These bounds make short work of providing a number of corollaries including: risk bounds for linear prediction (including settings where the weight vectors are constrained by either L_2 or L_1 constraints), margin bounds (including both L_2 and L_1 margins, along with more general notions based on relative entropy), a proof of the PAC-Bayes theorem, and L_2 covering numbers (with L_p norm constraints and relative entropy constraints). In addition to providing a unified analysis, the results herein provide some of the sharpest risk and margin bounds (improving upon a number of previous results). Interestingly, our results show that the uniform convergence rates of empirical risk minimization algorithms tightly match the regret bounds of online learning algorithms for linear prediction (up to a constant factor of 2).
With Karthik Sridharan and Ambuj Tewari
Claudio Gentile
Linear Algorithms for On-Line Multitask/Multiview Learning
In this talk, we present recent advances in the design and the analysis of interacting online linear algorithms for learning multitask and multiview classification problems. The algorithms perform better than independent learners whenever the tasks are related in a certain sense. We formalize task relatedness in different ways, and derive formal guarantees on the performance advantage provided by interaction. We believe our online analysis gives new insights into previously known co-regularization techniques, such as multitask kernels, and multiview spectral co-regularization. In particular, we present a natural matrix extension of the so-called quasi-additive algorithm for classification and show bounds depending on certain unitarily invariant norms of the matrix of task coefficients.
Joint work with G. Cavallanti and N. Cesa-Bianchi.