IOL Lab @ TUB/ZIB

ZIB-AISST Research Seminar and Lecture Series

List of talks for the AISST (Artificial Intelligence in Science, Society, and Technology) seminar and lecture series at Zuse Institute Berlin. This seminar serves two purposes:

  1. we host researchers for presenting their recent work, and
  2. we organize tutorial lectures on useful topics which typically do not appear in graduate coursework.

Presentations typically occur on Wednesday afternoon in ZIB’s Seminar Room (Room 2006), and announcements are sent by e-mail. For more information, please contact Mathieu Besançon, Kartikey Sharma, or Zev Woodstock.

Photo Dr Vu Nguyen Dr Vu Nguyen (Amazon Research Australia) [homepage]
Coordinates: @ ZIB Lecture Hall (Room 2005)

Title. Bayesian Optimization with Categorical and Continuous Variables

Abstract.

Bayesian optimization (BO) has demonstrated impressive success in optimizing black-box functions. However, there are still challenges in dealing with black-boxes that include both continuous and categorical inputs. I am going to present our recent works in optimizing the mixed space of categorical and continuous variables using Bayesian optimization [B. Ru, A. Alvi, V. Nguyen, M. Osborne, and S. Roberts. “Bayesian optimisation over multiple continuous and categorical inputs.” ICML 2020] and how to scale it up to higher dimensions [X. Wan, V. Nguyen, H. Ha, B. Ru, C. Lu, and M. Osborne. “Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces.” ICML 2021] and population-based AutoRL setting [J. Parker-Holder, V. Nguyen, S. Desai, and S. Roberts. “Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL”. NeurIPS 2021].

Photo Jonathan Eckstein Jonathan Eckstein (Rutgers Business School) [homepage]
Coordinates: @ ZIB Conference Room (Room 3028)

Title. Solving Stochastic Programming Problems by Operator Splitting

Abstract.

This talk describes the solution of convex optimization problems that include uncertainty modeled by a finite but potentially very large multi-stage scenario tree.

In 1991, Rockafellar and Wets proposed the progressive hedging (PH) algorithm to solve such problems. This method has some advantages over other standard methods such as Benders decomposition, especially for problems with large numbers of decision stages. The talk will open by showing that PH is an application of the Alternating Direction Method of Multipliers (ADMM). The equivalence of PH to the ADMM has long been known but not explicitly published.

The ADMM is an example of an “operator splitting” method, and in particular of a principle called “Douglas–Rachford splitting”. I will briefly explain what is meant by an “operator splitting method”.

Next, the talk will apply a different, more recent operator splitting method called “projective splitting” to the same problem. The resulting method is called “asynchronous projective hedging” (APH). Unlike most decomposition methods, it does not need to solve every subproblem at every iteration; instead, each iteration may solve just a single subproblem or a small subset of the available subproblems.

Finally, the talk will describe work integrating the APH algorithm into mpi-sppy, a Python package for modeling and distributed parallel solution of stochastic programming problems. mpi-sppy uses the Pyomo Python-based optimization modeling sytem. Our experience includes using up to 2,400 processor cores to solve 2-stage and 4-stage test problem instances with as many as 1,000,000 scenarios.

Portions of the work described in this talk are joint with Patrick Combettes (North Carolina State University), Jean-Paul Watson (Lawrence Livermore National Laboratory, USA), and David Woodruff (University of California, Davis).

Photo David Steurer David Steurer (ETH Zürich) [homepage]
Coordinates:

Title. Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures

Abstract.

We consider mixtures of k≥2 Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated, i.e., distinct components have statistical overlap at most k-C for a large enough constant C≥1.

Previous statistical-query lower bounds [Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart, Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures (extended abstract), 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017, pp. 73–84] give formal evidence that, even for the special case of colinear means, distinguishing such mixtures from (pure) Gaussians may be exponentially hard (in k).

We show that, surprisingly, this kind of hardness can only appear if mixing weights are allowed to be exponentially small. For polynomially lower bounded mixing weights, we show how to achieve non-trivial statistical guarantees in quasi-polynomial time.

Concretely, we develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight. The algorithm can reliably distinguish between a mixture of k≥2 well-separated Gaussian components and a (pure) Gaussian distribution. As a certificate, the algorithm computes a bipartition of the input sample that separates some pairs of mixture components, i.e., both sides of the bipartition contain most of the sample points of at least one component.

For the special case of colinear means, our algorithm outputs a k-clustering of the input sample that is approximately consistent with all components of the underlying mixture. We obtain similar clustering guarantees also for the case that the overlap between any two mixture components is lower bounded quasi-polynomially in k (in addition to being upper bounded polynomially in k).

A significant challenge for our results is that they appear to be inherently sensitive to small fractions of adversarial outliers unlike most previous algorithmic results for Gaussian mixtures. The reason is that such outliers can simulate exponentially small mixing weights even for mixtures with polynomially lower bounded mixing weights.

A key technical ingredient of our algorithms is a characterization of separating directions for well-separated Gaussian components in terms of ratios of polynomials that correspond to moments of two carefully chosen orders logarithmic in the minimum mixing weight.

Erling Andersen (Mosek)
Coordinates:

Title. The value of conic optimization for analytics practitioners (Tutorial Lecture)

Abstract.

Linear optimization, also known as linear programming, is a modelling framework widely used by analytics practitioners. The reason is that many optimization problems can easily be described in this framework. Moreover, huge linear optimization problems can be solved using readily available software and computers. However, a linear model is not always a good way to describe an optimization problem since the problem may contain nonlinearities. Nevertheless such nonlinearities are often ignored or linearized because a nonlinear model is considered cumbersome. Also there are issues with local versus global optima and in general it is just much harder to work with nonlinear functions than linear functions.

Over the last 15 years a new paradigm for formulating certain nonlinear optimization problems called conic optimization has appeared. The advantage of conic optimization is that it allows the formulation of a wide variety of nonlinearities while almost keeping the simplicity and efficiency of linear optimization.

Therefore, in this presentation we will discuss what conic optimization is and why it is relevant to analytics practitioners. In particular we will discuss what can be formulated using conic optimization, illustrated by examples. We will also provide some computational results documenting that large conic optimization problems can be solved efficiently in practice. To summarize, this presentation should be interesting for everyone interested in an important recent development in nonlinear optimization.

Photo Zev Woodstock Zev Woodstock (TU/ZIB)
Coordinates:

Title. Proximity operators and nonsmooth optimization (Tutorial Lecture)

Abstract.

Proximity operators are tools which use first-order information to solve optimization problems. However, unlike gradient-based methods, algorithms involving proximity operators are guaranteed to work in nonsmooth settings. This expository talk will discuss the mathematical and numerical properties of proximity operators, how to compute them, algorithms involving them, and advice on implementation.

Talks by visitors preceding the seminar

Photo Gonzalo Muñoz Gonzalo Muñoz (Universidad de O’Higgins, Chile) [homepage]

Coordinates:

Title. A Sample Size-Efficient Polyhedral Encoding and Algorithm for Deep Neural Network Training Problems

Abstract.

Deep Learning has received much attention lately, however, results regarding the computational complexity of training deep neural networks have only recently been obtained. Moreover, all current training algorithms for DNNs with optimality guarantees possess a poor dependency on the sample size, which is typically the largest parameter. In this work, we show that the training problems of large classes of deep neural networks with various architectures admit a polyhedral representation whose size is linear in the sample size. This provides the first theoretical training algorithm with provable worst-case optimality guarantees whose running time is polynomial in the size of the sample.

Photo Ariel Liebmann Ariel Liebmann (Monash University, Australia) [homepage]

Coordinates:

Title. Optimisation, Machine Learning and AI for Rapid Grid Decarbonisation

Abstract.

The national and transcontinental electricity grids of today are based on devices such as coal furnaces, steam turbines, copper and steel wires, electric transformers, and electromechanical power switches that have remained unchanged for 100 years. However imperceptibly, the components and operational management of this great machine, the grid, has began to change irreversibly. This is fortunate, as climate science tells us we must reduce CO2 emissions from the energy sector to zero by 2050 and to 50% of current levels by 2030 if we are to prevent dangerous climate changes in future world that is over 1.5 degree hotter that today. Now utility scale wind and solar PV farms as large as coal, gas and nuclear generators are being deployed more cheaply than it is possible to build and operate generators using older technologies. In some cases, even these new technologies can be cheaper that even merely the operating costs of older technologies. In addition, low cost rooftop solar PV has also enabled consumers to become self-suppliers and also contributors to the supply of energy for their neighbours. Moreover, the “dumb” grid of the past, is becoming “smarter”. This is enabled through a combination of ubiquitous low-cost telecommunication and programmable devices at the edge of the grid such as smart meters, smart PV inverters, smart air conditioners and home energy management systems. The final component is the electrification of the private transport system that will finally eliminate the need for fossil fuels. The implications of this are that it is now necessary to rapidly replan and reinvest in the energy system at rates and in ways that are unprecedented in industrial civilisations history. While the majority of hardware technology already exist, the missing piece of the puzzle are new computers science technologies, and particularly Optimisation, Machine Learning, Forecasting and Data analytics methods needed to plan and operate this rapidly transforming system.

In this talk I will describe a range of ways existing computer science tools in the Optimisation, AI, ML and other areas we and others are enhancing in order to better operate and plan the existing power system. I will focus on identifying emerging research opportunities in areas that are needed to complete the transformation to a cheaper, smarter and zero carbon energy system.

Photo Masashi Sugiyama Masashi Sugiyama (RIKEN-AIP/University of Tokyo, Japan) [homepage]

Coordinates:

Title. Introduction of RIKEN Center for Advanced Intelligence Project

Abstract.

RIKEN is one of Japan’s largest fundamental-research institutions. The RIKEN Center for Advanced Intelligence Project (AIP) was created in 2016 to propose and investigate new machine learning methodologies and algorithms, and apply them to societal problems. AIP covers a wide range of topics from generic AI research (machine learning, optimization, applied math., etc.), goal-oriented AI research (material, disaster, cancer, etc.), and AI-in-society research (ethics, data circulation, laws, etc.). In the first half of my talk, I will briefly introduce our center’s activities and collaboration strategies.

Then, in the latter half, I will talk about the research activities in my team, i.e., machine learning from imperfect information. Machine learning has been successfully used to solve various real-world problems. However, we are still facing many technical challenges, for example, we want to train machine learning systems without big labeled data and we want to reliably deploy machine learning systems under noisy environments. I will overview our recent advances in tackling these problems.