ZIB-AISST Research Seminar and Lecture Series

List of talks for the AISST (Artificial Intelligence in Science, Society, and Technology) seminar and lecture series at Zuse Institute Berlin. This seminar serves two purposes:

  1. we host researchers for presenting their recent work, and
  2. we organize tutorial lectures on useful topics which typically do not appear in graduate coursework.

Presentations typically occur on Wednesday afternoon in ZIB’s Seminar Room (Room 2006), and announcements are sent by e-mail. For more information, please contact Mathieu Besançon, Kartikey Sharma, or Zev Woodstock.

Photo Dr. Mathias Staudigl Dr. Mathias Staudigl (Maastricht University) [homepage]

Title. On the iteration complexity of first and second-order Hessian barrier algorithms for non-convex non-smooth conic optimization


A key problem in mathematical imaging, signal processing and computational statistics is the minimization of non-convex objective functions over conic domains, which are continuous but potentially non-smooth at the boundary of the feasible set. For such problems, we propose a new family of first and second-order interior-point methods for non-convex and non-smooth conic constrained optimization problems, combining the Hessian barrier method with quadratic and cubic regularization techniques. Our approach is based on a potential-reduction mechanism and attains a suitably defined class of approximate first- or second-order KKT points with worst-case iteration complexity O(ϵ−2) and O(ϵ−3/2), respectively. Based on these findings, we develop a new double loop path-following scheme attaining the same complexity, modulo adjusting constants. These complexity bounds are known to be optimal in the unconstrained case, and our work shows that they are upper bounds in the case with complicated constraints as well. A key feature of our methodology is the use of self-concordant barriers to construct strictly feasible iterates via a disciplined decomposition approach and without sacrificing on the iteration complexity of the method. To the best of our knowledge, this work is the first which achieves these worst-case complexity bounds under such weak conditions for general conic constrained optimization problems. This is joint work with Pavel Dvurechensky (WIAS Berlin) and based on the paper: arXiv:2111.00100 [math.OC].

Dr. Daniel Blankenburg (University of Bonn)
Coordinates: @ ZIB Lecture Hall (Room 2005)

Title. Resource Sharing Revisited


We revisit the (block-angular) min-max resource sharing problem, which is a well-known generalization of fractional packing and the maximum concurrent flow problem. It consists of finding an ℓ-minimal element in a Minkowski sum X=∑c∈CXc of non-empty closed convex sets Xc⊆ℝR≥0, where C and R are finite sets. We assume that an oracle for approximate linear minimization over Xc is given.

We improve on the currently fastest known FPTAS in various ways. A major novelty of our analysis is the concept of local weak duality, which illustrates that the algorithm optimizes (close to) independent parts of the instance separately. Interestingly, this implies that the computed solution is not only approximately ℓ-minimal, but among such solutions, also its second-highest entry is approximately minimal.

Based on a result by Klein and Young, we provide a lower bound of 𝛺(((|C|+|R|) log |R|)/𝛿²) required oracle calls for a natural class of algorithms. Our FPTAS is optimal within this class — its running time matches the lower bound precisely, and thus improves on the previously best-known running time for the primal as well as the dual problem.

Photo Hiroshi Kera Hiroshi Kera (Chiba University)

Title. Approximate computation of vanishing ideals


The vanishing ideal of points is the set of all polynomials that vanish over the points. Any vanishing ideal can be generated by a finite set of vanishing polynomials or generators, and the computation of approximate generators has been developed at the intersection of computer algebra and machine learning in the last decade under the name of approximate computation of vanishing ideals. In computer algebra, the developed algorithms are supported by theories more deeply, whereas, in machine learning, the algorithms have been developed toward applications at a cost of some theoretical properties. In this talk, I will present a review on the development of approximate computation of vanishing ideals in two fields, particularly from the perspective of the spurious vanishing problem and normalization, which are recently suggested as a new direction of development.

Vladimir Kolmogorov (IST Austria)
Coordinates: @ ZIB Lecture Hall (Room 2005)

Title. On discrete optimization problems and Gomory-Hu tree construction algorithms


I will consider the problem of minimizing functions of discrete variables represented as a sum of “tractable” subproblems. First, I will briefly review recent theoretical results characterizing complexity classification of discrete optimization in the framework of “Valued Constraint Satisfaction Problems” (VCSPs). Then I will talk about algorithms for solving Lagrangian relaxations of such problems. I will describe an approach based on the Frank-Wolfe algorithm that achieves the best-known convergence rate. I will also talk about practical implementation, and in particular about in-face Frank-Wolfe directions for certain combinatorial subproblems. Implementing such directions for perfect matching subproblems boils down to computing a Gomory-Hu (GH) tree of a given graph. Time permitting, I will describe a new approach for computing GH tree that appears to lead to a state-of-the-art implementation.

Photo Karen Aardal Karen Aardal (TU Delft)
Coordinates: @ ZIB Lecture Hall (Room 2005)

Title. Integer optimization: Branching based on lattice information


There has been enormous progress in the branch-and-bound methods in the past couple of decades. In particular, much effort has been put into the so-called variable selection problem, i.e. the problem of choosing which variable to branch on in the current search node. Recently, many researchers have investigated the potential of using machine learning to find good solutions to this problem by for instance trying to mimic what good, but computationally costly, heuristics do. The main part of this research has been focused on branching on so-called elementary disjunctions, that is, branching on a single variable. Theory, such as the results by H.W. Lenstra, Jr. and by Lovász & Scarf, tells us that we in general need to consider branching on general disjunctions, but due in part to the computational challenges to implement such methods, much less work in this direction has been done. Some heuristic results in this direction have been presented.

In this talk we discuss both theoretical and heuristic results when it comes to branching on general disjunctions with an emphasis on lattice based methods. A modest computational study is also presented. In the last part of the talk we also give a short description of results from applying machine learning to the variable selection problem. The talk is based on joint work with Laurence Wolsey.

Photo Vijay Vazirani Vijay Vazirani (University of California, Irvine)

Title. Online Bipartite Matching and Adwords


Over the last three decades, the online bipartite matching (OBM) problem has emerged as a central problem in the area of Online Algorithms. Perhaps even more important is its role in the area of Matching-Based Market Design. The resurgence of this area, with the revolutions of the Internet and mobile computing, has opened up novel, path-breaking applications, and OBM has emerged as its paradigmatic algorithmic problem. In a 1990 joint paper with Richard Karp and Umesh Vazirani, we gave an optimal algorithm, called RANKING, for OBM, achieving a competitive ratio of (1 – 1/e); however, its analysis was difficult to comprehend. Over the years, several researchers simplified the analysis.

We will start by presenting a “textbook quality” proof of RANKING. Its simplicity raises the possibility of extending RANKING all the way to a generalization of OBM called the adwords problem. This problem is both notoriously difficult and very significant, the latter because of its role in the AdWords marketplace of Google. We will show how far this endeavor has gone and what remains. We will also provide a broad overview of the area of Matching-Based Market Design and pinpoint the role of OBM.

Photo Sébastien Designolle Sébastien Designolle (University of Geneva) [homepage]
Coordinates: @ ZIB Lecture Hall (Room 2005)



In quantum mechanics, performing a measurement is an invasive process which generally disturbs the system. Due to this phenomenon, there exist incompatible quantum measurements, i.e., measurements that cannot be simultaneously performed on a single copy of the system.

In this talk we will explain the robustness-based approach generally used to quantify this incompatibility and how it can be cast, for finite-dimensional systems, as a semidefinite programming problem. With this formulation at hand we analytically investigate the incompatibility properties of some high-dimensional measurements and we tackle, for an arbitrary fixed dimension, the question of the most incompatible pairs of quantum measurements, showing in particular optimality of Fourier-conjugated bases.

Photo Lorenz T. (Larry) Biegler Lorenz T. (Larry) Biegler (Carnegie Mellon University) [homepage]

Title. Nonlinear Optimization with Data-Driven and Surrogate Models


Optimization models for engineering design and operation, are frequently described by complex models of black-box simulations. The integration, solution, and optimization of this ensemble of large-scale models is often difficult and computationally expensive. As a result, model reduction in the form of simplified or data-driven surrogate models is widely applied in optimization studies. While the application to machine learning and AI approaches has lead to widespread optimization studies with surrogate models, less attention has been paid to validating these results on the optimality of high-fidelity, i.e., ‘truth’ models. This talk describes a surrogate-based optimization approach based on a trust-region filter (TRF) strategy. The TRF method substitutes surrogates for high-fidelity models, thus leading to simpler optimization subproblems with sampling information from truth models. Adaptation of the subproblems is guided by a trust region method, which is globally convergent to the local optimum of the original high-fidelity problem. The approach is suitable for broad choices of surrogate models, ranging from neural networks to physics-based shortcut models. The TRF approach has been implemented on numerous optimization examples in process and energy systems, with complex high fidelity models. Three case studies will be presented for Real-Time Optimization (RTO) for oil refineries, chemical processes and dynamic adsorption models for CO2 capture, which demonstrate the effectiveness of this approach.

Photo Dr. Haoxiang Yang Dr. Haoxiang Yang (CUHK-Shenzhen) [homepage]

Title. Robust Optimization with Continuous Decision-Dependent Uncertainty


In this talk, we consider a robust optimization problem with continuous decision-dependent uncertainty (RO-CDDU), which has two new features: an uncertainty set linearly dependent on continuous decision variables and a convex piecewise-linear objective function. We prove that RO-CDDU is NP-hard in general and reformulate it into an equivalent mixed-integer nonlinear program (MINLP) with a decomposable structure to address the computational challenges. Such an MINLP model can be further transformed into a mixed-integer linear program (MILP) given the uncertainty set’s extreme points. We propose an alternating direction algorithm and a column generation algorithm for RO-CDDU. We model a robust demand response (DR) management problem in electricity markets as RO-CDDU, where electricity demand reduction from users is uncertain and depends on the DR planning decision. Extensive computational results demonstrate the promising performance of the proposed algorithms in both speed and solution quality. The results also shed light on how different magnitudes of decision-dependent uncertainty affect the demand response decision.

Photo Dr Vu Nguyen Dr Vu Nguyen (Amazon Research Australia) [homepage]
Coordinates: @ ZIB Lecture Hall (Room 2005)

Title. Bayesian Optimization with Categorical and Continuous Variables


Bayesian optimization (BO) has demonstrated impressive success in optimizing black-box functions. However, there are still challenges in dealing with black-boxes that include both continuous and categorical inputs. I am going to present our recent works in optimizing the mixed space of categorical and continuous variables using Bayesian optimization [B. Ru, A. Alvi, V. Nguyen, M. Osborne, and S. Roberts. “Bayesian optimisation over multiple continuous and categorical inputs.” ICML 2020] and how to scale it up to higher dimensions [X. Wan, V. Nguyen, H. Ha, B. Ru, C. Lu, and M. Osborne. “Think Global and Act Local: Bayesian Optimisation over High-Dimensional Categorical and Mixed Search Spaces.” ICML 2021] and population-based AutoRL setting [J. Parker-Holder, V. Nguyen, S. Desai, and S. Roberts. “Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL”. NeurIPS 2021].

Photo Jonathan Eckstein Jonathan Eckstein (Rutgers Business School) [homepage]
Coordinates: @ ZIB Conference Room (Room 3028)

Title. Solving Stochastic Programming Problems by Operator Splitting


This talk describes the solution of convex optimization problems that include uncertainty modeled by a finite but potentially very large multi-stage scenario tree.

In 1991, Rockafellar and Wets proposed the progressive hedging (PH) algorithm to solve such problems. This method has some advantages over other standard methods such as Benders decomposition, especially for problems with large numbers of decision stages. The talk will open by showing that PH is an application of the Alternating Direction Method of Multipliers (ADMM). The equivalence of PH to the ADMM has long been known but not explicitly published.

The ADMM is an example of an “operator splitting” method, and in particular of a principle called “Douglas–Rachford splitting”. I will briefly explain what is meant by an “operator splitting method”.

Next, the talk will apply a different, more recent operator splitting method called “projective splitting” to the same problem. The resulting method is called “asynchronous projective hedging” (APH). Unlike most decomposition methods, it does not need to solve every subproblem at every iteration; instead, each iteration may solve just a single subproblem or a small subset of the available subproblems.

Finally, the talk will describe work integrating the APH algorithm into mpi-sppy, a Python package for modeling and distributed parallel solution of stochastic programming problems. mpi-sppy uses the Pyomo Python-based optimization modeling sytem. Our experience includes using up to 2,400 processor cores to solve 2-stage and 4-stage test problem instances with as many as 1,000,000 scenarios.

Portions of the work described in this talk are joint with Patrick Combettes (North Carolina State University), Jean-Paul Watson (Lawrence Livermore National Laboratory, USA), and David Woodruff (University of California, Davis).

Photo David Steurer David Steurer (ETH Zürich) [homepage]

Title. Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures


We consider mixtures of k≥2 Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated, i.e., distinct components have statistical overlap at most k-C for a large enough constant C≥1.

Previous statistical-query lower bounds [Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart, Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures (extended abstract), 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017, pp. 73–84] give formal evidence that, even for the special case of colinear means, distinguishing such mixtures from (pure) Gaussians may be exponentially hard (in k).

We show that, surprisingly, this kind of hardness can only appear if mixing weights are allowed to be exponentially small. For polynomially lower bounded mixing weights, we show how to achieve non-trivial statistical guarantees in quasi-polynomial time.

Concretely, we develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight. The algorithm can reliably distinguish between a mixture of k≥2 well-separated Gaussian components and a (pure) Gaussian distribution. As a certificate, the algorithm computes a bipartition of the input sample that separates some pairs of mixture components, i.e., both sides of the bipartition contain most of the sample points of at least one component.

For the special case of colinear means, our algorithm outputs a k-clustering of the input sample that is approximately consistent with all components of the underlying mixture. We obtain similar clustering guarantees also for the case that the overlap between any two mixture components is lower bounded quasi-polynomially in k (in addition to being upper bounded polynomially in k).

A significant challenge for our results is that they appear to be inherently sensitive to small fractions of adversarial outliers unlike most previous algorithmic results for Gaussian mixtures. The reason is that such outliers can simulate exponentially small mixing weights even for mixtures with polynomially lower bounded mixing weights.

A key technical ingredient of our algorithms is a characterization of separating directions for well-separated Gaussian components in terms of ratios of polynomials that correspond to moments of two carefully chosen orders logarithmic in the minimum mixing weight.

Erling Andersen (Mosek)

Title. The value of conic optimization for analytics practitioners (Tutorial Lecture)


Linear optimization, also known as linear programming, is a modelling framework widely used by analytics practitioners. The reason is that many optimization problems can easily be described in this framework. Moreover, huge linear optimization problems can be solved using readily available software and computers. However, a linear model is not always a good way to describe an optimization problem since the problem may contain nonlinearities. Nevertheless such nonlinearities are often ignored or linearized because a nonlinear model is considered cumbersome. Also there are issues with local versus global optima and in general it is just much harder to work with nonlinear functions than linear functions.

Over the last 15 years a new paradigm for formulating certain nonlinear optimization problems called conic optimization has appeared. The advantage of conic optimization is that it allows the formulation of a wide variety of nonlinearities while almost keeping the simplicity and efficiency of linear optimization.

Therefore, in this presentation we will discuss what conic optimization is and why it is relevant to analytics practitioners. In particular we will discuss what can be formulated using conic optimization, illustrated by examples. We will also provide some computational results documenting that large conic optimization problems can be solved efficiently in practice. To summarize, this presentation should be interesting for everyone interested in an important recent development in nonlinear optimization.

Photo Zev Woodstock Zev Woodstock (TU/ZIB)

Title. Proximity operators and nonsmooth optimization (Tutorial Lecture)


Proximity operators are tools which use first-order information to solve optimization problems. However, unlike gradient-based methods, algorithms involving proximity operators are guaranteed to work in nonsmooth settings. This expository talk will discuss the mathematical and numerical properties of proximity operators, how to compute them, algorithms involving them, and advice on implementation.

Talks by visitors preceding the seminar

Photo Gonzalo Muñoz Gonzalo Muñoz (Universidad de O’Higgins, Chile) [homepage]


Title. A Sample Size-Efficient Polyhedral Encoding and Algorithm for Deep Neural Network Training Problems


Deep Learning has received much attention lately, however, results regarding the computational complexity of training deep neural networks have only recently been obtained. Moreover, all current training algorithms for DNNs with optimality guarantees possess a poor dependency on the sample size, which is typically the largest parameter. In this work, we show that the training problems of large classes of deep neural networks with various architectures admit a polyhedral representation whose size is linear in the sample size. This provides the first theoretical training algorithm with provable worst-case optimality guarantees whose running time is polynomial in the size of the sample.

Photo Ariel Liebmann Ariel Liebmann (Monash University, Australia) [homepage]


Title. Optimisation, Machine Learning and AI for Rapid Grid Decarbonisation


The national and transcontinental electricity grids of today are based on devices such as coal furnaces, steam turbines, copper and steel wires, electric transformers, and electromechanical power switches that have remained unchanged for 100 years. However imperceptibly, the components and operational management of this great machine, the grid, has began to change irreversibly. This is fortunate, as climate science tells us we must reduce CO2 emissions from the energy sector to zero by 2050 and to 50% of current levels by 2030 if we are to prevent dangerous climate changes in future world that is over 1.5 degree hotter that today. Now utility scale wind and solar PV farms as large as coal, gas and nuclear generators are being deployed more cheaply than it is possible to build and operate generators using older technologies. In some cases, even these new technologies can be cheaper that even merely the operating costs of older technologies. In addition, low cost rooftop solar PV has also enabled consumers to become self-suppliers and also contributors to the supply of energy for their neighbours. Moreover, the “dumb” grid of the past, is becoming “smarter”. This is enabled through a combination of ubiquitous low-cost telecommunication and programmable devices at the edge of the grid such as smart meters, smart PV inverters, smart air conditioners and home energy management systems. The final component is the electrification of the private transport system that will finally eliminate the need for fossil fuels. The implications of this are that it is now necessary to rapidly replan and reinvest in the energy system at rates and in ways that are unprecedented in industrial civilisations history. While the majority of hardware technology already exist, the missing piece of the puzzle are new computers science technologies, and particularly Optimisation, Machine Learning, Forecasting and Data analytics methods needed to plan and operate this rapidly transforming system.

In this talk I will describe a range of ways existing computer science tools in the Optimisation, AI, ML and other areas we and others are enhancing in order to better operate and plan the existing power system. I will focus on identifying emerging research opportunities in areas that are needed to complete the transformation to a cheaper, smarter and zero carbon energy system.

Photo Masashi Sugiyama Masashi Sugiyama (RIKEN-AIP/University of Tokyo, Japan) [homepage]


Title. Introduction of RIKEN Center for Advanced Intelligence Project


RIKEN is one of Japan’s largest fundamental-research institutions. The RIKEN Center for Advanced Intelligence Project (AIP) was created in 2016 to propose and investigate new machine learning methodologies and algorithms, and apply them to societal problems. AIP covers a wide range of topics from generic AI research (machine learning, optimization, applied math., etc.), goal-oriented AI research (material, disaster, cancer, etc.), and AI-in-society research (ethics, data circulation, laws, etc.). In the first half of my talk, I will briefly introduce our center’s activities and collaboration strategies.

Then, in the latter half, I will talk about the research activities in my team, i.e., machine learning from imperfect information. Machine learning has been successfully used to solve various real-world problems. However, we are still facing many technical challenges, for example, we want to train machine learning systems without big labeled data and we want to reliably deploy machine learning systems under noisy environments. I will overview our recent advances in tackling these problems.