We study a dynamic traffic assignment model, where agents base their instantaneous routing decisions on real-time delay predictions. We formulate a mathematically concise model and derive properties of the predictors that ensure a dynamic prediction equilibrium exists. We demonstrate the versatility of our framework by showing that it subsumes the well-known full information and instantaneous information models, in addition to admitting further realistic predictors as special cases. We complement our theoretical analysis by an experimental study, in which we systematically compare the induced average travel times of different predictors, including a machine-learning model trained on data gained from previously computed equilibrium flows, both on a synthetic and a real road network.

Methods and applications of Machine Learning have made impressive
progress in recent years. One approach is neural network, a
brain-inspired learning model that can be trained to perform
complex and diverse tasks. In order to deliver good results, the
training of such a network is crucial.

Given a neural network, training data, and a threshold, finding
weights for the neural network such that the total error is below
the threshold is known to be NP-hard. In this talk, we determine
the algorithmic complexity of this fundamental problem precisely,
by showing that it is complete for the complexity class
existential theory of the reals (ER).

This means that, up to polynomial time reductions, the problem is
equivalent to deciding whether a system of polynomial equations
and inequalities with integer coefficients and real unknowns has
a solution. If, as widely expected, ER is strictly larger than
NP, our work implies that the problem of training neural networks
is not even in NP.

Neural networks are usually trained using some variation of
backpropagation. The result of this paper offers an explanation
why techniques commonly used to solve big instances of
NP-complete problems seem not to be of use for this
task. Examples of such techniques are SAT solvers, IP solvers,
local search, dynamic programming, to name a few general ones.

The talk is based on joint work with
Mikkel Abrahamsen and Tillman Miltzow.

The mathematical approaches for modeling dynamic traffic can
roughly be divided into two categories: discrete packet routing
models and continuous flow over time models. Despite very vital
research activities on models in both categories, the connection
between these approaches was poorly understood so far. We build
this connection by specifying a (competitive) packet routing
model, which is discrete in terms of flow and time, and by
proving its convergence to flows over time with deterministic
queuing. More precisely, we prove that the limit of the
convergence process, when decreasing the packet size and time
step length in the packet routing model, constitutes a flow over
time with multiple commodities. The convergence result implies
the existence of approximate equilibria in the competitive
version of the packet routing model. This is of significant
interest as exact pure Nash equilibria, similar to almost all
other competitive models, cannot be guaranteed in the
multi-commodity setting.

Moreover, the introduced packet routing model with deterministic
queuing is very application-oriented as it is based on the
network loading module of the agent-based transport simulation
MATSim. As the present work is the first mathematical
formalization of this simulation, it provides a theoretical
foundation and an environment for provable mathematical
statements for MATSim.

This is joint work with Leon Sering and Theresa Ziemke.

The lecture offers a concise overview of a master’s thesis on the penalised linear regression problem with squared lasso penalty. We derive its dual and optimality conditions and connect it to the well-known lasso problem. Based on the optimality conditions we study the solution path and briefly discuss an exact and an approximate homotopy algorithm.

An elimination tree for a connected graph G is a rooted tree on the vertices of G obtained by choosing a root x and recursing on the connected components of G−x to produce the subtrees of x. Elimination trees appear in many guises in computer science and discrete mathematics, and they are closely related to centered colorings and tree-depth. They also encode many interesting combinatorial objects, such as bitstrings, permutations and binary trees. We apply the recent Hartung-Hoang-Mütze-Williams combinatorial generation framework to elimination trees, and prove that all elimination trees for a chordal graph G can be generated by tree rotations using a simple greedy algorithm. This yields a short proof for the existence of Hamilton paths on graph associahedra of chordal graphs. Graph associahedra are a general class of high-dimensional polytopes introduced by Carr, Devadoss, and Postnikov, whose vertices correspond to elimination trees and whose edges correspond to tree rotations. As special cases of our results, we recover several classical Gray codes for bitstrings, permutations and binary trees, and we obtain a new Gray code for partial permutations. Our algorithm for generating all elimination trees for a chordal graph G can be implemented in time O(m+n) per generated elimination tree, where m and n are the number of edges and vertices of G, respectively. If G is a tree, we improve this to a loopless algorithm running in time O(1) per generated elimination tree. We also prove that our algorithm produces a Hamilton cycle on the graph associahedron of G, rather than just Hamilton path, if the graph G is chordal and 2-connected. Moreover, our algorithm characterizes chordality, i.e., it computes a Hamilton path on the graph associahedron of G if and only if G is chordal. This is joint work with Jean Cardinal (ULB) and Torsten Mütze (Warwick).

An embedding of a graph in a book, called book embedding,
consists of a linear ordering of its vertices along the spine of
the book and an assignment of its edges to the pages of the book,
so that no two edges on the same page cross. The book thickness
of a graph is the minimum number of pages over all its book
embeddings. For planar graphs, a fundamental result is due to
Yannakakis, who proposed an algorithm to compute embeddings of
planar graphs in books with four pages. Our main contribution is
a technique that generalizes this result to a much wider family
of nonplanar graphs, which is characterized by a biconnected
skeleton of crossing-free edges whose faces have bounded degree.
Notably, this family includes all 1-planar and all optimal
2-planar graphs as subgraphs. We prove that this family of graphs
has bounded book thickness, and as a corollary, we obtain the
first constant upper bound for the book thickness of optimal
2-planar graphs.

This is joint work with Michael A. Bekos, Giordano Da Lozzo, Martin
Gronemann, Fabrizio Montecchiani, and Chrysanthi Raftopoulou.

We consider the problem of maximizing a fractionally subadditive function under a knapsack constraint that grows over time. An incremental solution to this problem is given by an order in which to include the elements of the ground set, and the competitive ratio of an incremental solution is defined by the worst ratio over all capacities relative to an optimum solution of the corresponding capacity. We present an algorithm that finds an incremental solution of competitive ratio at most max{3.293*sqrt{M},2M}, under the assumption that the values of singleton sets are in the range [1,M], and we give a lower bound of max{2.449,M} on the attainable competitive ratio. In addition, we establish that our framework captures potential-based flows between two vertices, and we give a tight bound of 2 for the incremental maximization of classical flows with unit capacities.

We consider a selection problem with stochastic probing. There
is a set of items whose values are drawn from independent
distributions. The distributions are known in advance. Each item
can be tested repeatedly. Each test reduces the uncertainty
about the realization of its value. We study a testing model,
where the first test reveals if the realized value is smaller or
larger than the median of the underlying distribution.
Subsequent tests allow to further narrow down the interval in
which the realization is located. There is a limited number of
possible tests, and our goal is to design near-optimal testing
strategies that allow to maximize the expected value of the
chosen item. We study both identical and non-identical
distributions and develop polynomial-time algorithms with
constant approximation factors in both scenarios.

This is joint work with Martin Hoefer and Daniel Schmand.

Deciding how to allocate the seats of a house of representatives
is one of the most fundamental problems in the political
organization of societies, and has been widely studied over
already two centuries. The idea of proportionality is at the core
of most approaches to tackle this problem, and this notion is
captured by the divisor methods, such as the Jefferson/D’Hondt
method. In a seminal work, Balinski and Demange extended the
single-dimensional idea of divisor methods to the setting in
which the seat allocation is simultaneously determined by two
dimensions, and proposed the so-called biproportional
apportionment method. The method, currently used in several
electoral systems, is however limited to two dimensions and the
question of extending it is considered to be an important problem
both theoretically and in practice.

In this work we initiate the study of multidimensional
proportional apportionment. We first formalize a notion of
multidimensional proportionality that naturally extends that of
Balinski and Demange. By means of analyzing an appropriate
integer linear program we are able to prove that, in contrast to
the two-dimensional case, the existence of multidimensional
proportional apportionments is not guaranteed and deciding its
existence is NP-complete. Interestingly, our main result asserts
that it is possible to find approximate multidimensional
proportional apportionments that deviate from the marginals by a
small amount. The proof arises through the lens of discrepancy
theory, mainly inspired by the celebrated Beck-Fiala Theorem. We
finally evaluate our approach by using the data from the recent
2021 Chilean Constitutional Convention election.

This is joint work together with José Correa (UChile) and Victor
Verdugo (UOH).

Moritz Buchem
(Maastricht University)
Coordinates:

We formalize the concept of additive approximation schemes and apply it to load balancing problems on identical machines. Additive approximation schemes compute a solution with an absolute error in the objective of at most εh for some suitable parameter h and any given ε > 0. We consider the problem of assigning jobs to identical machines with respect to common load balancing objectives like makespan minimization, the Santa Claus problem (on identical machines), and the envy-minimizing Santa Claus problem. For these settings we present additive approximation schemes for h = p_{max}, the maximum processing time of the jobs. Our technical contribution is two-fold. First, we introduce a new relaxation based on integrally assigning slots to machines and fractionally assigning jobs to the slots. We refer to this relaxation as the slot-MILP. While it has a linear number of integral variables, we identify structural properties of (near-)optimal solutions, which allow us to compute those in polynomial time. The second technical contribution is a local-search algorithm which rounds any given solution to the slot-MILP, introducing an additive error on the machine loads of at most εpmax. This is joint work together with Lars Rohwedder (EPFL), Tjark Vredeveld (UM) and Andreas Wiese (UChile).

Propagation of linear constraints has become a crucial
sub-routine in modern Mixed-Integer Programming (MIP) solvers. In
practice, iterative algorithms with tolerance-based stopping
criteria are used to avoid problems with slow or infinite
convergence. However, these heuristic stopping criteria can pose
difficulties for fairly comparing the efficiency of different
implementations of iterative propagation algorithms in a
real-world setting. Most significantly, the presence of unbounded
variable domains in the problem formulation makes it difficult to
quantify the relative size of reductions performed on them. In
this work, we develop a method to measure—independently of the
algorithmic design—the progress that a given iterative
propagation procedure has made at a given point in time during
its execution. Our measure makes it possible to study and better
compare the behavior of bounds propagation algorithms for linear
constraints. We apply the new measure to answer two questions of
practical relevance:

We investigate to what extent
heuristic stopping criteria can lead to premature termination on
real-world MIP instances.

We compare a GPU-parallel propagation algorithm against a
sequential state-of-the-art implementation and show that the
parallel version is even more competitive in a real-world setting
than originally reported.

The long-standing polynomial Hirsch conjecture asks if the
combinatorial diameter of any polyhedron can be bounded by a
polynomial of the dimension and number of facets. Inspired by this
question, we study the combinatorial diameter of two classes of
random polyhedra.

We prove nearly-matching upper and lower bounds, assuming that the
number of facets is very large compared to the dimension.

The Quickest Transshipment Problem is to route flow as quickly as possible from sources with supplies to sinks with demands in a network with capacities and transit times on the arcs. It is of fundamental importance for numerous applications in areas such as logistics, production, traffic, evacuation, and finance. More than 25 years ago, Hoppe and Tardos presented the first (strongly) polynomial-time algorithm for this problem. Their approach, as well as subsequently derived algorithms with strongly polynomial running time, are hardly practical as they rely on parametric submodular function minimization via Megiddo‘s method of parametric search. In this talk we present a considerably faster algorithm for the Quickest Transshipment Problem that instead employs a subtle extension of the Discrete Newton Method. This improves the previously best known running time of Õ(m^{4}k^{14}}) to Õ(m^{2}k^{5}+m^{3}k^{3}+m^{3}n), where n is the number of nodes, m the number of arcs, and k the number of sources and sinks.

In this talk we will see how to solve a network flow problem by a coarsening technique, and speculate if this technique can be carried over to other optimization problems. Then we will witness the struggles of experimentally evaluating an algorithm with many parameters.

We consider the stochastic scheduling problem of minimizing the
expected makespan on m parallel identical machines. While the
(adaptive) list scheduling policy achieves an approximation ratio
of 2, any (non-adaptive) fixed assignment policy has performance
guarantee 𝛺(log m/loglog m). Although the performance of the
latter class of policies are worse, there are applications,
e.g. in surgery scheduling, in which rather non-adaptive policies
are desired.

We introduce two classes of policies, namely 𝛿-delay and 𝜏-shift
policies, whose degree of adaptivity can be controlled by a
parameter. Using the fixed assignment policy induced by LEPT as
a basis, we present a policy - belonging to both classes - for
which we show that it is an O(loglog m)-approximation for
reasonably bounded parameters. In other words, an improvement on
the performance of any fixed assignment policy can be achieved
when allowing a small degree of adaptivity. Moreover, we provide
a matching lower bound for any 𝛿-delay and 𝜏-shift policy when
both parameters, respectively, are in the order of the expected
makespan of an optimal non-anticipatory policy.

The question whether we can convert a bunch of weak learners into
a strong learner arose in the late 80s. It was confirmed by
Schapire and led to the most famous approach: AdaBoost. This
algorithm is well known for it’s good performance in practice and
there are a lot of variants. In this talk, we will look at the
application of linear and mixed-integer optimization and we will
derive corresponding LP and MIP formulations. In particular, we
will look at:

Properties of AdaBoost including possible drawbacks and
approaches to overcome them

Column generation as an efficient way to solve LP formulations

Day to day we frequently encounter different kinds of
combinatorial objects, e.g., permutations, binary strings, binary
trees, spanning trees of a graph, bases of a matroid. Usually,
there are three fundamental tasks for these objects: counting,
random sampling, and exhaustive generation. Even though for the
first two there are powerful well-understood general approaches
like generating functions or Markov chains; this has been lacking
for exhaustive generation.

In this talk, we will discuss greedy strategies for exhaustive
generation. This is a very natural attempt to provide a general
approach for exhaustive generation as greedy strategies are a
widely used algorithmic paradigm. In particular, we will discuss:

Various classical examples of Greedy based approaches in
exhaustive generation.

An introduction to the zig-zag framework by Hartung, Hoang,
Mütze, and Williams, which provides a common lense
for many greedy strategies.

Some greedy strategies that appear to not fit into the zig-zag
framework, including a new greedy approach for
generating spanning trees of any given connected graph.

In 1932, Borsuk proved that every 2-dimensional convex body can be
divided in three parts with strictly smaller diameter than the
original. He also asked if the same would hold for any dimension
d: is it true that every d-dimensional convex body can be divided
in d+1 pieces of strictly smaller diameter?

The answer to this question is positive in 3 dimensions (Perkal
1947), but in 1993, Kahn and Kalai constructed polytopal
counterexamples in dimension 1325. Nowadays, we know that the
answer is “no” for all dimensions d≥64 (Bondarenko and Jenrich
2013). It remains open for every dimension between 4 and 63.

In this talk we introduce the problem, and we show the proofs for
dimensions 2 and 3 plus computational ideas on how to extend these
solutions to dimension 4.

I will talk about how to translate two open problems in the
context of ReLU neural network (NN) expressivity into the polytope
world:

Can constant-depth NNs compute any continuous piecewise linear
(CPWL) function? Smallest open special case: Can 3-layer NNs
compute the maximum of 5 numbers?

Can polynomial-size NNs precisely compute the objective value
of the Minimum Spanning Tree problem (or other combinatorial
optimization problems) for arbitrary real-valued edge weights?

To achieve these translations, I will introduce an isomorphism
between the two semigroups (convex CPWL functions, pointwise
addition) and (polytopes, Minkowski sum), known as “support
functions” (in convex geometry) and “Newton polytopes” (in
tropical geometry). Extending this concept to arbitrary (not
necessarily convex) CPWL functions leads to the (in my opinion
beautiful) construction of “virtual” polytopes, that is, the group
of formal Minkowski differences of two polytopes.

The Moore–Hodgson Algorithm minimizes the number of late jobs on a single machine. That is, it finds an optimal schedule for the classical problem where jobs with given processing times and due dates have to be scheduled on a single machine so as to minimize the number of jobs completing after their due date. Several proofs of the correctness of this algorithm have been published. We present a new short proof.

We consider the problem of maximizing the expected revenue from selling k identical items to n unit-demand buyers who arrive sequentially with independent and identically distributed valuations. We show bounds on the gap between the revenue achievable with dynamic prices and that achievable with a static price. These bounds are tight for all values of k and n and vary depending on a regularity property of the underlying distribution. The bound for regular distributions increases in n and tends to 1/(1-k^{k}/(e^{k} k!)). Our upper bounds are obtained via a classic relaxation of the pricing problem known as the ex-ante relaxation, and a novel use of Bernstein polynomials to approximate the revenue from static pricing. This technique allows for a uniform and lossless analysis across all parameter values, both in the finite regime and in the limit. (Joint work with Paul Dütting and Felix Fischer)

I present an extension of the Python-embedded optimization modeling language PICOS that makes key results from the fields of robust and distributionally robust optimization available to application developers. While the Python ecosystem already offers multiple frameworks concerned with mathematical optimization, library support for optimization under uncertainty has been sparse in comparison. My talk opens with a brief introduction to the modeling interface provided by PICOS, then leads into a discussion of the recent additions that allow uncertainty in the data to be modelled explicitly and accounted for, assuming worst-case outcomes, during the solution process. In particular, I present an illustrative application in linear signal estimation where distributionally robust optimization can be seen to perform quite well.

From the global strong convexity assumptions to local Łojasiewicz
gradient inequalities, functional structures are ubiquitous in
Machine Learning. They successfully describe convergence
properties of optimization methods, regret bounds of online
methods, or generalization bounds. However, much less attention
has been dedicated to similar structures in the optimization
constraint sets or online learner’s action sets.

Here, we first motivate such an issue from the perspective of
projection-free methods (e.g., Conditional Gradients in offline
optimization). We then describe how global and local set uniform
convexity, i.e., a natural quantitative notion for upper
curvature, can help obtain faster convergence rates or regret
bounds for online or bandit algorithms. By doing so, we explore
some useful characterization of these sets leveraging tools
developed to study Banach Spaces. We finally provide some
connection between uniform convexity and upper-bound on some
Rademacher constants.

Since its invention in 2006, differential privacy (DP) has become the de-facto standard for releasing aggregate information about data in a privacy-preserving way. It has not only found its way into companies such as Google, Microsoft and Apple, but is also used by government agencies such as the U.S. Census Bureau. In this talk, I will give an introduction into the concept of DP. I will further describe the most important building blocks that can be used to construct differentially private machine learning models that are safe against, e.g., training data extraction attacks.

In pandemic times, prohibiting close proximity between people is crucial to control the spread of the virus. This is particularly interesting in test and vaccination centers where each person should keep sufficient distance to other people during their transits to and from their designated rooms to prevent possible infections. To model this as a machine scheduling problem, each job has a transit time before and after its processing time. The task is to find a conflict-free schedule of minimum makespan in which the transit times of no two jobs on machines in close proximity (captured by an undirected graph) intersect. In this talk, we discuss hardness as well as approximation results for the case of identical jobs.

In this short talk, we will discuss some ongoing work on the Min
Weighed Sum Bin Packing problem. The goal is to assign n items
(with weight w_{j} and size s_{j}) to bins,
without exceeding the bin capacities, when assigning item j to
the i’th bin incurs a cost of i * w_{j}. This can also be
understood as a (batch)-scheduling problem, in which all jobs
have unit processing time, the machine can process batches of
jobs satisfying a knapsack constraint, and the goal is to
minimize the weighted sum of completion time.

A PTAS is known for this problem, but its astronomic running time
makes it useless in the practice. Apart from that, the First-Fit
algorithm that considers items in non-increasing order of
w_{j}/s_{j} is known to have a tight performance
guarantee of 2, and there is a greedy algorithm with a
performance guarantee of 4 for a slight generalization of this
problem, in which the feasible batches are given explicitly by a
family of subsets of items. We will show that in the considered
case (batches defined implicitly by a knapsack constraint), the
performance guarantee of the greedy algorithm can be improved to
2.

Differentiable optimization aims at bringing constrained convex optimization to differentiable programming, allowing the computation of solution derivatives with respect to arbitrary input parameters. After an overview of some of the advances in that direction, we will see how to integrate differentiable optimization as a layer in differentiable models and its interplay with automatic differentiation, and conclude with some comparisons with other approaches.

We consider the problem of scheduling jobs on parallel machines so as to minimize the sum of weighted job completion times. We assume that each job must be completely processed on one machine and that the time required may depend arbitrarily on the job and the machine. Furthermore, the jobs arrive online over time. In the first part of the talk, we consider preemptive scheduling of deterministic jobs. I will use dual fitting to prove that a simple greedy algorithm for this problem is 4-competitive. The second part is about non-preemptive scheduling of stochastic jobs. Here a similar greedy algorithm for the assignment of jobs to machines, combined with known policies for scheduling the jobs assigned to each machine (Schulz, 2008), leads to stochastic online policies with performance guarantees depending on an upper bound 𝛥 for the squared coefficients of variation of the random variables. These improve upon the previously best known guarantees (Gupta, Moseley, Uetz, & Xie) when 𝛥 is large enough (for the simplest variant considered when 𝛥 ≥ 0.815). This includes the important case of exponentially distributed processing times.

Minimum cost flows have numerous applications, e.g. electrical or
gas flows or equilibrium flows in traffic networks. In the
classical minimum cost flow problem we are given some
flow-dependent cost functions on the edges and demand values on
the vertices. In this setting we want to find a flow with minimal
costs satisfying the demands. However, in practice, demand and
supply values are not constant but may vary or be uncertain.
Therefore, we want to analyze the minimum cost flows as a function
of the demand values. More specifically, we consider a setting
where the demands in the network are given as a function depending
on a one-dimensional parameter 𝜆.

We analyze these functions theoretically and use the insights to
develop an output-polynomial algorithm that computes the minimum
cost flow function explicitly for piecewise-quadratic edge costs
and demands that are piecewise-linear functions of 𝜆. Further, we
can generalize this approach to compute approximate solutions for
arbitrary convex cost functions. The algorithm is tested on
real-world gas and traffic networks.

This is joint work with Max Klimm and Per Joachims.

We review classical and modern results in approximation theory of neural networks. First, the density of neural networks within different function spaces under various assumptions on the activation function is considered. Next, lower and upper bounds on the order of approximation with neural networks are given based on the input dimension, the number of neurons and a parameter quantifying the smoothness of the target function. Lastly, a family of compositional target functions for which the curse of dimensionality can be overcome using deep neural networks is examined.

Robust Optimization is a popular paradigm for optimisation under
uncertainty. It assumes that the uncertainty is specified by a set
and aims to optimise the worst case objective over all elements in
the set.

However in many problems the uncertainty changes over time as more
information comes in. For this project our goal is to develop an
adaptive learning algorithm that can simultaneously learn the
uncertainty set and solve the optimisation problem. We achieve
this by statistical updates and leveraging online learning
algorithms.

This is joint work with Kevin-Martin Aigner, Kristin Braun, Frauke
Liers and Sebastian Pokutta.

Since the beginning of Topology, one of the most used approach to study an object was to triangulate it. Many invariants have been introduced in the years, the two most important surely being homotopy and homology. However, the direct computation of these invariants is infeasible for triangulations with a large number of faces. This is why methods to reduce the number of faces of a complex without changing its homology and homotopy type are particularly important. In this talk, we will focus on various ways to see if a complex is contractible, starting from collapsibility and then trying to generalize the concept.

Generic rectangulations are tilings of the unit square by
rectangles such that no 4 rectangles share a corner. Generic
rectangulations are a fundamental object in computer science: not
only appearing in practice (e.g. in VLSI circuit design) and
theory (e.g. being closely related to binary search trees); but
also within many other fundamental combinatorial objects such as
permutations. In this work, we present a versatile algorithmic
framework for exhaustively generating a large variety of different
classes of generic rectangulations. Our algorithms work under very
mild assumptions and apply to a considerable number of interesting
rectangulation classes known from the literature, such as generic
rectangulations, diagonal rectangulations, S-equivalent
rectangulations, 1-sided/area-universal rectangulations, and their
guillotine variants. The resulting generation algorithms are
efficient, in most interesting cases even loopless or constant
amortized time, i.e., each new rectangulation is generated in
constant time in the worst case or on average, respectively.

In this talk we will show:

A simple introduction to rectangulations and some of its
interesting classes,

Our algorithmic generation framework
for rectangulation classes and

Correctness proof (sketch) of the framework by relating
rectangulations to permutations.

Fast domain propagation of linear constraints has become a crucial component of today’s best algorithms and solvers for mixed integer programming and pseudo-boolean optimization to achieve peak solving performance. Irregularities in the form of dynamic algorithmic behaviour, dependency structures, and sparsity patterns in the input data make efficient implementations of domain propagation on GPUs and, more generally, on parallel architectures challenging. This is one of the main reasons why domain propagation in state-of-the-art solvers is single thread only. In this paper, we present a new algorithm for domain propagation which (a) avoids these problems and allows for an efficient implementation on GPUs, and is (b) capable of running propagation rounds entirely on the GPU, without any need for synchronization or communication with the CPU. We present extensive computational results which demonstrate the effectiveness of our approach and show that ample speedups are possible on practically relevant problems: on state-of-the- art GPUs, our geometric mean speed-up for reasonably-large instances is around 10x to 20x and can be as high as 195x on favorably-large instances.

Conditional gradients (CG) constitute a class of projection-free
first-order algorithms for smooth convex optimization. As such,
they are frequently used in solving smooth convex optimization
problems over polytopes, for which the computational cost of
projections is prohibitive. We will discuss two topics in this
talk. First of all, CG algorithms do not enjoy the optimal
convergence rates achieved by projection-based first-order
accelerated methods; achieving such globally-accelerated rates is
information-theoretically impossible. To address this issue, we
present Locally Accelerated Conditional Gradients – an
algorithmic framework that couples accelerated steps with
conditional gradient steps to achieve local acceleration on
smooth strongly convex problems. Our approach does not require
projections onto the feasible set, but only on (typically
low-dimensional) simplices, thus keeping the computational cost
of projections at bay. Further, it achieves optimal accelerated
local convergence.

The second topic in this talk deals with the setting where first
and second-order information about the function is costly to
compute, and so we would like to make as much primal progress as
possible per oracle call. Constrained second-order convex
optimization algorithms are the method of choice when a high
accuracy solution to a problem is needed, due to their local
quadratic convergence. These algorithms require the solution of a
constrained quadratic subproblem at every iteration. In this
setting, we present the Second-Order Conditional Gradient Sliding
(SOCGS) algorithm, which uses a projection-free algorithm to
solve the constrained quadratic subproblems inexactly and uses
inexact Hessian oracles (subject to an accuracy
requirement). When the feasible region is a polytope, and a lower
bound on the primal gap is known, the algorithm converges
quadratically in primal gap after a finite number of linearly
convergent iterations. Once in the quadratic regime the SOCGS
algorithm requires O(log log (1/ε))
first-order and inexact Hessian oracle calls and
O(log (1 / ε) ⋅ log log (1/ε))
linear minimization oracle calls to achieve an
ε-optimal solution.

We consider a general class of binary packing problems with multiple convex quadratic knapsack constraints. We present three constant-factor approximation algorithms based upon three different algorithmic techniques: (1) a rounding technique tailored to a convex relaxation in conjunction with a non-convex relaxation; (2) a greedy strategy; and (3) a randomized rounding method. The practical performance is tested by a computational study of the empirical approximation of these algorithms for problem instances arising in the context of real-world gas transport networks.

The learning of relations, dynamics, and governing equations from (sampled) data is a prevalent problem in various scientific fields. In the absence of noise, this task can often be solved using the Buchberger-Möller Algorithm, see Möller and Buchberger (1982). In general, however, when we do not have direct access to aforementioned equations, we have access to them indirectly only via noisy sampling. Known methods which extract relations from noisy data are the Approximate Vanishing Ideal Algorithm (AVI) and Vanishing Component Analysis (VCA), see Heldt et al. (2009) and Livni et al. (2013), respectively. These algorithms are often computationally inefficient and only allow for restricted recovery of relations from data. We present a new AVI algorithm and two variants utilizing Conditional Gradient Descent or the Frank-Wolfe Algorithm, see, e.g., Frank and Wolfe (1956) or Levitin and Polyak (1966). Our method is more general, easily regularizable, and can also extract non-homogeneous relations. We then present the efficacy and versatility of our approach for three learning problems: reconstruction of governing equations from data, classification, and regression.

This talk is a basic overview over my PhD projects. I focus on a project on making neural networks interpretable. If a neural network outputs a classification result for a high-dimensional input, we would like to know whether a small set of the input parameters was already decisive for the classification. This problem is formalised by introducing a probabilistic variant of prime implicants. We show that this problem is hard for the class NP^{PP} and remains NP-hard for any non-trivial approximation. We present a heuristic solution strategy for real-world applications that is build on Assumed Density Filtering: A technique to propagate multivariate normal distributions through neural networks. To justify the use of this heuristic we show that in fact no sensible family of distributions can be invariant under ReLu-neural network layers.

In Computer Vision, algorithms are now capable of “intelligent”
automatic processing of the images. And these possibilities have
been recently recycled in many creative processes by
A.I. artists, which outputs sometimes reached hammer prizes in
art auctions. This emergence of A.I. art looks like an
artification, i.e., a societal process by which a
practice becomes commonly acknowledged as an art form. Art
history abounds with such examples, e.g., the transitioning from
craftmanship to solo painters in the Renaissance.

Hence, rather than assessing the artist’s narratives, this point
of view allows understanding A.I. art as a figurehead of the
broader ecosystem of the creative industry. The latter indeed
highly benefited from recent advances in Computer Vision.

We discuss here the possible intertwinements between A.I. art,
the use of Computer Vision in the next generation of creative
software, the creative industry, and broader societal
consequences.

This is ongoing work with Louis Thiry and Léa Saint-Raymond at
Ecole Normale Supérieure in Paris.

During the COGA Retreat 2019 we started research on the Symmetric
Rendezvous-on-the-Line-Problem with Unknown Initial Distance: Two
robots are placed on the real line with unknown initial
distance. They do not know the location of the other robot and do
not necessarily share the same sense of “left” and “right” on the
line. They move with constant speed over the line and try to
achieve rendez-vous using the same (randomized) search strategy in
as little time as possible.

In a March 2020 talk (given in the COGA Research Seminar) we
presented a family of search strategies that performed well in
simulations without being able to determine their exact
competitive ratio. During this talk we will present a novel family
of search strategies whose precise competitive ratio can be
computed using generating functions.

This is WIP by Guillaume, Khai Van, Martin, and Max.

Christoph Hertrich
(TU Berlin)
[homepage]
Coordinates:

This talk deals with an open problem concerning the expressivity of feedforward neural networks (NNs) with rectified linear unit (ReLU) activations. While it has been shown that every continuous, piecewise linear (CPWL) function can be exactly represented by such an NN with logarithmic depth (with respect to the input dimension), it is still unknown whether there exists any CPWL function that cannot be represented with three layers. Based on a yet unproven theoretical assumption, we show a computer-aided proof of non-existence of a 3-layer ReLU NN that computes the maximum of five scalar input numbers. We achieve this partial result by converting the existence question into a linear programming feasibility problem. This is joint work in progress with Amitabh Basu, Marco Di Summa, and Martin Skutella.

The Frank-Wolfe algorithm has become a popular constrained optimization algorithm for it is simple and projection-free, and it has been successfully applied to a variety of real-world problems. Its main drawback however lies in its convergence rate, which can be excessively slow due to naive descent directions. In this talk, we provide a brief review and propose two methods significantly improving its performance: one for minimizing a convex objective and one for minimizing a (non)convex finite-sum objective. The first method speeds up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a matching pursuit-style subroutine. Although the idea is reasonably natural, its advantage over the state-of-the-art Frank-Wolfe algorithms is quite considerable. The second method blends stochastic Frank-Wolfe algorithms with adaptive gradients, which set entry-wise step-sizes that automatically adjust to the geometry of the problem. Computational experiments on convex and nonconvex objectives demonstrate the advantage of our approach.

Talks by earlier visitors

Gonzalo Muñoz
(Universidad de O’Higgins, Chile)
[homepage]

Deep Learning has received much attention lately, however, results
regarding the computational complexity of training deep neural
networks have only recently been obtained. Moreover, all current
training algorithms for DNNs with optimality guarantees possess a
poor dependency on the sample size, which is typically the largest
parameter. In this work, we show that the training problems of
large classes of deep neural networks with various architectures
admit a polyhedral representation whose size is linear in the
sample size. This provides the first theoretical training
algorithm with provable worst-case optimality guarantees whose
running time is polynomial in the size of the sample.

The national and transcontinental electricity grids of today are
based on devices such as coal furnaces, steam turbines, copper and
steel wires, electric transformers, and electromechanical power
switches that have remained unchanged for 100 years. However
imperceptibly, the components and operational management of this
great machine, the grid, has began to change irreversibly. This is
fortunate, as climate science tells us we must reduce
CO_{2} emissions
from the energy sector to zero by 2050 and
to 50% of current levels by 2030 if we are to prevent dangerous
climate changes in future world that is over 1.5 degree hotter
that today. Now utility scale wind and solar PV farms as large as
coal, gas and nuclear generators are being deployed more cheaply
than it is possible to build and operate generators using older
technologies. In some cases, even these new technologies can be
cheaper that even merely the operating costs of older
technologies. In addition, low cost rooftop solar PV has also
enabled consumers to become self-suppliers and also contributors
to the supply of energy for their neighbours. Moreover, the “dumb”
grid of the past, is becoming “smarter”. This is enabled through a
combination of ubiquitous low-cost telecommunication and
programmable devices at the edge of the grid such as smart meters,
smart PV inverters, smart air conditioners and home energy
management systems. The final component is the electrification of
the private transport system that will finally eliminate the need
for fossil fuels. The implications of this are that it is now
necessary to rapidly replan and reinvest in the energy system at
rates and in ways that are unprecedented in industrial
civilisations history. While the majority of hardware technology
already exist, the missing piece of the puzzle are new computers
science technologies, and particularly Optimisation, Machine
Learning, Forecasting and Data analytics methods needed to plan
and operate this rapidly transforming system.

In this talk I
will describe a range of ways existing computer science tools in
the Optimisation, AI, ML and other areas we and others are
enhancing in order to better operate and plan the existing power
system. I will focus on identifying emerging research
opportunities in areas that are needed to complete the
transformation to a cheaper, smarter and zero carbon energy
system.

Masashi Sugiyama
(RIKEN-AIP/University of Tokyo, Japan)
[homepage]

RIKEN is one of Japan’s largest fundamental-research
institutions. The RIKEN Center for Advanced Intelligence Project
(AIP) was created in 2016 to propose and
investigate new machine learning methodologies and algorithms, and
apply them to societal problems. AIP covers a wide range of topics
from generic AI research (machine learning, optimization, applied
math., etc.), goal-oriented AI research (material, disaster,
cancer, etc.), and AI-in-society research (ethics, data
circulation, laws, etc.). In the first half of my talk, I will
briefly introduce our center’s activities and collaboration
strategies.

Then, in the latter half, I will talk about the
research activities in my team, i.e., machine learning from
imperfect information. Machine learning has been successfully used
to solve various real-world problems. However, we are still facing
many technical challenges, for example, we want to train machine
learning systems without big labeled data and we want to reliably
deploy machine learning systems under noisy environments. I will
overview our recent advances in tackling these problems.