Differentiable optimization aims at bringing constrained convex optimization to differentiable programming, allowing the computation of solution derivatives with respect to arbitrary input parameters. After an overview of some of the advances in that direction, we will see how to integrate differentiable optimization as a layer in differentiable models and its interplay with automatic differentiation, and conclude with some comparisons with other approaches.
We consider the problem of scheduling jobs on parallel machines so as to minimize the sum of weighted job completion times. We assume that each job must be completely processed on one machine and that the time required may depend arbitrarily on the job and the machine. Furthermore, the jobs arrive online over time. In the first part of the talk, we consider preemptive scheduling of deterministic jobs. I will use dual fitting to prove that a simple greedy algorithm for this problem is 4-competitive. The second part is about non-preemptive scheduling of stochastic jobs. Here a similar greedy algorithm for the assignment of jobs to machines, combined with known policies for scheduling the jobs assigned to each machine (Schulz, 2008), leads to stochastic online policies with performance guarantees depending on an upper bound 𝛥 for the squared coefficients of variation of the random variables. These improve upon the previously best known guarantees (Gupta, Moseley, Uetz, & Xie) when 𝛥 is large enough (for the simplest variant considered when 𝛥 ≥ 0.815). This includes the important case of exponentially distributed processing times.
Minimum cost flows have numerous applications, e.g. electrical or gas flows or equilibrium flows in traffic networks. In the classical minimum cost flow problem we are given some flow-dependent cost functions on the edges and demand values on the vertices. In this setting we want to find a flow with minimal costs satisfying the demands. However, in practice, demand and supply values are not constant but may vary or be uncertain. Therefore, we want to analyze the minimum cost flows as a function of the demand values. More specifically, we consider a setting where the demands in the network are given as a function depending on a one-dimensional parameter 𝜆.
We analyze these functions theoretically and use the insights to develop an output-polynomial algorithm that computes the minimum cost flow function explicitly for piecewise-quadratic edge costs and demands that are piecewise-linear functions of 𝜆. Further, we can generalize this approach to compute approximate solutions for arbitrary convex cost functions. The algorithm is tested on real-world gas and traffic networks.
This is joint work with Max Klimm and Per Joachims.
We review classical and modern results in approximation theory of neural networks. First, the density of neural networks within different function spaces under various assumptions on the activation function is considered. Next, lower and upper bounds on the order of approximation with neural networks are given based on the input dimension, the number of neurons and a parameter quantifying the smoothness of the target function. Lastly, a family of compositional target functions for which the curse of dimensionality can be overcome using deep neural networks is examined.
Robust Optimization is a popular paradigm for optimisation under uncertainty. It assumes that the uncertainty is specified by a set and aims to optimise the worst case objective over all elements in the set.
However in many problems the uncertainty changes over time as more information comes in. For this project our goal is to develop an adaptive learning algorithm that can simultaneously learn the uncertainty set and solve the optimisation problem. We achieve this by statistical updates and leveraging online learning algorithms.
This is joint work with Kevin-Martin Aigner, Kristin Braun, Frauke Liers and Sebastian Pokutta.
Since the beginning of Topology, one of the most used approach to study an object was to triangulate it. Many invariants have been introduced in the years, the two most important surely being homotopy and homology. However, the direct computation of these invariants is infeasible for triangulations with a large number of faces. This is why methods to reduce the number of faces of a complex without changing its homology and homotopy type are particularly important. In this talk, we will focus on various ways to see if a complex is contractible, starting from collapsibility and then trying to generalize the concept.
Generic rectangulations are tilings of the unit square by
rectangles such that no 4 rectangles share a corner. Generic
rectangulations are a fundamental object in computer science: not
only appearing in practice (e.g. in VLSI circuit design) and
theory (e.g. being closely related to binary search trees); but
also within many other fundamental combinatorial objects such as
permutations. In this work, we present a versatile algorithmic
framework for exhaustively generating a large variety of different
classes of generic rectangulations. Our algorithms work under very
mild assumptions and apply to a considerable number of interesting
rectangulation classes known from the literature, such as generic
rectangulations, diagonal rectangulations, S-equivalent
rectangulations, 1-sided/area-universal rectangulations, and their
guillotine variants. The resulting generation algorithms are
efficient, in most interesting cases even loopless or constant
amortized time, i.e., each new rectangulation is generated in
constant time in the worst case or on average, respectively.
In this talk we will show:
A simple introduction to rectangulations and some of its
interesting classes,
Our algorithmic generation framework
for rectangulation classes and
Correctness proof (sketch) of the framework by relating
rectangulations to permutations.
Fast domain propagation of linear constraints has become a crucial component of today’s best algorithms and solvers for mixed integer programming and pseudo-boolean optimization to achieve peak solving performance. Irregularities in the form of dynamic algorithmic behaviour, dependency structures, and sparsity patterns in the input data make efficient implementations of domain propagation on GPUs and, more generally, on parallel architectures challenging. This is one of the main reasons why domain propagation in state-of-the-art solvers is single thread only. In this paper, we present a new algorithm for domain propagation which (a) avoids these problems and allows for an efficient implementation on GPUs, and is (b) capable of running propagation rounds entirely on the GPU, without any need for synchronization or communication with the CPU. We present extensive computational results which demonstrate the effectiveness of our approach and show that ample speedups are possible on practically relevant problems: on state-of-the- art GPUs, our geometric mean speed-up for reasonably-large instances is around 10x to 20x and can be as high as 195x on favorably-large instances.
Conditional gradients (CG) constitute a class of projection-free first-order algorithms for smooth convex optimization. As such, they are frequently used in solving smooth convex optimization problems over polytopes, for which the computational cost of projections is prohibitive. We will discuss two topics in this talk. First of all, CG algorithms do not enjoy the optimal convergence rates achieved by projection-based first-order accelerated methods; achieving such globally-accelerated rates is information-theoretically impossible. To address this issue, we present Locally Accelerated Conditional Gradients – an algorithmic framework that couples accelerated steps with conditional gradient steps to achieve local acceleration on smooth strongly convex problems. Our approach does not require projections onto the feasible set, but only on (typically low-dimensional) simplices, thus keeping the computational cost of projections at bay. Further, it achieves optimal accelerated local convergence.
The second topic in this talk deals with the setting where first and second-order information about the function is costly to compute, and so we would like to make as much primal progress as possible per oracle call. Constrained second-order convex optimization algorithms are the method of choice when a high accuracy solution to a problem is needed, due to their local quadratic convergence. These algorithms require the solution of a constrained quadratic subproblem at every iteration. In this setting, we present the Second-Order Conditional Gradient Sliding (SOCGS) algorithm, which uses a projection-free algorithm to solve the constrained quadratic subproblems inexactly and uses inexact Hessian oracles (subject to an accuracy requirement). When the feasible region is a polytope, and a lower bound on the primal gap is known, the algorithm converges quadratically in primal gap after a finite number of linearly convergent iterations. Once in the quadratic regime the SOCGS algorithm requires O(log log (1/ε)) first-order and inexact Hessian oracle calls and O(log (1 / ε) ⋅ log log (1/ε)) linear minimization oracle calls to achieve an ε-optimal solution.
We consider a general class of binary packing problems with multiple convex quadratic knapsack constraints. We present three constant-factor approximation algorithms based upon three different algorithmic techniques: (1) a rounding technique tailored to a convex relaxation in conjunction with a non-convex relaxation; (2) a greedy strategy; and (3) a randomized rounding method. The practical performance is tested by a computational study of the empirical approximation of these algorithms for problem instances arising in the context of real-world gas transport networks.
The learning of relations, dynamics, and governing equations from (sampled) data is a prevalent problem in various scientific fields. In the absence of noise, this task can often be solved using the Buchberger-Möller Algorithm, see Möller and Buchberger (1982). In general, however, when we do not have direct access to aforementioned equations, we have access to them indirectly only via noisy sampling. Known methods which extract relations from noisy data are the Approximate Vanishing Ideal Algorithm (AVI) and Vanishing Component Analysis (VCA), see Heldt et al. (2009) and Livni et al. (2013), respectively. These algorithms are often computationally inefficient and only allow for restricted recovery of relations from data. We present a new AVI algorithm and two variants utilizing Conditional Gradient Descent or the Frank-Wolfe Algorithm, see, e.g., Frank and Wolfe (1956) or Levitin and Polyak (1966). Our method is more general, easily regularizable, and can also extract non-homogeneous relations. We then present the efficacy and versatility of our approach for three learning problems: reconstruction of governing equations from data, classification, and regression.
This talk is a basic overview over my PhD projects. I focus on a project on making neural networks interpretable. If a neural network outputs a classification result for a high-dimensional input, we would like to know whether a small set of the input parameters was already decisive for the classification. This problem is formalised by introducing a probabilistic variant of prime implicants. We show that this problem is hard for the class NPPP and remains NP-hard for any non-trivial approximation. We present a heuristic solution strategy for real-world applications that is build on Assumed Density Filtering: A technique to propagate multivariate normal distributions through neural networks. To justify the use of this heuristic we show that in fact no sensible family of distributions can be invariant under ReLu-neural network layers.
In Computer Vision, algorithms are now capable of “intelligent” automatic processing of the images. And these possibilities have been recently recycled in many creative processes by A.I. artists, which outputs sometimes reached hammer prizes in art auctions. This emergence of A.I. art looks like an artification, i.e., a societal process by which a practice becomes commonly acknowledged as an art form. Art history abounds with such examples, e.g., the transitioning from craftmanship to solo painters in the Renaissance.
Hence, rather than assessing the artist’s narratives, this point of view allows understanding A.I. art as a figurehead of the broader ecosystem of the creative industry. The latter indeed highly benefited from recent advances in Computer Vision.
We discuss here the possible intertwinements between A.I. art, the use of Computer Vision in the next generation of creative software, the creative industry, and broader societal consequences.
This is ongoing work with Louis Thiry and Léa Saint-Raymond at Ecole Normale Supérieure in Paris.
During the COGA Retreat 2019 we started research on the Symmetric Rendezvous-on-the-Line-Problem with Unknown Initial Distance: Two robots are placed on the real line with unknown initial distance. They do not know the location of the other robot and do not necessarily share the same sense of “left” and “right” on the line. They move with constant speed over the line and try to achieve rendez-vous using the same (randomized) search strategy in as little time as possible.
In a March 2020 talk (given in the COGA Research Seminar) we presented a family of search strategies that performed well in simulations without being able to determine their exact competitive ratio. During this talk we will present a novel family of search strategies whose precise competitive ratio can be computed using generating functions.
This is WIP by Guillaume, Khai Van, Martin, and Max.
Christoph Hertrich
(TU Berlin)
[homepage]
Coordinates:
This talk deals with an open problem concerning the expressivity of feedforward neural networks (NNs) with rectified linear unit (ReLU) activations. While it has been shown that every continuous, piecewise linear (CPWL) function can be exactly represented by such an NN with logarithmic depth (with respect to the input dimension), it is still unknown whether there exists any CPWL function that cannot be represented with three layers. Based on a yet unproven theoretical assumption, we show a computer-aided proof of non-existence of a 3-layer ReLU NN that computes the maximum of five scalar input numbers. We achieve this partial result by converting the existence question into a linear programming feasibility problem. This is joint work in progress with Amitabh Basu, Marco Di Summa, and Martin Skutella.
The Frank-Wolfe algorithm has become a popular constrained optimization algorithm for it is simple and projection-free, and it has been successfully applied to a variety of real-world problems. Its main drawback however lies in its convergence rate, which can be excessively slow due to naive descent directions. In this talk, we provide a brief review and propose two methods significantly improving its performance: one for minimizing a convex objective and one for minimizing a (non)convex finite-sum objective. The first method speeds up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a matching pursuit-style subroutine. Although the idea is reasonably natural, its advantage over the state-of-the-art Frank-Wolfe algorithms is quite considerable. The second method blends stochastic Frank-Wolfe algorithms with adaptive gradients, which set entry-wise step-sizes that automatically adjust to the geometry of the problem. Computational experiments on convex and nonconvex objectives demonstrate the advantage of our approach.