Registered Data

[02342] On dataset sparsification and data reconstruction in deep learning

  • Session Time & Room : 3D (Aug.23, 15:30-17:10) @E817 A715
  • Type : Proposal of Minisymposium
  • Abstract : Recent successes in deep learning are partially driven by the ability to use ever larger datasets with overparametrized models. However, the ability to obtain similar performance over smaller datasets is clearly computationally advantageous. Furthermore, when learning with large models over large datasets, it has been shown that portions of the data can be reconstructed from the model parameters. This clearly poses a privacy risk. It turns out that dataset reconstruction and dataset distillation are closely related. This symposium will bring together researchers working on the latest advances in both dataset reconstruction and distillation.
  • Organizer(s) : Anastasia Borovykh
  • Classification : 68T07, 90C31
  • Minisymposium Program :
    • 02342 (1/1) : 3D @E817 A715 [Chair: Anastasia Borovykh]
      • [03191] Data sampling for surrogate modeling and optimization
        • Format : Talk at Waseda University
        • Author(s) :
          • Tyler H Chang (Argonne National Laboratory)
        • Abstract : In surrogate modeling for nonconvex blackbox optimization, global convergence is driven by the global approximation error of an interpolatory model. For large complex problems, achieving global model accuracy can be prohibitively expensive, so model-based optimization techniques such as Bayesian optimization rely on adaptive sampling to tradeoff between exploration and exploitation. However, these techniques are known to scale poorly with dimension. Therefore, we analyze an alternative approach based on response surface methodology coupled with static design-of-experiments.
      • [03317] Bayesian inference via dataset sparsification
        • Format : Online Talk on Zoom
        • Author(s) :
          • Trevor Campbell (UBC)
        • Abstract : A Bayesian coreset is a sparsified dataset that can be used to reduce the cost of inference. Constructing high-quality coresets remains a challenge. In this work we introduce a new method for coreset construction that involves subsampling the data, and then optimizing a variational flow parametrized by coreset weights. Theoretical results demonstrate that our method achieves exponential data compression in a representative model. Experiments demonstrate accurate inference with reduced runtime compared with standard inference methods.
      • [03934] Foundations of Information Leakage in Machine Learning
        • Format : Talk at Waseda University
        • Author(s) :
          • Reza Shokri (National University of Singapore)
        • Abstract : This talk will explore the foundations of data privacy in machine learning, with a specific focus on membership inference attacks.
      • [04492] Understanding Reconstruction Attacks with Dataset Distillation
        • Format : Talk at Waseda University
        • Author(s) :
          • Noel Loo (Massachusetts Institute of Technology)
        • Abstract : Dataset reconstruction attacks are attacks which aim to recover portions of training data from a trained neural network with access to only the model parameters. In this talk, we study the efficacy of these attacks in both infinite and finite-width regimes, and show that these reconstruction attacks are closely related to dataset distillation. In doing so, we study the properties of recovered images, namely what makes images easy to reconstruct, and how they affect training.