minyoung huh (jacob)

email  /  google scholar  /  github


I am a PhD student working on artificial intelligence, machine learning and computer vision at MIT CSAIL / working with Phillip Isola and Pulkit Agrawal. I received my Bachelors from UC Berkeley advised under Alexei (Alyosha) Efros at Berkeley AI Research (BAIR). Prior to joining BAIR, I worked with Maysam Chamanzar and Michel Maharbiz at Swarm Lab.

I work on developing and understanding artifical intelligent systems. My research centers around the science of deep learning and scale, understanding behavior and optimization of deep learning models in order to develop a more intelligent, and efficient learning algorithm.

I spent some time at Google Research, Facebook (Meta) Research, Adobe Research, and Snap Research.

📖 I organize Algorithms That Learn And Scale (ATLAS): a deep learning discussion + seminar group on latest trends in large-scale AI. MIT folks are welcome to participate!
profile photo

Papers


Scalable Optimization in the Modular Norm
Tim Large*, Yang Liu, Minyoung Huh, Hyojin Bahng, Phillip Isola, Jeremy Bernstein*
arXiv 2024

We introduce "modular norm", which we use to normalize weights and updates in neural networks. The modular norm perspective allows learning rates to be transferable across different network widths and depths, simplifying model pre-training by eliminating the need for optimizer-specific scaling. We provide a Python package, modula, an architecture-aware optimization framework.

The Platonic Representation Hypothesis
Minyoung Huh*, Brian Cheung*, Tongzhou Wang*, Phillip Isola*
ICML 2024 (Oral)

We argue that there is a trend towards convergence in AI model representations, highlighting how different neural networks are increasingly aligning in their data representation across multiple domains and modalities. We introduce the concept of Platonic representation, a unifying statistical model of reality, and explore the driving forces, implications, and the challenges to this phenomenon.

Training Neural Networks from Scratch with Parallel Low-Rank Adapters
Minyoung Huh, Brian Cheung, Jeremy Bernstein, Phillip Isola, Pulkit Agrawal
arXiv 2024

We introduce LoRA-the-Explorer (LTE), a novel bi-level optimization algorithm that trains large neural networks using only low-rank adapters. Our method enables low-bandwidth communication with infrequent synchronization, providing a pre-training framework for efficiently training large models.

Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks
Minyoung Huh, Brian Cheung, Pulkit Agrawal, Phillip Isola
ICML 2023

We explore the challenges in training neural networks with vector quantization and straight-through estimation, pinpointing the primary issue as the mismatch between model embeddings and code-vector distribution. To tackle this, we propose affine re-parameterization, alternating optimization, and synchronized commitment loss. We show improved performance on variety of tasks including image classification and generative modeling.

The Low-Rank Simplicity Bias in Deep Networks
Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola
TMLR 2023 (arXiv 2021)

We conjecture that deep networks are implicitly biased to find lower rank solutions and that these are the solutions that generalize well. We further demonstrate linear over-parameterization can be used as an implicit regularizer to improve generalization without changing the effective model capacity.

Totems: Physical Objects for Verifying Visual Integrity
Jingwei Ma, Lucy Chai, Minyoung Huh, Tongzhou Wang, Phillip Isola, Antonio Torralba
ECCV 2022

We introduce a new approach to image forensics: placing physical refractive objects, which we call totems, into a scene so as to protect any photograph taken of that scene. Totems bend and redirect light rays, thus providing multiple, albeit distorted, views of the scene within a single image which we unscramble to reconstruct the underlying 3Dscene.

Learning to Ground Multi-Agent Communication with Autoencoders
Toru Lin, Minyoung Huh, Phillip Isola
NeurIPS 2021

We demonstrate a simple way to ground language in learned representations, which facilitates decentralized multi-agent communication and coordination. We find that a standard representation learning algorithm -- autoencoding -- is sufficient for arriving at a grounded common language.

Transforming and Projecting Images into Class-conditional Generative Networks
Minyoung Huh, Richard Zhang, Jun-Yan Zhu, Sylvain Paris, Aaron Hertzmann
ECCV 2020 (Oral)

We propose a method for projecting images into the space of generative neural networks. We optimize for transformation to counteract the model biases in a generative neural networks. We further propose the use of hybrid (gradient + CMA-ES) optimization to improve model inversion.

Feedback Adversarial Learning: Spatial Feedback for Improving Generative Adversarial Networks
Minyoung Huh*, Shao-Hua Sun*, Ning Zhang
CVPR 2019

We propose feedback adversarial learning (FAL) framework that can improve existing generative adversarial networks by leveraging spatial feedback from the discriminator into generative process.

Fighting Fake News: Detecting Malicious Image Manipulations via Learned Self-Consistency
Minyoung Huh*, Andrew Liu*, Alexei A. Efros, Andrew Owens
ECCV 2018

We propose a learning algorithm for detecting visual image manipulations that is trained only using a large dataset of real photographs. The algorithm uses the automatically recorded photo EXIF metadata as supervisory signal for training a model to determine whether an image is self-consistent

Multi-view to Novel view: Synthesizing Views via Self-Learned Confidence
Shao-Hua Sun, Minyoung Huh, Yuan-Hong Liao, Ning Zhang, Joseph J. Lim
ECCV 2018

We propose an end-to-end trainable framework that learns to exploit multiple viewpoints to synthesize a novel view without any 3D supervision. Our model consists of a flow prediction module and a pixel generation module to directly leverage information presented in source views as well as hallucinate missing pixels from statistical priors.

What makes ImageNet good for Transfer Learning?
Minyoung Huh, Pulkit Agrawal, Alexei A. Efros
NeurIPS workshop 2018

This work provides an empirical investigation into the various facets of this question, such as looking at the importance of the amount of examples, number of classes, balance between images-per-class and classes, and the role of fine and coarse grained recognition. We pre-train CNN features on various subsets of the ImageNet dataset and evaluate transfer performance on a variety of standard vision tasks.

Ultrasonic sculpting of virtual, steerable optical waveguides in tissue
Maysam Chamanzar, Matteo Giuseppe Scopelliti, Julien Bloch, Ninh Do, Minyoung Huh, Dongjin Seo, Jillian Iafrati, Vikaas S. Sohal, Mohammad-Reza Alam, Michel M. Maharbiz
Nature Communications 2019

We demonstrate that ultrasound can be used to define and steer the trajectory of light within scattering media by exploiting local pressure differences created by acoustic waves that result in refractive index contrasts.

Virtual Acousto-optic Beam Paths for Steerable Deep-tissue Optical Stimulation and Imaging
Maysam Chamanzar, Minyoung Huh, Ninh Do, Mohammad-Reza Alam, Michel M. Maharbiz
CLEO 2016 & SfN 2016

We present the first non-invasive methodology for optical delivery and steering deep inside the brain through creating reconfigurable light paths by ultrasonic waves via modulating the refractive and diffractive properties of the medium.



Teaching



Pointers

Trying something new. Few pointers for people interested in ML/AI research that have been fruitful for me during my PhD. I will update as I scavange my chat history and find interesting reads.

Food for thought
Ethics and Integrity
How to do research
Lectures


Miscellaneous