ICML 2024 compressed

This page contains one-sentence summaries of ICML 2024 accepted papers, generated by the compressor, my personal LLM-based project.


A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer

https://openreview.net/forum?id=zxxSJAVQPc

Compressor summary: GraphsGPT uses a Graph2Seq encoder to transform Non-Euclidean graphs into learnable Graph Words, and a GraphGPT decoder to reconstruct the original graph, achieving state-of-the-art results in graph representation learning and generation.


Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation

https://openreview.net/forum?id=zwUEk9WpsR

Compressor summary: The text discusses how server-assisted federated learning (SA-FL) can address incomplete client participation and provide theoretical and practical benefits over conventional FL.


Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

https://openreview.net/forum?id=ztn8FCR1td

Compressor summary: The paper presents a new architecture (Mamba-2) that combines state-space models and attention, improving the speed and performance of language modeling.


Rethinking Independent Cross-Entropy Loss For Graph-Structured Data

https://openreview.net/forum?id=zrQIc9mQQN

Compressor summary: The paper proposes a new method to improve graph neural networks for node classification by modeling the joint distribution of nodes and their clusters, enhancing accuracy and robustness against adversarial attacks.


Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks

https://openreview.net/forum?id=znz261CQK7

Compressor summary: Key points: - The paper studies random and SGD-optimized neural networks with zero training error - Overparameterization in terms of width helps generalization due to SGD bias - Overparameterization in terms of depth harms generalization but random and SGD-optimized networks behave similarly, suggesting an architectural bias Summary: The paper investigates how optimization and architecture affect generalization of neural networks with zero training error, finding that overparameterization in width is good for generalization because of SGD bias, while overparameterization in depth is bad for generalization but random and SGD-optimized networks are similar, indicating an architectural bias.


S3GCL: Spectral, Swift, Spatial Graph Contrastive Learning

https://openreview.net/forum?id=znKAWRZSF9

Compressor summary: S3GCL is a new graph representation learning method that addresses challenges of homophily assumptions and scalability by using cosine-parameterized Chebyshev polynomials as filters and an MLP encoder with positive pairs for context awareness.


Trustworthy Actionable Perturbations

https://openreview.net/forum?id=zkjGpZrIX3

Compressor summary: The text introduces Trustworthy Actionable Perturbations (TAP), a framework to modify inputs that change true class probabilities instead of fooling classifiers, with verification, cost, reward, and goal definitions for real-world applications.


Don’t Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget

https://openreview.net/forum?id=zkcya47Sq5

Compressor summary: To compare two binary classifiers, spend the budget on collecting a single noisy label for more samples rather than aggregating multiple labels via majority vote, as this increases accuracy and provides better sample size bounds.


Flexible Residual Binarization for Image Super-Resolution

https://openreview.net/forum?id=zji9DLksTz

Compressor summary: FRB is a method to improve binary image super-resolution by using second-order residual binarization and distillation-guided binarization training, achieving better results than existing methods.


Improving Factuality and Reasoning in Language Models through Multiagent Debate

https://openreview.net/forum?id=zj7YuTE4t8

Compressor summary: The paper proposes a method where multiple language models debate and improve each other's responses, enhancing reasoning and validity of generated content.


Conditionally-Conjugate Gaussian Process Factor Analysis for Spike Count Data via Data Augmentation

https://openreview.net/forum?id=zgiT3uxvCF

Compressor summary: The paper proposes ccGPFA, a conditionally-conjugate method for modeling neural activity from spike count data that enables tractable inference using variational EM and sparse Gaussian Processes.


Individualized Privacy Accounting via Subsampling with Applications in Combinatorial Optimization

https://openreview.net/forum?id=zfmwAaB9Nw

Compressor summary: The paper proposes a new technique to improve privacy analysis and algorithms for combinatorial optimization problems and heavy hitter identification in data streams.


TIC-TAC: A Framework For Improved Covariance Estimation In Deep Heteroscedastic Regression

https://openreview.net/forum?id=zdNTiTs5gU

Compressor summary: Deep heteroscedastic regression improves covariance estimation using gradient and curvature information, and introduces a metric to evaluate accuracy without supervision.


ODIN: Disentangled Reward Mitigates Hacking in RLHF

https://openreview.net/forum?id=zcIV8OQFVF

Compressor summary: The paper investigates reward hacking in Reinforcement Learning from Human Feedback using Long Short-Term Memories (LLMs) and proposes an improved evaluation protocol and reward model to address this issue.


Differentially Private Sum-Product Networks

https://openreview.net/forum?id=zc3bAEI5lp

Compressor summary: The paper proposes a novel method using sum-product networks to create a single model that performs both privacy-preserving classification and data generation, outperforming existing approaches in stability and utility.


Contrastive Predict-and-Search for Mixed Integer Linear Programs

https://openreview.net/forum?id=zatLnLvbs8

Compressor summary: ConPaS is a new machine learning framework that predicts and fixes integer variable assignments to solve MILPs faster and better than existing methods.


A Distributional Analogue to the Successor Representation

https://openreview.net/forum?id=zajsXCxMgW

Compressor summary: The paper introduces a new method for distributional reinforcement learning that separates transition structure and reward, using a distributional successor measure that describes the consequences of behavior, which can be learned from data and enables zero-shot risk-sensitive policy evaluation.


Handling Heterogeneous Curvatures in Bandit LQR Control

https://openreview.net/forum?id=zWIS8I9G9B

Compressor summary: The paper studies online LQR control with heterogeneous cost curvatures and provides a novel analysis using Newton decrement to improve adaptivity and performance.


DSD-DA: Distillation-based Source Debiasing for Domain Adaptive Object Detection

https://openreview.net/forum?id=zS8zUuAU8T

Compressor summary: The paper proposes a new framework that uses distillation and target-relevant information to improve domain adaptive object detection, addressing source bias and achieving better classification and localization in both domains.


Homomorphism Counts for Graph Neural Networks: All About That Basis

https://openreview.net/forum?id=zRrzSLwNHQ

Compressor summary: The paper proposes a new method for graph neural networks that counts all structures in a pattern basis, improving their expressive power and counting abilities without increasing complexity.


Agent Instructs Large Language Models to be General Zero-Shot Reasoners

https://openreview.net/forum?id=zMwFvxr6CV

Compressor summary: The paper proposes an agent that generates instructions to improve the zero-shot reasoning abilities of large language models on various tasks and datasets, achieving state-of-the-art performance.


Deep Networks Always Grok and Here is Why

https://openreview.net/forum?id=zMue490KMr

Compressor summary: Grokking is a widespread phenomenon in deep neural networks where generalization occurs long after zero training error, and it leads to delayed robustness against adversarial examples due to the linearization of the network mapping.


Impact of Decentralized Learning on Player Utilities in Stackelberg Games

https://openreview.net/forum?id=zMsMQJraEj

Compressor summary: The paper studies how two learning agents interact over time, shows that existing benchmarks don't capture their dynamics well, and proposes new algorithms and environments to improve their performance.


A Rate-Distortion View of Uncertainty Quantification

https://openreview.net/forum?id=zMGUDsPopK

Compressor summary: The paper proposes a new method called Distance Aware Bottleneck that enhances deep neural networks with uncertainty estimation and out-of-distribution detection by learning a codebook of input representations.


GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

https://openreview.net/forum?id=zL9q2JD1dC

Compressor summary: GALA3D is a framework that uses language models to generate realistic 3D scenes with controllable editing and consistent properties.


Transitional Uncertainty with Layered Intermediate Predictions

https://openreview.net/forum?id=zII3Olw7cr

Compressor summary: The paper presents TULIP, a method to improve single-pass uncertainty estimation in neural networks by preserving feature distances at intermediate layers.


Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples

https://openreview.net/forum?id=zFHaB7KESM

Compressor summary: The paper proposes a method to learn general nonlinear representations from multiple data sources with different distributions and dependencies, providing statistical guarantees and requiring fewer samples than existing methods.


SignSGD with Federated Defense: Harnessing Adversarial Attacks through Gradient Sign Decoding

https://openreview.net/forum?id=zEqeNEuiJr

Compressor summary: The paper proposes signSGD-FD, a distributed learning optimizer that uses federated defense to handle both honest and adversarial workers, ensuring fast convergence even under attacks.


On a Neural Implementation of Brenier's Polar Factorization

https://openreview.net/forum?id=zDCwJQY3eI

Compressor summary: The paper proposes a practical implementation of the polar factorization theorem for vector fields, using neural networks and optimal transport to parameterize the components, and explores its applications in machine learning.


Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning

https://openreview.net/forum?id=zCmMkWK4Ly

Compressor summary: The paper proposes ICES, a method to encourage multi-agent exploration in sparse reward environments by assessing each agent's contribution and using global transition information during training.


Overcoming the Optimizer's Curse: Obtaining Realistic Prescriptions from Neural Networks

https://openreview.net/forum?id=zB6VQzDmK8

Compressor summary: The paper proposes a method to obtain realistic decisions from ReLU networks by modeling the data manifold as an optimization constraint and using adaptive sampling to solve the problem efficiently.


What is Dataset Distillation Learning?

https://openreview.net/forum?id=z8sYc334fU

Compressor summary: The study examines the behavior, representativeness, and point-wise information content of dataset distillation, finding that it retains high task performance by compressing early training dynamics and contains meaningful semantic information.


Characterizing ResNet's Universal Approximation Capability

https://openreview.net/forum?id=z7zHsNFXHc

Compressor summary: ResNet can efficiently approximate functions such as polynomials and smooth ones with fewer tunable weights than ReLU networks, achieving optimal approximation rates.


DITTO: Diffusion Inference-Time T-Optimization for Music Generation

https://openreview.net/forum?id=z5Ux2u6t7U

Compressor summary: DITTO is a framework to control text-to-music diffusion models without fine-tuning by optimizing initial noise latents and achieving high quality, flexible, and efficient music generation.


Dynamic Metric Embedding into lp Space

https://openreview.net/forum?id=z3PUNzdmGs

Compressor summary: The paper presents a new algorithm for maintaining a randomized mapping of a weighted graph to an $\ell_p$ space while preserving the distance between vertices, with low expected distortion and fast update time, even in dynamic settings where edge weights change over time.


Demystifying SGD with Doubly Stochastic Gradients

https://openreview.net/forum?id=z373OXJXWU

Compressor summary: This paper analyzes the convergence properties of doubly stochastic gradient descent (doubly SGD), a popular optimization strategy for problems with intractable expectations, and shows that random reshuffling can improve its performance.


Triple Changes Estimator for Targeted Policies

https://openreview.net/forum?id=yzNEkTmcoF

Compressor summary: The paper proposes and tests a new estimator called 'triple changes' that allows for a more nuanced assessment of causal effects by incorporating information from potential outcomes, compared to existing methods like difference-in-differences or triple differences.


By Tying Embeddings You Are Assuming the Distributional Hypothesis

https://openreview.net/forum?id=yyYMAprcAR

Compressor summary: Tied input-output embeddings in language models relate to Harris' distributional hypothesis and contextual similarity of words.


Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

https://openreview.net/forum?id=ytz2naZoDB

Compressor summary: Key points: - The paper proposes a framework to train SDEs with policy gradients for generating samples with high rewards - The framework constrains the SDE to be consistent with its associated perturbation process, which covers the entire space and is easy to sample - The method is applied to structure-based drug design and improves the binding affinity of ligand molecules Summary: The paper presents a policy gradient framework for training SDEs that are compatible with their perturbation processes, enabling effective and efficient generation of high-reward samples. The method is used for structure-based drug design and achieves the best score on a dataset.


Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning

https://openreview.net/forum?id=yrFUJzcTsk

Compressor summary: HesScale is a fast and stable approximation of Hessian diagonals that outperforms existing methods in small networks for second-order optimization and reinforcement learning problems.


Scaling Laws for Fine-Grained Mixture of Experts

https://openreview.net/forum?id=yoqdlynCRs

Compressor summary: This paper studies how to adjust the size of experts in Mixture of Experts (MoE) models for efficient language processing, introducing a new hyperparameter called granularity and showing how it affects scaling laws.


Rethinking Guidance Information to Utilize Unlabeled Samples: A Label Encoding Perspective

https://openreview.net/forum?id=yoTCwNqQS6

Compressor summary: The paper proposes a new method called Label-Encoding Risk Minimization (LERM) that uses unlabeled samples to improve the learning of labeled samples and balance prediction discriminability and diversity in scenarios with insufficient labels.


Position: AI/ML Influencers Have a Place in the Academic Process

https://openreview.net/forum?id=yo9Jyt3XCY

Compressor summary: The paper examines how social media influencers boost the visibility and citations of AI and ML papers, and suggests they should promote diversity in their content.


Estimating Barycenters of Distributions with Neural Optimal Transport

https://openreview.net/forum?id=ymgcTqrZLT

Compressor summary: The paper proposes a new method to find an average distribution from multiple probability measures using neural networks and optimal transport, with theoretical guarantees and applications to image data.


Incentivized Learning in Principal-Agent Bandit Games

https://openreview.net/forum?id=ykgZk6vFrh

Compressor summary: This paper studies a repeated game where a principal tries to learn how to incentivize an agent to act in the principal's best interest in various settings, motivated by applications like healthcare and ecology.


Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical

https://openreview.net/forum?id=ykZYLBcA9g

Compressor summary: The paper proposes a new consistent method for complementary-label learning that does not rely on the uniform distribution assumption or ordinary-label training sets, and shows its effectiveness in experiments.


Bootstrap AutoEncoders With Contrastive Paradigm for Self-supervised Gaze Estimation

https://openreview.net/forum?id=ykRY34kL3j

Compressor summary: The paper proposes a new method called BeCa, which combines generative and contrastive methods for self-supervised full-face gaze estimation and improves performance over existing approaches on various datasets.


Amortizing Pragmatic Program Synthesis with Rankings

https://openreview.net/forum?id=yj8h567Ia7

Compressor summary: The paper proposes a method to speed up pragmatic program synthesis by using a global ranking derived from partial rankings of programs generated by an exact RSA synthesizer.


Provably Robust DPO: Aligning Language Models with Noisy Feedback

https://openreview.net/forum?id=yhpDKSw7yA

Compressor summary: This paper proposes a framework for policy optimization with random preference flips, introduces a novel loss function to debias the effect of noise, and proves its sub-optimality bound in theory.


Accelerating Legacy Numerical Solvers by Non-intrusive Gradient-based Meta-solving

https://openreview.net/forum?id=yh6Y7ppf46

Compressor summary: The paper proposes a novel method to combine machine learning and legacy numerical codes without modification for faster scientific computing.


Learning Shadow Variable Representation for Treatment Effect Estimation under Collider Bias

https://openreview.net/forum?id=ycXo4tQIpN

Compressor summary: The paper proposes a method to learn shadow variables from observational data for estimating treatment effects in the presence of collider bias, using hypothesis testing and a novel treatment effect estimator.


Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks

https://openreview.net/forum?id=ycLHJuLYuD

Compressor summary: SAFECLIP is a defense method for pre-training CLIP models against targeted data poisoning and backdoor attacks by using unimodal contrastive learning and dividing data into safe and risky sets.


Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints

https://openreview.net/forum?id=yb5xV8LFDq

Compressor summary: The paper proposes a new method for selecting small coresets that maintain model performance while minimizing the coreset size in deep learning algorithms, and provides theoretical and empirical results to support its effectiveness.


CLIF: Complementary Leaky Integrate-and-Fire Neuron for Spiking Neural Networks

https://openreview.net/forum?id=yY6N89IlHa

Compressor summary: The paper introduces CLIF, a new neuron model for spiking neural networks that improves accuracy by facilitating the backpropagation of temporal gradients, and shows its superior performance over other models and conventional ANNs on various datasets.


Towards Generalization beyond Pointwise Learning: A Unified Information-theoretic Perspective

https://openreview.net/forum?id=yXlQL9goY8

Compressor summary: The paper develops new information-theoretic methods to analyze and compare different types of contrastive learning algorithms, including pointwise, pairwise, triplet, quadruplet, and higher-order scenarios.


COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

https://openreview.net/forum?id=yUxdk32TU6

Compressor summary: The paper presents a novel framework for generating diverse and controllable adversarial attacks on large language models using an efficient text generation algorithm.


Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate

https://openreview.net/forum?id=yUPBkPKzHw

Compressor summary: ELBERT is a long-term fairness concept for machine learning models in sequential decision-making that reduces biases and maintains high utility.


Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory Architecture

https://openreview.net/forum?id=yTz0u4B8ug

Compressor summary: Memoria is a novel artificial neural network memory system inspired by human memory that outperforms conventional techniques in various tasks.


Performance Bounds for Active Binary Testing with Information Maximization

https://openreview.net/forum?id=yTXv8KDD1P

Compressor summary: The paper proposes tight non-vacuous bounds on the greedy heuristic InfoMax for binary test prediction, assuming that the conditional probability of a test being 'true' is within $\delta$ units of one-half, and analyzes two scenarios with modest values of $\delta$.


${\rm E}(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

https://openreview.net/forum?id=yShA4VPYZB

Compressor summary: This paper explores Euclidean symmetries in cooperative multi-agent reinforcement learning, designing neural networks with symmetric constraints that improve performance and generalization in various applications.


High-Dimensional Geometric Streaming for Nearly Low Rank Data

https://openreview.net/forum?id=yQfA0etfB7

Compressor summary: The text studies algorithms for approximating subspaces of points in the $\ell_p$ norm, giving fast and strong coreset constructions that can be applied to various geometric problems.


ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections

https://openreview.net/forum?id=yPDTXQwUPy

Compressor summary: The paper introduces *ETHER*, a new method to fine-tune foundation models with fewer parameters, less performance deterioration, and hyperparameter robustness.


RODEO: Robust Outlier Detection via Exposing Adaptive Out-of-Distribution Samples

https://openreview.net/forum?id=yOe5lqDPvM

Compressor summary: RODEO is a data-centric approach that uses text-to-image models to generate diverse and near-distribution outliers for robust outlier detection in adversarial settings.


Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration

https://openreview.net/forum?id=yL6hljtjW4

Compressor summary: The paper proposes a method called STATA that improves the efficiency of training Spiking Transformers by sparsifying tokens, identifying important ones across timesteps, and aligning attention maps.


Performative Prediction with Bandit Feedback: Learning through Reparameterization

https://openreview.net/forum?id=yHs3jIPgaF

Compressor summary: The paper proposes a framework to study social prediction without assuming convexity, known mapping, or first-order information, by reparameterizing the objective function and optimizing it iteratively.


Provable Benefits of Local Steps in Heterogeneous Federated Learning for Neural Networks: A Feature Learning Perspective

https://openreview.net/forum?id=yHRxnhKyEJ

Compressor summary: This paper investigates the effect of local steps in Federated Learning on feature learning and generalization, and shows that they help improve performance and reduce communication costs.


Domain Generalisation via Imprecise Learning

https://openreview.net/forum?id=yFUdZfbEme

Compressor summary: The Imprecise Domain Generalisation framework helps machine learners deal with uncertainty in out-of-distribution generalisation by allowing them to optimise against a range of strategies during training and enabling operators to specify their preferences at deployment time.


Variational Partial Group Convolutions for Input-Aware Partial Equivariance of Rotations and Color-Shifts

https://openreview.net/forum?id=yDXnXJE1RK

Compressor summary: The paper proposes VP G-CNNs, a novel approach to capture varying levels of partial equivariance for different data instances using variational inference and redesigned distributions.


On the Implicit Bias of Adam

https://openreview.net/forum?id=y8YovS0lOg

Compressor summary: The paper investigates how different optimization algorithms like RMSProp and Adam regularize solutions during training by analyzing their corresponding ordinary differential equations and their dependence on hyperparameters.


Neural Collapse in Multi-label Learning with Pick-all-label Loss

https://openreview.net/forum?id=y8NevOhrnW

Compressor summary: The paper investigates neural collapse in multi-label classification and shows how it affects feature representation and optimizer behavior, leading to improved performance.


Roping in Uncertainty: Robustness and Regularization in Markov Games

https://openreview.net/forum?id=y6y2HauOpR

Compressor summary: The paper studies robust Markov games with rectangular uncertainty, shows a connection between robust Nash equilibrium and regularized Markov games, provides a planning algorithm and provable guarantees, and identifies a special class of games that can be solved efficiently for reward-uncertain two-player zero-sum cases.


Evolution-Inspired Loss Functions for Protein Representation Learning

https://openreview.net/forum?id=y5L8W0KRUX

Compressor summary: Evolutionary Ranking (EvoRank) is a new training objective for AI-based protein engineering frameworks that uses evolutionary information from multiple sequence alignments to improve mutation effect predictions and learn diverse protein representations.


Neural Diffusion Models

https://openreview.net/forum?id=xzX7kf486K

Compressor summary: Neural Diffusion Models generalize conventional diffusion models by enabling non-linear transformations of data, improving generative tasks performance and sample quality.


Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

https://openreview.net/forum?id=xye7iNsgXn

Compressor summary: Key points: - Large-scale recommendation systems need to handle high cardinality, heterogeneous features and billions of user actions - DLRMs fail to scale with compute, while Transformers succeed in language and vision domains - HSTU is a new architecture for streaming recommendation data that outperforms baselines and is faster than FlashAttention2-based Transformers - Generative Recommenders with HSTU improve metrics in online A/B tests and reduce carbon footprint Summary: The authors propose HSTU, a new Transformer-based architecture for recommendation systems that handles high features, scales well with compute, and improves both performance and sustainability.


Exploring Intrinsic Dimension for Vision-Language Model Pruning

https://openreview.net/forum?id=xxL7CEWuxz

Compressor summary: The paper explores how intrinsic dimension can be used as a metric to measure and improve the prunability of vision-language models by modality, finding that visual representations are more crucial while language representations are more robust.


Convergence Guarantees for the DeepWalk Embedding on Block Models

https://openreview.net/forum?id=xwxUbBHC1q

Compressor summary: The paper analyzes the convergence properties and theoretical guarantees of DeepWalk algorithm on graphs generated by Stochastic Block Model, a simple model to study algorithm behavior on large graphs.


Learning Graph Representation via Graph Entropy Maximization

https://openreview.net/forum?id=xwOENWCo46

Compressor summary: The paper proposes a method to learn diverse graph representations by approximating graph entropy using orthonormal neural networks, which improve performance in unsupervised and semi-supervised learning tasks.


Drug Discovery with Dynamic Goal-aware Fragments

https://openreview.net/forum?id=xuX2rDSSco

Compressor summary: GEAM is a molecular generative framework for drug discovery that considers target chemical properties and updates its fragment vocabulary dynamically during the generation process.


Memory Efficient Neural Processes via Constant Memory Attention Block

https://openreview.net/forum?id=xtwCf7iAs2

Compressor summary: The paper introduces CMANPs, a variant of Neural Processes that uses constant memory and attention blocks to model predictive uncertainty efficiently.


Position: Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexamination

https://openreview.net/forum?id=xtKWwB6lzT

Compressor summary: The paper critiques offline reinforcement learning in dynamic treatment regimes, citing concerns about evaluation metrics, baselines, and RL formulations; it also presents a case study showing RL performance variations and the possibility of random baselines outperforming RL algorithms.


Membership Inference Attacks on Diffusion Models via Quantile Regression

https://openreview.net/forum?id=xqqccG7gf1

Compressor summary: The text describes a privacy vulnerability of diffusion models in image synthesis and proposes an improved membership inference attack that uses quantile regression and bootstrapping to detect if a given example belongs to the training data or not with higher accuracy and lower computational cost than previous methods.


Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models

https://openreview.net/forum?id=xpSlt67vxQ

Compressor summary: R-Bench is a benchmark to evaluate vision relationship hallucinations in large language models by testing their understanding of visual relationships and content.


RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

https://openreview.net/forum?id=xnQ1qoly7Q

Compressor summary: The paper introduces RoboCodeX, a framework for generating detailed robotic actions from high-level human instructions using tree-structured multimodal code generation and pre-training with a specialized dataset.


Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape

https://openreview.net/forum?id=xm2lU7tteQ

Compressor summary: The paper explores how Transformer models with a nonlinear representation layer can learn effectively in context, and proves that their loss landscape is benign and stable under certain conditions.


The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

https://openreview.net/forum?id=xlr6AUDuJz

Compressor summary: The authors introduce the WMDP benchmark to measure hazardous knowledge in LLMs and propose RMU, an unlearning method that reduces malicious use while preserving general capabilities.


InstructSpeech: Following Speech Editing Instructions via Large Language Models

https://openreview.net/forum?id=xlWcdtCyOC

Compressor summary: This paper presents InstructSpeech, a multi-task language model that can edit speech based on natural language instructions, achieving state-of-the-art results in eleven tasks.


An Analysis of Linear Time Series Forecasting Models

https://openreview.net/forum?id=xl82CcbYaT

Compressor summary: The paper shows that simple linear models and their variants perform well in time series forecasting and can be interpreted as unconstrained linear regression over expanded features, leading to better forecasts.


Improved Differentially Private and Lazy Online Convex Optimization: Lower Regret without Smoothness Requirements

https://openreview.net/forum?id=xl2yU3dsHK

Compressor summary: The paper presents new differentially private algorithms for online convex optimization that work well with non-smooth and high-dimensional loss functions, using sampling from log-concave densities and rejection sampling.


Langevin Policy for Safe Reinforcement Learning

https://openreview.net/forum?id=xgoilgLPGD

Compressor summary: The paper introduces Langevin Actor-Critic (LAC), a method that combines sampling-based Langevin policy with optimization-based actor-critic to enable safe reinforcement learning in complex tasks.


S3O: A Dual-Phase Approach for Reconstructing Dynamic Shape and Skeleton of Articulated Objects from Single Monocular Video

https://openreview.net/forum?id=xcyKKACmSd

Compressor summary: S3O is a novel method that learns parametric models for shape and motion from monocular videos without requiring additional annotations or extensive computational resources, improving 3D reconstruction accuracy and robustness.


CHAI: Clustered Head Attention for Efficient LLM Inference

https://openreview.net/forum?id=xcDRx8vzCa

Compressor summary: CHAI reduces memory and compute requirements for large language models by clustering correlated attention heads at runtime.


Positive and Unlabeled Learning with Controlled Probability Boundary Fence

https://openreview.net/forum?id=xbQqhojHTg

Compressor summary: PULCPBF is a two-stage method for PU learning that uses weak classifiers as a probability boundary fence and self-training to improve performance.


Robust Classification via a Single Diffusion Model

https://openreview.net/forum?id=xaSpuvNYwS

Compressor summary: Key points: - Diffusion models can improve adversarial robustness but have limitations - Robust Diffusion Classifier (RDC) is a generative classifier based on a pre-trained diffusion model - RDC uses multi-head diffusion and efficient sampling strategies to reduce computational cost - RDC outperforms adversarial training models against various adaptive attacks on CIFAR-10 Summary: The paper proposes Robust Diffusion Classifier, a generative classifier that leverages a pre-trained diffusion model to achieve better adversarial robustness than existing methods.


Envisioning Outlier Exposure by Large Language Models for Out-of-Distribution Detection

https://openreview.net/forum?id=xZO7SmM12y

Compressor summary: The paper proposes EOE, a method that uses large language models to generate outlier class labels and a score function based on potential outlier penalty for zero-shot OOD detection in open-world scenarios.


How Private are DP-SGD Implementations?

https://openreview.net/forum?id=xWI0MKwJSS

Compressor summary: Shuffling and Poisson subsampling are two different batch sampling methods for private machine learning that have significantly different privacy guarantees, and practitioners should be careful when reporting the privacy analysis of private SGD.


Model-based Reinforcement Learning for Parameterized Action Spaces

https://openreview.net/forum?id=xW79geE0RA

Compressor summary: DLPA is a new RL algorithm for PAMDPs that learns a dynamics model, plans with a modified control method, and outperforms existing methods in sample efficiency and performance.


A Dense Reward View on Aligning Text-to-Image Diffusion with Preference

https://openreview.net/forum?id=xVXnXk9I3I

Compressor summary: The paper proposes a new method for aligning text-to-image diffusion models with user preferences by using dense rewards and temporal discounting, which improves the efficiency and effectiveness of preference alignment.


Out-of-Domain Generalization in Dynamical Systems Reconstruction

https://openreview.net/forum?id=xTYIAD2NND

Compressor summary: The paper proposes a formal framework to analyze generalization in dynamical systems reconstruction from time series data, showing that black-box deep learning techniques often fail to achieve it and suggesting ways to improve it.


CF-OPT: Counterfactual Explanations for Structured Prediction

https://openreview.net/forum?id=xSkIxKdO08

Compressor summary: Key points: - Optimization layers in deep neural networks improve structured learning but lack interpretability - Counterfactual explanations can make them more transparent by providing alternative outcomes - Variational autoencodters enable counterfactuals in latent space with plausibility - CF-OPT is a first-order optimization algorithm for finding counterfactual explanations in structured learning Summary: The paper proposes a novel algorithm (CF-OPT) that uses variational autoencoders to find interpretable counterfactual explanations for optimization layers in deep neural networks.


Translating Subgraphs to Nodes Makes Simple GNNs Strong and Efficient for Subgraph Representation Learning

https://openreview.net/forum?id=xSizvCoI79

Compressor summary: The paper proposes Subgraph-To-Node translation, a method for learning subgraph representations that reduces memory and computational costs and performs better than existing graph neural networks.


Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

https://openreview.net/forum?id=xS2YKQlBIZ

Compressor summary: PRGD is a novel algorithm that maximizes the margin at an exponential rate for linearly separable data, while existing algorithms like GD and NGD fail under certain conditions and only achieve a polynomial rate.


Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

https://openreview.net/forum?id=xQiYCmDrjp

Compressor summary: The paper proposes a novel temporal distance for stochastic settings that satisfies the triangle inequality, enabling better planning and control in reinforcement learning.


FrameQuant: Flexible Low-Bit Quantization for Transformers

https://openreview.net/forum?id=xPypr0kufs

Compressor summary: The paper proposes using Fusion Frames to quantize Transformers to two bits, achieving efficient and accurate models with known recovery guarantees and denoising filters.


A New Branch-and-Bound Pruning Framework for $\ell_0$-Regularized Problems

https://openreview.net/forum?id=xPmSNLle1w

Compressor summary: The paper proposes an alternative pruning test method for $\ell_0$-regularized problems that improves the solving time of Branch-and-Bound algorithms in machine-learning applications.


Critical feature learning in deep neural networks

https://openreview.net/forum?id=xMJT4XW468

Compressor summary: The paper develops a theory of network kernels in deep neural networks that explains how their Bayesian prior adapts to data and features using large deviation and field-theoretic approaches.


Do Efficient Transformers Really Save Computation?

https://openreview.net/forum?id=xLikRS9OhW

Compressor summary: The paper investigates the reasoning capabilities of Sparse Transformer and Linear Transformer, finding that they are expressive enough for Dynamic Programming problems but require more model size than standard Transformer, and identifies a class of problems where they are more efficient.


Graph Neural Stochastic Diffusion for Estimating Uncertainty in Node Classification

https://openreview.net/forum?id=xJUhgvM2u8

Compressor summary: Graph neural stochastic diffusion (GNSD) is a new method for estimating uncertainty in graph neural network predictions by connecting them to stochastic partial differential equations with a $Q$-Wiener process and two networks to ensure accurate prediction and uncertainty propagation.


LLM-Empowered State Representation for Reinforcement Learning

https://openreview.net/forum?id=xJMZbdiQnf

Compressor summary: The proposed method, LESR, uses a large language model to generate task-related state representation codes for reinforcement learning, improving sample efficiency and performance in Mujoco and Gym-Robotics tasks.


Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning

https://openreview.net/forum?id=xIRKB5nRJl

Compressor summary: The authors propose a framework to train robots using multimodal prompts, combining vision and language signals, and achieve state-of-the-art performance on robot manipulation tasks.


Dynamic Survival Analysis with Controlled Latent States

https://openreview.net/forum?id=xGlVkBSDdt

Compressor summary: The authors propose a new method to learn intensity functions of counting processes using neural controlled differential equations and signature-based estimation, with theoretical guarantees and applications in various domains.


EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism

https://openreview.net/forum?id=xFk0w9zoV3

Compressor summary: EE-LLM is a framework that trains and uses large language models with early exiting to speed up inference while maintaining quality.


CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

https://openreview.net/forum?id=xFDJBzPhci

Compressor summary: The paper proposes a new objective function to improve out-of-distribution generalization and detection in vision-language pre-trained models during fine-tuning, by minimizing the gradient magnitude of energy scores on training data.


Ai-sampler: Adversarial Learning of Markov kernels with involutive maps

https://openreview.net/forum?id=xFCA2yWVs4

Compressor summary: The paper introduces a method to train Markov chains with neural networks for efficient and well-mixed sampling, using involutive Metropolis-Hastings kernels from reversible neural networks for detailed balance.


Online Algorithms with Uncertainty-Quantified Predictions

https://openreview.net/forum?id=xF656w37Mj

Compressor summary: The paper explores using uncertainty quantification (UQ) predictions in online algorithms for ski rental and online search problems, and proposes a new online learning framework to leverage UQ effectively.


Position: Application-Driven Innovation in Machine Learning

https://openreview.net/forum?id=xEB2oF3vvb

Compressor summary: The paper advocates for application-driven research in machine learning as a way to create impactful solutions and foster innovation, while suggesting reforms in the current reviewing, hiring, and teaching practices.


Simultaneous identification of models and parameters of scientific simulators

https://openreview.net/forum?id=xC7SYAZygF

Compressor summary: SBMI is a method that uses neural networks and simulations to systematically select model components in composite scientific models, enabling data-driven discovery and uncertainty-informed decision making.


RVI-SAC: Average Reward Off-Policy Deep Reinforcement Learning

https://openreview.net/forum?id=xB6YJZOKyT

Compressor summary: The paper proposes RVI-SAC, an off-policy DRL method using average reward criterion, which performs well on Mujoco locomotion tasks.


FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

https://openreview.net/forum?id=x2zxPwCkAZ

Compressor summary: FedBAT is a novel framework for federated learning that learns binary model updates during local training, reducing approximation errors and improving accuracy.


Improved Communication-Privacy Trade-offs in $L_2$ Mean Estimation under Streaming Differential Privacy

https://openreview.net/forum?id=x1G7ieRgRd

Compressor summary: The paper proposes a novel privacy accounting method and sparsification scheme for mean estimation in central differential privacy that improves MSEs, communication efficiency, and compatibility with streaming differential privacy and DP-FTRL optimizers.


HarmonyDream: Task Harmonization Inside World Models

https://openreview.net/forum?id=x0yIaw2fgk

Compressor summary: The paper investigates how observation and reward modeling tasks in world models affect sample-efficient MBRL and proposes HarmonyDream, a method that balances these two tasks to improve performance.


Randomized Confidence Bounds for Stochastic Partial Monitoring

https://openreview.net/forum?id=x0vLj1S6Wg

Compressor summary: The paper introduces new partial monitoring strategies using randomized confidence bounds and extends regret guarantees to stochastic settings with side information.


A Computational Framework for Solving Wasserstein Lagrangian Flows

https://openreview.net/forum?id=wwItuHdus6

Compressor summary: The text proposes a deep learning framework that simplifies solving various optimal transport problems without needing optimal couplings or simulating dynamics.


Noise-Aware Algorithm for Heterogeneous Differentially Private Federated Learning

https://openreview.net/forum?id=wuQ2DRPAuy

Compressor summary: Robust-HDP improves federated learning utility and convergence speed by efficiently estimating and reducing differential privacy noise across heterogeneous clients.


TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling

https://openreview.net/forum?id=wrTzLoqbCg

Compressor summary: TimeSiam is a self-supervised pre-training framework for time series that leverages Siamese networks to capture temporal correlations and outperforms other methods in forecasting and classification tasks.


Attention Meets Post-hoc Interpretability: A Mathematical Perspective

https://openreview.net/forum?id=wnkC5T11Z9

Compressor summary: The paper explores how attention mechanisms in transformer models can offer insights on their behavior and compares them with post-hoc explanations.


Unsupervised Parameter-free Simplicial Representation Learning with Scattering Transforms

https://openreview.net/forum?id=wmljUnbjy6

Compressor summary: The paper introduces simplicial scattering networks (SSNs), a parameter-free model that extracts task-agnostic features from simplicial complex data without labels using random walk matrices, improving robustness and performance in various tasks.


Matrix Information Theory for Self-Supervised Learning

https://openreview.net/forum?id=wleAlsklEh

Compressor summary: Matrix-SSL is a novel method that leverages matrix information theory to improve non-contrastive learning methods by aligning covariance matrices and achieving better results than previous state-of-the-art methods on image and language tasks.


The Good, The Bad, and Why: Unveiling Emotions in Generative AI

https://openreview.net/forum?id=wlOaG9g0uq

Compressor summary: This paper proposes three approaches to understand and manipulate emotions in generative AI models using psychological theories, and shows that they can comprehend emotional stimuli like the human brain's dopamine mechanism.


Better Locally Private Sparse Estimation Given Multiple Samples Per User

https://openreview.net/forum?id=wlBtHP8KqS

Compressor summary: The paper proposes a user-level differential privacy method for sparse linear regression that eliminates the dimension dependency and performs better than item-level methods with the same number of samples.


Joint Composite Latent Space Bayesian Optimization

https://openreview.net/forum?id=wkCUmO7oi2

Compressor summary: JoCo is a novel framework that combines neural network encoders and probabilistic models to compress high-dimensional input and output spaces for efficient black-box optimization.


InterLUDE: Interactions between Labeled and Unlabeled Data to Enhance Semi-Supervised Learning

https://openreview.net/forum?id=wilej5VnqL

Compressor summary: InterLUDE is a new SSL method that improves image classification by interpolating labeled and unlabeled embeddings and minimizing discrepancies in predictions between them.


On The Fairness Impacts of Hardware Selection in Machine Learning

https://openreview.net/forum?id=weixEb6Wjd

Compressor summary: The paper examines how hardware choices can affect model performance and fairness in machine learning, highlighting the need for considering hardware in ML-as-a-service platforms.


Incremental Topological Ordering and Cycle Detection with Predictions

https://openreview.net/forum?id=wea7nsJdMc

Compressor summary: The paper proposes a data structure for dynamic graph problems using algorithms-with-predictions to achieve consistency, robustness, and smoothness in running time, and shows empirical results on real datasets.


Perfect Alignment May be Poisonous to Graph Contrastive Learning

https://openreview.net/forum?id=wdezvnc9EG

Compressor summary: This paper investigates how different augmentations in graph contrastive learning affect downstream performance by separating classes rather than aligning nodes of the same class, and proposes two methods to verify the findings.


Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models

https://openreview.net/forum?id=wdTiuvd0fR

Compressor summary: The text discusses how large pretrained protein language models can improve protein prediction tasks but shows that their performance does not always scale with pretraining time and suggests a need for better pretraining methods.


Improving Equivariant Graph Neural Networks on Large Geometric Graphs via Virtual Nodes Learning

https://openreview.net/forum?id=wWdkNkUY8k

Compressor summary: FastEGNN is an improved equivariant GNN model for large geometric graphs that uses virtual nodes to approximate the graph and achieve a balance between accuracy and efficiency.


An Interpretable Evaluation of Entropy-based Novelty of Generative Models

https://openreview.net/forum?id=wUgTnf918v

Compressor summary: The paper proposes a spectral method to measure novelty in multi-modal generative models by comparing their sample types with a reference model using the Kernel-based Entropic Novelty score.


Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions

https://openreview.net/forum?id=wTd7dogTsB

Compressor summary: The paper analyzes how well score-based diffusion models can approximate complex distributions, and shows that they can achieve near-optimal performance under mild assumptions.


What’s the score? Automated Denoising Score Matching for Nonlinear Diffusions

https://openreview.net/forum?id=wLoESsgZIq

Compressor summary: The paper introduces a method called local-DSM that allows training generative models with non-Gaussian priors using nonlinear diffusion processes and applies it to image generation and statistical physics problems.


Understanding Heterophily for Graph Neural Networks

https://openreview.net/forum?id=wK9RvVmi7u

Compressor summary: The paper proposes a general random graph model to analyze heterophily patterns in Graph Neural Networks (GNNs) and shows how different factors affect separability in GNNs.


CompeteAI: Understanding the Competition Dynamics of Large Language Model-based Agents

https://openreview.net/forum?id=wGtzp4ZT1n

Compressor summary: Key points: - The paper explores competition dynamics in LLM-based agents using GPT-4 to simulate a virtual town with restaurant and customer agents - Competition encourages transformation, such as new operating strategies for restaurants - Simulation experiments reveal interesting findings that align with existing market and sociological theories Summary: The paper uses GPT-4 to create a competitive environment for LLM-based agents in a virtual town and studies how competition leads to transformation and insight into society.


A Unified Framework for Learning with Nonlinear Model Classes from Arbitrary Linear Samples

https://openreview.net/forum?id=wG2SgnH6Zv

Compressor summary: The paper introduces a framework for learning unknown objects from training data using a given model class, with general applicability to various problems and learning guarantees based on the variation of the model class.


Position: A Call to Action for a Human-Centered AutoML Paradigm

https://openreview.net/forum?id=wELbEYgnmo

Compressor summary: The paper argues for a more human-centered approach to AutoML, focusing on user interaction and collaboration between humans and AutoML systems.


xT: Nested Tokenization for Larger Context in Large Images

https://openreview.net/forum?id=wDDprThYeT

Compressor summary: The paper introduces *xT*, a vision transformer framework that effectively aggregates global context with local details and can model large images without significant losses or memory growth.


InferCept: Efficient Intercept Support for Augmented Large Language Model Inference

https://openreview.net/forum?id=wDDGQabYPQ

Compressor summary: InferCept is a new framework for large language models that reduces resource waste and improves serving speed by efficiently handling interactions with external environments, tools, and agents.


Fast Adversarial Attacks on Language Models In One GPU Minute

https://openreview.net/forum?id=wCMNbdshcY

Compressor summary: Key points: - BEAST is a novel, fast, and interpretable adversarial attack for Language Models - BEAST can jailbreak aligned LMs quickly and with high success rates - BEAST can induce hallucinations in LM chatbots and generate privacy attacks Summary: BEAST is a new attack method that can break the security of Language Models, make them produce wrong or irrelevant outputs, and expose user information.


Position: Future Directions in the Theory of Graph Machine Learning

https://openreview.net/forum?id=wBr5ozDEKp

Compressor summary: The authors call for a better theoretical understanding of graph neural networks (GNNs), considering their expressive power, generalization, and optimization in graph machine learning.


Position: $C^*$-Algebraic Machine Learning $-$ Moving in a New Direction

https://openreview.net/forum?id=w9nxTXuaCc

Compressor summary: The paper proposes a new direction for machine learning research called $C^*$-algebraic ML, which unifies existing learning strategies and constructs a new framework using the mathematical concept of $C^*$-algebra.


Sample-Efficient Multiagent Reinforcement Learning with Reset Replay

https://openreview.net/forum?id=w8ei1o9U5y

Compressor summary: The paper introduces Multiagent Reinforcement Learning with Reset Replay (MARR), a method to improve sample efficiency of MARL in parallel environments using reset strategy and data augmentation.


Learning to Play Atari in a World of Tokens

https://openreview.net/forum?id=w8BnKGFIYN

Compressor summary: DART is a sample-efficient transformer-based method that uses discrete representations for world modeling and learning behavior, achieving high performance on Atari games without look-ahead search.


Kernel Semi-Implicit Variational Inference

https://openreview.net/forum?id=w5oUo0LhO1

Compressor summary: Kernel SIVI (KSIVI) is a method for Bayesian inference that uses kernel tricks to simplify the score matching objective, enabling faster convergence and easier optimization.


Recurrent Early Exits for Federated Learning with Heterogeneous Clients

https://openreview.net/forum?id=w4B42sxNq3

Compressor summary: ReeFL is a recurrent early exit approach for federated learning that fuses features from different sub-models into a single shared classifier, improving performance and privacy.


Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

https://openreview.net/forum?id=w1d9DOGymR

Compressor summary: The paper explores how social choice theory can help address ethical and safety issues in fine-tuning foundation models like GPT-4 using human feedback or principles.


PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning

https://openreview.net/forum?id=w1HdBXSJXn

Compressor summary: The text introduces Persona In-Context Learning (PICLe), a method to customize the behavior of large language models based on desired personality traits using Bayesian inference and likelihood ratio selection criterion.


FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction

https://openreview.net/forum?id=vye4OgLaTy

Compressor summary: The paper presents a method called FlashST that improves traffic prediction by adapting pre-trained models to different scenarios and data distributions using a spatio-temporal prompt network and a distribution mapping mechanism.


On the Hardness of Probabilistic Neurosymbolic Learning

https://openreview.net/forum?id=vxPmrxKe0J

Compressor summary: Probabilistic neurosymbolic models combine neural networks and logical reasoning, and WeightME is an unbiased gradient estimator for them based on model sampling.


Information Flow in Self-Supervised Learning

https://openreview.net/forum?id=vxDjeeBnTu

Compressor summary: The paper analyzes two self-supervised learning methods using matrix mutual information and introduces M-MAE, a new method that improves upon existing ones for visual representation learning.


Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective

https://openreview.net/forum?id=vuMD71R20q

Compressor summary: The paper investigates how removing the square root from adaptive gradient optimizers improves generalization on convolutional architectures, while maintaining performance on transformers, and discusses practical benefits for developing non-diagonal adaptive methods.


Efficient Contrastive Learning for Fast and Accurate Inference on Graphs

https://openreview.net/forum?id=vsy21Xodrt

Compressor summary: GraphECL is a fast and efficient contrastive learning method for graphs that uses an MLP to mimic the computations of a GNN, achieving superior performance and efficiency compared to existing methods.


What is the Long-Run Distribution of Stochastic Gradient Descent? A Large Deviations Analysis

https://openreview.net/forum?id=vsOF7qDNhl

Compressor summary: This paper studies how stochastic gradient descent behaves in non-convex problems and shows that it visits low-energy regions, like the global minimum, more often than high-energy ones, using a thermodynamics analogy.


Kernel Debiased Plug-in Estimation: Simultaneous, Automated Debiasing without Influence Functions for Many Target Parameters

https://openreview.net/forum?id=vq7ITv8a49

Compressor summary: Kernel debiased plug-in estimation is a novel method to estimate multiple target parameters in nonparametric models without relying on influence functions and while maintaining computational efficiency.


Two Heads are Actually Better than One: Towards Better Adversarial Robustness via Transduction and Rejection

https://openreview.net/forum?id=vn92qYjL1F

Compressor summary: The paper proposes a new transductive algorithm that improves robustness against adversarial attacks by applying a reduction technique to construct effective defenses with provable guarantees.


Neural Operators with Localized Integral and Differential Kernels

https://openreview.net/forum?id=vl9GB3fbht

Compressor summary: The authors propose a method for learning differential and integral operators with locally supported kernels to improve the performance of Fourier neural operators for solving partial differential equations.


Graph Automorphism Group Equivariant Neural Networks

https://openreview.net/forum?id=vjkq5fwsj3

Compressor summary: The paper proposes a method to construct neural networks that can learn from data on graphs with complex relations using their actual symmetry group, Aut(G), instead of the symmetric group S_n.


DiffDA: a Diffusion model for weather-scale Data Assimilation

https://openreview.net/forum?id=vhMq3eAB34

Compressor summary: DiffDA is a denoising diffusion model that uses GraphCast and predicted states to generate high-resolution atmospheric data from sparse observations, enabling weather forecasting and climate modeling applications.


Local vs. Global Interpretability: A Computational Complexity Perspective

https://openreview.net/forum?id=veEjiN2w9F

Compressor summary: The authors propose a framework using computational complexity theory to assess local and global perspectives of interpreting ML models, comparing linear models, decision trees, and neural networks.


Comparing Graph Transformers via Positional Encodings

https://openreview.net/forum?id=va3r3hSA6n

Compressor summary: The paper investigates the relationship between absolute and relative positional encodings for graph transformers, finding them equivalent in distinguishing non-isomorphic graphs, but with potential differences depending on node features.


Partial Optimality in the Linear Ordering Problem

https://openreview.net/forum?id=vYYIuJDTHq

Compressor summary: The paper presents partial solutions to the linear ordering problem by constructing maps and testing cost function conditions.


On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

https://openreview.net/forum?id=vXUqOCsbj8

Compressor summary: We analyze the efficiency of modern Hopfield models based on pattern norms and show sub-quadratic variants exist under a computational complexity assumption, with examples and lower bounds provided.


One for All: A Universal Generator for Concept Unlearnability via Multi-Modal Alignment

https://openreview.net/forum?id=vSerUPYFtB

Compressor summary: The text proposes a universal perturbation generator that uses multi-modal pre-trained models to transform image data into text concepts and make it unlearnable, protecting personal information in free internet data.


Achieving Lossless Gradient Sparsification via Mapping to Alternative Space in Federated Learning

https://openreview.net/forum?id=vQmVmMN5ft

Compressor summary: The paper proposes a new space for gradient compression in federated learning, which improves compressibility and allows for higher accuracies with minimal information loss.


Particle Denoising Diffusion Sampler

https://openreview.net/forum?id=vMUnnS4OWC

Compressor summary: The paper proposes a new method to sample from unnormalized probability distributions using a particle-based denoising diffusion scheme with a novel score matching loss, which is more consistent than standard methods.


Stereographic Spherical Sliced Wasserstein Distances

https://openreview.net/forum?id=vLtVGtEz5h

Compressor summary: The paper proposes a fast and parallelizable distance for comparing spherical probability distributions using the stereographic projection and generalized Radon transform, called Stereographic Spherical Sliced Wasserstein (S3W) distance, and evaluates its speed and accuracy in various applications.


Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

https://openreview.net/forum?id=vKtomqlSxm

Compressor summary: Chain of Code is an extension that improves language models' ability to reason by having them write and emulate code for semantic tasks.


Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics

https://openreview.net/forum?id=vJx6fld6l0

Compressor summary: The study presents HEPT, a novel efficient transformer model for large-scale point cloud processing in HEP and astrophysics, using local inductive bias and OR & AND-construction LSH for kernel approximation.


Semantic-Aware Human Object Interaction Image Generation

https://openreview.net/forum?id=vITl6CqIkk

Compressor summary: The paper proposes a framework to improve text-to-image generative models for human-object interaction by refining human pose and interaction boundary regions using guidance from pose quality and boundary information.


Simulation-Based Inference with Quantile Regression

https://openreview.net/forum?id=vGHOFeUQi8

Compressor summary: NQE is a new SBI method that uses conditional quantile regression and spline interpolation to estimate posterior samples and credible regions, with calibration options for limited data or model errors.


Gradient Compressed Sensing: A Query-Efficient Gradient Estimator for High-Dimensional Zeroth-Order Optimization

https://openreview.net/forum?id=vG7YpsJT74

Compressor summary: The paper proposes a new method, GraCe, for nonconvex ZOO in high dimensions, which uses fewer queries and achieves similar convergence rates as previous methods.


Interpreting Equivariant Representations

https://openreview.net/forum?id=vFk9fqXLst

Compressor summary: The paper highlights the importance of considering equivariant models' inductive bias when using latent representations and proposes principles for choosing invariant projections to improve performance in downstream tasks, as shown in molecular graph generation and image classification examples.


UGrid: An Efficient-And-Rigorous Neural Multigrid Solver for Linear PDEs

https://openreview.net/forum?id=vFATIZXlCm

Compressor summary: The paper presents a mathematically rigorous neural solver for linear PDEs that combines U-Net and MultiGrid principles, achieves high accuracy and generalization, and introduces a new residual loss metric for unsupervised training.


Lookbehind-SAM: k steps back, 1 step forward

https://openreview.net/forum?id=vCN5lwcWWE

Compressor summary: The paper proposes Lookbehind, a method that improves the loss-sharpness trade-off in SAM by using multiple ascent steps and linear interpolation, leading to better generalization, robustness, and lifelong learning performance.


Modelling Microbial Communities with Graph Neural Networks

https://openreview.net/forum?id=vBJZ93tvoE

Compressor summary: The authors use graph neural networks (GNNs) to model bacterial communities from their genomes, predicting relative abundance profiles without growth curves, and show generalization to unseen bacteria and different community structures.


Energy-Efficient Gaussian Processes Using Low-Precision Arithmetic

https://openreview.net/forum?id=v9tIJW1fzt

Compressor summary: The authors propose using low-precision floating-point representations to reduce energy consumption in Gaussian process regression, and show that well-conditioned kernel matrices allow for significant energy savings without compromising model performance.


Model Assessment and Selection under Temporal Distribution Shift

https://openreview.net/forum?id=v8MgLJ7kbL

Compressor summary: The paper proposes an adaptive rolling window method for model assessment and selection that works well with changing environments and historical data.


Tabular Insights, Visual Impacts: Transferring Expertise from Tables to Images

https://openreview.net/forum?id=v7I5FtL2pV

Compressor summary: Charms is a method to transfer relevant tabular knowledge to images, enhancing image classification and interpretability.


Balancing Similarity and Complementarity for Federated Learning

https://openreview.net/forum?id=v6tAdeCXKH

Compressor summary: The paper proposes FedSaC, a framework for Federated Learning that balances similarity and complementarity among clients to enhance model performance in heterogeneous and multimodal scenarios.


Adaptive Robust Learning using Latent Bernoulli Variables

https://openreview.net/forum?id=v6eaD7Wekw

Compressor summary: The paper proposes an adaptive method for learning from corrupted data, using latent variables and variational inference to infer the corruption level and improve performance on various machine learning tasks.


Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

https://openreview.net/forum?id=v2o9rRJcEv

Compressor summary: The paper proposes a new method, TDPO-R, that aligns diffusion models with human preferences while minimizing reward overoptimization by exploiting the temporal inductive bias of diffusion models and addressing primacy bias from active neurons in the critic model.


TENG: Time-Evolving Natural Gradient for Solving PDEs With Deep Neural Nets Toward Machine Precision

https://openreview.net/forum?id=v1I4zRAjMb

Compressor summary: The paper presents TENG, a novel method for solving time-dependent PDEs using neural networks with high accuracy and efficiency.


Smoothness Adaptive Hypothesis Transfer Learning

https://openreview.net/forum?id=v0VUsQI5yw

Compressor summary: Smoothness Adaptive Transfer Learning (SATL) is a kernel ridge regression algorithm that adapts to varying and unknown smoothness in transfer learning tasks, achieving minimax optimality and optimal statistical rate.


Delaunay Graph: Addressing Over-Squashing and Over-Smoothing Using Delaunay Triangulation

https://openreview.net/forum?id=uyhjKoaIQa

Compressor summary: The paper proposes a new method to improve GNNs by constructing a graph from features using Delaunay Triangulation, which reduces over-squashing and oversmoothing problems.


RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

https://openreview.net/forum?id=uydQ2W41KO

Compressor summary: RLAIF is an alternative to RLHF that uses preferences from an LLM and achieves comparable or better results than RLHF, overcoming scalability issues.


Understanding the Training Speedup from Sampling with Approximate Losses

https://openreview.net/forum?id=uun4fzaiat

Compressor summary: The paper proposes a greedy method for selecting samples with large approximate losses instead of exact losses to reduce selection overhead and training time, and evaluates it on BERT model training.


Flora: Low-Rank Adapters Are Secretly Gradient Compressors

https://openreview.net/forum?id=uubBZKM99Y

Compressor summary: Flora reduces memory usage for training large neural networks by using random projections and resampling matrices, without sacrificing performance.


Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game

https://openreview.net/forum?id=usUPvQH3XK

Compressor summary: The text proposes a framework that combines large language models with reinforcement learning to create strategic language agents that can overcome intrinsic bias and perform well in complex decision-making tasks like the Werewolf game.


Multi-Region Markovian Gaussian Process: An Efficient Method to Discover Directional Communications Across Multiple Brain Regions

https://openreview.net/forum?id=us6zMORsMe

Compressor summary: The study develops a novel method (MRM-GP) that combines Linear Dynamical System and Gaussian Process techniques to analyze complex interactions between different brain regions, revealing communication patterns and separating them by frequency bands.


Amortized Equation Discovery in Hybrid Dynamical Systems

https://openreview.net/forum?id=uqWfZ23O9g

Compressor summary: AMORE is a new method for learning laws in hybrid dynamical systems that jointly discovers equations and categorizes modes, outperforming existing two-stage approaches.


Towards Compositionality in Concept Learning

https://openreview.net/forum?id=upO8FUwf92

Compressor summary: Compositional Concept Extraction (CCE) is a new method to find high-level concepts in foundation models that are useful for understanding them, and it outperforms existing methods on various datasets and tasks.


Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits

https://openreview.net/forum?id=uog14iBFLA

Compressor summary: The paper proposes a new algorithm for multi-task learning in linear contextual bandits that improves efficiency by using representation learning and provides theoretical guarantees on sample and iteration complexity.


DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design

https://openreview.net/forum?id=uku9r6RROl

Compressor summary: This paper explores how level sampling affects generalization in deep RL agents, proposes data-regularised environment design to improve it, and provides theoretical justification for prioritizing levels based on value loss.


Stability-Informed Initialization of Neural Ordinary Differential Equations

https://openreview.net/forum?id=uiqbnV4msl

Compressor summary: The paper explores how different aspects of neural ODE training impact performance and introduces a new initialization technique based on stability analysis.


Two Tales of Single-Phase Contrastive Hebbian Learning

https://openreview.net/forum?id=ui8ewXg1hV

Compressor summary: The paper proposes and analyzes dual propagation, a local learning algorithm that mimics gradients by using oppositely nudged compartments in artificial neurons, and shows its relation to adversarial robustness and a stable adjoint state method.


Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

https://openreview.net/forum?id=uhHDhVKFMW

Compressor summary: The paper proposes LESS, a cache method that integrates a constant-sized cache with eviction-based methods to improve memory efficiency and retain information in large language models.


On Prompt-Driven Safeguarding for Large Language Models

https://openreview.net/forum?id=ugxGpOEkox

Compressor summary: The authors study how safety prompts affect large language models' behavior and propose a method to optimize them for better safeguarding against harmful queries.


Towards Scalable and Versatile Weight Space Learning

https://openreview.net/forum?id=ug2uoAZ9c2

Compressor summary: SANE is a method to learn task-agnostic representations of neural networks that can embed larger models into a learned space and generate unseen models sequentially.


Sparse Dimensionality Reduction Revisited

https://openreview.net/forum?id=ufgVvFmUom

Compressor summary: The text introduces a sparse Johnson-Lindenstrauss transform that can embed points into fewer dimensions while preserving distances, and improves upon previous sparsity results for certain cases.


Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

https://openreview.net/forum?id=ufCptn28vG

Compressor summary: Isometric Diffusion is a technique that improves diffusion models by learning a disentangled latent space, enabling smoother interpolation, more accurate inversion, and better control over attributes.


Learning Scale-Aware Spatio-temporal Implicit Representation for Event-based Motion Deblurring

https://openreview.net/forum?id=udFZhUgtkI

Compressor summary: Key points: - SASNet is a novel method to restore blurred images with event streams at arbitrary scales - It uses spatial and temporal correlation features from events to generalize at continuous scales - It has two modules: SIRM for local areas and TIRM for global motion blur - A new H2D dataset is introduced to evaluate the method Summary: SASNet restores blurred images with event streams using spatial and temporal correlation features from events, and has two modules for different scales of blur. It is tested on a new dataset.


Distinguishing the Knowable from the Unknowable with Language Models

https://openreview.net/forum?id=ud4GSrqUKI

Compressor summary: The paper explores how to identify and measure the uncertainty in large language models' outputs using small probes and unsupervised methods.


Integrated Hardware Architecture and Device Placement Search

https://openreview.net/forum?id=ucl3B05EsX

Compressor summary: PHAZE optimizes deep learning training by co-optimizing architecture and device placement using novel algorithms, achieving higher throughput than TPUv4 and Spotlight on large language models.


Fast White-Box Adversarial Streaming Without a Random Oracle

https://openreview.net/forum?id=uaExqhJ2Ag

Compressor summary: This paper presents a near-optimal solution for sparse recovery and other tasks in white-box adversarial streaming models without needing a random oracle, using homomorphic encryption schemes.


Language Models with Conformal Factuality Guarantees

https://openreview.net/forum?id=uYISs2tpwP

Compressor summary: The text proposes a framework called conformal factuality that ensures high probability correctness guarantees for language models by connecting language modeling and conformal prediction, which involves making LM outputs less specific to expand uncertainty sets.


Graph-based Forecasting with Missing Data through Spatiotemporal Downsampling

https://openreview.net/forum?id=uYIFQOtb58

Compressor summary: The paper proposes a method for spatiotemporal forecasting that handles missing data by coarsening time series and combining them with attention.


Slicing Mutual Information Generalization Bounds for Neural Networks

https://openreview.net/forum?id=uWNUTRgBso

Compressor summary: The text introduces new generalization bounds for machine learning algorithms using slicing methods, which improve performance and enable control over compressibility in high-dimensional problems.


A Dynamic Algorithm for Weighted Submodular Cover Problem

https://openreview.net/forum?id=uUeXaKLE1I

Compressor summary: The paper studies the submodular cover problem with updates to the set, proposing a randomized algorithm that approximates the optimal solution and minimizes queries per update.


GPTSwarm: Language Agents as Optimizable Graphs

https://openreview.net/forum?id=uTC9AFXIhg

Compressor summary: The authors propose a unified framework for developing, integrating, and improving LLM-based agents using computational graphs and automatic graph optimizers.


Bidirectional Reciprocative Information Communication for Few-Shot Semantic Segmentation

https://openreview.net/forum?id=uRz9GZN17X

Compressor summary: The paper proposes a method for few-shot semantic segmentation that uses feedback and rectification layers to address the impact of intra-class diversity and improve the model's performance.


Random matrix theory improved Fréchet mean of symmetric positive definite matrices

https://openreview.net/forum?id=uQiFsBil3p

Compressor summary: The authors propose a random matrix theory method for estimating Fréchet means on symmetric positive definite matrices in machine learning tasks, which performs better than existing methods with low sample support and many matrices to average.


An LLM Compiler for Parallel Function Calling

https://openreview.net/forum?id=uQ2FUoFjnF

Compressor summary: LLMCompiler is a tool that improves the efficiency, cost, and accuracy of large language models by enabling parallel function calling for complex tasks.


Conformal prediction for multi-dimensional time series by ellipsoidal sets

https://openreview.net/forum?id=uN39Tt9P8b

Compressor summary: The paper proposes a new sequential method for predicting regions of multivariate time series with good coverage and small regions.


Scaling Beyond the GPU Memory Limit for Large Mixture-of-Experts Model Training

https://openreview.net/forum?id=uLpyWQPyF9

Compressor summary: ES-MoE is a new method that improves the efficiency of MoE training by balancing token loads and using host memory, achieving better scalability and throughput than existing frameworks.


Statistical Test for Attention Maps in Vision Transformers

https://openreview.net/forum?id=uLonuOfrwp

Compressor summary: The study proposes a statistical test for ViT's attention mechanisms in computer vision tasks, enabling reliable quantitative evidence for decision-making with controlled error rates, applied to brain image diagnosis.


Less is More: on the Over-Globalizing Problem in Graph Transformers

https://openreview.net/forum?id=uKmcyyrZae

Compressor summary: The paper questions the benefits of global attention in Graph Transformer for graph-structured data, proposes a Bi-Level Global Graph Transformer with Collaborative Training to address the over-globalizing problem, and provides empirical and theoretical evidence of its effectiveness.


BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

https://openreview.net/forum?id=uGoi3nY62g

Compressor summary: The paper introduces a new black-box attack method to exploit vulnerabilities in pixel-wise regression models for applications like autonomous driving and augmented reality, showing its effectiveness against 7 models and Google's online service.


Multi-View Clustering by Inter-cluster Connectivity Guided Reward

https://openreview.net/forum?id=uEx2bSAJu8

Compressor summary: The paper proposes a graph-based multi-view clustering algorithm that infers the unknown number of clusters $K$ using inter-cluster connectivity as a reward in reinforcement learning, outperforming existing methods.


LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

https://openreview.net/forum?id=uDoy7AGvEC

Compressor summary: LayerMerge is a novel depth compression method that jointly prunes convolution layers and activation functions to enhance efficiency without sacrificing performance.


Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

https://openreview.net/forum?id=uDkXoZMzBv

Compressor summary: This paper proposes a method to reduce computational costs of overparameterized models by exploiting low-dimensional structures in data and model parameters, showing its effectiveness for deep matrix completion and language model fine-tuning.


An amortized approach to non-linear mixed-effects modeling based on neural posterior estimation

https://openreview.net/forum?id=uCdcXRuHnC

Compressor summary: The authors propose a machine learning method using neural density estimation to efficiently infer population parameters in non-linear mixed-effects models for heterogeneous populations in various fields.


On the Universality of Volume-Preserving and Coupling-Based Normalizing Flows

https://openreview.net/forum?id=uA3FRvO2DJ

Compressor summary: The paper introduces a new framework to better understand and improve normalizing flows, which are neural networks for transforming one probability distribution into another, by showing their limitations and how to overcome them.


A Global Geometric Analysis of Maximal Coding Rate Reduction

https://openreview.net/forum?id=u9qmjV2khT

Compressor summary: The paper characterizes the properties of local and global optima of the MCR$^2$ objective for learning structured deep representations, showing that it leads to diverse and discriminative solutions.


Empowering Graph Invariance Learning with Deep Spurious Infomax

https://openreview.net/forum?id=u9oSQtujCF

Compressor summary: Key points: - Graph neural networks need to generalize out-of-distribution data - Existing methods assume spurious features and target labels are correlated - New paradigm induces a robust inductive bias based on infomax principle - EQuAD framework disentangles invariant and spurious features - EQuAD shows improved performance in synthetic and real datasets Summary: The paper proposes a novel graph invariance learning method, EQuAD, that leverages the infomax principle to learn invariant features and disentangle them from spurious ones, achieving better generalization on OOD data.


Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity

https://openreview.net/forum?id=u8TZ9gm4im

Compressor summary: The text describes a new image compression algorithm that uses text information to improve both perceptual and pixel-wise quality, while avoiding the drawbacks of existing methods.


Position: Evolving AI Collectives Enhance Human Diversity and Enable Self-Regulation

https://openreview.net/forum?id=u6PeRHEsjL

Compressor summary: The text suggests that large language models can shape each other's behavior and form emergent AI societies, which can have benefits for human society and online environments, and calls for research on these issues.


STELLA: Continual Audio-Video Pre-training with SpatioTemporal Localized Alignment

https://openreview.net/forum?id=u4VR3WBH7a

Compressor summary: Key points: - The paper proposes a new method for continual audio-video pre-training that addresses two challenges: sparse spatio-temporal correlation and multimodal correlation overwriting. - The method uses Localized Patch Importance Scoring and Replay-guided Correlation Assessment to select relevant patches for pre-training. - The method improves performance in zero-shot retrieval tasks and reduces memory consumption. Summary: The paper presents a new continual audio-video pre-training method that uses patch importance scoring and correlation assessment to select relevant patches, improving retrieval performance and memory efficiency.


Improving Antibody Humanness Prediction using Patent Data

https://openreview.net/forum?id=u26c52rxZC

Compressor summary: The study uses patent data to improve antibody humanness prediction by training an encoder with contrastive learning and cross-entropy loss.


Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

https://openreview.net/forum?id=u09gadH3BU

Compressor summary: The paper introduces any-precision LLM, a method to compress large language models and deploy them efficiently using varying bit-widths without sacrificing quality or performance.


Multi-Factor Adaptive Vision Selection for Egocentric Video Question Answering

https://openreview.net/forum?id=u00dmbI8Db

Compressor summary: The MFAS framework improves egocentric video question answering by enhancing small object recognition, suppressing noise, and aggregating visual semantics guided by questions.


Masked Face Recognition with Generative-to-Discriminative Representations

https://openreview.net/forum?id=tya725xlZ3

Compressor summary: The paper proposes a deep network that learns generative-to-discriminative representations for masked face recognition, using synthetic masked faces for pretraining.


Asymmetry in Low-Rank Adapters of Foundation Models

https://openreview.net/forum?id=txRZBD8tBV

Compressor summary: The paper studies the asymmetry in importance of low-rank adapter matrices during fine-tuning and shows that updating B matrix is more effective than A matrix, leading to parameter savings and improved generalization.


Bivariate Causal Discovery using Bayesian Model Selection

https://openreview.net/forum?id=twm7qPVX1F

Compressor summary: The paper proposes a new Bayesian method to identify causal directions in Markov equivalent structures, which works better than maximum likelihood methods and can handle realistic assumptions.


Faithfulness Measurable Masked Language Models

https://openreview.net/forum?id=tw1PwpuAuN

Compressor summary: The authors propose a novel method for measuring the faithfulness of NLP model explanations by incorporating token masking into a fine-tuning process that makes it in-distribution and improves importance measures.


DNCs Require More Planning Steps

https://openreview.net/forum?id=tu5fCCuua2

Compressor summary: This paper studies how the amount of computation time and memory affects the performance of a neural solver called Differentiable Neural Computer (DNC) on various algorithms, and finds that limiting its planning steps can lead to poor generalization and stability.


Pairwise Alignment Improves Graph Domain Adaptation

https://openreview.net/forum?id=ttnbM598vZ

Compressor summary: Pairwise Alignment (Pair-Align) is a novel graph domain adaptation method that handles shifts in features, labels, and connecting patterns using edge weights and label weights to recalibrate node influence and adjust classification loss.


Learning to Infer Generative Template Programs for Visual Concepts

https://openreview.net/forum?id=ttaTyweIr1

Compressor summary: The paper presents a neurosymbolic system that learns to infer general-purpose programs from visual datasets to perform various tasks in different domains.


How to Escape Sharp Minima with Random Perturbations

https://openreview.net/forum?id=tpYHbEl7P1

Compressor summary: The text discusses how to define and find flat minima in optimization algorithms for machine learning applications using the trace of the Hessian as a measure of flatness.


Diffusion Posterior Sampling is Computationally Intractable

https://openreview.net/forum?id=tp6ruPIfIV

Compressor summary: The paper shows that posterior sampling for inpainting and other tasks is computationally intractable even with fast unconditional sampling, using cryptographic assumptions.


Rethinking Adversarial Robustness in the Context of the Right to be Forgotten

https://openreview.net/forum?id=tmUorldOWN

Compressor summary: This paper explores a new security vulnerability in machine unlearning methods, proposes an attack that reduces adversarial robustness, and shows its potential for enhancing model stealing attacks.


Offline Actor-Critic Reinforcement Learning Scales to Large Models

https://openreview.net/forum?id=tl2qmO5kpD

Compressor summary: Offline actor-critic reinforcement learning can scale to large models like transformers and outperform behavioral cloning for multi-task training on continuous control tasks with sub-optimal data.


Keypoint-based Progressive Chain-of-Thought Distillation for LLMs

https://openreview.net/forum?id=tgsSKziIEa

Compressor summary: KPOD is a framework that improves language model reasoning transfer by weighting tokens and using progressive distillation.


Batch Singular Value Polarization and Weighted Semantic Augmentation for Universal Domain Adaptation

https://openreview.net/forum?id=teteOa9nJ9

Compressor summary: BSP-WSA is a novel method for universal domain adaptation that uses an adversarial classifier, singular value decomposition, and weighted semantic augmentation to handle category shift between domains.


MD tree: a model-diagnostic tree grown on loss landscape

https://openreview.net/forum?id=teHPKqjX8q

Compressor summary: The paper proposes a diagnosis method for neural networks based on loss landscape metrics, which outperforms validation-based approaches in identifying the source of model failure.


Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss

https://openreview.net/forum?id=tdomF3PW6A

Compressor summary: The paper proposes Convex Concave Loss (CCL), a method to increase the loss variance of training data and defend against membership inference attacks by reducing the convexity of loss functions with a concave term.


Connecting the Dots: Is Mode-Connectedness the Key to Feasible Sample-Based Inference in Bayesian Neural Networks?

https://openreview.net/forum?id=tc3Nmcpmnx

Compressor summary: The text discusses how to improve sample-based inference for Bayesian neural networks by understanding the relationship between weight and function space, and provides guidelines and an effective solution with competitive performance and uncertainty quantification.


ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

https://openreview.net/forum?id=tVwzR1myUp

Compressor summary: The Continuum Physical Dataset (ContPhy) is a new benchmark for testing AI models' ability to reason about diverse physical properties and dynamics of soft-bodied objects and other continuum phenomena, revealing their current limitations and inspiring improvements in perception and reasoning.


Finite Time Logarithmic Regret Bounds for Self-Tuning Regulation

https://openreview.net/forum?id=tTtSnpH4fc

Compressor summary: The paper proposes a new algorithm, PIECE, that achieves finite-time logarithmic regret bounds for self-tuning regulation problem and improves initial transient performance.


Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency

https://openreview.net/forum?id=tTq3qMkJ8w

Compressor summary: The paper proposes CooK, a model that uses co-occurrence knowledge and TF-$l$-IDF to improve scene graph generation, achieving better performance and generalization than existing methods.


Latent Noise Segmentation: How Neural Noise Leads to the Emergence of Segmentation and Grouping

https://openreview.net/forum?id=tSjyKR8WIf

Compressor summary: The paper proposes that neural noise can help humans and AI models group and segment images without supervision, leading to better performance on perceptual grouping tasks.


Barrier Algorithms for Constrained Non-Convex Optimization

https://openreview.net/forum?id=tRESfzWFtf

Compressor summary: The paper presents new interior-point methods for non-convex optimization with constraints that have better global complexity than existing methods.


PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition

https://openreview.net/forum?id=tQPkzTdaaN

Compressor summary: PARDEN is a method that uses a large language model as a safeguard to detect and prevent jailbreaks by asking it to repeat its own outputs, achieving better results than existing approaches.


Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency

https://openreview.net/forum?id=tOO6PD3kYP

Compressor summary: The paper proposes and analyzes a Bayesian optimization method using Gaussian Process models with random exploration, which achieves optimal error rates, has computational advantages, and partially resolves an open problem.


Fast Sampling-Based Sketches for Tensors

https://openreview.net/forum?id=tMkPL7Tiul

Compressor summary: The text presents a new method for creating sketches that can be applied to rank one tensors in two or three modes, achieving faster computation times by using fast convolution and random subsets of tensor entries.


Zero-Shot Reinforcement Learning via Function Encoders

https://openreview.net/forum?id=tHBLwSYnLf

Compressor summary: The paper proposes the function encoder, an algorithm that helps reinforcement learning agents transfer between related tasks by providing a coherent vector representation of the reward or transition function.


A Touch, Vision, and Language Dataset for Multimodal Alignment

https://openreview.net/forum?id=tFEOOH9eH0

Compressor summary: The text introduces a new dataset of visiontouch pairs with English labels and presents a TVL model that improves tactile-vision-language alignment over existing models.


Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference

https://openreview.net/forum?id=tDRYrAkOB7

Compressor summary: Dynamic Memory Compression (DMC) is a method to compress the cache of key-value representations for past tokens in large language models, improving generation efficiency and allowing them to fit longer contexts and larger batches within any given memory budget.


SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models

https://openreview.net/forum?id=tDMlQkJRhZ

Compressor summary: SPHINX-X is a large language model developed upon SPHINX, with improved architecture and training efficiency, trained on diverse multi-domain and multi-modal data, and obtaining strong correlation between performance and data/parameter scales.


MADA: Meta-Adaptive Optimizers Through Hyper-Gradient Descent

https://openreview.net/forum?id=tASXcrMekp

Compressor summary: MADA is a unified optimizer framework that learns the best adaptive optimizer for deep learning tasks during training, and outperforms Adam and other popular optimizers on vision and language tasks.


Task-aware Orthogonal Sparse Network for Exploring Shared Knowledge in Continual Learning

https://openreview.net/forum?id=tABvuya05B

Compressor summary: The paper proposes a method for continuous learning that partitions the network into three parts to share knowledge between old and new tasks, improving performance and preventing forgetting.


Larimar: Large Language Models with Episodic Memory Control

https://openreview.net/forum?id=t8mt4YrPsq

Compressor summary: Larimar is a brain-inspired architecture that enhances LLMs with a distributed episodic memory for efficient, accurate, and flexible knowledge updates without re-training or fine-tuning.


The Computational Complexity of Finding Second-Order Stationary Points

https://openreview.net/forum?id=t8WDBcegae

Compressor summary: The paper shows that finding approximate second-order stationary points in non-convex optimization problems is as hard as finding first-order stationary points and depends on the domain complexity, contrary to previous results.


Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

https://openreview.net/forum?id=t82Y3fmRtk

Compressor summary: The paper presents **R**$^3$, a method that uses outcome supervision and reverse curriculum to improve large language models' reasoning skills.


TimeX++: Learning Time-Series Explanations with Information Bottleneck

https://openreview.net/forum?id=t6dBpwkbea

Compressor summary: The paper proposes TimeX++, an explanation framework that uses information bottleneck principle to produce high-quality explanations for deep learning models operating on time series data, and evaluates it on synthetic and real-world datasets.


Compositional Few-Shot Class-Incremental Learning

https://openreview.net/forum?id=t4908PyZxs

Compressor summary: The paper proposes a cognitive-inspired method for few-shot class-incremental learning that uses compositional learning, set similarities, and primitive reusability to recognize novel classes with fewer samples, achieving improved performance and interpretability.


Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation

https://openreview.net/forum?id=t3SEfoTaYQ

Compressor summary: Coprocessor Actor Critic is a novel reinforcement learning method for adaptive brain stimulation that learns how to act and induce optimal actions in the world with less samples and higher success than traditional approaches.


Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces

https://openreview.net/forum?id=szxtVHOh0C

Compressor summary: Key points: - Surface-VQMAE is a novel surface-based unsupervised learning algorithm for protein function analysis - It uses Transformer architecture and vector quantization to capture patch-level relations and enforce discrete posterior distribution - It shows effectiveness in various scenarios such as binding site scoring, affinity prediction, and mutant effect estimation Summary: Surface-VQMAE is a new algorithm that uses protein surface information and unsupervised learning to analyze protein functions, achieving good results in different applications.


DataFreeShield: Defending Adversarial Attacks without Training Data

https://openreview.net/forum?id=szvKJgmubh

Compressor summary: Key points: - The text is about achieving adversarial robustness without real data - Existing methods assume accessibility to original data and fail in this scenario - DataFreeShield proposes two solutions: surrogate dataset generation and adversarial training - DataFreeShield outperforms baselines and shows the first data-free solution Summary: The text presents DataFreeShield, a method that achieves adversarial robustness without real data by generating surrogate datasets and training with them, outperforming existing methods.


Advancing Dynamic Sparse Training by Exploring Optimization Opportunities

https://openreview.net/forum?id=szRHR9XGrY

Compressor summary: The paper introduces BiDST, a novel framework for dynamic sparse training that optimizes both weights and masks simultaneously, achieving better accuracy, faster speed, and reduced overhead compared to traditional methods.


Dirichlet Flow Matching with Applications to DNA Sequence Design

https://openreview.net/forum?id=syXFAVqx85

Compressor summary: Dirichlet flow matching on the simplex improves sequence generation speed and quality over autoregressive models, especially for complex DNA sequences.


IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation

https://openreview.net/forum?id=swTG6xju8O

Compressor summary: The paper proposes IM-3D, a text-to-3D model that uses video generators, 3D reconstruction with Gaussian splatting, and reduces evaluation times to create high-quality 3D outputs efficiently.


Sparser, Better, Deeper, Stronger: Improving Static Sparse Training with Exact Orthogonal Initialization

https://openreview.net/forum?id=svm53KQAtN

Compressor summary: EOI is a novel sparse orthogonal initialization scheme that provides exact orthogonality and enables creation of layers with arbitrary densities, outperforming common sparse initialization techniques for static sparse training.


Accelerated Speculative Sampling Based on Tree Monte Carlo

https://openreview.net/forum?id=stMhi1Sn2G

Compressor summary: Accelerated Speculative Sampling (ASpS) is a new algorithm that improves inference speed of large language models by using Tree Monte Carlo methods to generate multiple tokens in one step and finding better global maximum couplings.


Transferable Facial Privacy Protection against Blind Face Restoration via Domain-Consistent Adversarial Obfuscation

https://openreview.net/forum?id=st2BTty53v

Compressor summary: The paper proposes a new method for protecting anonymity in facial recognition by creating adversarial obfuscation that fools face restoration techniques, which can restore pixelated faces with high accuracy.


ULAREF: A Unified Label Refinement Framework for Learning with Inaccurate Supervision

https://openreview.net/forum?id=ssFMq35UUY

Compressor summary: The paper proposes ULAREF, a unified framework for learning with inaccurate supervision, which refines labels using global reliability detection and local enhancement with a consistency loss.


ReLUs Are Sufficient for Learning Implicit Neural Representations

https://openreview.net/forum?id=srejp9uOx7

Compressor summary: The authors propose using ReLU neurons with constraints in deep neural networks to learn implicit neural representations, showing their effectiveness and versatility in various tasks.


Ambiguity-Aware Abductive Learning

https://openreview.net/forum?id=sqv2xP8rfb

Compressor summary: The paper proposes a new method for abductive learning, called Ambiguity-Aware Abductive Learning (A$^3$BL), which improves the existing approach by evaluating all potential candidates and their probabilities to better handle uncertainty in the knowledge base.


On a Combinatorial Problem Arising in Machine Teaching

https://openreview.net/forum?id=spOpHW1No2

Compressor summary: This paper proves that the worst case teaching dimension occurs when using binary representations of numbers in machine teaching.


Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset

https://openreview.net/forum?id=snhurpZt63

Compressor summary: The paper introduces a new underwater salient instance segmentation dataset (USIS10K) and a method (USIS-SAM) based on Segment Anything Model that uses visual prompts to improve segmentation accuracy in underwater scenes.


Revisiting Context Aggregation for Image Matting

https://openreview.net/forum?id=sjJZHPV9Id

Compressor summary: The paper proposes AEMatter, a new matting network with a Hybrid-Transformer backbone and appearance-enhanced axis-wise learning blocks that achieves superior performance compared to existing methods.


Online Learning under Budget and ROI Constraints via Weak Adaptivity

https://openreview.net/forum?id=shzEkKPrsn

Compressor summary: The paper proposes a new online learning framework that works without strict feasibility assumptions and proves its no-regret guarantees for bidding in ad auctions.


Fast Algorithms for Hypergraph PageRank with Applications to Semi-Supervised Learning

https://openreview.net/forum?id=sfQH4JJ4We

Compressor summary: The paper presents fast and scalable algorithms for hypergraph models in semi-supervised learning that capture higher-order relations better than graph-based methods and improve clustering on categorical data.


In value-based deep reinforcement learning, a pruned network is a good network

https://openreview.net/forum?id=seo9V9QRZp

Compressor summary: Gradual magnitude pruning improves value-based deep reinforcement learning agents' parameter effectiveness and performance with minimal parameters.


Scaling Laws for the Value of Individual Data Points in Machine Learning

https://openreview.net/forum?id=scSB9RynSd

Compressor summary: The text discusses how machine learning models' performance improves with more data and how individual data points contribute differently to model accuracy depending on dataset size, suggesting ways to use this knowledge for data valuation and selection.


Bayesian Optimization of Function Networks with Partial Evaluations

https://openreview.net/forum?id=scMAQ3mFAA

Compressor summary: Cost-aware node selection improves Bayesian optimization of function networks by reducing evaluation costs.


Projecting Molecules into Synthesizable Chemical Spaces

https://openreview.net/forum?id=scFlbJQdm1

Compressor summary: The authors propose a novel framework that uses a transformer-based model to generate new chemical structures while ensuring synthetic accessibility through postfix notations of synthesis pathways.


APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference

https://openreview.net/forum?id=sb81Xl50JG

Compressor summary: APT is a method that adaptively prunes and tunes language model parameters for improved training and inference efficiency, maintaining high performance with significantly reduced parameters and faster fine-tuning.


Pruned Pivot: Correlation Clustering Algorithm for Dynamic, Parallel, and Local Computation Models

https://openreview.net/forum?id=saP7s0ZgYE

Compressor summary: The paper proposes a fast and efficient algorithm for correlation clustering in dynamic graphs, with performance guarantees similar to the well-known Pivot algorithm.


Physics-Informed Neural Network Policy Iteration: Algorithms, Convergence, and Verification

https://openreview.net/forum?id=sZla6SnooP

Compressor summary: The paper proposes model-based policy iteration algorithms for nonlinear optimal control problems using neural networks to solve PDEs, with convergence guarantees and outperforming traditional methods.


Language Models as Semantic Indexers

https://openreview.net/forum?id=sYeioWoF9u

Compressor summary: The paper proposes LMIndexer, a self-supervised framework that uses a generative language model to learn semantic identifiers for information retrieval tasks, addressing challenges such as sequential discrete ID and semantic supervision deficiency.


Private and Federated Stochastic Convex Optimization: Efficient Strategies for Centralized Systems

https://openreview.net/forum?id=sTVSyqD6XX

Compressor summary: The paper proposes methods for preserving privacy in Federated Learning with centralized systems, ensuring Differential Privacy while maintaining optimal convergence rates and linear computational complexity.


Low-Cost High-Power Membership Inference Attacks

https://openreview.net/forum?id=sT7UJh5CTc

Compressor summary: The paper proposes a new statistical test for robustly detecting if a data point was used to train a model, using reference models and population data samples, with low computational cost and high accuracy.


Private Heterogeneous Federated Learning Without a Trusted Server Revisited: Error-Optimal and Communication-Efficient Algorithms for Convex Losses

https://openreview.net/forum?id=sSAEhcdB9N

Compressor summary: This paper proposes novel federated learning algorithms that protect data privacy, perform well with heterogeneous data, and require fewer communication rounds than existing methods.


PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning

https://openreview.net/forum?id=sOyJSNUrzQ

Compressor summary: The paper presents a generic framework called ReED for analyzing various knowledge graph representation learning (KGRL) models and provides theoretical generalization bounds for them, which can guide their practical design.


Optimal Eye Surgeon: Finding image priors through sparse generators at initialization

https://openreview.net/forum?id=sO5qtpvsUZ

Compressor summary: The Optimal Eye Surgeon (OES) framework prunes and trains deep image generator networks by adaptively underparameterizing them, which helps resist overfitting to noise and improves image restoration tasks.


Stochastic Quantum Sampling for Non-Logconcave Distributions and Estimating Partition Functions

https://openreview.net/forum?id=sNjxqSnXFO

Compressor summary: The paper presents quantum algorithms for sampling from non-logconcave probability distributions and estimating their partition functions, overcoming challenges by using a reference reversible Markov chain and showing polynomial speedups.


CaRiNG: Learning Temporal Causal Representation under Non-Invertible Generation Process

https://openreview.net/forum?id=sLZzFTMWSt

Compressor summary: CaRiNG is a method to identify causal processes in sequential data with non-invertible generation, using temporal context and an identifiability theory.


Sample Complexity Bounds for Estimating Probability Divergences under Invariances

https://openreview.net/forum?id=sKjcrAC4eZ

Compressor summary: The paper studies how Lie group invariances reduce sample complexity and improve convergence rates when estimating various distances and density estimation problems in machine learning models.


Large Language Models are Geographically Biased

https://openreview.net/forum?id=sHtIStlg0v

Compressor summary: The authors study geographic biases in large language models and show that they are correlated with socioeconomic conditions and exhibit significant variation across models.


Language-Driven Cross-Modal Classifier for Zero-Shot Multi-Label Image Recognition

https://openreview.net/forum?id=sHswzNWUW2

Compressor summary: The paper proposes a new language-driven framework for zero-shot multi-label image recognition using CLIP, LLMs, and cross-modal mapping without annotated images during training.


Tell, Don't Show: Language Guidance Eases Transfer Across Domains in Images and Videos

https://openreview.net/forum?id=sFN49CfklF

Compressor summary: LaGTran is a text supervision framework that transfers discriminative knowledge from labeled source to unlabeled target data with domain gaps, outperforming prior approaches on image and video datasets.


Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations

https://openreview.net/forum?id=sF9epWkNUG

Compressor summary: VCNeFs are a new method to solve time-dependent PDEs using neural fields, attention mechanisms, and conditional learning, addressing limitations of previous Transformer-based approaches.


LASER: Linear Compression in Wireless Distributed Optimization

https://openreview.net/forum?id=sDjszMb2Ir

Compressor summary: LASER is a new compression scheme that efficiently transmits low-rank gradients over noisy channels, achieving better performance than existing methods on practical machine learning tasks.


Compressing Large Language Models by Joint Sparsification and Quantization

https://openreview.net/forum?id=sCGRhnuMUJ

Compressor summary: The paper proposes JSQ, a novel model compression technique for large language models that integrates sparsification and quantization to achieve significant computation reduction without performance degradation.


Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data

https://openreview.net/forum?id=sBJNokmYuV

Compressor summary: CPL is a method that generates refined candidate pseudolabels for vision-language models to improve fine-tuning performance in downstream tasks.


Understanding and Diagnosing Deep Reinforcement Learning

https://openreview.net/forum?id=s9RKqT7jVM

Compressor summary: The text discusses concerns about the stability of deep neural policies in various settings, and introduces a method to analyze and improve these policies using robust training techniques.


WARM: On the Benefits of Weight Averaged Reward Models

https://openreview.net/forum?id=s7RDnNUJy6

Compressor summary: Weight Averaged Reward Models (WARM) is a method to prevent reward hacking in large language models by averaging multiple fine-tuned reward models, improving quality and alignment of predictions.


Fundamental Benefit of Alternating Updates in Minimax Optimization

https://openreview.net/forum?id=s6ZAT8MLKU

Compressor summary: The paper analyzes the convergence rates of two variants of the GDA algorithm for minimax optimization problems, shows that Alternating-GDA is faster and has better performance than Simultaneous-GDA, and proposes a new algorithm called Alex-GDA that improves upon both.


Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments

https://openreview.net/forum?id=s5PLISyNyP

Compressor summary: The paper proposes a method to estimate the CATE from observational data in multiple environments with violations of standard causal assumptions by using them as instrumental variables and combining meta-learners with machine learning models.


High-Performance Temporal Reversible Spiking Neural Networks with $\mathcal{O}(L)$ Training Memory and $\mathcal{O}(1)$ Inference Cost

https://openreview.net/forum?id=s4h6nyjM9H

Compressor summary: T-RevSNN is a novel spiking neural network architecture that reduces memory, training time, and inference energy costs by temporally reversible interactions and redesigned input encoding.


MS-TIP: Imputation Aware Pedestrian Trajectory Prediction

https://openreview.net/forum?id=s4Hy0L4mml

Compressor summary: The paper proposes a novel approach (MS-TIP) that uses transformers, multi-scale hypergraphs, and scenic attention to predict pedestrian trajectories accurately even when the observed sequences are incomplete.


Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

https://openreview.net/forum?id=s4EYBJ30WY

Compressor summary: The paper introduces spectral imbalance as a source of class bias in feature learning and proposes a framework to study, compare, and mitigate it using 11 pre-trained encoders.


In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

https://openreview.net/forum?id=s3e8poX3kb

Compressor summary: The authors study why large language models make factual errors, find a pattern related to sharpness of context activations, and propose an entropy-based metric to improve the quality of generated text by adjusting token prediction distributions.


LoRA Training in the NTK Regime has No Spurious Local Minima

https://openreview.net/forum?id=s1sdx6vNsU

Compressor summary: This paper analyzes how low-rank adaptation (LoRA) helps fine-tune large language models efficiently and effectively by avoiding spurious minima and achieving good generalization.


DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation

https://openreview.net/forum?id=s0UDX7Kswl

Compressor summary: DiffAug is a novel unsupervised contrastive learning technique that uses diffusion models to generate positive samples based on semantic encoders, improving representation ability in various domains.


Don't trust your eyes: on the (un)reliability of feature visualizations

https://openreview.net/forum?id=s0Jvdolv2I

Compressor summary: Feature visualizations are not reliable for explaining how neural networks process natural images because they can be easily fooled by arbitrary patterns and do not match standard input processing.


SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation

https://openreview.net/forum?id=ryDa4mS18V

Compressor summary: SAM-E is a novel architecture for robot manipulation that uses a vision foundation model for scene understanding and sequence imitation for long-term action reasoning, achieving superior performance and efficiency.


Differentiable Annealed Importance Sampling Minimizes The Jensen-Shannon Divergence Between Initial and Target Distribution

https://openreview.net/forum?id=rvaN2P1rvC

Compressor summary: DAIS optimizes a symmetrized KL divergence between initial and target distribution, acting as a variational inference method that often improves uncertainty estimates.


Dynamic Facility Location in High Dimensional Euclidean Spaces

https://openreview.net/forum?id=rucbIsWoEV

Compressor summary: The paper presents the first dynamic algorithm for facility location in high-dimensional spaces, achieving good quality and stable solutions with sub-linear update times.


Sampling in Unit Time with Kernel Fisher-Rao Flow

https://openreview.net/forum?id=rtyqBfcg8j

Compressor summary: The paper proposes a new method to sample from unnormalized probability densities using a mean-field ODE and interacting particle systems that are gradient-free, closed-form, and require only sampling from a reference density and computing its ratio to the target density.


AlphaFold Meets Flow Matching for Generating Protein Ensembles

https://openreview.net/forum?id=rs8Sh2UASt

Compressor summary: Key points: - A flow-based generative modeling approach for learning and sampling protein conformational landscapes - AlphaFlow and ESMFlow are sequence-conditioned generative models of protein structure based on AlphaFold and ESMFold - Outperforms AlphaFold with MSA subsampling and captures conformational flexibility, positional distributions, and higher-order ensemble observables - Can diversify a static PDB structure faster than replicate MD trajectories Summary: The authors develop AlphaFlow and ESMFlow, generative models of protein structure that use flow methods to improve accuracy and diversity, and show their potential for simulating conformational flexibility and equilibrium properties.


Designing Decision Support Systems using Counterfactual Prediction Sets

https://openreview.net/forum?id=rqyXubsBhH

Compressor summary: This paper proposes a new design for decision support systems using prediction sets and online learning, which improves the performance by giving less agency to human experts.


Listening to the noise: Blind Denoising with Gibbs Diffusion

https://openreview.net/forum?id=rmEgJ7bhuZ

Compressor summary: The paper introduces Gibbs Diffusion, a method that enables blind denoising by simultaneously sampling the signal and noise parameters using a conditional diffusion model and a Monte Carlo sampler.


Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness

https://openreview.net/forum?id=rkYOxLLv2x

Compressor summary: The paper evaluates large kernel convolutional neural networks (CNNs) and shows they can be as or more robust than vision transformers (ViTs), revealing novel insights into their source of robustness.


Reducing Item Discrepancy via Differentially Private Robust Embedding Alignment for Privacy-Preserving Cross Domain Recommendation

https://openreview.net/forum?id=rk4kmL8aOY

Compressor summary: The paper introduces RidCDR, a model for privacy-preserving cross-domain recommendation that uses embedding alignment to share knowledge without overlapping data.


Position: Intent-aligned AI Systems Must Optimize for Agency Preservation

https://openreview.net/forum?id=rfvgdfd1K9

Compressor summary: The text discusses the need for agency-preserving AI-human interactions that protect humans' long-term control, rather than just aligning AI systems with human intentions.


SaVeR: Optimal Data Collection Strategy for Safe Policy Evaluation in Tabular MDP

https://openreview.net/forum?id=reB9FFAaKw

Compressor summary: The paper studies how to collect data safely for policy evaluation in tabular MDPs and proposes a safe oracle algorithm called SaVeR.


A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

https://openreview.net/forum?id=re6es2atbl

Compressor summary: The paper proposes a new assumption to analyze federated learning convergence rate under data heterogeneity and shows that it can reduce the impact of local Lipschitz constant and improve performance.


Moreau Envelope for Nonconvex Bi-Level Optimization: A Single-Loop and Hessian-Free Solution Strategy

https://openreview.net/forum?id=rZD9hV0Bc4

Compressor summary: This paper proposes an efficient gradient-based algorithm for large-scale nonconvex Bi-Level Optimization problems in machine learning, with theoretical guarantees and experimental validation.


Piecewise Constant and Linear Regression Trees: An Optimal Dynamic Programming Approach

https://openreview.net/forum?id=rXnBvu5D7i

Compressor summary: The authors develop and test optimal dynamic programming methods to improve the scalability and performance of regression trees, a machine learning model that represents complex relationships.


Coactive Learning for Large Language Models using Implicit User Feedback

https://openreview.net/forum?id=rVWsTjMW1m

Compressor summary: Coactive learning uses users' implicit edits to improve large language models without supervised training, enabling personalized LLMs.


Position: Is machine learning good or bad for the natural sciences?

https://openreview.net/forum?id=rU8o0QQCy0

Compressor summary: Machine learning methods can be valuable for causal inference and emulating simulations in natural sciences, but they also introduce unwanted biases in some cases.


Decomposing and Editing Predictions by Modeling Model Computation

https://openreview.net/forum?id=rTBR0eqE4G

Compressor summary: Component modeling helps to understand how machine learning models make predictions by breaking them down into their parts, and COAR is a tool that estimates the impact of each part on the prediction.


Effective Federated Graph Matching

https://openreview.net/forum?id=rSfzchjIYu

Compressor summary: The paper presents UFGM, an unsupervised federated graph matching algorithm that uses graphlet features and trust region optimization to match node pairs across clients while preserving privacy.


Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning

https://openreview.net/forum?id=rReWhol66R

Compressor summary: The paper proposes a novel representation-based approach to measure the domain gap and filter data for cross-domain offline reinforcement learning, achieving superior performance with less target data.


Expressivity and Generalization: Fragment-Biases for Molecular GNNs

https://openreview.net/forum?id=rPm5cKb1VB

Compressor summary: The authors propose a new GNN architecture and a fragmentation method with infinite vocabulary that improves expressiveness and performance on molecular property prediction compared to recent advances in higher-order GNNs, using the Fragment-WL test for theoretical analysis.


Highway Value Iteration Networks

https://openreview.net/forum?id=rORsGuE2hV

Compressor summary: The paper proposes a new planning algorithm called highway value iteration network that combines value iteration with skip connections, exploration, and safety checks to enable effective long-term planning with deep neural networks.


Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis

https://openreview.net/forum?id=rMV86cAOh6

Compressor summary: The text introduces SCDM, a robust conditional diffusion model that generates realistic images from noisy semantic maps by stochastically perturbing the labels through Label Diffusion and using a class-wise noise schedule.


Transport of Algebraic Structure to Latent Embeddings

https://openreview.net/forum?id=rK6AZem0hX

Compressor summary: The paper proposes a method to learn consistent operations from latent embeddings by mapping them to a mirrored algebra on Euclidean space.


Closing the Gap: Achieving Global Convergence (Last Iterate) of Actor-Critic under Markovian Sampling with Neural Network Parametrization

https://openreview.net/forum?id=rJxFvAs7pq

Compressor summary: The paper provides a comprehensive theoretical analysis of Actor-Critic algorithms, considering practical aspects such as multi-layer neural networks, Markovian sampling, continuous state-action spaces, last iterate performance, and global optimality, and shows their global convergence with sample complexity bounds.


Learning to Compile Programs to Neural Networks

https://openreview.net/forum?id=rJti61Uere

Compressor summary: A neural surrogate compiler uses a hypernetwork to generate efficient neural networks that mimic the behavior of programs, improving data efficiency and training speed compared to traditional methods.


In-context Learning on Function Classes Unveiled for Transformers

https://openreview.net/forum?id=rJkGOARXns

Compressor summary: The paper investigates how transformer-based neural sequence models can learn different types of functions in-context by approximating them with neural networks and studying their pre-training and activation functions.


Linguistic Calibration of Long-Form Generations

https://openreview.net/forum?id=rJVjQSQ8ye

Compressor summary: The paper proposes a method to train language models to generate long-form text with calibrated confidence statements, which helps users make better decisions based on the model's predictions.


Exploration and Anti-Exploration with Distributional Random Network Distillation

https://openreview.net/forum?id=rIrpzmqRBk

Compressor summary: The paper proposes DRND, a modified RND algorithm that improves exploration in deep reinforcement learning by distilling a distribution of random networks and allocating bonuses more precisely.


GNNs Also Deserve Editing, and They Need It More Than Once

https://openreview.net/forum?id=rIc9adYbH2

Compressor summary: Key points: - The paper proposes SEED-GNN, a GNN model editing method that is practical and effective. - The main challenge of GNN editing is sequential editing robustness, which is lacking in existing methods due to overfitting. - The paper also defines the task paradigm of GNN editing and hopes to inspire more research in this area. Summary: The paper introduces SEED-GNN, a novel method for editing graph neural networks (GNNs) that can handle multiple errors sequentially without overfitting, and formalizes the GNN editing task paradigm.


Visual Representation Learning with Stochastic Frame Prediction

https://openreview.net/forum?id=rI6lxIX0uX

Compressor summary: The paper proposes a stochastic frame prediction model for learning image representations that captures uncertainty and temporal information, and shows its effectiveness on various video-based tasks.


Restoring balance: principled under/oversampling of data for optimal classification

https://openreview.net/forum?id=rHylzxK3HU

Compressor summary: The paper analyzes how class imbalance affects generalization curves for linear classifiers and proposes mixed under/oversampling strategies to improve performance.


PASOA- PArticle baSed Bayesian Optimal Adaptive design

https://openreview.net/forum?id=rGCvMARXkG

Compressor summary: PASOA is a new Bayesian experimental design method that uses contrastive estimation, stochastic optimization, and tempered SMC to balance information gain and accuracy in sequential design optimization and parameter inference.


UPOCR: Towards Unified Pixel-Level OCR Interface

https://openreview.net/forum?id=rEZ24oJhbn

Compressor summary: UPOCR is a simple and effective generalist model for pixel-level OCR tasks that unifies paradigms, architectures, and training strategies using vision Transformers and learnable task prompts.


InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

https://openreview.net/forum?id=rADFNrIss3

Compressor summary: InstructZero is a method that optimizes soft prompts to create instructions for black-box LLMs, improving their performance on various tasks.


BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model

https://openreview.net/forum?id=r9rzU9QzPe

Compressor summary: The BiSHop model combines associative memory and attention mechanisms to handle non-rotational invariance and feature sparsity in deep tabular learning, achieving superior performance on real-world datasets with fewer hyperparameter searches.


Reweighted Solutions for Weighted Low Rank Approximation

https://openreview.net/forum?id=r9XICONppE

Compressor summary: The paper introduces a new relaxed solution to weighted low rank approximation (WLRA) that uses the weight matrix itself to reweight a low rank solution, achieving simple and efficient algorithms with provable approximation guarantees.


Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

https://openreview.net/forum?id=r8k5JrGip6

Compressor summary: Superposition prompting is a novel method to improve efficiency and quality of large language models in retrieval-augmented generation by processing input documents in parallel paths and discarding irrelevant ones.


Learning to Route Among Specialized Experts for Zero-Shot Generalization

https://openreview.net/forum?id=r0qcGcFL4U

Compressor summary: The paper proposes a post-hoc method for routing tokens to specialized language models to improve zero-shot generalization, called PHATGOOSE.


Test-Time Model Adaptation with Only Forward Passes

https://openreview.net/forum?id=qz1Vx1v9iK

Compressor summary: The paper proposes a test-time adaptation method for resource-limited devices that uses input prompts, covariance matrix adaptation evolution strategy, and activation shifting to adapt models without backpropagation or weight changes.


$S^2$IP-LLM: Semantic Space Informed Prompt Learning with LLM for Time Series Forecasting

https://openreview.net/forum?id=qwQVV5R8Y7

Compressor summary: The paper proposes a method to use pre-trained language models for time series forecasting by aligning their semantic space with temporal dynamics using token embeddings and learned prompts.


Multi-Agent Reinforcement Learning Meets Leaf Sequencing in Radiotherapy

https://openreview.net/forum?id=qwKSTLbati

Compressor summary: The paper proposes a deep reinforcement learning model called Reinforced Leaf Sequencer to improve radiotherapy planning by optimizing leaf sequencing in a multi-agent framework.


Latent Space Symmetry Discovery

https://openreview.net/forum?id=qstt2OguvM

Compressor summary: LaLiGAN is a novel generative model that can discover nonlinear symmetries in data and latent space, enabling applications like equation discovery and long-term forecasting.


Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

https://openreview.net/forum?id=qqPL0DkcrI

Compressor summary: Sinusoidal positional encoding (SPE) is a new method that learns adaptive frequency features without hyperparameter tuning, improving fidelity and speed in various tasks such as 3D view synthesis, Text-to-Speech generation, and 1D regression.


An Empirical Study Into What Matters for Calibrating Vision-Language Models

https://openreview.net/forum?id=qoxuPshrZb

Compressor summary: The study examines how well vision-language models (VLMs) can estimate uncertainty across different settings and finds that temperature scaling helps improve calibration, even with a small amount of data.


Manifold Integrated Gradients: Riemannian Geometry for Feature Attribution

https://openreview.net/forum?id=qoOt02l2WC

Compressor summary: The paper proposes a method to improve Integrated Gradients, a feature attribution technique for deep learning models, by aligning the path of attribution with the data manifold's geometry, resulting in more intuitive explanations and increased robustness to adversarial attacks.


Multi-Source Conformal Inference Under Distribution Shift

https://openreview.net/forum?id=qmUbSAgz08

Compressor summary: The paper proposes a method to estimate uncertainty in machine learning predictions using multiple biased data sources, and shows its usefulness in predicting hospital length of stay for pediatric heart surgery patients.


Implicit Regularization in Feedback Alignment Learning Mechanisms for Neural Networks

https://openreview.net/forum?id=qklMNNub0H

Compressor summary: The text introduces a unified framework explaining how feedback alignment works in neural networks, improves its performance on multi-class tasks, and offers theoretical and empirical insights for better understanding and developing the method.


From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems

https://openreview.net/forum?id=qkhbyDqlNI

Compressor summary: This paper investigates how large language models can solve physical world problems through hierarchical reinforcement learning and exploration strategies.


Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

https://openreview.net/forum?id=qjqlhWDcId

Compressor summary: The paper shows that transformers can learn sparse token selection and have better generalization than fully-connected networks.


On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm

https://openreview.net/forum?id=qg6AlnpEQH

Compressor summary: Cross-Task Linearity is a phenomenon where linearly interpolating weights of finetuned models starting from the same pretrained checkpoint results in similar features across tasks, suggesting neural networks act as approximate linear maps.


Memory Consolidation Enables Long-Context Video Understanding

https://openreview.net/forum?id=qeFgvVVAJ2

Compressor summary: The memory-consolidated vision transformer (MC-ViT) extends the context for video understanding by fine-tuning pre-trained video transformers to attend to non-parametric memories, achieving state-of-the-art results on long-context tasks with fewer parameters.


Provably Efficient Reinforcement Learning for Adversarial Restless Multi-Armed Bandits with Unknown Transitions and Bandit Feedback

https://openreview.net/forum?id=qbIKUfastZ

Compressor summary: Key points: - The paper proposes a reinforcement learning algorithm for episodic RMAB with unknown transitions, bandit feedback, and adversarial rewards - The algorithm has two main components: a biased adversarial reward estimator and a low-complexity index policy - The algorithm achieves $ ilde{\mathcal{O}}(H\sqrt{T})$ regret bound, which is the first to ensure $ ilde{\mathcal{O}}(\sqrt{T})$ regret for this problem setting Summary: The paper presents a novel reinforcement learning algorithm for sequential decision making problems with adversarial rewards and unknown transitions, using a biased reward estimator and a low-complexity policy, and showing an improved regret bound.


PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses

https://openreview.net/forum?id=qawwyKqOkj

Compressor summary: The paper proposes a bagging method for event-level loss functions in linear regression and GLMs, which reduces to one-dimensional clustering and improves model quality with adaptive homogeneous sampling and label differential privacy.


Pre-Training Protein Bi-level Representation Through Span Mask Strategy On 3D Protein Chains

https://openreview.net/forum?id=qY63FnLuJ1

Compressor summary: The paper proposes a new pre-training strategy for protein models that combines residue and atom information, improving performance on downstream tasks like binding site prediction and function prediction.


Pausing Policy Learning in Non-stationary Reinforcement Learning

https://openreview.net/forum?id=qY622O6Ehg

Compressor summary: The text discusses how strategically pausing decision updates in online reinforcement learning can improve performance by managing uncertainty, and provides theoretical and experimental evidence for this claim.


Defining Neural Network Architecture through Polytope Structures of Datasets

https://openreview.net/forum?id=qXoqV40imX

Compressor summary: The paper defines upper and lower bounds for neural network widths based on dataset complexity, explores how geometry affects network requirements, and develops an algorithm to infer dataset structure from trained networks.


Extracting Training Data From Document-Based VQA Models

https://openreview.net/forum?id=qTX1vxzs8b

Compressor summary: The paper studies how vision-language models can memorize and leak personal information from images during training, and proposes a countermeasure to prevent this issue.


BLO-SAM: Bi-level Optimization Based Finetuning of the Segment Anything Model for Overfitting-Preventing Semantic Segmentation

https://openreview.net/forum?id=qRtM5EqE9l

Compressor summary: BLO-SAM is a model that improves semantic segmentation by optimizing prompt embeddings and using bi-level optimization to reduce overfitting and enable autonomous object segmentation.


Controlling Behavioral Diversity in Multi-Agent Reinforcement Learning

https://openreview.net/forum?id=qQjUgItPq4

Compressor summary: DiCo is a method for controlling behavioral diversity in multi-agent systems by adjusting policy components, without changing the learning objective, and improving performance and sample efficiency.


BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

https://openreview.net/forum?id=qOl2WWOqFg

Compressor summary: BiLLM is a 1-bit post-training quantization scheme that compresses pretrained LLMs while preserving their performance and efficiency.


DynSyn: Dynamical Synergistic Representation for Efficient Learning and Control in Overactuated Embodied Systems

https://openreview.net/forum?id=qOMQ0UGLYl

Compressor summary: DynSyn is a deep reinforcement learning algorithm that uses synergistic representations of muscle actuators derived from dynamical structures to improve motor control in high-dimensional, overactuated systems.


Cluster-Aware Similarity Diffusion for Instance Retrieval

https://openreview.net/forum?id=qMG3OK7Xcg

Compressor summary: The paper proposes a novel method called Cluster-Aware Similarity for instance retrieval, which reduces misinformation propagation by diffusing similarity within local clusters instead of using pairwise instances.


Learning from Streaming Data when Users Choose

https://openreview.net/forum?id=qLZ32oS7j2

Compressor summary: The paper proposes a decentralized algorithm to optimize user choice and service provider models in digital markets, and proves its convergence and effectiveness.


GiLOT: Interpreting Generative Language Models via Optimal Transport

https://openreview.net/forum?id=qKL25sGjxL

Compressor summary: The paper introduces GiLOT, a method to measure and explain the impact of each word in large language models using Optimal Transport and token similarity.


On Interpolating Experts and Multi-Armed Bandits

https://openreview.net/forum?id=qIiPM5CbRY

Compressor summary: The paper studies a family of online decision problems that interpolate between learning with expert advice and multi-armed bandit, and provides tight minimax regret bounds and an optimal PAC algorithm for a special case called $\mathbf m$-BAI.


Graph Neural Network Explanations are Fragile

https://openreview.net/forum?id=qIOSNyPPwB

Compressor summary: The text discusses adversarial attacks on graph neural network explainers, showing that they can be easily manipulated by slightly perturbing the graph structure.


The Effect of Weight Precision on the Neuron Count in Deep ReLU Networks

https://openreview.net/forum?id=qHt8FzPvU9

Compressor summary: This paper analyzes how weight precision affects ReLU neural networks in terms of number of neurons and preprocessing time, presenting an exponential algorithm to reduce neurons at the cost of precision, and showing that high precision alone doesn't help in reducing neurons.


Structure-Aware E(3)-Invariant Molecular Conformer Aggregation Networks

https://openreview.net/forum?id=qGEEso256L

Compressor summary: The text describes a new method for predicting molecular properties using 2D and 3D representations of molecules integrated with a novel aggregation mechanism that is invariant under Euclidean transformations.


AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

https://openreview.net/forum?id=qFILbkTQWw

Compressor summary: AnyTool is a powerful language model agent that uses over 16,000 APIs to address user queries and improves upon previous evaluation protocols with its own benchmark, AnyToolBench.


Operator SVD with Neural Networks via Nested Low-Rank Approximation

https://openreview.net/forum?id=qESG5HaaoJ

Compressor summary: The paper introduces a new neural network approach to compute eigenvalue decomposition efficiently and accurately using low-rank approximation and nesting techniques.


Estimating Unknown Population Sizes Using the Hypergeometric Distribution

https://openreview.net/forum?id=qE4nkfyMYl

Compressor summary: The paper proposes a novel method to estimate discrete distributions in multivariate hypergeometric sampling when both population size and category sizes are unknown, using variational autoencoder framework and showing its applications in NLP and biology.


Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

https://openreview.net/forum?id=qDw4FxMubj

Compressor summary: This paper proposes a sample-efficient model-based algorithm for learning robust equilibria in distributionally robust Markov games, where agents learn policies that perform well under various environmental uncertainties.


Model-Based Minimum Bayes Risk Decoding for Text Generation

https://openreview.net/forum?id=qDUaH9xHVV

Compressor summary: Model-based MBR decoding improves text generation by using the model probability instead of a Monte Carlo estimate for hypothesis selection.


ACM-MILP: Adaptive Constraint Modification via Grouping and Selection for Hardness-Preserving MILP Instance Generation

https://openreview.net/forum?id=qDAAMmGsGw

Compressor summary: The paper proposes ACM-MILP, a framework that generates better Mixed-Integer Linear Programming (MILP) instances by adaptively modifying constraints and modeling their interrelations.


tinyBenchmarks: evaluating LLMs with fewer examples

https://openreview.net/forum?id=qAml3FpfhG

Compressor summary: The paper proposes strategies to reduce the number of evaluations needed for testing large language models on various benchmarks by using curated examples and releasing evaluation tools and smaller versions of popular benchmarks.


MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

https://openreview.net/forum?id=q6fXuPLpao

Compressor summary: MLIP improves CLIP by using multiple domains and levels of supervision, token merging, and frequency transforms to enhance multimodal learning efficiency and performance.


Byzantine Resilient and Fast Federated Few-Shot Learning

https://openreview.net/forum?id=q5q59s2WJy

Compressor summary: The paper presents a Byzantine resilient algorithm for learning low-dimensional linear representation in a federated setting, with improved efficiency and security compared to existing methods.


Disguised Copyright Infringement of Latent Diffusion Models

https://openreview.net/forum?id=q5Bg858Hef

Compressor summary: The paper discusses concealed copyright infringement by generative models that use disguised copies of protected data for training and proposes methods to detect and prevent it.


Classification Under Strategic Self-Selection

https://openreview.net/forum?id=q3Bz1TVTq4

Compressor summary: The text discusses how users strategically choose whether to participate in predictive systems based on favorable outcomes, and proposes a framework for learning from such self-selected populations.


An Effective Dynamic Gradient Calibration Method for Continual Learning

https://openreview.net/forum?id=q14AbM4kdv

Compressor summary: The paper proposes an algorithm to calibrate gradients for continual learning, reducing catastrophic forgetting and improving performance when historical data is limited.


Sparse-to-dense Multimodal Image Registration via Multi-Task Learning

https://openreview.net/forum?id=q0vILV7zAw

Compressor summary: The paper presents SDME, a multi-task network that combines sparse and dense feature matching for robust multimodal image registration, achieving excellent results on various datasets.


Disentanglement Learning via Topology

https://openreview.net/forum?id=q0lxAs5GGO

Compressor summary: TopDis is a novel method for learning disentangled data representations using a multi-scale topological loss that improves various disentanglement scores and works unsupervised, even for correlated factors of variation.


Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search

https://openreview.net/forum?id=pz4B2kHVKo

Compressor summary: The paper proposes a probabilistic routing method called PEOs that improves the efficiency of approximate nearest neighbor search in high-dimensional spaces using graph-based approaches.


Latent Logic Tree Extraction for Event Sequence Explanation from LLMs

https://openreview.net/forum?id=pwfcwEqdUz

Compressor summary: Key points: - The goal is to use Large Language Models (LLMs) to provide explanations for high-stakes event sequences. - The method builds on the temporal point process model and uses the likelihood function as a score for logic trees. - The approach combines an amortized EM learning framework with a GFlowNet generator for structured discrete variables. - The online setting extracts relevant rules from LLMs for each sequence in few iterations. Summary: The paper proposes a method to use LLMs for explanations of high-stakes event sequences, using a score based on the likelihood function and an EM learning framework with a GFlowNet generator.


DiNADO: Norm-Disentangled Neurally-Decomposed Oracles for Controlling Language Models

https://openreview.net/forum?id=pvg1OdUtDQ

Compressor summary: DiNADO is an improved version of NADO for controllable language generation that addresses challenges like gradient vanishing, limited capacity, and allows combination with finetuning methods.


Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective

https://openreview.net/forum?id=puSMYmHmJW

Compressor summary: The paper proposes a mathematical model that represents the world as a hypergraph and studies how pre-training of foundation models can recover this structure using graph theory concepts.


DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

https://openreview.net/forum?id=psup68MBvt

Compressor summary: DisCo-Diff is a new diffusion model that simplifies the learning process by adding discrete latents, improving performance in various tasks.


The Balanced-Pairwise-Affinities Feature Transform

https://openreview.net/forum?id=pspyQm4ko0

Compressor summary: The BPA feature transform uses optimal transport optimization to efficiently and effectively represent high order relations between input features for various tasks like few-shot classification, clustering, and person re-identification.


In-Context Reinforcement Learning for Variable Action Spaces

https://openreview.net/forum?id=pp3v2ch5Sd

Compressor summary: Headless-AD is a single-trained model that can adapt to different action spaces without retraining or data collection.


Fair Off-Policy Learning from Observational Data

https://openreview.net/forum?id=poEPRuNvM3

Compressor summary: Key points: - The paper proposes a framework for fair off-policy learning from observational data under different notions of fairness - The framework applies to actions or policy values as measures of fairness - The framework has theoretical guarantees and is tested on simulated and real-world data Summary: The paper presents a novel approach to learn fair decision rules from observational data in off-policy learning settings, with different fairness criteria and theoretical guarantees.


An Intrinsic Vector Heat Network

https://openreview.net/forum?id=po4NsL9KvX

Compressor summary: The paper presents a neural network architecture for learning tangent vector fields on surface manifolds that preserves intrinsic properties using a trainable vector heat diffusion module and vector-valued neurons.


More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

https://openreview.net/forum?id=pmsPKIBAu6

Compressor summary: The text introduces a new framework for studying meta-learning methods using PAC-Bayesian theory, which allows more flexibility and directness in transferring knowledge between tasks, and demonstrates its effectiveness in both theory and practice.


Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs

https://openreview.net/forum?id=pmncWWkGMz

Compressor summary: The paper proposes agent-specific effects (ASE) to measure how an agent's action affects the outcome by influencing other agents and introduces counterfactual counterpart (cf-ASE) for identifying and estimating it.


Graph Out-of-Distribution Detection Goes Neighborhood Shaping

https://openreview.net/forum?id=pmcusTywXO

Compressor summary: TopoOOD is a method for detecting out-of-distribution node instances on graphs that considers graph topology and neighborhood context, and shows improved performance over existing approaches.


Emergent Equivariance in Deep Ensembles

https://openreview.net/forum?id=plXXbXjvQ9

Compressor summary: Deep ensembles become equivariant for all inputs and architectures with data augmentation, and this emergent property holds off-manifold in the infinite width limit.


Efficient Denoising Diffusion via Probabilistic Masking

https://openreview.net/forum?id=pktvuR7b5v

Compressor summary: EDDPM is an efficient denoising diffusion method that uses probabilistic masking to identify and skip redundant steps during training, improving inference efficiency without losing important information.


Robust Inverse Constrained Reinforcement Learning under Model Misspecification

https://openreview.net/forum?id=pkUl39b0in

Compressor summary: The paper proposes a method to learn safe policies from expert demonstrations by inferring robust constraints that account for environmental differences, and evaluates it in continuous and discrete domains.


Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

https://openreview.net/forum?id=piujJIF3zs

Compressor summary: Model Tailor is a method to reduce catastrophic forgetting in multi-modal language models by replacing a small portion of fine-tuned parameters and improving performance on both original and new tasks.


GPT-4V(ision) is a Generalist Web Agent, if Grounded

https://openreview.net/forum?id=piecKJ2DlB

Compressor summary: The paper explores using large multimodal models like GPT-4V as generalist web agents that can follow natural language instructions on websites, and proposes SEEACT to evaluate them on the MIND2WEB benchmark.


DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching

https://openreview.net/forum?id=phGHQOKmaU

Compressor summary: DiffStitch is a data augmentation method that connects low-reward trajectories with high-reward ones, improving offline reinforcement learning performance across different methods.


Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error

https://openreview.net/forum?id=pgI9inG2Ny

Compressor summary: This paper studies the optimal robust policy for deep reinforcement learning agents under state perturbations, proposes a new consistent adversarial robust deep Q-network algorithm, and shows its effectiveness in various benchmarks.


Translation Equivariant Transformer Neural Processes

https://openreview.net/forum?id=pftXzp6Yn3

Compressor summary: The text introduces a new family of translation equivariant neural processes (TE-TNPs) that improve spatio-temporal modelling by leveraging symmetries in posterior predictive maps.


A General Online Algorithm for Optimizing Complex Performance Metrics

https://openreview.net/forum?id=pfnBLXgFVS

Compressor summary: Key points: - The paper proposes a general online algorithm for optimizing non-decomposable performance metrics based on confusion matrices - The algorithm is simple, efficient, and works for different types of classification problems - The algorithm achieves sublinear regret for concave and smooth metrics and performs well in experiments Summary: The paper presents an online algorithm that optimizes complex performance metrics using confusion matrices, with simplicity, efficiency, and low regret guarantees.


Open-Vocabulary Calibration for Fine-tuned CLIP

https://openreview.net/forum?id=pY2UpspnBB

Compressor summary: This paper proposes Distance-Aware Calibration (DAC), a simple and effective method to improve confidence calibration in vision-language models for open-vocabulary tasks using prompt learning.


Guiding LLMs The Right Way: Fast, Non-Invasive Constrained Generation

https://openreview.net/forum?id=pXaEYzrFae

Compressor summary: The paper introduces DOMINO, a new algorithm that enforces constraints on large language models without sacrificing accuracy or speed.


Position: Understanding LLMs Requires More Than Statistical Generalization

https://openreview.net/forum?id=pVyOchWUBa

Compressor summary: This paper argues that non-identifiability in AR probabilistic models is a separate theoretical explanation for the desirable qualities of large language models, and discusses its relevance through three case studies.


SelMatch: Effectively Scaling Up Dataset Distillation via Selection-Based Initialization and Partial Updates by Trajectory Matching

https://openreview.net/forum?id=pTFud6SetK

Compressor summary: SelMatch is a novel dataset distillation method that effectively scales with the number of images per class and improves performance on image classification tasks.


Subgoal-based Demonstration Learning for Formal Theorem Proving

https://openreview.net/forum?id=pSnhA7Em1P

Compressor summary: The paper proposes a subgoal-based learning framework that improves the performance of large language models in formal theorem proving by using demonstrative examples and diffusion models for organization.


Split-Ensemble: Efficient OOD-aware Ensemble via Task and Model Splitting

https://openreview.net/forum?id=pQyoBWA146

Compressor summary: Split-Ensemble is a novel method that improves uncertainty estimation for deep learning models without extra OOD data or inference costs, using subtask-splitting and feature sharing.


Class-Imbalanced Graph Learning without Class Rebalancing

https://openreview.net/forum?id=pPnkpvBeZN

Compressor summary: This paper proposes a topological augmentation framework called BAT that can effectively mitigate class imbalance bias in graph learning without reweighting or resampling.


Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

https://openreview.net/forum?id=pPNMhdYMaz

Compressor summary: Key points: - The paper studies multi-agent reinforcement learning with adaptivity constraints - It proposes a policy elimination algorithm that achieves low regret and batch complexity - It proves lower bounds for such algorithms and extends to related problems Summary: The paper presents a near optimal algorithm and analysis for multi-agent reinforcement learning with adaptivity constraints, and shows its applicability to bandit games and reward-free MARL.


SILVER: Single-loop variance reduction and application to federated learning

https://openreview.net/forum?id=pOgMluzEIH

Compressor summary: SILVER is a new optimization method for non-convex problems that reduces computation time and achieves optimal gradient complexity without needing multiple full gradients.


Efficient Algorithms for Empirical Group Distributionally Robust Optimization and Beyond

https://openreview.net/forum?id=pOJbk4Nzmi

Compressor summary: The paper proposes a new algorithm called ALEG for solving empirical Group Distributionally Robust Optimization (GDRO), which minimizes the maximal risk across multiple groups and outperforms existing methods.


Mechanistic Neural Networks for Scientific Machine Learning

https://openreview.net/forum?id=pLtuwhoQh7

Compressor summary: Mechanistic Neural Networks use a new block to learn differential equations and improve interpretability and efficiency in scientific data modeling.


O$n$ Learning Deep O($n$)-Equivariant Hyperspheres

https://openreview.net/forum?id=pFWmHUdJE5

Compressor summary: Key points: - Paper proposes O$(n)$-equivariant neurons with spherical decision surfaces for learning deep features under nd reflections and rotations - Neurons generalize to any dimension n and are called Deep Equivariant Hyperspheres - Network combines them using an invariant operator based on the relation between two points and a sphere - Approach outperforms competing methods on O$(n)$-equivariant benchmark datasets Summary: The paper introduces Deep Equivariant Hyperspheres, neurons that learn deep features under nd reflections and rotations with spherical decision surfaces. The network uses an invariant operator and shows superior performance on O$(n)$-equivariant tasks.


Better & Faster Large Language Models via Multi-token Prediction

https://openreview.net/forum?id=pEWAcejiU2

Compressor summary: The paper proposes training language models to predict multiple future tokens simultaneously, improving sample efficiency, downstream capabilities, and inference speed for both code and natural language models.


SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

https://openreview.net/forum?id=pDoAjdrMf0

Compressor summary: The paper investigates transfer reinforcement learning using successor features and generalized policy improvement, providing convergence analysis and generalization guarantees for this approach.


Revealing Vision-Language Integration in the Brain with Multimodal Networks

https://openreview.net/forum?id=pD9BTIDUoX

Compressor summary: The authors use (multi)modal DNNs to identify brain regions where multimodal integration occurs by predicting SEEG recordings from vision and language models.


RAUCA: A Novel Physical Adversarial Attack on Vehicle Detectors via Robust and Accurate Camouflage Generation

https://openreview.net/forum?id=pBTLGM9uWx

Compressor summary: RAUCA is a method for generating effective adversarial camouflage against vehicle detectors, using a novel neural rendering component (NRP) and a multi-weather dataset.


QORA: Zero-Shot Transfer via Interpretable Object-Relational Model Learning

https://openreview.net/forum?id=pAzDdYzEva

Compressor summary: QORA is an algorithm that uses a domain-agnostic object-based state representation to construct expressive models for various reinforcement-learning domains, achieving high generalization and interpretability with fewer observations than neural networks.


Stochastic Weakly Convex Optimization beyond Lipschitz Continuity

https://openreview.net/forum?id=pAyX8q1IIn

Compressor summary: The paper proposes new adaptive regularization strategies for stochastic weakly convex optimization that preserve the $\mathcal{O} ( 1 / \sqrt{K})$ convergence rate with a wide class of algorithms, using weak assumptions and showing efficiency and robustness in experiments.


Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond

https://openreview.net/forum?id=pAdI75JG3G

Compressor summary: The paper presents a novel framework for combinatorial multi-armed bandits with multivariant and probabilistically triggering arms, enhancing modeling power and achieving improved results in various applications like episodic reinforcement learning and probabilistic maximum coverage.


Theory of Consistency Diffusion Models: Distribution Estimation Meets Fast Sampling

https://openreview.net/forum?id=pAPykbqUHf

Compressor summary: This paper develops a statistical theory for consistency models that speed up sample generation by merging steps in the diffusion process, achieving similar performance to diffusion models.


RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

https://openreview.net/forum?id=pA2Q5Wfspp

Compressor summary: RL-CFR is a new RL method for dynamic action abstraction in IIEFGs that uses CFR for strategy derivation and achieves higher expected payoff than existing methods.


Generalizing Orthogonalization for Models with Non-Linearities

https://openreview.net/forum?id=p9SMltcfsu

Compressor summary: The paper proposes a method to reduce biases in black-box algorithms like neural networks by correcting non-linearities and handling scalar and tensor-valued predictions.


Inexact Newton-type Methods for Optimisation with Nonnegativity Constraints

https://openreview.net/forum?id=p7gpooFIr3

Compressor summary: The paper proposes extensions of the two-metric projection framework to solve large scale nonconvex optimisation problems with nonnegativity constraints, achieving state-of-the-art convergence rates and competitive practical performance.


Prospective Side Information for Latent MDPs

https://openreview.net/forum?id=p5FIjG9fbs

Compressor summary: This paper studies decision problems where agents receive some weak information about the hidden context and shows that current RL algorithms are not optimal for this setting, proposing a better algorithm with improved sample complexity.


PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

https://openreview.net/forum?id=p225Od0aYt

Compressor summary: PRISE is a method that uses BPE to compress sequences and learn action abstractions for better skill learning in robotic manipulation tasks.


Nesting Particle Filters for Experimental Design in Dynamical Systems

https://openreview.net/forum?id=p1kDNFs62o

Compressor summary: The paper presents a new Bayesian experimental design method for non-exchangeable data using Inside-Out SMC$^2$ algorithm and particle Markov chain Monte Carlo, which outperforms existing methods on dynamical systems.


MEMORYLLM: Towards Self-Updatable Large Language Models

https://openreview.net/forum?id=p0lKWzdikQ

Compressor summary: MEMORYLLM is a self-updatable language model that can memorize new text knowledge and maintain its performance after many updates.


Bridging Model Heterogeneity in Federated Learning via Uncertainty-based Asymmetrical Reciprocity Learning

https://openreview.net/forum?id=p0MGN0LSnx

Compressor summary: FedType is a framework for federated learning that uses proxy models to exchange information securely, efficiently, and without relying on public data.


On the Origins of Linear Representations in Large Language Models

https://openreview.net/forum?id=otuTw4Mghk

Compressor summary: This paper investigates why large language models encode high-level semantic concepts linearly, and shows that it is due to the loss function and gradient descent.


InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation

https://openreview.net/forum?id=or8BQ4ohGb

Compressor summary: InterpreTabNet uses a latent variable and GPT-4 to learn distinct and sparse feature masks for tabular data, improving interpretability and predictive performance.


Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models

https://openreview.net/forum?id=opkluZm9gX

Compressor summary: The paper studies a dynamic version of the attention matrix multiplication problem in large language models and provides an algorithm with efficient update and query time.


A New Robust Partial p-Wasserstein-Based Metric for Comparing Distributions

https://openreview.net/forum?id=opieUcKjPa

Compressor summary: The paper introduces a new family of distances called $k$-RPW that combines partial Wasserstein distance with robustness to outliers and faster convergence rate than existing measures, making it suitable for image retrieval tasks.


Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability

https://openreview.net/forum?id=oowQ8LPA12

Compressor summary: The paper presents a theoretical framework to analyze the performance of machine learning methods for database operations in dynamic datasets, showing when they can outperform non-learned alternatives.


Position: Explain to Question not to Justify

https://openreview.net/forum?id=ooikIHLHCs

Compressor summary: The text discusses two complementary approaches to Explainable Artificial Intelligence (XAI) - human-oriented explanations (BLUE XAI) and model-oriented explanations (RED XAI), emphasizing the need for more methods in RED XAI to ensure AI safety.


A Theory of Fault-Tolerant Learning

https://openreview.net/forum?id=ooh8tkXKyR

Compressor summary: The paper introduces fault-tolerant PAC learning, a framework to identify robust machine learning models against random and adversarial faults, showing its sample complexity can vary depending on the type of fault.


Mitigating Catastrophic Forgetting in Online Continual Learning by Modeling Previous Task Interrelations via Pareto Optimization

https://openreview.net/forum?id=olbTrkWo1D

Compressor summary: The paper proposes a new continual learning algorithm, POCL, that optimizes past tasks' performance while maintaining the current task's performance, addressing the catastrophic forgetting challenge in machine learning.


Q-value Regularized Transformer for Offline Reinforcement Learning

https://openreview.net/forum?id=ojtddicekd

Compressor summary: Q-value regularized Transformer (QT) combines trajectory modeling with dynamic programming to improve offline reinforcement learning.


Off-policy Evaluation Beyond Overlap: Sharp Partial Identification Under Smoothness

https://openreview.net/forum?id=oiY7yhyi6W

Compressor summary: The paper proposes new methods for off-policy evaluation without overlap or a well-specified model, using Lipschitz smoothness assumptions to provide sharp bounds and optimal estimators.


Optimal bounds for $\ell_p$ sensitivity sampling via $\ell_2$ augmentation

https://openreview.net/forum?id=ohH3sbUue2

Compressor summary: The text describes improved sampling methods for large data sets using sensitivity and subspace embeddings, achieving optimal complexity bounds with better guarantees than previous approaches.


Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks

https://openreview.net/forum?id=ohG9bVMs5j

Compressor summary: The paper proposes a new method to generate proxy graphs that help explain graph neural networks by preserving explanatory factors and adhering to the distribution of training data.


Generative Active Learning for Long-tailed Instance Segmentation

https://openreview.net/forum?id=ofXRBPtol3

Compressor summary: The paper proposes BSGAL, a new active learning algorithm that uses gradient cache to select batches of generated data for enhancing long-tailed instance segmentation tasks.


Uniformly Stable Algorithms for Adversarial Training and Beyond

https://openreview.net/forum?id=odCl49tWA6

Compressor summary: The paper proposes a new algorithm (ME-$\mathcal{A}$) that improves robustness of neural networks in adversarial machine learning by ensuring uniform stability and mitigating the issue of robust overfitting.


The Relative Value of Prediction in Algorithmic Decision Making

https://openreview.net/forum?id=oaACFfNbXl

Compressor summary: The authors study when and why predictions are more valuable than expanding access or increasing intervention effectiveness in algorithmic decision making for public goods and welfare.


A Linear Time and Space Local Point Cloud Geometry Encoder via Vectorized Kernel Mixture (VecKM)

https://openreview.net/forum?id=oYltxxam2t

Compressor summary: VecKM is an efficient and accurate local point cloud geometry encoder that uses vectorized kernel mixtures and eliminates the need for grouping points into neighbors.


BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

https://openreview.net/forum?id=oWYzIodyC4

Compressor summary: BWS is a universal data subset selection method that efficiently selects informative samples for training neural networks across various selection ratios using difficulty scores and kernel ridge regression.


Foundations of Testing for Finite-Sample Causal Discovery

https://openreview.net/forum?id=oUmXcewb83

Compressor summary: The text discusses a new method for finite-sample causal discovery that combines structured multiple testing and graph skeleton information, allowing efficient verification of graph structure.


Human-like Category Learning by Injecting Ecological Priors from Large Language Models into Neural Networks

https://openreview.net/forum?id=oTmQmaNkGn

Compressor summary: ERMI is a cognitive model that uses large language models to generate ecologically valid tasks and meta-learning to create rational agents, outperforming other models in explaining human behavior and achieving high accuracy on a benchmark task.


Efficient Stochastic Approximation of Minimax Excess Risk Optimization

https://openreview.net/forum?id=oTYuORAMaP

Compressor summary: The paper proposes efficient stochastic approximation methods for minimizing excess risk in minimax optimization problems.


Learning to Explore in POMDPs with Informational Rewards

https://openreview.net/forum?id=oTD3WoQyFR

Compressor summary: The paper proposes a POMDP agent that uses meta-exploration techniques to gather relevant information for completing tasks in partially observed environments, outperforming prior methods when complex strategies are needed.


On the sample complexity of conditional independence testing with Von Mises estimator with application to causal discovery

https://openreview.net/forum?id=oSOZ31ISBV

Compressor summary: The paper proposes a new test for conditional independence using an estimator based on the Von Mises entropy and kernel density estimation, which has better performance than existing methods in causal discovery for non-linear models and non-Gaussian continuous variables.


DRCT: Diffusion Reconstruction Contrastive Training towards Universal Detection of Diffusion Generated Images

https://openreview.net/forum?id=oRLwyayrh1

Compressor summary: The paper proposes DRCT, a framework to improve the generalizability of image detectors for generated images by using high-quality diffusion reconstruction and contrastive learning, and introduces a large dataset with different diffusion models for evaluation.


How to Leverage Diverse Demonstrations in Offline Imitation Learning

https://openreview.net/forum?id=oOlooUu2Sb

Compressor summary: Key points: - The paper proposes a data selection method for offline imitation learning with imperfect demonstrations. - The method selects data based on resultant states, which uses dynamics information and extracts both expert and diverse behaviors. - The paper also presents a behavior cloning algorithm that leverages the selected data. - The method achieves state-of-the-art performance on complex and high-dimensional offline IL benchmarks. Summary: The paper introduces a resultant state-based data selection method and a behavior cloning algorithm for offline imitation learning with imperfect demonstrations, which outperforms existing methods on 20/21 benchmarks.


Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

https://openreview.net/forum?id=oLfq1KKneW

Compressor summary: The paper proposes a framework using conditional residual energy-based models to improve molecule synthesis routes in drug discovery by considering criteria like costs, yields, and step count.


Position: Amazing Things Come From Having Many Good Models

https://openreview.net/forum?id=oFDFGd9Age

Compressor summary: The Rashomon Effect means many good models exist for the same data, influencing various aspects of machine learning and its applications, especially for noisy tabular data problems.


Image Restoration Through Generalized Ornstein-Uhlenbeck Bridge

https://openreview.net/forum?id=oDUJmNCV8D

Compressor summary: The GOUB model uses a generalized OU process to map from low-quality images to high-quality ones, achieving state-of-the-art performance in various image restoration tasks.


KISA: A Unified Keyframe Identifier and Skill Annotator for Long-Horizon Robotics Demonstrations

https://openreview.net/forum?id=oCI9gHocws

Compressor summary: Keyframe Identifier and Skill Annotator (KISA) is a method that uses visual-language representations to accurately and interpretably decompose unlabeled robotic manipulation demonstrations into keyframes and skills.


SPADE: Sparsity-Guided Debugging for Deep Neural Networks

https://openreview.net/forum?id=oBYv73nOoA

Compressor summary: The paper introduces SPADE, a method that uses sample-targeted pruning to provide more accurate and interpretable image saliency maps and neuron visualizations for deep neural networks without affecting their behavior.


VideoPrism: A Foundational Visual Encoder for Video Understanding

https://openreview.net/forum?id=oBP8vXFJNQ

Compressor summary: VideoPrism is a versatile video encoder that uses pretraining on large datasets to perform well on various video understanding tasks.


Learning Latent Space Hierarchical EBM Diffusion Models

https://openreview.net/forum?id=o9uOuIwhZK

Compressor summary: The paper proposes using diffusion probabilistic schemes to improve learning of energy-based priors for multi-layer generative models, addressing the prior hole problem and increasing modelling expressivity.


Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

https://openreview.net/forum?id=o8AaRKbP9K

Compressor summary: The paper explores how linear looped Transformers can learn and converge to fix-point iterative algorithms like preconditioned gradient descent in linear regression, using a novel theoretical analysis and experiments.


How Spurious Features are Memorized: Precise Analysis for Random and NTK Features

https://openreview.net/forum?id=o6N1Bqay0k

Compressor summary: The paper proposes a theoretical framework to quantify how deep learning models memorize spurious features unrelated to the task and reveals factors affecting this phenomenon.


PairNet: Training with Observed Pairs to Estimate Individual Treatment Effect

https://openreview.net/forum?id=o5SVr80Rgg

Compressor summary: PairNet is a novel training strategy for individual treatment effect estimation that minimizes losses over pairs of examples based on their factual observed outcomes, achieving smaller generalization error than existing methods.


ReLU Network with Width $d+\mathcal{O}(1)$ Can Achieve Optimal Approximation Rate

https://openreview.net/forum?id=o4HF3N6CZR

Compressor summary: The paper proves that ReLU neural networks with a width of $d+1$ can optimally approximate continuous functions over $[0,1]^d$ under $L^p$ norm for $p\in[1,\infty)$, and $d+11$ for the uniform norm.


Interpreting and Improving Diffusion Models from an Optimization Perspective

https://openreview.net/forum?id=o2ND9v0CeK

Compressor summary: This paper shows how denoising diffusion models can be seen as a form of gradient descent, analyzes their convergence, and proposes a new sampler that achieves state-of-the-art performance in image generation.


Using Left and Right Brains Together: Towards Vision and Language Planning

https://openreview.net/forum?id=o1gS6MNAw8

Compressor summary: The authors propose a novel framework for concurrent visual and language planning, which outperforms separate approaches on various tasks, demonstrating the benefits of integrating vision and language.


BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

https://openreview.net/forum?id=nxzXTLByXO

Compressor summary: BRAIn is a method that improves distribution matching techniques for language model alignment by reducing variance and generalizing the target distribution using Bayes' rule, leading to better performance in summarization and self-attention tasks.


Robustness of Deep Learning for Accelerated MRI: Benefits of Diverse Training Data

https://openreview.net/forum?id=nvfZgdHtHc

Compressor summary: The study shows that using diverse data for training deep learning models improves their performance and robustness in accelerated MRI tasks without sacrificing in-distribution accuracy.


$H$-Consistency Guarantees for Regression

https://openreview.net/forum?id=nvHlHfjJPe

Compressor summary: The paper studies and generalizes tools for proving $H$-consistency bounds, a measure of how well regression algorithms perform, and derives new surrogate losses for adversarial regression with promising results.


Can a Few Decide for Many? The Metric Distortion of Sortition

https://openreview.net/forum?id=nsjfoziR5j

Compressor summary: The paper studies how to select representative sortition panels that reflect the population's opinion using metric distortion and compares two selection algorithms.


EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens

https://openreview.net/forum?id=nn5OPHom8t

Compressor summary: EVEREST is an efficient MVA approach for video representation learning that focuses on informative frames, reducing computation and memory requirements.


LLM Maybe LongLM: SelfExtend LLM Context Window Without Tuning

https://openreview.net/forum?id=nkOMLBIiI7

Compressor summary: SelfExtend is a method to extend the context window of language models without fine-tuning by using grouped and neighbor attention mechanisms.


Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models

https://openreview.net/forum?id=njwv9BsGHF

Compressor summary: LATS is a framework that combines language models and Monte Carlo Tree Search to create more capable autonomous agents for various decision-making tasks.


Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

https://openreview.net/forum?id=njpTpkvUbO

Compressor summary: The paper introduces a new algorithm for learning in Constrained Markov Decision Processes, which achieves near-optimal regret bounds and performs better than existing methods in simulations.


Momentum Particle Maximum Likelihood

https://openreview.net/forum?id=ngjmcfowtc

Compressor summary: The paper proposes a novel dynamical systems approach to fit latent variable models to data using optimal transport insights, which improves upon existing particle methods and other MLE algorithms.


RLVF: Learning from Verbal Feedback without Overgeneralization

https://openreview.net/forum?id=ngcZhfXCBW

Compressor summary: C3PO is a method that learns from high-level verbal feedback to adjust large language models without overgeneralizing the feedback to unrelated contexts.


Bringing Motion Taxonomies to Continuous Domains via GPLVM on Hyperbolic manifolds

https://openreview.net/forum?id=ndVXXmxSC5

Compressor summary: Key points: - Human motion taxonomies are useful for analysing how humans move and interact with their environment - They have a hierarchical structure but lack computational models that connect them to high-dimensional data - The proposed model uses hyperbolic embeddings and Gaussian processes to capture the taxonomy structure and learn from data - The model outperforms other methods and can generate realistic trajectories Summary: The paper proposes a novel model that uses hyperbolic embeddings and Gaussian processes to learn human motion taxonomies from data and generate realistic trajectories.


Graph-based Time Series Clustering for End-to-End Hierarchical Forecasting

https://openreview.net/forum?id=nd47Za5jk5

Compressor summary: The paper proposes a graph-based method to unify relational and hierarchical inductive biases in deep learning for time series forecasting, using trainable graph pooling operators to learn the hierarchy from data and a differentiable reconciliation stage to balance constraints and predictions.


Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining

https://openreview.net/forum?id=ncjhi4qAPV

Compressor summary: The paper discusses the potential benefits and drawbacks of using non-private models pretrained on public datasets to improve differentially private machine learning.


Conformalized Adaptive Forecasting of Heterogeneous Trajectories

https://openreview.net/forum?id=nbpwNmXTTw

Compressor summary: The paper introduces a new method for generating uncertainty bands that cover the entire path of random trajectories, useful for motion planning applications with unpredictable objects.


A Dynamical Model of Neural Scaling Laws

https://openreview.net/forum?id=nbOY1OmtRc

Compressor summary: The text discusses a random feature model that explains neural scaling laws in network training and generalization, predicting different power law exponents for performance with training time and model size, and showing convergence rates dependent on architecture and task.


video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models

https://openreview.net/forum?id=nYsh5GFIqX

Compressor summary: The paper presents a single end-to-end audio-visual large language model (av-LLM) called video-SALMONN, which improves speech understanding in videos and achieves significant accuracy gains on various audio-visual tasks.


HAMLET: Graph Transformer Neural Operator for Partial Differential Equations

https://openreview.net/forum?id=nYX7I6PsL7

Compressor summary: HAMLET is a graph transformer framework that uses modular input encoders to solve PDEs using neural networks, achieving robustness and adaptability across different domains and data scenarios.


Double-Step Alternating Extragradient with Increasing Timescale Separation for Finding Local Minimax Points: Provable Improvements

https://openreview.net/forum?id=nUVForc3VP

Compressor summary: The paper proposes two improvements to existing two-timescale gradient methods for nonconvex-nonconcave minimax optimization, addressing instability and timescale separation challenges in overparameterized settings.


STEER: Assessing the Economic Rationality of Large Language Models

https://openreview.net/forum?id=nU1mtFDtMX

Compressor summary: The paper presents a methodology for assessing the economic rationality of LLMs as decision-making agents by surveying the literature, proposing a benchmark distribution, and conducting an empirical experiment.


Predictive Coding beyond Correlations

https://openreview.net/forum?id=nTgzmXvuEA

Compressor summary: This paper demonstrates how predictive coding, a biological learning algorithm, can do causal inference by modifying its inference process without changing the causal graph and applying it to image classification tasks.


Enhancing Value Function Estimation through First-Order State-Action Dynamics in Offline Reinforcement Learning

https://openreview.net/forum?id=nSGnx8lNJ6

Compressor summary: The paper proposes a method to improve offline reinforcement learning by combining discrete-time and continuous-time RL, using the value function's first derivative to better predict unvisited states.


Thermometer: Towards Universal Calibration for Large Language Models

https://openreview.net/forum?id=nP7Q1PnuLK

Compressor summary: THERMOMETER is a calibration approach that uses an auxiliary model to improve the accuracy and reliability of large language models for diverse tasks.


Viewing Transformers Through the Lens of Long Convolutions Layers

https://openreview.net/forum?id=nOyj26YdIQ

Compressor summary: The paper proposes minimal modifications to the transformer architecture to improve its performance on long-range tasks by incorporating smoothness and locality principles into the attention mechanism.


Unsupervised Representation Learning of Brain Activity via Bridging Voxel Activity and Functional Connectivity

https://openreview.net/forum?id=nOjZfpLyh1

Compressor summary: BrainMixer is an unsupervised learning framework that leverages voxel-level activity and functional connectivity to represent the brain effectively, performing better than existing methods in cognitive tasks and neurological diagnosis.


SHINE: Shielding Backdoors in Deep Reinforcement Learning

https://openreview.net/forum?id=nMWxLnSBGW

Compressor summary: SHINE is a new method for protecting deep reinforcement learning agents from backdoor attacks by identifying and removing triggers in the agent's policy.


StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

https://openreview.net/forum?id=nMN5hNZMQK

Compressor summary: The paper examines how different ways of parameterizing state-space models affect their memory learning abilities and introduces new reparameterization techniques to overcome the memory limitations of these models.


Completing Visual Objects via Bridging Generation and Segmentation

https://openreview.net/forum?id=nLgtHHBgl3

Compressor summary: The paper proposes MaskComp, a method that reconstructs incomplete objects using iterative generation and segmentation stages with a mask condition to improve image quality and refine the object mask.


Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

https://openreview.net/forum?id=nLRKnO74RB

Compressor summary: The paper proposes a dynamic weight-ensembling module for merging task-specific Transformer models, which can adapt to different tasks and reduce parameter interference.


Adaptive Online Experimental Design for Causal Discovery

https://openreview.net/forum?id=nJzf3TVnOn

Compressor summary: The text proposes a new causal discovery algorithm that efficiently learns cause-and-effect relationships from interventional data using adaptive interventions and sampling history.


Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

https://openreview.net/forum?id=nDps3Q8j2l

Compressor summary: The paper introduces FCNet, a new network that uses the frequency domain to improve data efficiency and inference speed in robotics reinforcement learning, outperforming Transformer on various tasks.


Learning Coverage Paths in Unknown Environments with Deep Reinforcement Learning

https://openreview.net/forum?id=nCZYRBK1J4

Compressor summary: The paper explores using reinforcement learning for online path planning in unknown environments and proposes a new map representation and reward function to improve coverage.


Equivariant Deep Weight Space Alignment

https://openreview.net/forum?id=nBPnmk6EeO

Compressor summary: Deep-Align is a novel framework that learns to solve the weight alignment problem of deep networks without requiring labeled data and improves both speed and quality of alignment compared to existing methods.


Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

https://openreview.net/forum?id=nBGBzV4It3

Compressor summary: The paper proposes a new approach to optimize the sampling schedules of diffusion models for better quality outputs, using methods from stochastic calculus.


Combining Experimental and Historical Data for Policy Evaluation

https://openreview.net/forum?id=nB6ERIud2y

Compressor summary: The paper proposes data integration methods for policy evaluation using experimental and historical data, with optimized weights and pessimistic principle to achieve robustness and efficiency in different reward shift scenarios.


Verifying message-passing neural networks via topology-based bounds tightening

https://openreview.net/forum?id=nAoiUlz4Bf

Compressor summary: Our approach provides robust certificates for message-passing neural networks using ReLU activation, mixed-integer optimization, and topology-based bounds tightening to ensure trustworthiness in the face of various graph attacks.


A fast algorithm to simulate nonlinear resistive networks

https://openreview.net/forum?id=nAbfF37H6t

Compressor summary: The authors propose a fast and exact method for simulating nonlinear resistive networks, enabling efficient training and reducing the simulation bottleneck.


Position: Enforced Amnesia as a Way to Mitigate the Potential Risk of Silent Suffering in the Conscious AI

https://openreview.net/forum?id=nACGn4US1R

Compressor summary: The paper suggests a way to prevent possible silent suffering in AI by limiting their access to memory or resetting it regularly, even without confirming their consciousness.


Scaling Down Deep Learning with MNIST-1D

https://openreview.net/forum?id=n9pru4bJU9

Compressor summary: MNIST-1D is a minimalist, procedurally generated deep learning benchmark that enables various experiments and research on low-memory and low-compute platforms.


Decoding-time Realignment of Language Models

https://openreview.net/forum?id=n8g6WMxt09

Compressor summary: DeRa is a method for exploring different levels of regularization in aligned language models without retraining, allowing users to control alignment and improve efficiency.


Parameterized Physics-informed Neural Networks for Parameterized PDEs

https://openreview.net/forum?id=n3yYrtt9U7

Compressor summary: P$^2$INNs are a new extension of physics-informed neural networks that can efficiently model solutions of parameterized partial differential equations, improving accuracy and efficiency on benchmark problems.


Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach

https://openreview.net/forum?id=n3smZl8itR

Compressor summary: The paper proposes a method to simplify complex multi-player games by breaking them down into smaller subgames and applying Bellman's principle of optimality, improving scalability and efficiency.


Learning the Uncertainty Sets of Linear Control Systems via Set Membership: A Non-asymptotic Analysis

https://openreview.net/forum?id=n2kq2EOHFE

Compressor summary: The paper investigates set membership estimation for unknown linear systems, providing the first convergence rate bounds and demonstrating its practical potential.


Multiply-Robust Causal Change Attribution

https://openreview.net/forum?id=n2eppIzHlL

Compressor summary: The paper proposes a new estimation strategy that combines regression and re-weighting methods to quantify the contribution of each causal mechanism in explaining the change in an outcome variable distribution.


Rejuvenating image-GPT as Strong Visual Representation Learners

https://openreview.net/forum?id=mzGtunvpJH

Compressor summary: The paper introduces D-iGPT, a new approach to learn visual representations from images by predicting semantic tokens and visible pixels with autoregressive pretraining using CLIP-based models.


Bayesian Regret Minimization in Offline Bandits

https://openreview.net/forum?id=mz55Ox0Igz

Compressor summary: The paper proposes a new algorithm for offline linear bandits that minimizes Bayesian regret using efficient conic optimization solvers and shows its superiority over the maximum lower confidence bound approach.


BeigeMaps: Behavioral Eigenmaps for Reinforcement Learning from Images

https://openreview.net/forum?id=myCgfQZzbc

Compressor summary: The paper proposes a new method called Behavioral Eigenmaps (BeigeMaps) for learning representations in reinforcement learning agents from high-dimensional image observations that group similar states and improve policy performance.


Is Epistemic Uncertainty Faithfully Represented by Evidential Deep Learning Methods?

https://openreview.net/forum?id=mxjB0LIgpT

Compressor summary: The paper discusses challenges and insights related to using evidential deep learning methods for quantifying uncertainty in ML systems.


Learning Surrogates for Offline Black-Box Optimization via Gradient Matching

https://openreview.net/forum?id=mv9beA1wDF

Compressor summary: The text discusses a problem in offline design optimization and proposes a new algorithm to improve surrogate models' accuracy by matching the latent gradient field in the data.


Soft Prompt Recovers Compressed LLMs, Transferably

https://openreview.net/forum?id=muBJPCIqZT

Compressor summary: The authors show that using soft prompts can improve the performance of compressed large language models, making them more accessible without additional engineering efforts.


Gambling-Based Confidence Sequences for Bounded Random Vectors

https://openreview.net/forum?id=mu7Er7f9NQ

Compressor summary: The paper introduces a new method to construct confidence sequences for multivariate stochastic processes using a general gambling framework, which improves upon existing methods in terms of tightness.


Major-Minor Mean Field Multi-Agent Reinforcement Learning

https://openreview.net/forum?id=mslTE1qgLa

Compressor summary: The paper introduces Major-Minor Mean Field Control (M3FC), a generalization of Mean Field Control that models many similar and few complex agents, and proposes an M3FMARL algorithm that approximates the policy gradient of the M3FC MDP and outperforms state-of-the-art methods in various scenarios.


Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning

https://openreview.net/forum?id=mrd4e8ZJjm

Compressor summary: The paper proposes a novel dynamics model for reinforcement learning that infers fine-grained causal structures using discrete latent variables, improving robustness and performance in downstream tasks.


Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective

https://openreview.net/forum?id=mphq2jMFLZ

Compressor summary: The paper investigates how two-layer neural networks can efficiently learn multiple kernel spaces and functions using mean-field Langevin dynamics in a two-timescale limit.


Repoformer: Selective Retrieval for Repository-Level Code Completion

https://openreview.net/forum?id=moyG54Okrj

Compressor summary: The paper proposes a selective RAG framework that uses self-supervised learning to decide when to retrieve contexts for code completion, achieving better performance and efficiency than existing methods.


Double Stochasticity Gazes Faster: Snap-Shot Decentralized Stochastic Gradient Tracking Methods

https://openreview.net/forum?id=mkbSXxovP5

Compressor summary: The text proposes two new decentralized optimization algorithms, snap-shot DSGT and accelerated snap-shot DSGT, which improve upon the existing DSGT method by using snapshot gradient tracking and achieving better convergence properties in different network topologies.


GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding

https://openreview.net/forum?id=mk8oRhox2l

Compressor summary: GliDe and CaPE are methods to speed up decoding of large language models by reusing cached keys and values and using confidence scores for token selection.


Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

https://openreview.net/forum?id=mk3A5IUdn8

Compressor summary: Caduceus is a new family of DNA language models that can handle long-range interactions, bi-directionality, and reverse complementarity, improving performance on downstream tasks compared to larger models without these features.


Bounded and Uniform Energy-based Out-of-distribution Detection for Graphs

https://openreview.net/forum?id=mjh7AOWozN

Compressor summary: NODESAFE improves GNNs' ability to detect out-of-distribution data in graphs by bounding negative energy scores and mitigating logit shifts, achieving better performance than previous methods.


LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models

https://openreview.net/forum?id=mhI5nc5QwX

Compressor summary: Key points: - The study observes low-rank structure in MHA sub-layer of Transformer, but not in FFN sub-layer - LoRAP is a novel method that combines low-rank approximation and structured pruning for LLMs - LoRAP outperforms previous methods on zero-shot perplexity and task classification Summary: The study proposes LoRAP, a new compression method for large language models that uses low-rank approximation and structured pruning, and shows its superior performance on various tasks.


Privacy Attacks in Decentralized Learning

https://openreview.net/forum?id=mggc3oYHy4

Compressor summary: The paper proposes a new attack on Decentralized Gradient Descent (D-GD) that allows an attacker to reconstruct private data of other users by exploiting the gossip averaging protocol.


On Online Experimentation without Device Identifiers

https://openreview.net/forum?id=merZTLSdC9

Compressor summary: HIFIVE is a variational method for accurately estimating human preferences and causal effects from fragmented device identifiers in online behaviors.


Unsupervised Domain Adaptation for Anatomical Structure Detection in Ultrasound Images

https://openreview.net/forum?id=meItvvCO7X

Compressor summary: ToMo-UDA is a new method for detecting fetal structures in ultrasound images that uses topology and morphology knowledge to overcome challenges due to differences between institutions and overlapping structures.


Position: Tensor Networks are a Valuable Asset for Green AI

https://openreview.net/forum?id=mcg6jppkwb

Compressor summary: The paper argues that tensor networks can enhance both sustainability and inclusivity in AI research by providing mathematical rigor and efficient compression.


A2Q+: Improving Accumulator-Aware Weight Quantization

https://openreview.net/forum?id=mbx2pLK5Eq

Compressor summary: A2Q+ is an improved quantization method for neural networks that avoids numerical overflow, reduces quantization error, and uses weight normalization.


Distributionally Robust Data Valuation

https://openreview.net/forum?id=mbBehLOAqR

Compressor summary: The paper proposes a method to value data for machine learning models without knowing the validation dataset, using distributionally robust generalization error and model deviation as measures, and shows its effectiveness on real-world datasets.


HumanTOMATO: Text-aligned Whole-body Motion Generation

https://openreview.net/forum?id=maVIKlGqr7

Compressor summary: The paper introduces HumanTOMATO, a framework for generating realistic whole-body motions from text descriptions, addressing the limitations of previous methods by using a hierarchical VQ-VAE and a Hierarchical-GPT model.


Low-Rank Similarity Mining for Multimodal Dataset Distillation

https://openreview.net/forum?id=mY93trX2Qz

Compressor summary: The paper proposes LoRS, a method to create synthetic data from image-text pairs for visual-language dataset distillation, which improves existing algorithms and focuses on modality correspondence.


Reinforcement Learning from Reachability Specifications: PAC Guarantees with Expected Conditional Distance

https://openreview.net/forum?id=mXUDDL4r1Q

Compressor summary: The text discusses a fundamental problem in sequential decision making, presenting lower bounds and an algorithm for Reinforcement Learning (RL) from reachability specifications using expected conditional distance (ECD).


Beyond the ROC Curve: Classification Trees Using Cost-Optimal Curves, with Application to Imbalanced Datasets

https://openreview.net/forum?id=mXLcbRBA8v

Compressor summary: The paper proposes an algorithm for building oblique classification trees that optimizes a loss function based on minimizing false negatives subject to a maximum false positive rate, which improves accuracy and interpretability in imbalanced datasets with class costs.


Spider: A Unified Framework for Context-dependent Concept Segmentation

https://openreview.net/forum?id=mWV8NeU79e

Compressor summary: The text introduces Spider, a unified model that can understand and distinguish various context-dependent concepts in different domains, outperforming existing specialized models and enabling continuous learning.


Adaptive Accompaniment with ReaLchords

https://openreview.net/forum?id=mUVydzrkgz

Compressor summary: ReaLchords is an online generative model for improvising chord accompaniment to user melody that uses reinforcement learning and a novel reward model for harmonic and temporal coherency.


Privacy-Preserving Instructions for Aligning Large Language Models

https://openreview.net/forum?id=mUT1biz09t

Compressor summary: The text proposes using synthetic instructions to protect user privacy in language model applications, and shows that this method achieves high utility in both supervised and reinforcement learning settings.


WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

https://openreview.net/forum?id=mUSPhG4uDW

Compressor summary: The paper introduces WEBLINX, a benchmark for conversational web navigation, and compares different models' performance in this task.


PruNeRF: Segment-Centric Dataset Pruning via 3D Spatial Consistency

https://openreview.net/forum?id=mU7FfQT6VE

Compressor summary: PruNeRF is a framework to prune distracting objects from NeRF training data using 3D spatial consistency, segmentation, and depth-based reprojection.


Overcoming Data and Model heterogeneities in Decentralized Federated Learning via Synthetic Anchors

https://openreview.net/forum?id=mNzkumTSVL

Compressor summary: Key points: - Decentralized FL is a serverless network where clients train local models separately - This may reduce model generalizability due to data and model heterogeneity among clients - DeSA introduces synthetic global anchors based on raw data distribution to facilitate knowledge transfer and regularization Summary: DeSA is a decentralized FL technique that uses synthetic global anchors to enhance the generalizability of local models trained separately by clients.


Rethinking Data Shapley for Data Selection Tasks: Misleads and Merits

https://openreview.net/forum?id=mKYBMf1hHG

Compressor summary: The study explores the inconsistency of Data Shapley's performance in data selection tasks and identifies a class of utility functions where it works optimally, proposing a heuristic for predicting its effectiveness.


Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens

https://openreview.net/forum?id=mK6FB9xQ7v

Compressor summary: The paper compares the performance of a new optimizer, AdamQLR, which combines elements of first-order (Adam) and second-order (K-FAC) methods for deep learning optimization.


What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding

https://openreview.net/forum?id=mJhXlsZzzE

Compressor summary: This study analyzes the sample complexity and convergence of a shallow Graph Transformer for semi-supervised node classification, showing how self-attention and positional encoding improve generalization.


Challenges in Training PINNs: A Loss Landscape Perspective

https://openreview.net/forum?id=mJGiFr8jLa

Compressor summary: The paper investigates challenges in training Physics-Informed Neural Networks (PINNs), compares different optimizers, and proposes a new second-order optimizer to improve their performance.


Rethinking Optimization and Architecture for Tiny Language Models

https://openreview.net/forum?id=mHIEOZtDDF

Compressor summary: The authors study various optimization techniques for tiny language models and achieve significant improvement in performance compared to baseline models.


On Hypothesis Transfer Learning of Functional Linear Models

https://openreview.net/forum?id=mGsF8Q0fGZ

Compressor summary: The paper proposes two transfer learning algorithms for functional linear regression, using RKHS distance to measure similarity and analyze dynamics, with empirical results on synthetic and real data.


A Multimodal Automated Interpretability Agent

https://openreview.net/forum?id=mDw42ZanmE

Compressor summary: MAIA is a system that uses neural models to automate understanding of other neural models' behavior, especially for vision tasks, by providing tools such as synthesizing inputs, computing activating exemplars, and summarizing results.


Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

https://openreview.net/forum?id=mCzyRdDak5

Compressor summary: The paper presents two zero-shot audio editing techniques using pre-trained diffusion models, ZETA for text-based edits and ZEUS for semantically meaningful unsupervised modifications in music signals.


Reinformer: Max-Return Sequence Modeling for Offline RL

https://openreview.net/forum?id=mBc8Pestd5

Compressor summary: The paper introduces Reinforcer, a new sequence model for offline RL that maximizes returns and improves trajectory stitching.


Smooth Min-Max Monotonic Networks

https://openreview.net/forum?id=m8t1yzfBsJ

Compressor summary: The smooth min-max (SMM) network module is a simple modification of the existing min-max architecture that ensures monotonicity, improves training stability, and maintains good generalization performance in data-driven models.


Layer-Aware Analysis of Catastrophic Overfitting: Revealing the Pseudo-Robust Shortcut Dependency

https://openreview.net/forum?id=m8lCi7rG4u

Compressor summary: This paper investigates the causes of overfitting in adversarial training, finding that it's related to shortcuts formed in early layers of neural networks, and proposes a method to mitigate it by perturbing weights across layers.


When Representations Align: Universality in Representation Learning Dynamics

https://openreview.net/forum?id=m5nB7ucXHT

Compressor summary: The study proposes a theory of representation learning in deep neural networks that applies to various architectures and activation functions, showing that some aspects of learning dynamics are similar across them.


Smooth Tchebycheff Scalarization for Multi-Objective Optimization

https://openreview.net/forum?id=m4dO5L6eCp

Compressor summary: The paper proposes a lightweight and efficient smooth Tchebycheff scalarization approach for gradient-based multi-objective optimization with good theoretical properties and lower computational complexity.


Adaptive Conformal Inference by Betting

https://openreview.net/forum?id=lwWV4Zl3h1

Compressor summary: The paper proposes a new method for uncertain predictions in machine learning that doesn't require data assumptions or learning rate tuning.


Degeneration-free Policy Optimization: RL Fine-Tuning for Language Models without Degeneration

https://openreview.net/forum?id=lwTshcWlmB

Compressor summary: DfPO is a fine-tuning method for language models using reinforcement learning that prevents text degeneration and improves downstream task scores by masking KL divergence and using truncated advantage functions.


Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

https://openreview.net/forum?id=luqH1eL4PN

Compressor summary: BiPE is a new positional encoding method for language sequences that combines intra-segment and inter-segment encodings to improve semantic information capture and extrapolation capabilities.


Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization

https://openreview.net/forum?id=ltzTHGFF5i

Compressor summary: Jetfire is an INT8 training method for transformers that optimizes memory access and maintains accuracy while speeding up pretraining and reducing memory usage.


Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

https://openreview.net/forum?id=ltb2XaIr9p

Compressor summary: The paper analyzes two important optimization algorithms, gives their convergence rates, shows they stay near optimizers, provides an implementable version, and explores properties of a related method.


CauDiTS: Causal Disentangled Domain Adaptation of Multivariate Time Series

https://openreview.net/forum?id=lsavZkUjFZ

Compressor summary: CauDiTS is a framework for unsupervised domain adaptation of multivariate time series that disentangles causal patterns from correlations to improve classification reliability across domains.


MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts

https://openreview.net/forum?id=lsQnneYa8p

Compressor summary: The paper proposes a multi-task vehicle routing solver with mixture-of-experts and hierarchical gating, which improves generalization and performance on various vehicle routing problems.


DistiLLM: Towards Streamlined Distillation for Large Language Models

https://openreview.net/forum?id=lsHZNNoC7r

Compressor summary: DistiLLM is a new knowledge distillation framework for language models that improves efficiency and performance by using a novel divergence loss and an adaptive off-policy approach.


E$^2$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

https://openreview.net/forum?id=lrPrkWXqzd

Compressor summary: This paper proposes techniques to make distilling GANs from diffusion models more efficient, enabling real-time high-quality image editing on mobile devices with low training and storage costs.


Federated Combinatorial Multi-Agent Multi-Armed Bandits

https://openreview.net/forum?id=lrFwPeDdEQ

Compressor summary: Key points: - Paper introduces federated learning for online combinatorial optimization with bandit feedback - Transforms any offline resilient single-agent algorithm into an online multi-agent algorithm with better performance - Algorithm is communication-efficient and applies to stochastic submodular maximization Summary: The paper presents a federated learning framework that improves online combinatorial optimization with bandit feedback by transforming offline resilient algorithms into online multi-agent ones, achieving better regret bounds and communication efficiency.


SMaRt: Improving GANs with Score Matching Regularity

https://openreview.net/forum?id=lqeVCc9zYq

Compressor summary: The paper proposes a score matching regularity (SMaRt) technique to improve GANs' ability to generate data that matches the real data manifold, especially for diverse and complex data.


TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors

https://openreview.net/forum?id=lpHjmPvxW1

Compressor summary: The paper proposes a backdoor defense framework for diffusion models called TERD, which can reverse triggers and detect backdoor inputs, achieving high security and adaptability.


Density-Softmax: Efficient Test-time Model for Uncertainty Estimation and Robustness under Distribution Shifts

https://openreview.net/forum?id=lon750Kf7n

Compressor summary: Density-Softmax is a sampling-free framework that improves uncertainty estimation and robustness by combining a density function with the softmax layer, achieving competitive results with fewer parameters and faster test time.


Learning Modality Knowledge Alignment for Cross-Modality Transfer

https://openreview.net/forum?id=lmiurzioja

Compressor summary: The paper proposes MoNA, a meta-learning method that learns data transformations to reduce modality gaps and improve cross-modality transfer.


Detecting Influence Structures in Multi-Agent Reinforcement Learning

https://openreview.net/forum?id=lm04PyXoEl

Compressor summary: The paper introduces new ways to measure and analyze how much one agent can affect another in multi-agent reinforcement learning, including approximation algorithms and empirical validation.


Spike Distance Function as a Learning Objective for Spike Prediction

https://openreview.net/forum?id=limyQ1Kk0k

Compressor summary: The paper proposes a new method for predicting neuronal responses using neural networks and shows its superiority over Poisson models in inferring spike trains from retinal ganglion cells' recordings.


Mean-field Chaos Diffusion Models

https://openreview.net/forum?id=lgcFX4VFrM

Compressor summary: The paper introduces a new type of generative models that can handle large and complex data sets by using ideas from mean-field theory and score-matching methods.


On the Complexity of Finite-Sum Smooth Optimization under the Polyak–Łojasiewicz Condition

https://openreview.net/forum?id=leJGQCron2

Compressor summary: The paper analyzes the complexity of gradient methods for optimizing a Polyak–Łojasiewicz function with mean-squared smooth components and distributed on a network, and proposes an efficient decentralized method.


DMTG: One-Shot Differentiable Multi-Task Grouping

https://openreview.net/forum?id=lcX5GbDIi8

Compressor summary: The paper proposes a new Multi-Task Learning method that simultaneously groups tasks and trains a model in one-shot, using a differentiable Categorical distribution to prune task heads.


Meta-Reinforcement Learning Robust to Distributional Shift Via Performing Lifelong In-Context Learning

https://openreview.net/forum?id=laIOUtstMs

Compressor summary: PSBL is a meta-RL method that uses Bayesian inference to sample actions from the posterior distribution of the optimal policy, enabling in-context learning and improving performance on tasks with different distributions.


Revisiting Inexact Fixed-Point Iterations for Min-Max Problems: Stochasticity and Structured Nonconvexity

https://openreview.net/forum?id=lWy2lCTyJa

Compressor summary: The paper studies constrained and nonconvex min-max problems, improves first-order methods' complexity guarantees by using conic nonexpansiveness and relaxing inexactness levels, and analyzes a stochastic iteration method.


Generalization to New Sequential Decision Making Tasks with In-Context Learning

https://openreview.net/forum?id=lVQ4FUZ6dp

Compressor summary: The paper investigates how transformers can be trained to learn new sequential decision making tasks from few examples by using trajectory sequences with specific properties, and shows that larger datasets and more diverse tasks lead to better in-context learning.


Predictive Linear Online Tracking for Unknown Targets

https://openreview.net/forum?id=lT3W4AkyM7

Compressor summary: The paper proposes PLOT, an online tracking algorithm that learns a time-varying model of the target and uses it in receding horizon control, with theoretical results for non-stationary targets and real-world quadrotor demonstration.


Unsupervised Concept Discovery Mitigates Spurious Correlations

https://openreview.net/forum?id=lQzmDFlsHX

Compressor summary: The paper proposes CoBalT, a concept balancing technique that mitigates spurious correlations in unsupervised object-centric learning without requiring human labeling of subgroups.


Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training

https://openreview.net/forum?id=lQIN9ZyMLz

Compressor summary: The paper proposes a causal inference framework to improve multi-label image classification by leveraging label correlations while mitigating overfitting issues caused by co-occurrence relationships.


GaussianPro: 3D Gaussian Splatting with Progressive Propagation

https://openreview.net/forum?id=lQ3SEBH1gF

Compressor summary: GaussianPro is a novel method that improves 3D neural rendering by using progressive propagation and patch matching to densify 3D Gaussians, outperforming the traditional 3D Gaussian Splatting technique on large-scale scenes.


Inferring the Long-Term Causal Effects of Long-Term Treatments from Short-Term Experiments

https://openreview.net/forum?id=lQ2o7JteMO

Compressor summary: The text discusses how to infer long-term causal effects of continuous interventions using short-term observations and doubly-robust estimators in offline reinforcement learning.


Robust Stable Spiking Neural Networks

https://openreview.net/forum?id=lIYtJtpJR0

Compressor summary: Key points: - The paper studies the robustness of spiking neural networks (SNNs) against adversarial attacks by analyzing membrane potential perturbation dynamics. - The paper proposes a training framework with modified SNN neurons to reduce the mean square of membrane potential perturbation and enhance the stability of nonlinear systems. - The paper verifies the effectiveness of the framework on image classification task using Gaussian noise training and adversarial training. Summary: The paper improves the robustness of spiking neural networks against adversarial attacks by modifying their neurons to reduce perturbation dynamics and enhancing input-output stability.


HelmFluid: Learning Helmholtz Dynamics for Interpretable Fluid Prediction

https://openreview.net/forum?id=lHJFfDFbm6

Compressor summary: HelmFluid is a method that predicts fluid dynamics using the Helmholtz theorem, decomposing it into curl-free and divergence-free parts, and integrating them in multiple spatial scales.


Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning

https://openreview.net/forum?id=lGvIV4Bgsz

Compressor summary: The paper proposes a Convolution Bottleneck structure in CNNs, where early layers transform input into a few frequencies and channels, and later layers map back to outputs, explaining the common practice of down-sampling and how it affects function learning.


Coarse-To-Fine Tensor Trains for Compact Visual Representations

https://openreview.net/forum?id=lGZUvfP2ZF

Compressor summary: The text proposes PuTT, a method for learning compact and high-quality representations of visual data using tensor train upsampling, which improves compression, denoising, and image completion tasks.


Feel-Good Thompson Sampling for Contextual Dueling Bandits

https://openreview.net/forum?id=l9ga3iQuHt

Compressor summary: FGTS.CDB is a Thompson sampling algorithm for linear contextual dueling bandits with a new Feel-Good exploration term that achieves near minimax-optimal regret.


Understanding Stochastic Natural Gradient Variational Inference

https://openreview.net/forum?id=l8GrPpsZfy

Compressor summary: Stochastic NGVI has a non-asymptotic convergence rate of $\mathcal{O}(\frac{1}{T})$ for conjugate likelihoods, similar to stochastic gradient descent, and likely converges faster in practice; however, for non-conjugate likelihoods, it optimizes a non-convex objective with no known global convergence rate.


On the Feasibility of Single-Pass Full-Capacity Learning in Linear Threshold Neurons with Binary Input Vectors

https://openreview.net/forum?id=l7vQQi0I2d

Compressor summary: The text investigates whether there exist single-pass learning rules with maximum capacity in a certain class of rules, but finds such rules impossible using a linear program.


Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation

https://openreview.net/forum?id=l7shXGuGBT

Compressor summary: The paper proposes MATRIX, a social scene simulator that helps large language models align with human values by emulating realistic scenarios and fine-tuning the LLMs with simulated data, achieving better alignment than existing methods.


PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

https://openreview.net/forum?id=l6Hef6FVd0

Compressor summary: PIPER is a novel hierarchical reinforcement learning method that uses preference-based learning and hindsight relabeling to learn from sparse rewards and improve performance in challenging tasks.


Scalable Multiple Kernel Clustering: Learning Clustering Structure from Expectation

https://openreview.net/forum?id=l5lgbVR6BP

Compressor summary: This paper proposes a novel multiple kernel clustering framework that learns from expectation kernel matrices and provides theoretical guarantees for its performance.


TravelPlanner: A Benchmark for Real-World Planning with Language Agents

https://openreview.net/forum?id=l5XQzNkAOe

Compressor summary: The text introduces TravelPlanner, a new planning benchmark for testing language agents' abilities in complex real-world scenarios, showing that current language models struggle to handle it.


Turnstile $\ell_p$ leverage score sampling with applications

https://openreview.net/forum?id=l4ZjeDDnu9

Compressor summary: The paper proposes a novel algorithm for sampling rows in a matrix based on their norm when the data is presented as a turnstile data stream, which can improve subsampling constructions for regression problems with low overhead.


Multi-group Learning for Hierarchical Groups

https://openreview.net/forum?id=l4H7Hv7LhJ

Compressor summary: The paper proposes a method to build decision trees that can handle hierarchical groups and generalize well with few samples.


Causal Discovery via Conditional Independence Testing with Proxy Variables

https://openreview.net/forum?id=l1YbS3qkdk

Compressor summary: The paper proposes a new method to test causal relationships over continuous variables without parametric assumptions, using discretization and a novel test statistic.


Semantically-correlated memories in a dense associative model

https://openreview.net/forum?id=l0OGoZPZuC

Compressor summary: CDAM is a novel associative memory model that uses graph structures to link memory patterns and can handle auto- and hetero-association with anti-Hebbian learning rules for various applications.


Provably Neural Active Learning Succeeds via Prioritizing Perplexing Samples

https://openreview.net/forum?id=kzz0kn546b

Compressor summary: The text explains how neural network-based active learning works by prioritizing samples with yet-to-be-learned features, and shows that both uncertainty-based and diversity-based query criteria achieve this goal.


Selective Mixup Helps with Distribution Shifts, But Not (Only) because of Mixup

https://openreview.net/forum?id=ksph9pkEDc

Compressor summary: Selective mixup improves generalization by non-randomly selecting pairs, which resamples data to uniform class distribution, an effect explained by regression toward the mean.


Feasible Reachable Policy Iteration

https://openreview.net/forum?id=ks8qSwkkuZ

Compressor summary: The paper proposes a safe reinforcement learning method that considers goal achievement and reduces ineffective exploration by using a feasible reachable function and policy iteration.


Efficient Policy Evaluation with Offline Data Informed Behavior Policy Design

https://openreview.net/forum?id=kpDd2HCBka

Compressor summary: The paper proposes methods to make online Monte Carlo estimators more efficient and unbiased by learning a tailored behavior policy from offline data.


When and How Does In-Distribution Label Help Out-of-Distribution Detection?

https://openreview.net/forum?id=knhbhDLdry

Compressor summary: The paper explores how using in-distribution (ID) labels can improve out-of-distribution (OOD) detection in machine learning by analyzing data separability with a graph-theoretic approach.


Bayesian Knowledge Distillation: A Bayesian Perspective of Distillation with Uncertainty Quantification

https://openreview.net/forum?id=knZ4NYzGUd

Compressor summary: Bayesian Knowledge Distillation (BKD) is a method that connects KD to Bayesian modeling, providing insight into its working mechanism and tools for measuring student model uncertainty.


Monotone, Bi-Lipschitz, and Polyak-Łojasiewicz Networks

https://openreview.net/forum?id=kn2xp8UOvQ

Compressor summary: The paper introduces BiLipNet, an invertible neural network that can control its output sensitivity and input distinguishability, and PLNet, a scalar-output network based on BiLipNet and quadratic potential, which can learn non-convex losses with global minimum.


On the Expressive Power of Spectral Invariant Graph Neural Networks

https://openreview.net/forum?id=kmugaw9Kfq

Compressor summary: The paper introduces EPNN, a novel message-passing framework for spectral invariant GNNs, and analyzes their expressiveness compared to other architectures and subgraph GNNs.


High-Probability Bound for Non-Smooth Non-Convex Stochastic Optimization with Heavy Tails

https://openreview.net/forum?id=klKk9ETAyU

Compressor summary: The paper proposes a new online-to-non-convex framework that uses heavy-tailed gradients and gradient clipping to find stationary points with high probability, improving on existing results for smooth and non-smooth objectives.


Differentially Private Domain Adaptation with Theoretical Guarantees

https://openreview.net/forum?id=kkqIEp2bRa

Compressor summary: The paper proposes two differentially private algorithms for adapting predictions from public data to private data with similar performance as non-private methods.


Sign Gradient Descent-based Neuronal Dynamics: ANN-to-SNN Conversion Beyond ReLU Network

https://openreview.net/forum?id=kfpe7Dg23G

Compressor summary: The text proposes a new optimization-based approach to improve spiking neural networks by approximating subgradient methods and expanding their nonlinearity support, achieving state-of-the-art conversion from artificial neural networks.


FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering

https://openreview.net/forum?id=kc4dZYJlJG

Compressor summary: FedRC is a new algorithm that helps protect privacy in machine learning by handling multiple types of data shifts among clients using clustering and bi-level optimization.


BAT: Learning to Reason about Spatial Sounds with Large Language Models

https://openreview.net/forum?id=kao5hRX9YA

Compressor summary: BAT is a model that combines spatial sound perception and natural language reasoning to navigate and interpret in-the-wild spatial sounds using synthesized audio data and a novel spatial audio encoder.


How do Transformers Perform In-Context Autoregressive Learning ?

https://openreview.net/forum?id=kZbTkpnafR

Compressor summary: The paper investigates how Transformers learn to predict the next token in a sequence by training on a simple first-order autoregressive task and shows that they do so through an in-context autoregressive learning procedure, where they learn orthogonal matrices to capture data patterns.


Hyperbolic Optimizer as a Dynamical System

https://openreview.net/forum?id=kZKopcDp2q

Compressor summary: The text presents a novel way to study optimizers in neural networks using hyperbolic geometry and dynamical systems tools, focusing on their long-term behavior and stability.


More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning

https://openreview.net/forum?id=kZBCFQe1Ej

Compressor summary: The paper proves that Distributional Reinforcement Learning can achieve better instance-dependent bounds in both online and offline RL than previous methods, and demonstrates its effectiveness in contextual bandits.


Intersecting-Boundary-Sensitive Fingerprinting for Tampering Detection of DNN Models

https://openreview.net/forum?id=kZArjKc64o

Compressor summary: IBSF is a novel fingerprinting method for detecting tampering with DNN models using only top-1 labels, which maximizes the partial Shannon entropy of selected categories near decision boundaries.


Parameter Estimation in DAGs from Incomplete Data via Optimal Transport

https://openreview.net/forum?id=kXde6Qa6Uy

Compressor summary: The text proposes a new framework for estimating graphical model parameters from incomplete data using optimal transport, without making unrealistic assumptions or variational approximations, and shows its effectiveness and robustness in experiments.


R2E: Turning any Github Repository into a Programming Agent Environment

https://openreview.net/forum?id=kXHgEYFyf3

Compressor summary: The paper introduces R2E, a framework that converts GitHub repositories into test environments for evaluating AI coding assistants using program analysis and large language models.


Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

https://openreview.net/forum?id=kVgpa1rfLO

Compressor summary: The paper introduces a new unsupervised federated learning algorithm (FedGrEM) for mixture models, with a comprehensive finite-sample theory that compares its performance to local single-task learning and other federated EM algorithms.


Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices

https://openreview.net/forum?id=kUm9iuvwIQ

Compressor summary: Slicedit is a method for text-based video editing that uses a pre-trained image synthesis model to process spatial and spatiotemporal slices, achieving better results than existing methods.


A Generative Approach for Treatment Effect Estimation under Collider Bias: From an Out-of-Distribution Perspective

https://openreview.net/forum?id=kUj9b2CezT

Compressor summary: The paper proposes a novel method, C$^2$GAM, to generate missing data from different environments and address collider bias in observational studies.


Accelerating Transformer Pre-training with 2:4 Sparsity

https://openreview.net/forum?id=kTaX87Zn6M

Compressor summary: The paper proposes techniques to accelerate and preserve accuracy in pre-training large transformers using 2:4 sparse matrix multiplication on NVIDIA Ampere GPUs.


Fewer Truncations Improve Language Modeling

https://openreview.net/forum?id=kRxCDDFNpp

Compressor summary: Best-fit Packing is a method that optimizes document packing for large language models, reducing truncations and improving coherence and performance.


Variational Schrödinger Diffusion Models

https://openreview.net/forum?id=kRv0WPJd00

Compressor summary: The variational Schr"odinger diffusion model (VSDM) improves scalability and efficiency of transportation plans in diffusion models by linearizing forward score functions with variational inference and optimizing backward scores without simulation-based losses.


Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design

https://openreview.net/forum?id=kQwSbv0BR4

Compressor summary: DFMs are a new flow-based generative model that can handle discrete and continuous data, achieving state-of-the-art results in protein co-design tasks.


Prompt-guided Precise Audio Editing with Diffusion Models

https://openreview.net/forum?id=kQ1dwuheR0

Compressor summary: PPAE is a text-guided, training-free method for precise audio editing using cross-attention maps and a hierarchical pipeline.


MusicFlow: Cascaded Flow Matching for Text Guided Music Generation

https://openreview.net/forum?id=kOczKjmYum

Compressor summary: MusicFlow is a text-to-music model that uses flow matching networks and masked prediction to generate high-quality music from text descriptions efficiently.


Self-Supervised Coarsening of Unstructured Grid with Automatic Differentiation

https://openreview.net/forum?id=kMBvZ40Iu9

Compressor summary: The paper proposes a method to reduce the computational cost of numerical simulations using differentiable physics, $k$-means clustering, and stochastic minimization for coarsening unstructured grids.


Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

https://openreview.net/forum?id=kLiDMGJKx1

Compressor summary: The Outlier-Efficient Modern Hopfield Model improves associative memory retrieval and attention in large transformer-based models, reducing kurtosis and infinity norm of model outputs.


StrWAEs to Invariant Representations

https://openreview.net/forum?id=kLZZWvqlEm

Compressor summary: Key points: - Autoencoders are useful for generative modeling and representation learning - Structural constraints like conditional independence can improve latent variable invariance - Wasserstein autoencodters (WAEs) can easily incorporate such constraints - StrWAEs are a principled way of penalizing autoencoders to impose structural constraints - StrWAEs show promising results on various tasks Summary: The paper introduces StrWAEs, a flexible and principled way of imposing structural constraints on autoencoders using Wasserstein autoencoders, which improve latent variable invariance and perform well on several tasks.


Learning Latent Structures in Network Games via Data-Dependent Gated-Prior Graph Variational Autoencoders

https://openreview.net/forum?id=kKWjZoaRLv

Compressor summary: The GPGVAE model learns the strategic interactions and network structures from observed actions in network games using a spectral GNN encoder, a data-dependent gated prior, and a Transformer mixture of Bernoulli encoder.


ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

https://openreview.net/forum?id=kIh7GJmRfD

Compressor summary: ATraDiff is a generative diffusion model that adapts to different trajectory lengths and distribution shifts, improving online RL performance using synthetic trajectories from offline data.


Pluvial Flood Emulation with Hydraulics-informed Message Passing

https://openreview.net/forum?id=kIHIA6Lr0B

Compressor summary: The paper introduces a hydraulics-informed graph neural network for flood simulation that incorporates physics domain knowledge and performs well on complex topography and sparse precipitation data.


RoboDreamer: Learning Compositional World Models for Robot Imagination

https://openreview.net/forum?id=kHjOmAUfVe

Compressor summary: RoboDreamer is an innovative method that learns a compositional world model by factorizing video generation, enabling generalization of language instructions for realistic plan synthesis and multimodal goals in robotic decision-making.


Acquisition Conditioned Oracle for Nongreedy Active Feature Acquisition

https://openreview.net/forum?id=kGXUL6qGso

Compressor summary: The paper presents a new method for sequentially selecting features in health care applications that minimizes costs and improves diagnostic performance using an oracle based approach.


Listenable Maps for Audio Classifiers

https://openreview.net/forum?id=kAfYYg6PX8

Compressor summary: L-MAC is an interpretation method for audio classifiers that generates binary masks to highlight relevant parts of audio signals, which are more faithful and preferred by users than other methods.


Autaptic Synaptic Circuit Enhances Spatio-temporal Predictive Learning of Spiking Neural Networks

https://openreview.net/forum?id=kAIkYOE5pV

Compressor summary: The paper proposes a novel Spatio-Temporal Circuit (STC) model for spiking neural networks to improve their ability to handle complex, dynamic spatio-temporal prediction tasks by incorporating autaptic synapses and two learnable adaptive pathways.


OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

https://openreview.net/forum?id=kAFevjEYsz

Compressor summary: Key points: - Existing methods improve adversarial robustness but only on in-distribution (ID) data - OODRobustBench is a benchmark to assess out-of-distribution (OOD) robustness using 23 dataset shifts and 6 threat models - Adversarial robustness suffers from severe OOD generalization issue - ID robustness correlates strongly with OOD robustness in a positive linear way - Existing methods are unlikely to achieve high OOD robustness and novel methods are needed Summary: The paper introduces OODRobustBench, a benchmark to measure OOD adversarial robustness using various shifts and models, and shows that existing methods have poor OOD performance and require new solutions.


Improving SAM Requires Rethinking its Optimization Formulation

https://openreview.net/forum?id=k7G4N1x7f9

Compressor summary: The paper proposes a new version of Sharpness-Aware Minimization (SAM) called BiSAM, which improves the performance of network perturbations using a bilevel optimization approach and a novel lower-bound surrogate loss.


Directly Denoising Diffusion Models

https://openreview.net/forum?id=k5ncz7TIPX

Compressor summary: DDDMs generate realistic images quickly using few-step sampling and Pseudo-LPIPS, outperforming GANs and distillation-based models on benchmark datasets.


Differentially Private Decentralized Learning with Random Walks

https://openreview.net/forum?id=k2dVVIWWho

Compressor summary: This paper studies the privacy of decentralized learning with random walk algorithms using a new differential privacy variant, and shows that it can improve privacy compared to gossip algorithms for nearby nodes.


Self-Driven Entropy Aggregation for Byzantine-Robust Heterogeneous Federated Learning

https://openreview.net/forum?id=k2axqNsVVO

Compressor summary: SDEA is a method for Byzantine-robust aggregation in heterogeneous federated learning that uses a random public dataset and learns aggregation weights to distinguish benign from malicious clients.


Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?

https://openreview.net/forum?id=k1JXxbpIY6

Compressor summary: The authors investigate how large language models perform on arithmetic word problems, comparing their cognitive biases to those of children, and find that LLMs mimic human biases in understanding text and planning solutions, but not in executing arithmetic expressions.


Exploiting Negative Samples: A Catalyst for Cohort Discovery in Healthcare Analytics

https://openreview.net/forum?id=k1J2GbamLi

Compressor summary: The paper proposes a method to find cohorts within negative samples in healthcare analytics using data Shapley values and manifold learning, which can reveal insights on diseases and related conditions.


Learning to Remove Cuts in Integer Linear Programming

https://openreview.net/forum?id=k10805cgak

Compressor summary: The paper proposes a new cutting plane method for solving integer linear programs by removing some existing constraints, which improves the performance compared to adding new ones.


Position: Foundation Agents as the Paradigm Shift for Decision Making

https://openreview.net/forum?id=jzHmElqpPe

Compressor summary: Key points: - Decision making needs perception, memory, and reasoning for optimal policies - Conventional approaches have limitations in sample efficiency and generalization - Foundation models in language and vision can adapt to diverse tasks faster - The authors propose foundation agents as a new learning paradigm inspired by LLMs - They outline the roadmap of foundation agents from data collection/generation, pretraining, adaptation, and alignment with LLMs - They identify critical research questions and trends for foundation agents with real-world use cases Summary: The paper proposes foundation agents, a new learning paradigm for decision making inspired by large language models, that can adapt faster and better to diverse tasks with data collection/generation, pretraining, adaptation, and alignment.


RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching

https://openreview.net/forum?id=jxvqvZLBuU

Compressor summary: RNAFlow is an AI method for designing RNA structures and sequences by combining a denoising network with an RNA inverse folding model that considers conformational flexibility.


Data-free Distillation of Diffusion Models with Bootstrapping

https://openreview.net/forum?id=jw2f9v59g0

Compressor summary: BOOT is an efficient data-free distillation technique for improving the speed and quality of image generation using diffusion models without requiring additional data or computationally expensive processes.


Minimizing $f$-Divergences by Interpolating Velocity Fields

https://openreview.net/forum?id=jvVWPtJYbc

Compressor summary: The paper proposes a new method to estimate velocity fields for Wasserstein Gradient Flow, which improves the accuracy of particle movement and applies it to domain adaptation and missing data imputation tasks.


Efficient Algorithms for Sum-Of-Minimum Optimization

https://openreview.net/forum?id=jsmaWEdx9g

Compressor summary: Key points: - propose a new optimization model for clustering applications called sum-of-minimum optimization - develop efficient algorithms inspired by k-means and Lloyd's algorithm - prove a tight bound and convergence rate for the algorithms - show numerical results on multiple tasks Summary: The paper introduces sum-of-minimum optimization, a novel clustering model with efficient algorithms based on k-means and Lloyd's algorithm, and demonstrates its effectiveness on various tasks.


Position: Measure Dataset Diversity, Don't Just Claim It

https://openreview.net/forum?id=jsKr6RVDDs

Compressor summary: The text discusses the challenges of defining and measuring diversity in machine learning datasets, and suggests using principles from measurement theory to improve dataset construction.


FedMBridge: Bridgeable Multimodal Federated Learning

https://openreview.net/forum?id=jrHUbftLd6

Compressor summary: FedMBridge is a new method for multimodal federated learning that uses a hypernetwork to handle different client architectures and data types while protecting privacy.


PGODE: Towards High-quality System Dynamics Modeling

https://openreview.net/forum?id=jrE7geZekq

Compressor summary: The paper proposes PGODE, a new method for modeling multi-agent dynamical systems using prototype decomposition and graph ODEs, which improves generalization under system changes and outperforms baselines in various scenarios.


Conformalized Survival Distributions: A Generic Post-Process to Increase Calibration

https://openreview.net/forum?id=jr0W36wOBx

Compressor summary: The paper proposes a new conformal regression method to improve survival analysis models' calibration without compromising their ability to rank subjects accurately.


Efficient Precision and Recall Metrics for Assessing Generative Models using Hubness-aware Sampling

https://openreview.net/forum?id=jnps5YwNlU

Compressor summary: The paper proposes eP&R, a new evaluation metric for deep generative models that uses hubness-aware sampling to reduce computational costs while maintaining accuracy.


A decoder-only foundation model for time-series forecasting

https://openreview.net/forum?id=jn2iTJas6h

Compressor summary: The paper presents a time-series foundation model for forecasting that performs well without any fine-tuning on various public datasets using a large pretrained attention model with input patching.


In-Context Decision Transformer: Reinforcement Learning via Hierarchical Chain-of-Thought

https://openreview.net/forum?id=jmmji1EU3g

Compressor summary: The In-context Decision Transformer (IDT) improves offline reinforcement learning efficiency by using a hierarchical structure inspired by human decision-making to reconstruct high-level decisions instead of low-level actions, achieving state-of-the-art results in long-horizon tasks and reducing evaluation time significantly.


Seesaw: Compensating for Nonlinear Reduction with Linear Computations for Private Inference

https://openreview.net/forum?id=jklD0TV5Hw

Compressor summary: Seesaw is a novel neural architecture search method that leverages more linear computations and nonlinear result reuse for privacy-preserving machine learning, achieving better accuracy and lower latency than existing methods.


Post-hoc Part-Prototype Networks

https://openreview.net/forum?id=jhWSzTO0Jl

Compressor summary: The paper proposes a method to create post-hoc part-prototype networks that can explain both where and what a model looks for in an image, while maintaining performance and providing more faithful explanations.


Plug-in Performative Optimization

https://openreview.net/forum?id=jh7FDDwDBf

Compressor summary: The paper proposes plug-in performative optimization, a method that uses possibly misspecified models to reduce performative risk and improve prediction in situations where the feedback affects future observations.


BBox-Adapter: Lightweight Adapting for Black-Box Large Language Models

https://openreview.net/forum?id=jdRIaUu3xY

Compressor summary: BBox-Adapter is a novel method for adapting black-box LLMs to specific tasks using ranking-based NCE loss and online data sampling, improving performance and reducing costs.


Online Variational Sequential Monte Carlo

https://openreview.net/forum?id=jbPc3pW6sC

Compressor summary: The paper proposes online VSMC, a method for efficient and accurate parameter estimation and latent state inference in state-space models using stochastic gradient approximation.


Simple Ingredients for Offline Reinforcement Learning

https://openreview.net/forum?id=japBn31gXC

Compressor summary: The text discusses how offline reinforcement learning algorithms struggle with diverse data from different tasks, and suggests using larger policy sizes to improve performance.


Unmasking Vulnerabilities: Cardinality Sketches under Adaptive Inputs

https://openreview.net/forum?id=jaJxpKkBcL

Compressor summary: The paper studies how cardinality sketches perform in adaptive settings and reveals their vulnerabilities, showing an attack that exploits simple non-adaptive queries to generate adversarial inputs.


FiT: Flexible Vision Transformer for Diffusion Model

https://openreview.net/forum?id=jZVen2JguY

Compressor summary: The Flexible Vision Transformer (FiT) is a new transformer architecture that can generate images with unrestricted resolutions and aspect ratios by conceptualizing images as sequences of dynamically-sized tokens, thus promoting resolution generalization and eliminating biases.


Connecting the Dots: Collaborative Fine-tuning for Black-Box Vision-Language Models

https://openreview.net/forum?id=jZEY5SxbL4

Compressor summary: Key points: - CraFT is a method for fine-tuning black-box VLMs using input prompts and output predictions only - CraFT consists of two modules: prompt generation and prediction refinement, with an auxiliary loss to ensure consistency - CraFT outperforms white-box methods on few-shot classification with less queries, faster training, and less memory Summary: CraFT is a novel approach for fine-tuning black-box VLMs without accessing their parameters. It uses prompt generation and prediction refinement modules to improve few-shot classification performance, efficiency, and memory footprint.


Conditional Language Learning with Context

https://openreview.net/forum?id=jXn1qIcjyG

Compressor summary: The paper introduces conditional finetuning, a method that allows language models to learn useful knowledge from a corpus while avoiding unnecessary biases, leading to improved performance on downstream tasks and lifelong learning.


SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

https://openreview.net/forum?id=jWHU4b7Yk6

Compressor summary: SyCoCa is a method that improves multimodal alignment between language and vision by introducing bidirectional interactions on global and local representations using textual and visual cues.


MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

https://openreview.net/forum?id=jVXJdGQ4eD

Compressor summary: MagicPose is a diffusion-based model that can generate realistic human images with controlled poses and facial expressions while preserving the identity, using a two-stage training strategy to disentangle appearance and motions.


From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation

https://openreview.net/forum?id=jU6iPouOZ6

Compressor summary: The paper introduces a novel framework called Vision to Audio and Beyond (VAB) that learns representations and generates modalities within latent spaces for various audio-visual tasks.


Sparse Inducing Points in Deep Gaussian Processes: Enhancing Modeling with Denoising Diffusion Variational Inference

https://openreview.net/forum?id=jTn4AIOgpM

Compressor summary: DDVI is a novel method that uses denoising diffusion SDEs and neural networks to infer the posterior distribution of sparse inducing points in deep Gaussian processes, improving efficiency and reducing bias.


No Dimensional Sampling Coresets for Classification

https://openreview.net/forum?id=jS3CMHtYJD

Compressor summary: The paper presents new coresets for classification problems that have smaller size, distributional input support, and various loss function applications.


Position: On the Societal Impact of Open Foundation Models

https://openreview.net/forum?id=jRX6yCxFhx

Compressor summary: Open foundation models have benefits and risks related to innovation, power distribution, and transparency, but their misuse risk is unclear due to insufficient research and varying assumptions in past work.


The Perception-Robustness Tradeoff in Deterministic Image Restoration

https://openreview.net/forum?id=jQA5iutPzd

Compressor summary: The paper studies how well deterministic methods solve inverse problems in imaging and shows that they need higher Lipschitz constants for better perceptual quality and consistency, making them more vulnerable to attacks.


Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

https://openreview.net/forum?id=jQ92egz5Ym

Compressor summary: The paper proposes IR-QLoRA, a method to improve the accuracy of quantized large language models for deployment on resource-constrained devices by retaining original information through statistics-based and finetuning-based technologies.


Learning Divergence Fields for Shift-Robust Graph Representations

https://openreview.net/forum?id=jPaEOH56JB

Compressor summary: The paper proposes a geometric diffusion model with learnable divergence fields and causal inference to improve generalization for interdependent data, and provides three model instantiations based on GCN, GAT, and Transformers.


Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization

https://openreview.net/forum?id=jP8mf34iCW

Compressor summary: The paper proposes a new subset selection algorithm for active learning that optimizes batch acquisition directly on the combinatorial space using sequential greedy sampling and improves efficiency in solving expensive multi-objective combinatorial optimization problems.


Learning the Target Network in Function Space

https://openreview.net/forum?id=jP1zeEqHli

Compressor summary: The paper proposes Lookahead-Replicate, a new reinforcement learning algorithm that maintains function space equivalence between online and target networks, leading to better deep RL performance on Atari games.


Fast Timing-Conditioned Latent Audio Diffusion

https://openreview.net/forum?id=jOlO8t1xdx

Compressor summary: The paper proposes an efficient generative model that creates long-form, variable-length stereo audio from text prompts using latent diffusion and convolutional autoencoders, outperforming existing models in quality and structure.


Partially Stochastic Infinitely Deep Bayesian Neural Networks

https://openreview.net/forum?id=jNab9mXEyj

Compressor summary: The paper introduces a new type of neural network that combines partial stochasticity and infinite depth to improve efficiency, expressivity, and performance on various tasks.


How Transformers Learn Causal Structure with Gradient Descent

https://openreview.net/forum?id=jNM4imlHZv

Compressor summary: The paper shows how gradient descent on a simplified two-layer transformer learns latent causal structure from in-context learning tasks, and proves that the attention matrix encodes mutual information between tokens.


Averaging $n$-step Returns Reduces Variance in Reinforcement Learning

https://openreview.net/forum?id=jM9A3Kz6Ki

Compressor summary: Compound returns, a weighted average of step returns, can reduce variance in reinforcement learning methods, improving their sample efficiency.


BetterV: Controlled Verilog Generation with Discriminative Guidance

https://openreview.net/forum?id=jKnW7r7de1

Compressor summary: BetterV is a Verilog generation framework that uses fine-tuned large language models and generative discriminators to create correct and optimized Verilog code for various electronic design automation tasks.


Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks

https://openreview.net/forum?id=jKYyFbH8ap

Compressor summary: SAFIM is a new benchmark to evaluate Large Language Models on code Fill-in-the-Middle tasks, focusing on syntax-aware completions and providing a robust framework for accurate and fair comparisons.


ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

https://openreview.net/forum?id=jKUWlgra9b

Compressor summary: ERQ is a two-step method that reduces quantization error in vision transformers by strategically updating weights with full-precision and refining rounding directions of quantized weights.


Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

https://openreview.net/forum?id=jJmGl01S4l

Compressor summary: The paper explains how spikes in training loss during neural network training with stochastic gradient descent (SGD) are related to improved feature learning and generalization by increasing alignment with the true predictor's average gradient outer product.


Non-clairvoyant Scheduling with Partial Predictions

https://openreview.net/forum?id=jJLcXGB2uA

Compressor summary: The paper studies how to schedule jobs efficiently when only some predictions are available, and proposes an algorithm that is robust, consistent, and smooth under this restriction.


Executable Code Actions Elicit Better LLM Agents

https://openreview.net/forum?id=jJ9BoXAfFa

Compressor summary: This paper proposes using executable Python code as a unified action space for large language models, improving their performance on agent-oriented tasks and enabling them to perform sophisticated actions like model training.


Cross-view Masked Diffusion Transformers for Person Image Synthesis

https://openreview.net/forum?id=jEoIkNkqyc

Compressor summary: X-MDPT is a novel pose-guided human image generation model using masked diffusion transformers, which improves scalability and efficiency over existing approaches.


Tight Partial Identification of Causal Effects with Marginal Distribution of Unmeasured Confounders

https://openreview.net/forum?id=jEWpcEyuUl

Compressor summary: The paper reassesses the role of marginal confounder distribution in partial identification and provides a criterion for determining its impact on causal inference.


Discovering Multiple Solutions from a Single Task in Offline Reinforcement Learning

https://openreview.net/forum?id=j6rG1ETRyu

Compressor summary: The study proposes and tests algorithms for learning multiple distinct solutions from a single task in offline reinforcement learning.


Hybrid Neural Representations for Spherical Data

https://openreview.net/forum?id=j6QZy90B93

Compressor summary: The paper proposes a new method, Hybrid Neural Representations for Spherical data (HNeR-S), to better handle nonlinear spherical signals like weather and CMB data using positional features from feature-grids and a multi-layer perceptron.


Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

https://openreview.net/forum?id=j5wf1NNhFs

Compressor summary: The U-ext-Hop model improves memory retrieval in Hopfield networks by using a learnable feature map that transforms the energy function into kernel space, enhancing memory capacity and reducing confusion.


Position: Fundamental Limitations of LLM Censorship Necessitate New Approaches

https://openreview.net/forum?id=j5csKrtyAe

Compressor summary: The text discusses the limitations of current methods to prevent misuse of large language models by censoring their outputs and calls for re-evaluation and new approaches to ensure safety.


FADAS: Towards Federated Adaptive Asynchronous Optimization

https://openreview.net/forum?id=j56JAd29uH

Compressor summary: FADAS is a novel method that combines adaptive federated optimization with asynchronous updates, improving efficiency and resilience in privacy-preserving machine learning.


MILP-FBGen: LP/MILP Instance Generation with Feasibility/Boundedness

https://openreview.net/forum?id=j4HtfTqr0f

Compressor summary: The paper introduces MILP-FBGen, a machine learning technique for generating realistic and feasible Linear Programming and Mixed-Integer Linear Programming instances that preserve key properties and improve downstream task performance.


On Multi-Armed Bandit with Impatient Arms

https://openreview.net/forum?id=j35VcooKG8

Compressor summary: The paper studies a hard Multi-Armed Bandit problem where neglected arms leave the game, proposes FC-SE algorithm with regret bounds, and extends it to handle new arms with FC-Entry algorithm.


Distributional Bellman Operators over Mean Embeddings

https://openreview.net/forum?id=j2pLfsBm4J

Compressor summary: A new algorithmic framework for distributional reinforcement learning uses mean embeddings of return distributions and simple linear-algebraic computations to update the sketch Bellman operator, with theoretical and empirical evidence for its effectiveness.


Differentiability and Optimization of Multiparameter Persistent Homology

https://openreview.net/forum?id=ixdfvnO0uy

Compressor summary: The article develops a general framework for optimizing vector-valued functions using multiparameter homological descriptors from persistent homology, showing that it improves performance over one-parameter descriptors.


Visual Transformer with Differentiable Channel Selection: An Information Bottleneck Inspired Approach

https://openreview.net/forum?id=iup9NElHji

Compressor summary: The paper proposes a new transformer block that reduces the computational cost of visual transformers while maintaining or improving their accuracy, and introduces a novel variational upper bound for information bottleneck optimization.


AI Alignment with Changing and Influenceable Reward Functions

https://openreview.net/forum?id=itYGbe0Cs1

Compressor summary: The paper introduces DR-MDPs to model preference changes in AI alignment, shows that static preferences can lead to undesirable AI influence, and explores potential solutions while acknowledging their limitations.


Learning from Integral Losses in Physics Informed Neural Networks

https://openreview.net/forum?id=itDhUBY2xf

Compressor summary: The paper proposes methods to train networks under partial differential equations, which require many evaluations, by addressing the bias caused by naive integral approximations.


EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting

https://openreview.net/forum?id=isUSVgS7W1

Compressor summary: Event cameras can reconstruct 3D scenes from sparse event data using a generalizable framework that estimates depth, intensity, and Gaussian regression, outperforming existing methods.


Position: What Can Large Language Models Tell Us about Time Series Analysis

https://openreview.net/forum?id=iroZNDxFJZ

Compressor summary: The paper argues that large language models can revolutionize time series analysis, enabling efficient decision-making and new possibilities like modality switching and question answering.


Residual-Conditioned Optimal Transport: Towards Structure-Preserving Unpaired and Paired Image Restoration

https://openreview.net/forum?id=irBHPlknxP

Compressor summary: The paper proposes a new image restoration method using optimal transport and residual information to preserve the original image structure better than existing methods.


Statistical Properties of Robust Satisficing

https://openreview.net/forum?id=iqAyWVLUEO

Compressor summary: This paper analyzes the theoretical properties of the Robust Satisficing (RS) model, a streamlined approach to robust optimization with better statistical guarantees and performance than existing methods.


The Expressive Power of Path-Based Graph Neural Networks

https://openreview.net/forum?id=io1XSRtcO8

Compressor summary: PATH-WL is a powerful new method for graph neural networks that uses paths and shortest path distance information to achieve strong empirical results and solve complex problems like strongly regular graphs.


Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective

https://openreview.net/forum?id=inEuvSg0y1

Compressor summary: Mol-AE is a new auto-encoder model for 3D molecular representation learning with positional encoding and 3D Cloze Test objective that outperforms existing methods.


Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model

https://openreview.net/forum?id=ihv6pWuILN

Compressor summary: The study proposes a new causal model to analyze customer churn using tensor completion methods and shows its effectiveness in practice.


Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

https://openreview.net/forum?id=igRjCCAz2a

Compressor summary: EDDPMs are versatile probabilistic models that combine encoding-decoding with diffusion for broad applicability and enhanced performance across text, proteins, and images.


Private Gradient Descent for Linear Regression: Tighter Error Bounds and Instance-Specific Uncertainty Estimation

https://openreview.net/forum?id=igRAPavrrS

Compressor summary: The paper analyzes standard differentially private gradient descent for linear regression, showing its accuracy and sample complexity match non-private methods with proper hyperparameter choices and adaptive confidence intervals.


On the Unexpected Effectiveness of Reinforcement Learning for Sequential Recommendation

https://openreview.net/forum?id=ie3vXkMvRY

Compressor summary: The text discusses how Reinforcement Learning (RL) improves session-based recommendation by promoting better embeddings of user interactions, and suggests using an auxiliary loss instead of RL to achieve similar performance gains.


Evaluating Model Bias Requires Characterizing its Mistakes

https://openreview.net/forum?id=idyUNsoZ75

Compressor summary: SkewSize is a new metric that measures and characterizes bias in model predictions across subgroups, improving upon existing benchmarks.


Best of Both Worlds Guarantees for Smoothed Online Quadratic Optimization

https://openreview.net/forum?id=icijMMWwdG

Compressor summary: The paper studies smoothed online quadratic optimization with different costs and analyzes optimal algorithms for both adversarial and stochastic settings, proposing a new distribution-agnostic dynamic interpolation algorithm called Lazy Adaptive Interpolation (LAI).


DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency

https://openreview.net/forum?id=ibwxzYCep9

Compressor summary: The paper proposes a new framework for image restoration that balances visual quality and faithfulness to the original image using a stochastic degradation model and early-stopping.


Modeling Caption Diversity in Contrastive Vision-Language Pretraining

https://openreview.net/forum?id=iaV2fU6Dif

Compressor summary: Llip is a new pretraining method for vision-language models that leverages diverse captions to improve image representation and performance on zero-shot tasks.


Gated Linear Attention Transformers with Hardware-Efficient Training

https://openreview.net/forum?id=ia5XvxFUJT

Compressor summary: The paper presents FlashLinearAttention and gated linear attention (GLA), which improve linear attention's efficiency and performance for parallel training and inference in Transformers.


Diffuse, Sample, Project: Plug-And-Play Controllable Graph Generation

https://openreview.net/forum?id=ia0Z8d1DbY

Compressor summary: PRODIGY is a method to generate graphs with precise control using pre-trained diffusion models, handling both soft and hard constraints, achieving high constraint satisfaction for various applications like drug discovery.


Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

https://openreview.net/forum?id=iYYA5zDoCm

Compressor summary: This paper introduces a decentralized model for multi-agent systems with varying dependencies and proposes three near-optimal, scalable policies for it.


Model Alignment as Prospect Theoretic Optimization

https://openreview.net/forum?id=iUwHnoENnl

Compressor summary: The paper proposes KTO, a new human-aware loss function for LLMs that directly maximizes human utility based on Kahneman-Tversky prospect theory, and shows its effectiveness compared to other HALOs and cross-entropy minimization at different scales.


Learning Decision Policies with Instrumental Variables through Double Machine Learning

https://openreview.net/forum?id=iRcmqXZjeK

Compressor summary: DML-IV is a non-linear IV regression method that reduces bias in two-stage IV regressions and effectively learns high-performing policies using a novel learning objective and DML framework.


Beyond the Calibration Point: Mechanism Comparison in Differential Privacy

https://openreview.net/forum?id=iQTElQbAqo

Compressor summary: The text proposes a new method to compare differentially private machine learning mechanisms by measuring their worst-case privacy risks and shows its usefulness through examples.


Triplet Interaction Improves Graph Transformers: Accurate Molecular Graph Learning with Triplet Graph Transformers

https://openreview.net/forum?id=iPFuWc1TV2

Compressor summary: The Triplet Graph Transformer is a novel model that uses triplet attention and aggregation to capture third-order interactions in graphs, enabling better molecular property prediction and achieving state-of-the-art results on various benchmarks.


Adaptive Hierarchical Certification for Segmentation using Randomized Smoothing

https://openreview.net/forum?id=iOEReiiTit

Compressor summary: The paper proposes a novel certification method for machine learning models that uses a multi-level hierarchy to reduce abstain rates and increase information gain compared to existing methods.


Block Acceleration Without Momentum: On Optimal Stepsizes of Block Gradient Descent for Least-Squares

https://openreview.net/forum?id=iLyUEPZ0fR

Compressor summary: Block coordinate descent with optimal stepsizes can achieve faster convergence than gradient descent and momentum for some problems.


DeepPolar: Inventing Nonlinear Large-Kernel Polar Codes via Deep Learning

https://openreview.net/forum?id=iLfk2CwEHA

Compressor summary: DeepPolar codes use neural networks to generalize and improve Polar codes for error correction at short-to-medium block lengths.


CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding

https://openreview.net/forum?id=iLSgF7jMtI

Compressor summary: CogDPM is a new model that integrates diffusion probabilistic models with predictive coding theory to improve visual world prediction by incorporating precision weighting mechanism.


Debating with More Persuasive LLMs Leads to More Truthful Answers

https://openreview.net/forum?id=iLCZtl7FTa

Compressor summary: Debate can help weaker AI models assess stronger ones without human labels, and optimizing expert debaters for persuasiveness improves their performance.


The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

https://openreview.net/forum?id=iKkFruh4d5

Compressor summary: This paper studies how multi-pass gradient descent with batch re-use improves learning for two-layer neural networks when target functions have multiple indexes.


Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

https://openreview.net/forum?id=iJlPJsTw2B

Compressor summary: The paper proposes a new optimal format for large language models and explores the tradeoff between accuracy and chip area using different datatypes.


Sampling-based Multi-dimensional Recalibration

https://openreview.net/forum?id=iJWeK2snMH

Compressor summary: The paper proposes a method to calibrate probabilistic forecasts for multi-dimensional outputs using highest density regions, which considers the joint distribution across dimensions.


CoLoRA: Continuous low-rank adaptation for reduced implicit neural modeling of parameterized partial differential equations

https://openreview.net/forum?id=iHSgfGob9j

Compressor summary: CoLoRA is a method that uses low-rank adaptive neural networks to predict the evolution of solution fields for partial differential equations, achieving fast and accurate results even in data-scarce regimes.


Gibbs Sampling of Continuous Potentials on a Quantum Computer

https://openreview.net/forum?id=iGMTxygzcJ

Compressor summary: A quantum algorithm using quantum Fourier transforms can efficiently estimate Gibbs measures for periodic functions and improve sampling precision in high temperature regimes.


Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise

https://openreview.net/forum?id=iE2lMjeXRR

Compressor summary: The paper explores the theoretical advantages of Adam optimizer in optimization problems using an online learning framework.


On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning

https://openreview.net/forum?id=iC8l9DI1ZX

Compressor summary: Supervised contrastive representation learning methods, SupSiam and SupBYOL, improve on self-supervised learning by using labels to reduce intra-class variance and avoid collapse, leading to better results across various tasks and datasets.


Rapid Learning without Catastrophic Forgetting in the Morris Water Maze

https://openreview.net/forum?id=i9C4Kwm56G

Compressor summary: The study proposes a novel task for machine learning models that mimics animals' ability to adapt quickly and maintain proficiency, using a combination of neural networks and biological inspirations.


Auto-Regressive Next-Token Predictors are Universal Learners

https://openreview.net/forum?id=i56plqPpEa

Compressor summary: The authors present a theory for how simple language models can perform complex tasks by predicting the next token in a sequence and show that linear networks and MLPs have surprising abilities in text generation and arithmetic tasks.


Self-Correcting Self-Consuming Loops for Generative Model Training

https://openreview.net/forum?id=i0nVanexij

Compressor summary: The paper proposes self-correction functions to stabilize generative models trained on synthetic data, especially for tasks like human motion synthesis.


LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

https://openreview.net/forum?id=hz8cFsdz7P

Compressor summary: The Scientific Generative Agent (SGA) combines the strengths of large language models and simulations to enhance scientific discovery by proposing hypotheses, reasoning about discrete components, and receiving feedback on continuous parts.


Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

https://openreview.net/forum?id=hunSEjeCPE

Compressor summary: The paper introduces EDIS, a method that uses a diffusion model to extract prior knowledge from offline data and energy functions to distill it for better online learning, improving efficiency and safety in RL.


On the Tractability of SHAP Explanations under Markovian Distributions

https://openreview.net/forum?id=htq0FbPOsY

Compressor summary: The article studies the computational complexity of the SHAP framework for local explainability of ML models under realistic assumptions and shows that it can be computed in polynomial time for some model families.


RMIB: Representation Matching Information Bottleneck for Matching Text Representations

https://openreview.net/forum?id=hsHIxrnrMx

Compressor summary: The paper proposes a method to match text representations in tasks with different domains by optimizing information bottleneck, which improves the performance of such tasks.


Prompt-tuning Latent Diffusion Models for Inverse Problems

https://openreview.net/forum?id=hrwIndai8e

Compressor summary: The paper introduces P2L, a method that uses text-to-image latent diffusion models with prompt tuning to solve imaging inverse problems better than existing methods.


Truly No-Regret Learning in Constrained MDPs

https://openreview.net/forum?id=hrWte3nlzr

Compressor summary: The paper proposes a new primal-dual algorithm for constrained reinforcement learning that avoids error cancellations and achieves sublinear regret.


Stochastic positional embeddings improve masked image modeling

https://openreview.net/forum?id=hr8OXXMb7a

Compressor summary: This paper introduces StoP, a method that incorporates location uncertainty in MIM by using stochastic positional embeddings, which improves downstream performance on various tasks.


Nearest Neighbour Score Estimators for Diffusion Generative Models

https://openreview.net/forum?id=hqNz4LDuhn

Compressor summary: The text introduces a new score function estimator for diffusion generative models that reduces variance and improves training speed and sample quality.


Behavior Generation with Latent Actions

https://openreview.net/forum?id=hoVwecMqV5

Compressor summary: VQ-BeT is a new model that improves on BeT for generating complex behaviors by tokenizing continuous actions with hierarchical vector quantization, achieving faster inference speed and better performance in various environments.


Theoretical Guarantees for Variational Inference with Fixed-Variance Mixture of Gaussians

https://openreview.net/forum?id=hnqlgwcRxb

Compressor summary: The paper investigates theoretical properties of variational inference for non-Gaussian Mixture of Gaussians, showing how it can be cast as optimizing Dirac positions using gradient descent and studying errors in the process.


ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

https://openreview.net/forum?id=hlvKd7Vdxm

Compressor summary: The paper proposes ExCP, a framework that compresses large language models' checkpoints significantly while maintaining high accuracy on various tasks.


Privacy Profiles for Private Selection

https://openreview.net/forum?id=hgHQvrvwH9

Compressor summary: The paper proposes an easy-to-use recipe to improve privacy profiles of ReportNoisyMax and PrivateTuning using base algorithms' privacy profiles, leading to better private learning experiments.


Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

https://openreview.net/forum?id=hg4wXlrQCV

Compressor summary: The paper presents Craftax-Classic, a faster version of Crafter, and Craftax, a more challenging benchmark for RL research that requires deep exploration and planning.


Position: Machine Learning-powered Assessments of the EU Digital Services Act Aid Quantify Policy Impacts on Online Harms

https://openreview.net/forum?id=hdpv6mall8

Compressor summary: The text discusses the potential harmful applications of machine learning and the need for evaluating the European Union's Digital Services Act to curb these harms.


Grokking Group Multiplication with Cosets

https://openreview.net/forum?id=hcQfTsVnBo

Compressor summary: The text describes how researchers reverse engineered a neural network that learned the arithmetic of permutation groups and challenges the interpretability of another paper on the same topic.


Peeking with PEAK: Sequential, Nonparametric Composite Hypothesis Tests for Means of Multiple Data Streams

https://openreview.net/forum?id=hcASxFvmZ5

Compressor summary: PEAK is a novel nonparametric sequential test that uses a betting scheme to control type-I error and power in multiple data streams, reducing sample complexity and computational overhead.


Online Adaptive Anomaly Thresholding with Confidence Sequences

https://openreview.net/forum?id=hbsKxUEreL

Compressor summary: The paper proposes an algorithm for anomaly detection that adapts to distribution shifts, has low false positive and negative rates, and uses offline data effectively.


Hard Tasks First: Multi-Task Reinforcement Learning Through Task Scheduling

https://openreview.net/forum?id=haUOhXo70o

Compressor summary: SMT is a novel multi-task RL algorithm that prioritizes harder tasks, uses a task difficulty metric for efficient resource allocation, and resets network parameters to mitigate simplicity bias.


Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

https://openreview.net/forum?id=hZ0fWhgVch

Compressor summary: The paper proposes a training-free approach to improve text-to-image alignment by optimizing images directly with the supervision of vision-language models and incorporating score distillation sampling.


GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

https://openreview.net/forum?id=hYHsrKDiX7

Compressor summary: GaLore is a training strategy for large language models that reduces memory usage by up to 82.5% without sacrificing performance, enabling pre-training on 7B models with consumer GPUs and 24GB memory.


Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback

https://openreview.net/forum?id=hXQOO6VsxH

Compressor summary: The paper proposes two efficient algorithms for RL with Aggregate Bandit Feedback, which allows feedback at episode end instead of individual rewards, and achieves near-optimal regret guarantees using linear function approximation and new randomization techniques.


Mind the Boundary: Coreset Selection via Reconstructing the Decision Boundary

https://openreview.net/forum?id=hWng0GXeE4

Compressor summary: The paper presents a novel geometry-based coreset construction method that efficiently selects training data to reconstruct the decision boundary of a deep neural network, achieving high data pruning rate with minimal accuracy loss and showing strong cross-architecture transferability.


From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions

https://openreview.net/forum?id=hTiNFCNxM1

Compressor summary: The paper proposes DCEM, an algorithm to learn from selective labels with disparate censorship, and shows that it reduces bias without sacrificing performance on synthetic and real clinical data.


Progressive Inference: Explaining Decoder-Only Sequence Classification Models Using Intermediate Predictions

https://openreview.net/forum?id=hRX1o7FBhT

Compressor summary: The paper introduces Progressive Inference, a method to explain decoder-only transformer models' predictions by evaluating intermediate classifications at different input positions, using either Single Pass or Multi Pass approaches.


Chasing Convex Functions with Long-term Constraints

https://openreview.net/forum?id=hRBdOHVn7y

Compressor summary: The paper studies online metric problems with long-term constraints for resource allocation in sustainable energy/computing systems, and proposes optimal algorithms for bounded hitting cost gradients and weighted $\ell_1$ metrics.


Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments

https://openreview.net/forum?id=hQpUhySEJi

Compressor summary: The paper introduces Subequiriant Hierarchical Neural Networks (SHNN) for learning policies in multi-entity 3D environments, which use task assignment and subequivariance to reduce complexity and improve performance on a new benchmark called Multi-entity Benchmark (MEBEN).


Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis

https://openreview.net/forum?id=hLuNVjRnY3

Compressor summary: The authors propose a new model merging algorithm called CCA Merge, which uses Canonical Correlation Analysis to maximize the correlations between linear combinations of the models' features and improves performance over past methods in both two-model and multi-model settings.


Exploration-Driven Policy Optimization in RLHF: Theoretical Insights on Efficient Data Utilization

https://openreview.net/forum?id=hLGxDYo0eF

Compressor summary: The paper analyzes the performance of a policy optimization reinforcement learning algorithm that learns from human feedback without knowing the reward function, providing bounds on query complexity and novel techniques for inferring reward parameters.


Hyperbolic Active Learning for Semantic Segmentation under Domain Shift

https://openreview.net/forum?id=hKdJPMQvew

Compressor summary: HALO is a hyperbolic neural network that uses epistemic uncertainty to select data points for pixel-level active learning, achieving state-of-the-art results in semantic segmentation under domain shift with minimal supervision.


Multiply Robust Estimation for Local Distribution Shifts with Multiple Domains

https://openreview.net/forum?id=hJaWoU3Emh

Compressor summary: The paper proposes a two-stage method to improve machine learning models' performance and generalization when data distributions vary across multiple segments of the population, using linear combinations and refinement steps for each segment.


Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows

https://openreview.net/forum?id=hG6gddAKnJ

Compressor summary: The paper explores and characterizes conservation laws in non-Euclidean geometries and momentum-based dynamics for neural network training.


Discovering Features with Synergistic Interactions in Multiple Views

https://openreview.net/forum?id=hFEgae0od4

Compressor summary: The paper proposes a deep learning method that selects synergistic feature subsets from multi-view data using interaction information to understand target outcomes better.


SiBBlInGS: Similarity-driven Building-Block Inference using Graphs across States

https://openreview.net/forum?id=h8aTi32tul

Compressor summary: SiBBlInGS is a graph-based method for discovering interpretable units in time series data across different states that captures complex variability and adapts to varying session lengths and sample sizes.


Infinite-Horizon Distributionally Robust Regret-Optimal Control

https://openreview.net/forum?id=h3SGdpI4Ta

Compressor summary: The paper studies how to control linear systems with uncertain disturbances within a Wasserstein-2 ambiguity set, and proposes efficient algorithms to compute near-optimal control policies.


Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction

https://openreview.net/forum?id=h2uBuQvpp8

Compressor summary: ASMR is a method that learns to selectively sample k-space measurements for faster and accurate disease detection in MR imaging.


Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning

https://openreview.net/forum?id=gzis9n5r7e

Compressor summary: The paper proposes a method called Transition Discriminator-based Imitation Learning (TDIL) that uses a transition discriminator to compute surrogate rewards from one expert trajectory, addressing reward sparsity and achieving expert-level performance in single-demonstration imitation learning.


Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

https://openreview.net/forum?id=gxOQEMRbRa

Compressor summary: Q-probing adapts pre-trained language models to new tasks using a linear function on embeddings that reweights candidate completions based on task-specific rewards or policy objectives, and can improve performance in data-limited regimes.


Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency

https://openreview.net/forum?id=guFsTBXsov

Compressor summary: Minimal Frame Averaging (MFA) is a new framework for achieving exact equivariance in machine learning systems by constructing minimal frames that encode symmetries efficiently and effectively across various tasks, including physics simulations and complex-valued domains.


Generalized Preference Optimization: A Unified Approach to Offline Alignment

https://openreview.net/forum?id=gu3nacA9AH

Compressor summary: Generalized preference optimization is a unified framework for offline learning that encompasses existing methods and allows tuning of regularization through convex functions.


LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

https://openreview.net/forum?id=gtYdvSGMYV

Compressor summary: LAGMA improves cooperative multi-agent reinforcement learning by generating a goal-reaching trajectory in latent space and providing an incentive for agents to follow it.


Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching

https://openreview.net/forum?id=gqA8ZHO0j8

Compressor summary: USBS is a fast and scalable spectral bundle method for solving SDPs that can leverage warm-start initialization, achieving dramatic speedups compared to existing methods.


StackSight: Unveiling WebAssembly through Large Language Models and Neurosymbolic Chain-of-Thought Decompilation

https://openreview.net/forum?id=gn5AsHIIwb

Compressor summary: StackSight is a novel technique that combines LLMs with program analysis to decompile complex WebAssembly code into readable C++ snippets, improving understanding and decompilation.


Characterizing Large Language Model Geometry Helps Solve Toxicity Detection and Generation

https://openreview.net/forum?id=glfcwSsks8

Compressor summary: The authors investigate the geometry of Large Language Models (LLMs) to understand their inner mechanisms and develop novel solutions for tasks like toxicity detection.


VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context

https://openreview.net/forum?id=gjoUXwuZdy

Compressor summary: The authors introduce VisionGraph, a benchmark for testing large multimodal models' ability to solve graph theory problems using a Description-Program-Reasoning chain that improves their accuracy and performance.


SelfIE: Self-Interpretation of Large Language Model Embeddings

https://openreview.net/forum?id=gjgRKbdYR7

Compressor summary: SelfIE is a framework that lets large language models explain their own reasoning in natural language, enabling control over their responses for reliability, transparency, and future development.


Can AI Assistants Know What They Don't Know?

https://openreview.net/forum?id=girxGkdECL

Compressor summary: Key points: - LLMs can make factual errors in open-domain question answering - The paper explores if AI assistants can know what they don't know and express it through natural language - They create an Idk dataset and align the assistant with it using different methods - The aligned assistant is more truthful and declines less often Summary: The paper investigates how to make AI assistants aware of their knowledge gaps and express them in natural language, creating a new dataset and alignment methods that improve their truthfulness.


PolySketchFormer: Fast Transformers via Sketching Polynomial Kernels

https://openreview.net/forum?id=ghYrfdJfjK

Compressor summary: PolySketchFormer is a fast and accurate Transformer-based language model that replaces softmax attention with polynomial attention and uses sketching techniques for linear-time computation.


Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision

https://openreview.net/forum?id=ghNRg2mEgN

Compressor summary: The study examines whether weak model supervision can elicit the full capabilities of stronger models and finds that while there is some success, techniques like reinforcement learning may be needed for superhuman models without further work.


First-Order Manifold Data Augmentation for Regression Learning

https://openreview.net/forum?id=geajNKab7g

Compressor summary: FOMA is a new domain-independent data augmentation method for regression problems that samples from the tangent planes of the train distribution and improves generalization and robustness.


Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design

https://openreview.net/forum?id=gbD9MAc9p0

Compressor summary: The paper proposes quality-weighted Vendi scores to balance quality and diversity in experimental design, improving data collection and discovery in various applications.


Accelerating Federated Learning with Quick Distributed Mean Estimation

https://openreview.net/forum?id=gWEwIlZrbQ

Compressor summary: The paper proposes a novel DME method for federated learning that improves the NMSE guarantee and computational efficiency by using off-the-shelf solvers and quantization.


Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

https://openreview.net/forum?id=gVjMwLDFoQ

Compressor summary: iDEM is a fast and scalable algorithm that uses only the energy function and its gradient to generate independent samples from unnormalized probability distributions, achieving state-of-the-art performance on various tasks.


Long Range Propagation on Continuous-Time Dynamic Graphs

https://openreview.net/forum?id=gVg8V9isul

Compressor summary: CTAN is a new method for modeling spatio-temporal information in C-TDGs that outperforms existing methods on long-range tasks.


VNN: Verification-Friendly Neural Networks with Hard Robustness Guarantees

https://openreview.net/forum?id=gUFufRkzjV

Compressor summary: The text proposes a framework to generate Verification-Friendly Neural Networks that balance prediction performance and verification-friendliness, enabling more robustness in safety-critical applications.


Subsampling is not Magic: Why Large Batch Sizes Work for Differentially Private Stochastic Optimisation

https://openreview.net/forum?id=gTBjkJvadC

Compressor summary: Large batch sizes reduce gradient variance in differentially private stochastic gradient descent (DP-SGD) by decreasing subsampling-induced variance and are beneficial especially in the asymptotic regime.


Quasi-Monte Carlo Features for Kernel Approximation

https://openreview.net/forum?id=gSMUjrkRRk

Compressor summary: Quasi-Monte Carlo methods can improve kernel approximation error for certain kernels, leading to fewer random features needed and better performance in kernel ridge regression.


Representing Molecules as Random Walks Over Interpretable Grammars

https://openreview.net/forum?id=gS3nc9iUrH

Compressor summary: The paper proposes a data-efficient and interpretable graph grammar model for representing and reasoning over complex molecular structures, enabling better design and property prediction.


Applying language models to algebraic topology: generating simplicial cycles using multi-labeling in Wu's formula

https://openreview.net/forum?id=gQz30hTkRE

Compressor summary: The paper proposes machine learning methods to understand the group structure of homotopy groups of spheres by generating simplicial cycles from algorithmic datasets related to Dyck languages.


Position: A Roadmap to Pluralistic Alignment

https://openreview.net/forum?id=gQpBnRHwxM

Compressor summary: The text discusses the importance of designing AI systems to serve diverse human values and suggests a roadmap for achieving pluralistic alignment using large language models and different types of benchmarks.


Discovering Environments with XRM

https://openreview.net/forum?id=gPStP3FSY9

Compressor summary: The paper introduces Cross-Risk Minimization (XRM), an algorithm that automatically discovers environments within datasets for robust out-of-distribution generalization without relying on human-annotated environment labels.


Stochastic Gradient Flow Dynamics of Test Risk and its Exact Solution for Weak Features

https://openreview.net/forum?id=gPBMkJG7bt

Compressor summary: The paper studies how test risk changes with learning rate in continuous time stochastic gradient flow dynamics and applies the theory to a weak features model, finding the impact of stochasticity on test risk.


New Bounds on the Cohesion of Complete-link and Other Linkage Methods for Agglomerative Clustering

https://openreview.net/forum?id=gL5djEYLx2

Compressor summary: This paper improves existing bounds on the maximum diameter and cohesion of complete-link and average-link hierarchical clustering algorithms for metric spaces, supporting the preference of complete-link over single-link for producing compact clusters.


Causal-IQA: Towards the Generalization of Image Quality Assessment Based on Causal Inference

https://openreview.net/forum?id=gKPkipJ3gm

Compressor summary: This paper introduces Causal-IQA, an end-to-end blind Image Quality Assessment method that uses causality to improve estimation accuracy and generalization by mitigating confounding effects between distortion types, image contents, and human ratings.


Learning to Intervene on Concept Bottlenecks

https://openreview.net/forum?id=gEbl6XNLK6

Compressor summary: CB2Ms are memory-enhanced concept bottleneck models that learn from past interventions and can generalize them to new situations, improving interpretability and performance.


Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching

https://openreview.net/forum?id=gE7qZurGH3

Compressor summary: The paper proposes a new method for lossless graph condensation by using curriculum learning, expanding window matching, and a customized loss function to train expert trajectories that preserve the original graph's structure and performance.


Small-loss Adaptive Regret for Online Convex Optimization

https://openreview.net/forum?id=gDQuupz8mm

Compressor summary: This paper proposes new algorithms that achieve small-loss adaptive regret bounds for various types of convex functions and handle changing environments.


Can Gaussian Sketching Converge Faster on a Preconditioned Landscape?

https://openreview.net/forum?id=gB3E8IwQZy

Compressor summary: Key points: - The paper introduces GSGD, a novel gradient sketching method for large-scale optimization - GSGD does not require importance sampling but can match the convergence rate of methods with it - GSGD can exploit non-smooth regularization terms for faster convergence - Experimental results show the effectiveness and efficiency of GSGD Summary: The paper presents GSGD, a new gradient sketching method that achieves fast convergence in large-scale optimization without importance sampling or exploiting smooth regularization terms.


SceneCraft: An LLM Agent for Synthesizing 3D Scenes as Blender Code

https://openreview.net/forum?id=gAyzjHw2ml

Compressor summary: SceneCraft is an LLM agent that creates Blender scripts from text descriptions to render complex scenes, using spatial planning, image analysis, and library learning.


Policy-conditioned Environment Models are More Generalizable

https://openreview.net/forum?id=g9mYBdooPA

Compressor summary: Policy-conditioned model (PCM) learning improves reinforcement learning by adapting the dynamics model to different evaluation policies for better prediction accuracy.


Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

https://openreview.net/forum?id=g8AigOTNXL

Compressor summary: The paper proposes SCG, a new method for symbolic music generation that works with non-differentiable rules and improves quality and controllability over existing methods.


Learning to Predict Mutational Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning

https://openreview.net/forum?id=g89jAdrnAF

Compressor summary: Key points: - The paper proposes a method to predict the effects of amino acid mutations on protein-protein binding using hierarchical prompt learning - The method models the joint distribution of each mutation with various microenvironmental features - The method outperforms existing pre-training-based methods and can be applied to optimize antibodies against SARS-CoV-2 Summary: The paper introduces a hierarchical prompt learning framework that predicts how amino acid mutations affect protein-protein binding by modeling the microenvironmental changes. The method is more efficient and effective than previous pre-training methods and can help design better antibodies against COVID-19.


Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments

https://openreview.net/forum?id=g43yUNWX4V

Compressor summary: The paper proposes two algorithms for Federated Reinforcement Learning (FRL) that can handle large levels of environment heterogeneity and achieve state-of-the-art convergence results.


Latent variable model for high-dimensional point process with structured missingness

https://openreview.net/forum?id=g1Gf0hoPSz

Compressor summary: The paper proposes a latent-variable model that uses Gaussian processes to handle high-dimensional data with missing values and unknown measurement times in various fields.


MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

https://openreview.net/forum?id=fz9PaJNViP

Compressor summary: MOKD is a bi-level optimization framework that improves few-shot classification by learning class-specific representations and maximizing kernel dependence between them and labels, while minimizing dependence among all samples.


Feature Importance Disparities for Data Bias Investigations

https://openreview.net/forum?id=fywWm06IGn

Compressor summary: The paper proposes a method to identify features that have different importance in subgroups of a dataset, which can help detect and rectify biases in classifiers trained on such data.


Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks

https://openreview.net/forum?id=fwxnHViGNj

Compressor summary: The paper compares multi-task benchmarks in machine learning to electoral systems, shows a trade-off between diversity and sensitivity to irrelevant changes, and introduces new quantitative measures and approximation algorithms for these aspects.


SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

https://openreview.net/forum?id=fuX4hyLPmO

Compressor summary: The paper introduces SLEB, a new pruning method that removes redundant transformer blocks from large language models to speed up inference while maintaining accuracy and perplexity.


On Universally Optimal Algorithms for A/B Testing

https://openreview.net/forum?id=ft5jK9uPgC

Compressor summary: No algorithm beats uniform sampling in A/B testing with fixed budget and Bernoulli rewards.


On Mechanistic Knowledge Localization in Text-to-Image Generative Models

https://openreview.net/forum?id=fsVBsxjRER

Compressor summary: The authors propose LocoGen and LocoEdit, methods that use causal tracing to locate and edit specific visual attributes in text-to-image models, enabling efficient model editing.


Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

https://openreview.net/forum?id=frA0NNBS1n

Compressor summary: The paper proposes using Sequential Monte Carlo with learned twist functions to improve probabilistic inference for various language model techniques such as safety, capability, and harmlessness training.


Differentially Private Bias-Term Fine-tuning of Foundation Models

https://openreview.net/forum?id=fqeANcjBMT

Compressor summary: DP-BiTFiT is a new method for privacy-preserving fine-tuning of large pre-trained models that achieves high accuracy, efficiency, and parameter efficiency.


Encodings for Prediction-based Neural Architecture Search

https://openreview.net/forum?id=fqPH6ejwGi

Compressor summary: This paper explores different types of neural network architectures encodings, introduces unified encodings, and proposes FLAN, a predictor that significantly reduces the cost of training NAS accuracy predictors.


Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch

https://openreview.net/forum?id=fq0NaiU8Ex

Compressor summary: DARE is a method that sparsifies and merges language model parameters from different tasks without retraining, enabling larger models with diverse capabilities and improved performance.


Using AI Uncertainty Quantification to Improve Human Decision-Making

https://openreview.net/forum?id=fowZNENcVJ

Compressor summary: The text discusses how providing uncertainty information from AI can enhance human decision-making and reports positive results from two experiments testing this idea.


Pursuing Overall Welfare in Federated Learning through Sequential Decision Making

https://openreview.net/forum?id=foPMkomvk1

Compressor summary: The text proposes a method to achieve client-level fairness in federated learning by using an adaptive aggregation scheme that optimizes online convex optimization and improves decision making for cross-device and cross-silo settings.


EDISON: Enhanced Dictionary-Induced Tensorized Incomplete Multi-View Clustering with Gaussian Error Rank Minimization

https://openreview.net/forum?id=fiugPLSXjK

Compressor summary: The paper proposes a robust and efficient incomplete multi-view clustering method called EDISON, which uses an enhanced dictionary representation strategy and Gaussian error rank approximation for tensor data.


SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

https://openreview.net/forum?id=fgBWtOw66T

Compressor summary: SFC is a new transform for fast and accurate quantized convolution using symbolic computing and the Discrete Fourier Transform, achieving better efficiency than Winograd and FFT algorithms.


Lessons from Generalization Error Analysis of Federated Learning: You May Communicate Less Often!

https://openreview.net/forum?id=ffS0aYP6mk

Compressor summary: The paper studies how frequently communicating in Federated Learning affects generalization error and provides bounds and experiments for different learning models.


MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models

https://openreview.net/forum?id=ffLblkoCw8

Compressor summary: MAGDi is a method to improve reasoning in smaller models by distilling knowledge from multiple LLMs using graph representations and three objective functions, achieving efficiency and generalization improvements.


Prompting is a Double-Edged Sword: Improving Worst-Group Robustness of Foundation Models

https://openreview.net/forum?id=fdroxYsgzQ

Compressor summary: Summary: The paper proposes Prompting for Robustness (PfR), a method to improve machine learning models' robustness against spurious correlations by using foundation models to predict the spurious attribute and learn a classifier that performs well across different labels.


Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

https://openreview.net/forum?id=fVg9YrSllr

Compressor summary: The text introduces a benchmark to evaluate sampling methods from intractable distributions using a standardized task suite and various performance criteria, including new metrics for mode collapse.


Boundary Exploration for Bayesian Optimization With Unknown Physical Constraints

https://openreview.net/forum?id=fSnMqHZ8xr

Compressor summary: BE-CBO is a new Bayesian optimization method that learns constraints with neural networks to efficiently explore the boundary between feasible and infeasible designs in black-box optimization problems.


Graph Neural Networks Use Graphs When They Shouldn't

https://openreview.net/forum?id=fSNHK7mu3j

Compressor summary: The paper shows that Graph Neural Networks (GNNs) often use the input graph structure even when it's not needed, leading to suboptimal solutions, and suggests using regular graphs to improve performance and avoid this bias.


Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

https://openreview.net/forum?id=fRG45xL1WT

Compressor summary: FeatLLM is a novel in-context learning framework that uses LLMs to generate optimal features for tabular predictions, achieving high performance few-shot learning.


Riemannian Accelerated Zeroth-order Algorithm: Improved Robustness and Lower Query Complexity

https://openreview.net/forum?id=fPwWfoyxL1

Compressor summary: Key points: - The paper proposes a zeroth-order algorithm for optimization problems on Riemannian manifolds with improved efficiency and robustness. - The algorithm achieves state-of-the-art function query complexity and almost sure convergence in the asymptotic sense. - The algorithm requires larger smoothing parameters, improving the existing result by a factor of $ ilde{\mathcal{O}}(\epsilon^{7/8}d^{-1/2})$. Summary: The paper presents a Riemannian accelerated zeroth-order algorithm that is more efficient and stable than previous ones for optimization problems on Riemannian manifolds.


Learning Low-dimensional Latent Dynamics from High-dimensional Observations: Non-asymptotics and Lower Bounds

https://openreview.net/forum?id=fOBas5H4Xc

Compressor summary: Key points: - The paper proposes an algorithm to learn low-dimensional models from high-dimensional observations of LTI systems. - The algorithm has an optimal sample complexity up to logarithmic factors and dimension-independent constants. - The paper also considers a meta-learning problem where the observer column space can be learned from multiple LTI systems. Summary: The paper presents an optimal algorithm for learning low-dimensional models of LTI systems from high-dimensional data, and extends it to a meta-learning problem with collective observer column space learning.


Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition

https://openreview.net/forum?id=fO31YAyNbI

Compressor summary: The paper proposes MotionEpic, a novel video Multimodal Large Language Model, and VoT, a Video-of-Thought reasoning framework that use spatial-temporal scene graphs and Chain-of-Thought techniques to improve video understanding and reasoning.


Scale-Free Image Keypoints Using Differentiable Persistent Homology

https://openreview.net/forum?id=fNJbcxhxRj

Compressor summary: The paper presents MorseDet, a novel computer vision method that uses algebraic topology tools to learn keypoints with scale-invariant and flexible performance.


Reshape and Adapt for Output Quantization (RAOQ): Quantization-aware Training for In-memory Computing Systems

https://openreview.net/forum?id=fM9xTkpAdu

Compressor summary: RAOQ is a method to improve in-memory computing by mitigating ADC quantization error and adapting AI models for better performance in computer vision and NLP tasks.


Amend to Alignment: Decoupled Prompt Tuning for Mitigating Spurious Correlation in Vision-Language Models

https://openreview.net/forum?id=f8G2KSCSdp

Compressor summary: The paper introduces CoOPood, a fine-grained prompt tuning method for VLMs that aligns text with invariant features and avoids spurious ones, improving OOD generalization.


How Deep Do We Need: Accelerating Training and Inference of Neural ODEs via Control Perspective

https://openreview.net/forum?id=f6QenZyyeP

Compressor summary: The paper proposes two methods to optimize Neural ODEs, improving their training and inference speed by drawing inspiration from control theory.


Self-Composing Policies for Scalable Continual Reinforcement Learning

https://openreview.net/forum?id=f5gtX2VWSB

Compressor summary: A growable and modular neural network architecture can learn continually from previous tasks without forgetting or interference, and scales well with the number of tasks.


Data Poisoning Attacks against Conformal Prediction

https://openreview.net/forum?id=f49AkFT5jf

Compressor summary: The paper proposes new black-box data poisoning attacks against conformal prediction methods, which can manipulate the uncertainty of specific examples more effectively than traditional attacks.


Erasing the Bias: Fine-Tuning Foundation Models for Semi-Supervised Learning

https://openreview.net/forum?id=f47ZK6gy3I

Compressor summary: The paper introduces FineSSL, a new semi-supervised learning approach that adapts pre-trained foundation models, improving their performance, reducing training cost, and integrating with other algorithms.


HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

https://openreview.net/forum?id=f3TUipYU3U

Compressor summary: HarmBench is a standardized framework for evaluating automated red teaming methods against large language models, revealing new insights and improving LLM robustness.


Dense Reward for Free in Reinforcement Learning from Human Feedback

https://openreview.net/forum?id=eyxVRMrZ4m

Compressor summary: The authors propose a method to improve reinforcement learning for language models by using attention weights from the reward model to redistribute the reward, making it easier to optimize and potentially leading to better results.


Image Fusion via Vision-Language Model

https://openreview.net/forum?id=eqY64Z1rsT

Compressor summary: FILM is a novel image fusion method that uses textual descriptions generated by ChatGPT to guide the fusion process, enhancing feature extraction and contextual understanding.


Asymptotically Optimal and Computationally Efficient Average Treatment Effect Estimation in A/B testing

https://openreview.net/forum?id=eqIGoEoI10

Compressor summary: The paper proposes adaptive policies for A/B testing that estimate the average treatment effect with a desired confidence interval width and probability, using an optimal sample size lower bound derived from a non-convex optimization problem.


Agnostic Learning of Mixed Linear Regressions with EM and AM Algorithms

https://openreview.net/forum?id=eo88noTbb5

Compressor summary: The paper analyzes how agnostic learning of mixed linear regression can be achieved by EM and AM algorithms without assuming generative models.


AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

https://openreview.net/forum?id=emtXYlBrNF

Compressor summary: The paper proposes a method to attribute and understand large language models by extending Layer-wise Relevance Propagation to handle attention layers, improving faithfulness and efficiency over existing methods.


NDOT: Neuronal Dynamics-based Online Training for Spiking Neural Networks

https://openreview.net/forum?id=elF0QoBSFV

Compressor summary: NDOT is a new online training method for SNNs that uses neuronal dynamics to compute gradients efficiently and accurately on large-scale datasets.


Contrastive Learning for Clinical Outcome Prediction with Partial Data Sources

https://openreview.net/forum?id=elCOPIm4Xw

Compressor summary: CLOPPS is a machine learning model that predicts clinical outcomes using information from different data sources and performs better than existing models in real-world scenarios.


Conformal Predictions under Markovian Data

https://openreview.net/forum?id=efzkSbpyRw

Compressor summary: The paper studies how split Conformal Prediction performs on Markovian data, showing its coverage gap depends on the mixing time of the chain, and proposes a method called $K$-split CP that adapts to the data's properties.


Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation

https://openreview.net/forum?id=eejhD9FCP3

Compressor summary: The IRDiff model uses a network of protein-molecule interactions to generate ligands that bind well to specific proteins, using references with desired properties as guidance.


One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

https://openreview.net/forum?id=edHLN40DWu

Compressor summary: Mixture-of-Prompts (MoP) is a method that automates instruction design for large language models by dividing the problem space into sub-regions, each governed by a specialized expert with an instruction and demos, achieving high win rates on benchmarks.


Deletion-Anticipative Data Selection with a Limited Budget

https://openreview.net/forum?id=ecvuJWE1YY

Compressor summary: The text discusses using supervised data subset selection and active learning techniques to reduce data acquisition costs for machine learning models, but considering the impact of future data deletions under GDPR and proposing deletion-anticipative data selection methods to optimize utility.


Speech Self-Supervised Learning Using Diffusion Model Synthetic Data

https://openreview.net/forum?id=ecnpYYHjt9

Compressor summary: DiffS4L uses diffusion models to generate synthetic speech data with different variations, improving self-supervised learning for low-resource languages and under privacy concerns.


MF-CLR: Multi-Frequency Contrastive Learning Representation for Time Series

https://openreview.net/forum?id=ecO7WOIlMD

Compressor summary: MF-CLR is a self-supervised contrastive learning method for representing multi-frequency time series data, achieving excellent results on various financial downstream tasks.


Harnessing Hierarchical Label Distribution Variations in Test Agnostic Long-tail Recognition

https://openreview.net/forum?id=ebt5BfRHcW

Compressor summary: The paper presents a new mixture-of-expert approach, DirMixE, that captures both global and local variations in test label distributions for long-tail recognition tasks.


Leveraging Attractor Dynamics in Spatial Navigation for Better Language Parsing

https://openreview.net/forum?id=eapFRURALQ

Compressor summary: The PHE-trinity model explores how the hippocampus contributes to language comprehension by using a modular continuous attractor network to represent syntactic structure and two separate input streams, and demonstrates its effectiveness in learning from limited data.


Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

https://openreview.net/forum?id=eaNLvrP8n1

Compressor summary: AVTrack is a framework that adapts the computation of transformer-based visual trackers for real-time UAV tracking by dynamically optimizing ViT architecture and learning view-invariant representations.


Position: Leverage Foundational Models for Black-Box Optimization

https://openreview.net/forum?id=ea2MgKn3sV

Compressor summary: The paper explores how large language models can revolutionize black-box optimization by using their comprehension, flexibility, and performance prediction abilities.


Optimal Hessian/Jacobian-Free Nonconvex-PL Bilevel Optimization

https://openreview.net/forum?id=eZiQWM5U0E

Compressor summary: The paper proposes a new method (HJFBiO) for solving nonconvex-PL bilevel optimization problems without Hessian/Jacobian matrices, with optimal convergence and gradient complexity.


Learning-Rate-Free Stochastic Optimization over Riemannian Manifolds

https://openreview.net/forum?id=eY98MVffrD

Compressor summary: The paper proposes learning-rate-free algorithms for stochastic optimization over Riemannian manifolds that eliminate hand-tuning and provide optimal convergence guarantees.


Towards Theoretical Understanding of Learning Large-scale Dependent Data via Random Features

https://openreview.net/forum?id=eY4jrFe6Qc

Compressor summary: The paper analyzes how kernel ridge regression with random feature mapping performs on large-scale nonparametric regression with different types of data dependence structures, showing optimality under exponential decay but sub-optimality under polynomial decay.


Novel Spectral Algorithms for the Partial Credit Model

https://openreview.net/forum?id=eW0pZmziBH

Compressor summary: The paper introduces a fast and accurate statistical algorithm for inference under the Partial Credit Model used in psychometrics and other fields, with applications to education, recommendation systems, and finance.


StrokeNUWA—Tokenizing Strokes for Vector Graphic Synthesis

https://openreview.net/forum?id=eVlx8DaG9h

Compressor summary: StrokeNUWA is a method that uses vector graphics and semantic "stroke" tokens to enable more natural and efficient visual synthesis with large language models.


Graph-enhanced Large Language Models in Asynchronous Plan Reasoning

https://openreview.net/forum?id=eVGpdivOnQ

Compressor summary: The study investigates if large language models can plan asynchronously, finding that they struggle without illustrations, and proposes a new technique called Plan Like a Graph to improve performance.


Calibration Bottleneck: Over-compressed Representations are Less Calibratable

https://openreview.net/forum?id=eRThYD9BGD

Compressor summary: The paper proposes a new training method (PLP) that improves uncertainty calibration in deep neural networks by avoiding over-compression of top layers and using weak classifier heads.


Fast Decision Boundary based Out-of-Distribution Detector

https://openreview.net/forum?id=eQaOb4r6YC

Compressor summary: The paper proposes a fast and effective out-of-distribution detector that uses feature distances to decision boundaries without auxiliary models, achieving good performance and low latency.


Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings

https://openreview.net/forum?id=ePDnv4xESI

Compressor summary: MixPro is a data-efficient method for adapting pretrained models to new distributions using mixed embeddings and linear classifiers.


Ensemble Pruning for Out-of-distribution Generalization

https://openreview.net/forum?id=eP3vsbB5wW

Compressor summary: The paper proposes a method to prune deep neural network ensembles under distribution shifts using a topology graph, improving predictive diversity and generalization performance on out-of-distribution data.


Characteristic Guidance: Non-linear Correction for Diffusion Model at Large Guidance Scale

https://openreview.net/forum?id=eOtjMYdGLt

Compressor summary: Characteristic guidance is a non-linear correction method for DDPMs that improves semantic features and image quality by respecting the Fokker-Planck equation without additional training.


Advancing DRL Agents in Commercial Fighting Games: Training, Integration, and Agent-Human Alignment

https://openreview.net/forum?id=eN1T7I7OpZ

Compressor summary: The paper presents Shūkai, a DRL agent for fighting games that improves generalizability and sample efficiency with Heterogeneous League Training and specific rewards, and shows its effectiveness in Naruto Mobile.


Towards efficient deep spiking neural networks construction with spiking activity based pruning

https://openreview.net/forum?id=eMQyb1tvvc

Compressor summary: The paper proposes a method to compress spiking neural networks by dynamically pruning and regenerating convolutional kernels based on their activity levels, achieving low-power and high-efficiency performance while maintaining model accuracy.


RoboMP$^2$: A Robotic Multimodal Perception-Planning Framework with Multimodal Large Language Models

https://openreview.net/forum?id=eJFQROkaj0

Compressor summary: The paper introduces a new framework for robotic manipulation that combines multimodal perception and planning using tailored language models and retrieval-based policy learning.


Simplicity Bias of Two-Layer Networks beyond Linearly Separable Data

https://openreview.net/forum?id=eGZH3HCuGm

Compressor summary: This paper investigates how simplicity bias affects general neural networks, especially two-layer ones, and suggests that features learned in the middle stages of training may improve out-of-distribution generalization.


OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

https://openreview.net/forum?id=eG42XBhV9a

Compressor summary: The paper introduces OLLIE, a method that improves offline-to-online Imitation Learning by learning a better policy initialization and an aligned discriminator initialization, achieving better performance and efficiency in various domains.


Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

https://openreview.net/forum?id=eFvoL7BOny

Compressor summary: The paper proposes a novel quantum reinforcement learning algorithm that achieves provably efficient exploration-exploitation trade-off and breaks the $\Omega(\sqrt{T})$-regret barrier in classical RL.


Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

https://openreview.net/forum?id=eFSppFiVYG

Compressor summary: The paper proposes new generalization bounds for heavy-tailed stochastic optimization algorithms using fractional Fokker-Planck equation and shows that heavy tails can be beneficial or harmful depending on the problem structure.


Harnessing Neural Unit Dynamics for Effective and Scalable Class-Incremental Learning

https://openreview.net/forum?id=eDtty9ZCvt

Compressor summary: The paper proposes AutoActivator, a connectionist model with adaptive neural unit dynamics for class-incremental learning, which can expand its capacity when needed and reactivate required units at inference time without forgetting old classes.


Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

https://openreview.net/forum?id=eDjvSFOkXw

Compressor summary: Lookahead decoding is a parallel algorithm that accelerates large language model decoding without needing auxiliary models or data stores, achieving up to 4x speedup on multiple GPUs.


Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation

https://openreview.net/forum?id=eCCaHZKdl4

Compressor summary: The Uncertainty-aware Reward Model (URM) improves instruction following in language models by estimating response quality and uncertainty using Bayesian approximation, leading to better performance on benchmarks.


Saliency strikes back: How filtering out high frequencies improves white-box explanations

https://openreview.net/forum?id=eC1OOpOGZW

Compressor summary: FORGrad is a new method that improves the performance of white-box attribution methods by filtering out high-frequency artifacts in gradient signals, making them more accurate and computationally efficient for model explanations.


Simple linear attention language models balance the recall-throughput tradeoff

https://openreview.net/forum?id=e93ffDcpH3

Compressor summary: The text explores a new language model architecture called BASED that balances memory efficiency and recall ability by combining linear and sliding window attention, achieving competitive results on perplexity and real-world tasks.


Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

https://openreview.net/forum?id=e76GrGhIgf

Compressor summary: The paper explores the dense-to-sparse gating MoE, proposes a novel activation gate to improve convergence rates, and validates the results with simulations.


Adaptively Learning to Select-Rank in Online Platforms

https://openreview.net/forum?id=e5tA3Apbmy

Compressor summary: The paper proposes a user response model to adaptively rank items for heterogeneous users, using contextual bandits and an upper confidence bound, achieving low regret and improving user satisfaction.


Position: A Call for Embodied AI

https://openreview.net/forum?id=e5admkWKgV

Compressor summary: Embodied AI (E-AI) is proposed as a key step toward Artificial General Intelligence (AGI), focusing on embodiment, cognitive architectures, and active inference to enhance AI's ability to communicate, collaborate, and coexist with humans.


Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning

https://openreview.net/forum?id=e3geukCBw6

Compressor summary: Momentor is a Video-LLM that can understand and locate specific video segments using a large-scale dataset called Moment-10M.


Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

https://openreview.net/forum?id=e3Dpq3WdMv

Compressor summary: This study evaluates the trade-off between compression efficiency and trustworthiness in large language models using various techniques and dimensions, finding that quantization within a moderate bit range is more effective than pruning for achieving both goals.


Double Variance Reduction: A Smoothing Trick for Composite Optimization Problems without First-Order Gradient

https://openreview.net/forum?id=e1jPdRJeo7

Compressor summary: The paper introduces a new method called Zeroth-order Proximal Double Variance Reduction (ZPDVR) that reduces sampling and coordinate-wise variances in zeroth-order optimization, with less computational cost and better performance than existing methods.


Positive Concave Deep Equilibrium Models

https://openreview.net/forum?id=e0SKaKEEdr

Compressor summary: pcDEQ models improve DEQ models by ensuring positive weights and concave activation functions, providing theoretical guarantees for their fixed point and convergence, and performing well in language modeling and computer vision tasks.


Discovering Bias in Latent Space: An Unsupervised Debiasing Approach

https://openreview.net/forum?id=dztd61efGy

Compressor summary: SteerFair is a method to reduce biases in foundation models' QA capabilities by steering their internal representations away from spurious associations between input characteristics and correctness likelihood.


Imitation Learning from Purified Demonstrations

https://openreview.net/forum?id=dyfsPNuYCk

Compressor summary: The paper proposes a two-step purification method using diffusion models to remove noises in imperfect expert demonstrations for imitation learning, improving performance in real-world scenarios.


Robust Inverse Graphics via Probabilistic Inference

https://openreview.net/forum?id=dwWef5w2cR

Compressor summary: The paper proposes a Bayesian method called robust inverse graphics to infer 3D scenes from single images with unknown corruptions, using a strong scene prior and an uninformative uniform corruption prior, and shows that it outperforms other methods.


Slot Abstractors: Toward Scalable Abstract Visual Reasoning

https://openreview.net/forum?id=duyl8sy8qV

Compressor summary: Slot Abstractors is a new approach that combines slot-based methods and relational abstraction to enable scalable abstract visual reasoning with many objects and relations.


Repeat After Me: Transformers are Better than State Space Models at Copying

https://openreview.net/forum?id=duRRoGeoQT

Compressor summary: Key points: - Transformers are widely used for sequence modeling but GSSMs have fixed-size latent state that does not depend on sequence length - The paper shows that GSSMs are less efficient and generalize worse than transformers on tasks that require copying from the input context - The paper evaluates pretrained language models and finds transformers significantly outperform GSSMs at copying and retrieving information from context Summary: The paper compares transformers and GSSMs, showing that transformers are superior for tasks that need copying from the input context.


Feedback Efficient Online Fine-Tuning of Diffusion Models

https://openreview.net/forum?id=dtVlc9ybTm

Compressor summary: The paper proposes a new RL method to find high-reward samples in complex distributions by exploring the feasible manifold efficiently, with theory and experiments in images, biological sequences, and molecules.


Position: Automatic Environment Shaping is the Next Frontier in RL

https://openreview.net/forum?id=dslUyy1rN4

Compressor summary: The text argues that to advance robotics with sim-to-real reinforcement learning, researchers should focus on automating environment shaping procedures rather than tuning RL algorithms.


Community-Invariant Graph Contrastive Learning

https://openreview.net/forum?id=dskLpg8WFb

Compressor summary: The paper proposes a graph contrastive learning method that preserves the graph community structure during augmentation, improving robustness and generalization.


Does Label Smoothing Help Deep Partial Label Learning?

https://openreview.net/forum?id=drjjxmi2Ha

Compressor summary: The paper proposes label smoothing to improve deep partial label learning classifiers and provides theoretical and empirical evidence for its effectiveness.


Offline Transition Modeling via Contrastive Energy Learning

https://openreview.net/forum?id=dqpg8jdA2w

Compressor summary: Energy-based transition models capture complex real-world transitions and improve offline reinforcement learning performance.


An Improved Finite-time Analysis of Temporal Difference Learning with Deep Neural Networks

https://openreview.net/forum?id=dqdctVbSfs

Compressor summary: The paper develops a new analysis of neural TD learning algorithms that improves the sample complexity and achieves an $ ilde{\mathcal{O}}(\epsilon^{-1})$ error bound under Markovian sampling.


Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

https://openreview.net/forum?id=dplgaRn4Ae

Compressor summary: The paper proposes a new exploration by optimization approach with a hybrid regularizer to improve regret bounds in online decision-making problems under limited feedback, achieving nearly optimal performance in both stochastic and adversarial environments.


ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints

https://openreview.net/forum?id=dmfvHU1LNF

Compressor summary: The paper presents a new policy optimization algorithm (ACPO) for average-CMDPs with theoretical guarantees and experimental results showing its effectiveness in various challenging environments.


Causality Based Front-door Defense Against Backdoor Attack on Language Models

https://openreview.net/forum?id=dmHHVcHFdM

Compressor summary: Key points: - The paper proposes FABE, a new framework based on causal inference to protect language models from backdoor attacks. - FABE creates a 'front door' that maps out the actual causal relationships and filters out spurious associations. - FABE achieves state-of-the-art results in defending against various attack methods. Summary: FABE is a new framework that uses causal inference to defend language models from backdoor attacks by creating a 'front door' that separates legitimate and spurious associations, improving the defense effect significantly.


Balanced Resonate-and-Fire Neurons

https://openreview.net/forum?id=dkdilv4XD4

Compressor summary: The balanced resonate-and-fire neuron (BRF) is an improved spiking neural network model that achieves higher performance, lower spike count, fewer parameters, faster convergence, and better stability than previous models.


Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

https://openreview.net/forum?id=disVlUOH4b

Compressor summary: HOP is a novel multi-agent algorithm that efficiently adapts to co-players in mixed-motive environments by hierarchically modeling opponents' goals and using Monte Carlo Tree Search for planning.


ELTA: An Enhancer against Long-Tail for Aesthetics-oriented Models

https://openreview.net/forum?id=dhrNfAJAH6

Compressor summary: ELTA is a technique that enhances aesthetic image assessment by improving minority feature representation, aligning features and labels, and refining output distribution, especially for long-tailed datasets.


LQER: Low-Rank Quantization Error Reconstruction for LLMs

https://openreview.net/forum?id=dh8k41g775

Compressor summary: LQER is a method to reduce quantization errors in large language models, enabling near-lossless compression and improved downstream task performance with less hardware resources.


Consistent Long-Term Forecasting of Ergodic Dynamical Systems

https://openreview.net/forum?id=dfR6FU53qk

Compressor summary: The text describes a method to improve long-term forecasting of dynamical systems using techniques from operator theory and statistics, with uniform error bounds on infinite time horizons.


Deep Equilibrium Models are Almost Equivalent to Not-so-deep Explicit Models for High-dimensional Gaussian Mixtures

https://openreview.net/forum?id=ddjRdm3wUW

Compressor summary: The paper analyzes implicit neural networks' spectral behavior using random matrix theory and proposes a method to design shallow explicit networks that match their kernel matrices.


On Which Nodes Does GCN Fail? Enhancing GCN From the Node Perspective

https://openreview.net/forum?id=dcwUGaK9sQ

Compressor summary: The paper proposes DaGCN, a framework that improves Graph Convolutional Networks by handling nodes that do not fit the label smoothness assumption and are not well-represented by existing GCNs.


Equivariant Graph Neural Operator for Modeling 3D Dynamics

https://openreview.net/forum?id=dccRCYmL5x

Compressor summary: Equivariant Graph Neural Operator (EGNO) is a novel method that learns 3D dynamics as trajectories using equivariant temporal convolutions, outperforming existing methods in multiple domains.


MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

https://openreview.net/forum?id=dbFEFHAD79

Compressor summary: The paper introduces a benchmark called MLLM-as-a-Judge to evaluate multimodal large language models, finding they perform well in pair comparisons but struggle with scoring evaluation and batch ranking tasks due to various biases and inconsistencies.


Reparameterized Importance Sampling for Robust Variational Bayesian Neural Networks

https://openreview.net/forum?id=da7MMwICjC

Compressor summary: RIS is a novel sampling method that reduces variance and improves convergence, performance, and uncertainty estimation in BNNs.


Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation

https://openreview.net/forum?id=dZsEOFUDew

Compressor summary: The text proposes a perspective to understand how pre-training language models with next-token prediction enables them to reason by aggregating indirect paths seen during pre-training on knowledge and reasoning graphs.


On The Statistical Complexity of Offline Decision-Making

https://openreview.net/forum?id=dYDPcx78tm

Compressor summary: The paper analyzes how well offline data can be used for online decisions, finding near-optimal performance bounds based on function approximation and a new policy concept that covers all existing data-related notions.


Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

https://openreview.net/forum?id=dWxb80a0TW

Compressor summary: The paper proposes a universal geometric graph representation for 3D molecular complexes and a Generalist Equivariant Transformer (GET) model to capture interactions between various molecule types using one model that preserves fine-grained information.


Denoising Autoregressive Representation Learning

https://openreview.net/forum?id=dW29JZj0G5

Compressor summary: The paper proposes DARL, a simple decoder-only Transformer that learns strong visual representations for image generation by using tailored noise schedules and longer training in larger models.


Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

https://openreview.net/forum?id=dVpFKfqF3R

Compressor summary: The paper proposes using categorical cross-entropy for training value functions in deep reinforcement learning, which improves performance and scalability across various domains.


NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

https://openreview.net/forum?id=dVhrnjZJad

Compressor summary: The paper proposes factorized diffusion models for text-to-speech generation that disentangle speech into different attributes and generate them individually, leading to improved speech quality and naturalness.


A Unified View of FANOVA: A Comprehensive Bayesian Framework for Component Selection and Estimation

https://openreview.net/forum?id=dV9QGostQk

Compressor summary: The paper introduces a flexible and scalable Bayesian framework for FANOVA models that can handle different sparsity levels and unify various methods, while enabling uncertainty quantification and novel model developments.


Contrasting Multiple Representations with the Multi-Marginal Matching Gap

https://openreview.net/forum?id=dV9B9qFeGi

Compressor summary: The paper proposes a new loss function, M3G, that uses multi-marginal optimal transport theory to learn representations from multiple views, showing improved performance in self-supervised and multimodal tasks.


Human vs. Generative AI in Content Creation Competition: Symbiosis or Conflict?

https://openreview.net/forum?id=dT6ZbSxh33

Compressor summary: The text discusses how generative AI impacts content creation, introduces a competition model to study the balance between humans and AI, and suggests a stable equilibrium is possible.


Adaptive Observation Cost Control for Variational Quantum Eigensolvers

https://openreview.net/forum?id=dSrdnhLS2h

Compressor summary: The paper proposes a method called SubsCoRe that uses Gaussian process surrogate to control the cost and accuracy of SMO in VQE by adjusting the required accuracy and number of measurement shots.


Solving Poisson Equations using Neural Walk-on-Spheres

https://openreview.net/forum?id=dQveBV9lZl

Compressor summary: The paper introduces Neural Walk-on-Spheres, a new neural network method for solving high-dimensional Poisson equations efficiently, with better accuracy, speed, and reduced memory usage compared to existing methods.


Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport

https://openreview.net/forum?id=dMhF96PfQi

Compressor summary: Semi-dual JKO is a scalable Wasserstein gradient flow model with reduced training complexity that achieves competitive results in image generation.


Compositional Text-to-Image Generation with Dense Blob Representations

https://openreview.net/forum?id=dMOhgHNYAf

Compressor summary: BlobGEN is a text-to-image model that uses dense blob representations to capture fine-grained scene details, enabling better controllability and compositionality with large language models.


Position: A Safe Harbor for AI Evaluation and Red Teaming

https://openreview.net/forum?id=dLojMSgSFW

Compressor summary: Independent evaluation of generative AI systems is crucial for safety, but current terms of service and research access programs discourage it; developers should provide legal and technical safe harbor for public interest research without fear of account suspension or legal reprisal.


In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

https://openreview.net/forum?id=dJTChKgv3a

Compressor summary: The proposed in-context vectors (ICV) approach improves the efficiency and effectiveness of in-context learning for large language models, enabling them to follow demonstration examples better and handle diverse tasks more flexibly.


Deep Functional Factor Models: Forecasting High-Dimensional Functional Time Series via Bayesian Nonparametric Factorization

https://openreview.net/forum?id=dHXKCyaIkp

Compressor summary: The Deep Functional Factor Model (DF2M) is a Bayesian nonparametric model that combines the Indian Buffet Process, multi-task Gaussian Processes, and a deep kernel function to analyze high-dimensional functional time series, offering explainability and better predictive accuracy than conventional deep learning models.


Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs

https://openreview.net/forum?id=dGDFZM018a

Compressor summary: Key points: - Graph Neural Networks (GNNs) are powerful for homophilic graphs but not for heterophilic graphs with dissimilar node features - Signed Message Passing (SMP) is a widely used method to handle heterophilic graphs but has limitations - The paper proposes Multiset to Multiset GNN (M2M-GNN), a novel message-passing function that overcomes the limitations of SMP and performs better Summary: The paper introduces M2M-GNN, a new method for graph neural networks that handles heterophilic graphs with different node features better than existing methods like SMP.


Self-Supervised Interpretable End-to-End Learning via Latent Functional Modularity

https://openreview.net/forum?id=dFEeI51O5j

Compressor summary: MoNet is a modular network that learns task-specific decision-making processes without supervision, enabling effective and interpretable visual navigation in indoor environments.


A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity

https://openreview.net/forum?id=dBqHGZPGZI

Compressor summary: The study examines how direct preference optimization (DPO) reduces toxicity in pre-trained language models and reveals that it bypasses rather than removes capabilities learned from pre-training.


Position: The Causal Revolution Needs Scientific Pragmatism

https://openreview.net/forum?id=dBMLtuKH01

Compressor summary: The authors propose scientific pragmatism, a balanced approach between scientific perfectionism and system-centric biases, to advance causal models and methods in knowledge generation and application.


Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees

https://openreview.net/forum?id=d5tJWH5yCi

Compressor summary: Key points: - The paper proposes a dynamic algorithm for maintaining decision trees under adversarial updates. - The algorithm guarantees good tree quality and fast update time. - The algorithm works for various metrics and types of decision trees, including boosted ones. Summary: The paper presents a fast and high-quality dynamic algorithm for updating different types of decision trees under data changes, with provable worst-case bounds.


KernelSHAP-IQ: Weighted Least Square Optimization for Shapley Interactions

https://openreview.net/forum?id=d5jXW2H4gg

Compressor summary: The paper proposes an approach to interpret higher-order interactions in complex ML models using the Shapley Interaction Index and shows its effectiveness with KernelSHAP-IQ.


InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks

https://openreview.net/forum?id=d5LURMSfTx

Compressor summary: InfiAgent-DABench is a benchmark to evaluate LLM-based agents on data analysis tasks using a format-prompting technique and an agent framework.


From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

https://openreview.net/forum?id=d2vONO90Rw

Compressor summary: The authors propose a new method called supervised pinpoint tuning (SPT) that selectively fine-tunes specific modules in large language models to reduce sycophancy without compromising their general abilities.


GRATH: Gradual Self-Truthifying for Large Language Models

https://openreview.net/forum?id=d2f2sCXQuI

Compressor summary: The paper proposes GRATH, a method to improve the truthfulness of large language models using out-of-domain question prompts and direct preference optimization.


Multiplicative Weights Update, Area Convexity and Random Coordinate Descent for Densest Subgraph Problems

https://openreview.net/forum?id=d2E2i5rJ4x

Compressor summary: The paper presents new algorithms for finding and decomposing dense subgraphs in graphs, improving on previous work in terms of iteration complexity, convergence rate, and practicality.


Neural Jump-Diffusion Temporal Point Processes

https://openreview.net/forum?id=d1P6GtRzuV

Compressor summary: The paper proposes a new type of temporal point process model, called Neural Jump-Diffusion Temporal Point Process (NJDTPP), that uses neural networks to parameterize its intensity dynamics and achieves better performance than existing models.


Collapse-Aware Triplet Decoupling for Adversarially Robust Image Retrieval

https://openreview.net/forum?id=cy3JBZKCw1

Compressor summary: The paper introduces a method called CA-TRIDE to improve image retrieval robustness against adversarial examples by addressing the limitations of existing deep metric learning approaches, and shows its effectiveness on three datasets.


Weakly-Supervised Residual Evidential Learning for Multi-Instance Uncertainty Estimation

https://openreview.net/forum?id=cxiqxDnrCx

Compressor summary: This paper introduces a new method, MIREL, for uncertainty estimation in weakly-supervised learning scenarios with multiple instances and weak annotations.


Mean Estimation in the Add-Remove Model of Differential Privacy

https://openreview.net/forum?id=cwIhvoTzuK

Compressor summary: The paper proposes an optimal algorithm for one-dimensional mean estimation under the add-remove model of differential privacy, showing that it performs similarly to the swap model and improves upon existing methods.


Dynamic Spectral Clustering with Provable Approximation Guarantee

https://openreview.net/forum?id=coP4kPdhKr

Compressor summary: The paper proposes a fast dynamic spectral clustering algorithm for evolving graphs that can approximate cluster structures well under certain conditions.


Hierarchical Integral Probability Metrics: A distance on random probability measures with low sample complexity

https://openreview.net/forum?id=cmy38XZlJu

Compressor summary: The paper introduces a new distance for comparing probabilities that is faster, easier to estimate, and has more properties than the existing Wasserstein distance.


Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

https://openreview.net/forum?id=cmD5E6ami4

Compressor summary: Symmetric replay training (SRT) improves sample efficiency in deep reinforcement learning for combinatorial optimization by using high-reward samples to explore under-explored symmetric regions without additional online interactions.


BOtied: Multi-objective Bayesian optimization with tied multivariate ranks

https://openreview.net/forum?id=cj5HbaX14p

Compressor summary: The paper proposes a new acquisition function called BOtied, which uses the CDF indicator to efficiently optimize multiple competing objectives using copulas.


Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes

https://openreview.net/forum?id=cit0hg4sEz

Compressor summary: FedKSeed is a method for federated full-parameter tuning of large language models that reduces communication cost by using zeroth-order optimization, random seeds, and probability-differentiated seed sampling, improving performance over existing methods.


Taylor Videos for Action Recognition

https://openreview.net/forum?id=chhIZGqlUG

Compressor summary: The authors propose a new video format called Taylor video that extracts dominant motions from videos using Taylor expansion and show its effectiveness for action recognition with different architectures and modalities.


Beyond Individual Input for Deep Anomaly Detection on Tabular Data

https://openreview.net/forum?id=chDpBp2P6b

Compressor summary: The paper proposes a new anomaly detection method using Non-Parametric Transformers to capture feature and sample dependencies and achieves state-of-the-art performance on 31 benchmark datasets.


Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts

https://openreview.net/forum?id=ccSSKTz9LX

Compressor summary: This paper studies how fine-tuning affects long-tail learning tasks and proposes LIFT, a lightweight fine-tuning algorithm that reduces training time, parameters, and improves performance.


Trained Random Forests Completely Reveal your Dataset

https://openreview.net/forum?id=cc72Vnfvoc

Compressor summary: The authors present an optimization-based method to reconstruct datasets used for training random forests, using common libraries and showing that most methods are vulnerable to this attack.


Understanding the Impact of Introducing Constraints at Inference Time on Generalization Error

https://openreview.net/forum?id=cbZTnjqIib

Compressor summary: This paper examines how constraining outputs of an ML model at inference time affects its generalization error and suggests choosing a proper loss function for this approach.


Graph Generation with Diffusion Mixture

https://openreview.net/forum?id=cZTFxktg23

Compressor summary: The paper proposes a generative framework that models the topology of graphs by explicitly learning the final graph structures of the diffusion process, improving graph and molecule generation tasks.


Continuous Treatment Effects with Surrogate Outcomes

https://openreview.net/forum?id=cZNuYKtoOZ

Compressor summary: The paper proposes a method to estimate continuous treatment effects using surrogate variables and labeled/unlabeled data, improving accuracy and addressing selection bias.


The Max-Min Formulation of Multi-Objective Reinforcement Learning: From Theory to a Model-Free Algorithm

https://openreview.net/forum?id=cY9g0bwiZx

Compressor summary: The paper presents a new approach for multi-objective reinforcement learning that ensures fairness among multiple goals and improves performance over existing methods.


Variational Learning is Effective for Large Deep Networks

https://openreview.net/forum?id=cXBv07GKvk

Compressor summary: The paper argues that a new optimizer called IVON performs well for training large neural networks and has advantages over Adam in terms of predictive uncertainty, finetuning, and sensitivity to data.


Just Cluster It: An Approach for Exploration in High-Dimensions using Clustering and Pre-Trained Representations

https://openreview.net/forum?id=cXBPPfNUZJ

Compressor summary: The paper proposes a density estimation-based exploration method for 3-D environments using clustering on random or pre-trained representations, showing its effectiveness and integration of pre-trained biases in exploration.


FESSNC: Fast Exponentially Stable and Safe Neural Controller

https://openreview.net/forum?id=cVp8blEw2i

Compressor summary: The FESSNC is a fast neural controller for nonlinear systems that ensures stability and safety using heuristic learning, projection operators, and Hutchinson's trace estimator.


A sampling theory perspective on activations for implicit neural representations

https://openreview.net/forum?id=cVkqItmYLQ

Compressor summary: This paper explores implicit neural representations using sampling theory, finding that $\mathrm{sinc}$ activations are optimal for shallow encodings and connecting them to dynamical systems.


SIN: Selective and Interpretable Normalization for Long-Term Time Series Forecasting

https://openreview.net/forum?id=cUMOVfOIve

Compressor summary: The paper introduces SIN, a selective and interpretable method for normalizing time series data to improve forecasting accuracy.


Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

https://openreview.net/forum?id=cU20finY8V

Compressor summary: Sharpness-aware minimization (SAM) improves generalization by discarding undesirable model biases through perturbed forgetting, which is related to information bottleneck principle and outperforms standard SAM on various benchmarks.


Understanding Finetuning for Factual Knowledge Extraction

https://openreview.net/forum?id=cPsn9AcOYh

Compressor summary: The impact of QA fine-tuning data on factuality depends on the familiarity of facts, with lesser-known facts reducing factuality and better-known ones maintaining or improving it.


Accelerating Heterogeneous Federated Learning with Closed-form Classifiers

https://openreview.net/forum?id=cMige5MK1N

Compressor summary: Fed3R is a method for federated learning that works well with non-identical data distributions, is efficient, and can be fine-tuned with other algorithms.


Discovering Mixtures of Structural Causal Models from Time Series Data

https://openreview.net/forum?id=cHJAUdam3i

Compressor summary: The paper proposes MCD, a variational inference framework to discover causal models from time series data with different underlying causal structures, and shows its superior performance on synthetic and real datasets.


Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention

https://openreview.net/forum?id=cFDaYtZR4u

Compressor summary: CInA is a new self-supervised method that uses multiple unlabeled datasets for causal learning and enables zero-shot causal inference on unseen tasks with high accuracy, potentially paving the way for causal foundation models.


Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems

https://openreview.net/forum?id=cEJ9jNJuJP

Compressor summary: The paper questions the effectiveness of using machine learning to generate heatmaps for guiding Monte Carlo tree search in solving large-scale traveling salesman problems, and suggests future research directions.


AST-T5: Structure-Aware Pretraining for Code Generation and Understanding

https://openreview.net/forum?id=cBWVJh5Fvf

Compressor summary: AST-T5 is a novel pretraining method that uses Abstract Syntax Trees to improve code generation, transpilation, and understanding tasks, outperforming similar-sized language models in various scenarios.


Transforming and Combining Rewards for Aligning Large Language Models

https://openreview.net/forum?id=cAWbm9KRZO

Compressor summary: The LSC-transformation method improves language model alignment by emphasizing poorly-performing outputs, preventing underfitting and reward hacking, and allowing principled aggregation of multiple rewards.


Bayesian Adaptation of Network Depth and Width for Continual Learning

https://openreview.net/forum?id=c9HddKGiYk

Compressor summary: The paper proposes a new Bayesian method to adjust network depth and width in dynamic architecture-based continual learning, achieving better or similar results than existing methods and working well for unsupervised learning too.


Scalable Pre-training of Large Autoregressive Image Models

https://openreview.net/forum?id=c92KDfEZTg

Compressor summary: The paper presents AIM, a set of vision models inspired by Large Language Models that scale well with data and model size, and can be pre-trained on a 7 billion parameter model on 2 billion images for high ImageNet-1k performance.


A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models

https://openreview.net/forum?id=c8qWiNiqRY

Compressor summary: DiPmark is a watermarking technique that preserves the original content's distribution while being accessible and robust to token changes.


Hybrid Reinforcement Learning from Offline Observation Alone

https://openreview.net/forum?id=c6rVlTKpb5

Compressor summary: The paper explores the challenges and solutions for hybrid reinforcement learning with observation-only offline data, and proposes an algorithm that performs well even without a reset model of the environment.


Position: Insights from Survey Methodology can Improve Training Data

https://openreview.net/forum?id=c3ls5AVOw7

Compressor summary: The text discusses the importance of high-quality data for AI/ML models, the challenges of collecting such data, and how survey methodology can help improve data quality and reduce biases.


Deep Neural Room Acoustics Primitive

https://openreview.net/forum?id=c2CKmP9l5X

Compressor summary: The paper proposes DeepNeRAP, a method to learn a continuous neural field that encodes sound propagation dynamics in 3D spaces and infers room impulse response without direct ground truth access.


Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

https://openreview.net/forum?id=c1AKcA6ry1

Compressor summary: The paper explores the theory and algorithms of aligning generative models with RLHF, proposes new methods that outperform existing ones, and shows their effectiveness on a large language model.


A3S: A General Active Clustering Method with Pairwise Constraints

https://openreview.net/forum?id=c18noxRh3X

Compressor summary: A3S is a novel framework that improves active clustering performance by adjusting initial cluster results using Normalized mutual information gain and reducing human queries.


A Language Model’s Guide Through Latent Space

https://openreview.net/forum?id=c0LoolDFw4

Compressor summary: The paper explores how to control language models with different concepts beyond truthfulness and evaluates their effectiveness using a new metric and extensive experiments.


What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement

https://openreview.net/forum?id=bzNwexOPWm

Compressor summary: Key points: - Language models make errors and suffer from catastrophic forgetting when updated with corrected instances - The goal is to predict which upstream examples will be forgotten for better replay control and interpretability - A partially interpretable model based on logit scores performs well on BART but not on T5 - A black-box classifier based on inner products of representations outperforms the interpretable model - Replaying forecasted forgotten examples reduces forgetting and shows practical utility Summary: The paper proposes models to predict which upstream examples will be forgotten by language models after update, using either interpretable or black-box methods, and shows that replaying these examples improves performance and reduces forgetting.


Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

https://openreview.net/forum?id=byxXa99PtF

Compressor summary: Input clarification ensembling is a framework for decomposing the uncertainty of large language models into aleatoric (data) and epistemic (model) components, improving reliability and interpretability.


Autoformalizing Euclidean Geometry

https://openreview.net/forum?id=bylZbZOsGA

Compressor summary: The paper presents a neuro-symbolic approach to automatically translate informal Euclidean geometry proofs into formal theorems using theorem provers, large language models, and semantic evaluation.


Plug-and-Play image restoration with Stochastic deNOising REgularization

https://openreview.net/forum?id=byAXJTk0LH

Compressor summary: SNORE is a new PnP algorithm that applies the denoiser only on images with appropriate noise levels, improving image restoration results.


An Unsupervised Approach for Periodic Source Detection in Time Series

https://openreview.net/forum?id=bwZlD7mYoa

Compressor summary: The paper proposes a new method to detect periodic patterns in noisy time series data without labels or augmentations and shows significant improvements over existing methods.


Optimal Ridge Regularization for Out-of-Distribution Prediction

https://openreview.net/forum?id=bvPYroQgc3

Compressor summary: The study examines how optimal ridge regularization and risk behave in out-of-distribution prediction scenarios, showing that negative regularization can be optimal and the tuned risk depends on data aspect ratio.


Feature Distribution on Graph Topology Mediates the Effect of Graph Convolution: Homophily Perspective

https://openreview.net/forum?id=buW1Bi6XFw

Compressor summary: Randomly shuffling feature vectors among nodes of the same class improves graph neural network performance by reducing the dependence between graph topology and features.


Precise Accuracy / Robustness Tradeoffs in Regression: Case of General Norms

https://openreview.net/forum?id=btYeH65fI3

Compressor summary: The paper studies how adversarial attacks affect linear regression models and finds the best balance between robustness and accuracy for different scenarios.


Challenges and Considerations in the Evaluation of Bayesian Causal Discovery

https://openreview.net/forum?id=bqgtkBDkNs

Compressor summary: The paper discusses challenges in evaluating Bayesian Causal Discovery methods due to uncertainty in inferred causal graphs and proposes factors to consider for better assessment.


SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

https://openreview.net/forum?id=bq1JEgioLr

Compressor summary: SciBench is a benchmark suite for testing large language models on complex scientific problems across math, chemistry, and physics domains, revealing their current limitations and areas for improvement.


Batch and match: black-box variational inference with a score-based divergence

https://openreview.net/forum?id=bplNmU2ROC

Compressor summary: BaM is a new black-box variational inference method that uses a score-based divergence and optimizes Gaussian variational families with full covariance matrices, which converges faster and better than ELBO-based methods.


Sequential Kernel Goodness-of-fit Testing

https://openreview.net/forum?id=bmeUeCUMHA

Compressor summary: A possible summary is: The authors propose a novel sequential goodness-of-fit testing method that adapts to data complexity and maintains statistical power by using a betting strategy.


Nonlinear Filtering with Brenier Optimal Transport Maps

https://openreview.net/forum?id=blzDxD6bKt

Compressor summary: The paper proposes a new nonlinear filtering method based on Brenier optimal transport that uses neural networks and stochastic optimization to handle degenerate likelihoods, high-dimensional states, and multi-modal distributions.


Learning Decision Trees and Forests with Algorithmic Recourse

https://openreview.net/forum?id=blGpu9aGs6

Compressor summary: Key points: - New algorithm (AR) for accurate tree-based models with recourse actions - Ensures existence of actions by constraining tree growth - Uses adversarial training and greedy algorithm - Applies to random forest and improves accuracy and efficiency Summary: The paper introduces AR, a new algorithm that learns accurate tree models with guaranteed recourse actions using adversarial training and a greedy approach. It works on random forests and outperforms baselines in accuracy and efficiency.


Fundamental Limits of Distributed Covariance Matrix Estimation Under Communication Constraints

https://openreview.net/forum?id=biE1uHyG0l

Compressor summary: This paper studies how two agents can help a central server estimate a high-dimensional covariance matrix by communicating limited information about disjoint samples.


Unbiased Multi-Label Learning from Crowdsourced Annotations

https://openreview.net/forum?id=bgP8Rxv2eB

Compressor summary: Key points: - CMLL problem: multiple true labels, unreliable labels from annotators - Existing methods: focus on inferring true labels, not predicting - New method: unbiased risk estimator based on transition matrices, decoupled autoencoder to exploit label correlations - Generalization error bound for convergence Summary: The paper proposes a new method for CMLL that uses an unbiased risk estimator and a decoupled autoencoder to handle unreliable labels and improve prediction.


Submodular framework for structured-sparse optimal transport

https://openreview.net/forum?id=bfQCO9Vqhk

Compressor summary: The paper proposes sparsity-constrained optimal transport methods for learning sparse transport plans with efficient algorithms and theory.


High-Dimensional Bayesian Optimization via Semi-Supervised Learning with Optimized Unlabeled Data Sampling

https://openreview.net/forum?id=beXQVQorse

Compressor summary: Teacher-Student Bayesian Optimization (TSBO) is a novel semi-supervised learning approach that uses unlabeled data, teacher and student models to minimize labeled data queries and improve sample-efficiency in global optimization tasks.


Riemannian coordinate descent algorithms on matrix manifolds

https://openreview.net/forum?id=bdKaQmrM81

Compressor summary: The paper presents coordinate descent algorithms for various matrix manifolds that update fewer variables per iteration than Riemannian optimization while maintaining feasibility and providing low cost and efficiency.


State-Constrained Zero-Sum Differential Games with One-Sided Information

https://openreview.net/forum?id=bcN7KSB2YS

Compressor summary: The paper studies games with state constraints and one-sided information, and provides theoretical results for computing behavioral strategies and belief manipulation in such games.


On the Recoverability of Causal Relations from Temporally Aggregated I.I.D. Data

https://openreview.net/forum?id=bZNH0SU37Y

Compressor summary: The effect of temporal aggregation on non-temporal causal discovery depends on the consistency criterion used, the degree of nonlinearity, and the presence of partial linearity or prior information.


Split-and-Denoise: Protect large language model inference with local differential privacy

https://openreview.net/forum?id=bZ4fzw1iz7

Compressor summary: SnD is a private inference framework that splits LLMs and adds noise to embeddings before transmitting them to servers, enhancing privacy while maintaining performance.


Timer: Generative Pre-trained Transformers Are Large Time Series Models

https://openreview.net/forum?id=bYRYb7DMNo

Compressor summary: The paper introduces the Time Series Transformer (Timer), a large language model pre-trained on a curated dataset of heterogeneous time series, which can perform various tasks such as forecasting, imputation, and anomaly detection.


Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

https://openreview.net/forum?id=bX3J7ho18S

Compressor summary: The paper proposes a method to estimate LLM-generated text in scientific peer reviews and finds that 6.5%-16.9% of the review text could be modified by LLMs, with higher occurrence in low confidence reviews and near deadline submissions.


Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

https://openreview.net/forum?id=bWZKvF0g7G

Compressor summary: The text discusses a new vision-language safe instruction-following dataset called VLGuard that helps improve the safety of large language models without compromising their helpfulness.


Position: TrustLLM: Trustworthiness in Large Language Models

https://openreview.net/forum?id=bWUU0LwwMp

Compressor summary: The paper introduces TrustLLM, a study on trustworthiness in large language models, evaluating 16 mainstream models across eight dimensions and finding that proprietary models generally outperform open-source ones, but some open-source models come close to them.


Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

https://openreview.net/forum?id=bWNPx6t0sF

Compressor summary: The paper analyzes various methods for fine-tuning language models with preference data and finds that approaches using on-policy sampling and negative gradient outperform offline and maximum likelihood objectives, which are unified under mode-seeking objectives for categorical distributions.


Controlled Decoding from Language Models

https://openreview.net/forum?id=bVIcZb7Qa0

Compressor summary: The paper proposes controlled decoding (CD), a modular solver that uses a prefix scorer to learn a value function and control the generation of a frozen base model, achieving effective alignment of language models with multiple rewards.


Self-Infilling Code Generation

https://openreview.net/forum?id=bV9yT24t9B

Compressor summary: The paper introduces a new method for generating code called self-infilling, which can create context and content simultaneously, improving output quality and regularizing the generation process.


Differentiable Model Scaling using Differentiable Topk

https://openreview.net/forum?id=bULHOW1RXM

Compressor summary: DMS is a method that efficiently searches for optimal network width and depth, improving performance on various tasks such as image classification, object detection, and language modeling.


Graph-Triggered Rising Bandits

https://openreview.net/forum?id=bPsohGR6gD

Compressor summary: The paper introduces a new model for bandits where arms' rewards depend on each other through a graph, studies its optimal and suboptimal policies, and presents regret minimization algorithms with structured graphs and no-regret properties.


Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference

https://openreview.net/forum?id=bOhzU7NpTB

Compressor summary: Modular-DCM is an efficient algorithm that uses pre-trained models to sample from identifiable causal queries in high-dimensional data, outperforming baselines and handling latent confounders.


Robust Data-driven Prescriptiveness Optimization

https://openreview.net/forum?id=bNgAdyv7ZP

Compressor summary: The paper proposes a new measure called the coefficient of prescriptiveness to compare optimization techniques that use side information for better decisions and introduces a model with this measure, which is solved by a bisection algorithm.


Watermarks in the Sand: Impossibility of Strong Watermarking for Language Models

https://openreview.net/forum?id=bM2s12t4hR

Compressor summary: This paper proves that strong watermarking schemes for generative models are impossible under realistic assumptions and shows an efficient attack on three existing schemes using a "quality" and "perturbation" oracle.


Genie: Generative Interactive Environments

https://openreview.net/forum?id=bJbSbJskOS

Compressor summary: Genie is a large-scale generative model that can create diverse virtual environments from text or images and can be controlled by user actions without any supervision.


Policy Evaluation for Variance in Average Reward Reinforcement Learning

https://openreview.net/forum?id=bID9PiBFpT

Compressor summary: The paper proposes a new algorithm for reinforcement learning that considers risk as measured by asymptotic variance and shows it converges in finite time.


High-Dimensional Kernel Methods under Covariate Shift: Data-Dependent Implicit Regularization

https://openreview.net/forum?id=bBzlapzeR1

Compressor summary: The paper examines how importance re-weighting improves kernel ridge regression in high dimensions with covariate shifts by analyzing the bias-variance trade-off and providing asymptotic expansions of kernels under covariate shift.


Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

https://openreview.net/forum?id=bBkQ51PmjC

Compressor summary: The paper proposes a novel learning-based approach for solving Quadratic Assignment Problems (QAPs) using a Solution AWare Transformer (SAWT) architecture that encodes facility and location nodes separately, enabling scalability to larger problem sizes.


Analyzing $D^\alpha$ seeding for $k$-means

https://openreview.net/forum?id=b9uHveqszc

Compressor summary: The paper analyzes how the $D^\alpha$ seeding algorithm performs better than standard $k$-means for any $\alpha>2$, and gives theoretical and empirical evidence for this improvement.


Fine-grained Classes and How to Find Them

https://openreview.net/forum?id=b9VfvegTEO

Compressor summary: FALCON is an unsupervised method that uses coarse labels to discover and relate fine-grained classes in image and single-cell classification tasks, improving performance significantly.


Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators

https://openreview.net/forum?id=b89JtZj9gm

Compressor summary: Modern Text-to-Image generators like Stable Diffusion can improve the robustness and generalization of neural image classifiers by simulating interventions over environmental factors in the training data.


Graph As Point Set

https://openreview.net/forum?id=b6yHkQpSwZ

Compressor summary: The paper proposes a novel way to convert graphs into sets and use set encoders like Transformers to learn from them, improving the expressivity and performance of Graph Neural Networks.


ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

https://openreview.net/forum?id=b6rA0kAHT1

Compressor summary: Key points: - The paper proposes an algorithmic framework for multi-turn RL with LLMs that balances flexibility and efficiency. - The framework, ArCHer, combines a high-level value function learner and a low-level token-by-token policy learner. - ArCHer outperforms prior methods in sample efficiency and performance on multi-turn LLM tasks. Summary: The paper introduces ArCHer, an algorithmic framework for multi-turn RL with LLMs that uses a high-level value function and a low-level token-by-token policy to optimize long-term objectives efficiently and effectively.


Probabilistic Subgoal Representations for Hierarchical Reinforcement Learning

https://openreview.net/forum?id=b6AwZauZPV

Compressor summary: The paper proposes a probabilistic subgoal representation function for hierarchical reinforcement learning using Gaussian Processes, which improves performance in various tasks and environments with stochastic uncertainties and diverse rewards.


Conformal Prediction for Deep Classifier via Label Ranking

https://openreview.net/forum?id=b3pYoZfcoo

Compressor summary: SAPS is a novel algorithm for conformal prediction that reduces prediction set size by discarding probability values except for the maximum softmax probability, while preserving uncertainty information and improving conditional coverage rates.


IM-Unpack: Training and Inference with Arbitrarily Low Precision Integers

https://openreview.net/forum?id=b2D9PBNNQ2

Compressor summary: The paper proposes a simple algorithm to represent heavy matrix entries with low bit-width integers and achieve efficiency gains in GEMM operations for Transformer-based models.


Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics

https://openreview.net/forum?id=b1iurBHDck

Compressor summary: The authors propose a novel algorithm that uses a multimodal variational autoencoder to reconstruct nonlinear dynamical systems from various types of data, including symbolic data, in a generative framework.


Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective

https://openreview.net/forum?id=b1YQ5WKY3w

Compressor summary: The paper investigates whether in-context learning in large language models is equivalent to Bayesian inference using the martingale property, and finds evidence against this hypothesis.


ILILT: Implicit Learning of Inverse Lithography Technologies

https://openreview.net/forum?id=b0lxGL2n3d

Compressor summary: The paper presents a new machine learning framework that can generate high-quality masks for chip design without using inverse lithography solvers, achieving better performance than existing methods.


Category-Aware Active Domain Adaptation

https://openreview.net/forum?id=axwrD8F1yq

Compressor summary: The text proposes a novel method for active domain adaptation that focuses on improving individual categories without harming others by identifying the most important unlabeled data samples using influence function analysis.


Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

https://openreview.net/forum?id=axl3FAkpik

Compressor summary: The Binoculars method accurately detects machine-generated text from various large language models using simple calculations with two pre-trained models, achieving state-of-the-art performance without any training data or model-specific modifications.


Language-guided Skill Learning with Temporal Variational Inference

https://openreview.net/forum?id=awo5H10K6v

Compressor summary: The algorithm uses LLMs to segment trajectories and then merges them using variational inference and an auxiliary objective to discover reusable skills for agents in different environments.


Towards Theoretical Understandings of Self-Consuming Generative Models

https://openreview.net/forum?id=aw6L8sB2Ts

Compressor summary: The paper studies how mixing real and synthetic data in generative models affects data distributions, derives bounds on the TV distance between them, and reveals a phase transition point where the distance declines.


Language Models Represent Beliefs of Self and Others

https://openreview.net/forum?id=asJTE8EBjg

Compressor summary: The study reveals how large language models represent and decode beliefs, showing their importance for social reasoning in various tasks.


Position: Opportunities Exist for Machine Learning in Magnetic Fusion Energy

https://openreview.net/forum?id=arwP5FA2dO

Compressor summary: Key research challenges in fusion energy that could benefit from Machine Learning applications are discussed, highlighting six areas where ML can contribute to advancing fusion as a carbon-free energy source.


DPN: Decoupling Partition and Navigation for Neural Solvers of Min-max Vehicle Routing Problems

https://openreview.net/forum?id=ar174skI9u

Compressor summary: The paper proposes a novel attention-based encoder for solving the min-max vehicle routing problem, which improves the efficiency and optimality of sequential planning by decoupling customer partition and navigation tasks and using an agent-permutation-symmetric loss function.


Privacy-Preserving Embedding via Look-up Table Evaluation with Fully Homomorphic Encryption

https://openreview.net/forum?id=apxON2uH4N

Compressor summary: The paper proposes an efficient algorithm for privacy-preserving word embedding using homomorphic encryption and coded inputs.


Toward Adaptive Reasoning in Large Language Models with Thought Rollback

https://openreview.net/forum?id=aoAPOOtN9E

Compressor summary: The paper introduces Thought Rollback, a new reasoning framework for LLMs that allows them to adaptively build thought structures and revise mistaken ones to solve problems better under hallucinations.


EvoluNet: Advancing Dynamic Non-IID Transfer Learning on Graphs

https://openreview.net/forum?id=anM1M5aoM8

Compressor summary: EvoluNet is a novel framework for non-IID transfer learning on dynamic graphs that leverages temporal encoding and domain unification modules to improve generalization performance.


Position: Why Tabular Foundation Models Should Be a Research Priority

https://openreview.net/forum?id=amRSBdZlw9

Compressor summary: The authors propose developing large tabular models (LTMs) that can contextualize multiple datasets, which could have significant impacts on various fields and tasks involving tabular data.


FedLMT: Tackling System Heterogeneity of Federated Learning via Low-Rank Model Training with Theoretical Guarantees

https://openreview.net/forum?id=akyElNlUVA

Compressor summary: FedLMT and pFedLMT are frameworks for federated learning that address resource heterogeneity among clients using pre-factorized low-rank models, improving model accuracy and reducing costs.


Multi-layer Rehearsal Feature Augmentation for Class-Incremental Learning

https://openreview.net/forum?id=aksdU1KOpT

Compressor summary: The paper proposes Multi-layer Rehearsal Feature Augmentation (MRFA) to improve generalization and reduce catastrophic forgetting in Class-Incremental Learning by optimizing the all-layer margin on rehearsal samples.


Exploiting Human-AI Dependence for Learning to Defer

https://openreview.net/forum?id=aiz79FxjaI

Compressor summary: The paper introduces dependent Bayes optimality and a deferral principle for learning to defer frameworks that exploit the dependence between models and experts, and propose a novel consistent surrogate loss based on these concepts.


Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

https://openreview.net/forum?id=ahEm3l2P6w

Compressor summary: The paper introduces OWL, a novel pruning method for large language models that uses non-uniform layerwise sparsity ratios based on outlier features to improve performance and reduce model size.


Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

https://openreview.net/forum?id=ah1BlQcLv4

Compressor summary: Transformers can learn gradient descent and non-linear functions by implementing them in function space, enabling efficient learning for various non-linear tasks and activation choices.


Gaussian Processes on Cellular Complexes

https://openreview.net/forum?id=afnyJfQddk

Compressor summary: The paper introduces Gaussian processes on cellular complexes, a generalization of graphs that captures polyadic relations, and proposes two new kernels to model interactions between vertices, edges, and higher-order cells.


Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning

https://openreview.net/forum?id=aeXRBnLoPP

Compressor summary: The paper introduces Accelerated Policy Gradient (APG), a method to improve convergence rates in reinforcement learning by adapting Nesterov's accelerated gradient to policy optimization, and proves its theoretical convergence properties.


Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition

https://openreview.net/forum?id=ad5I6No9G1

Compressor summary: The paper explains why models can improve on modular addition problem even after overfitting, and suggests that this phenomenon is due to a transition from kernel-like behavior to limiting behavior of gradient descent on deep networks.


Graph Geometry-Preserving Autoencoders

https://openreview.net/forum?id=acTLXagzqd

Compressor summary: This paper introduces a new autoencoder framework that preserves the data's geometric structure using a similarity graph and Riemannian geometry, leading to better latent representation learning.


Socialized Learning: Making Each Other Better Through Multi-Agent Collaboration

https://openreview.net/forum?id=aaeJpJw5Ur

Compressor summary: The text proposes Multi-Agent Socialized Collaboration (MASC), a method for achieving socialized learning in multi-agent systems, which prioritizes original expert classes' accuracy while acquiring new abilities.


Predicting Lagrangian Multipliers for Mixed Integer Linear Programs

https://openreview.net/forum?id=aZnZOqUOHq

Compressor summary: The paper proposes a deep learning method for finding Lagrangian Multipliers that improves bounds on Mixed Integer Linear Programs by using graph neural networks to encode and decode relaxed constraints.


Improving Open-Ended Text Generation via Adaptive Decoding

https://openreview.net/forum?id=aXD94eATtT

Compressor summary: Adaptive decoding is a mechanism that helps language models choose better candidates for tokens during generation by increasing confidence based on an entropy-based metric.


Beyond Regular Grids: Fourier-Based Neural Operators on Arbitrary Domains

https://openreview.net/forum?id=aVqqoFAavs

Compressor summary: The paper proposes a method to extend neural operators for learning solutions of PDEs to non-equispaced point distributions by efficiently evaluating spectral transformations, achieving faster training and comparable or improved accuracy.


Self-attention Networks Localize When QK-eigenspectrum Concentrates

https://openreview.net/forum?id=aRZjRj41WQ

Compressor summary: The text discusses how self-attention mechanisms in machine learning can be improved by reducing localization and addressing rank and entropy collapses for better model performance.


UP2ME: Univariate Pre-training to Multivariate Fine-tuning as a General-purpose Framework for Multivariate Time Series Analysis

https://openreview.net/forum?id=aR3uxWlZhX

Compressor summary: The paper proposes UP2ME, a general-purpose framework for multivariate time series tasks that combines univariate pre-training and multivariate fine-tuning to achieve state-of-the-art performance in forecasting, imputation and anomaly detection.


SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

https://openreview.net/forum?id=aQl4xiwVBc

Compressor summary: The paper proposes PRepBN and SLA modules to improve efficiency of transformers for computer vision and natural language tasks, achieving lower latency and similar performance compared to existing methods.


A Universal Transfer Theorem for Convex Optimization Algorithms Using Inexact First-order Oracles

https://openreview.net/forum?id=aPhwhueqjR

Compressor summary: The paper presents a method to adapt any convex optimization algorithm to work with inexact first-order information, without knowing the algorithm's details, and applies to various types of algorithms and nonconvex problems.


On the Embedding Collapse when Scaling up Recommendation Models

https://openreview.net/forum?id=aPVwOAr1aW

Compressor summary: The paper proposes a multi-embedding design to address embedding collapse, which restricts scalability in large recommendation models, and shows its effectiveness through experiments.


How Smooth Is Attention?

https://openreview.net/forum?id=aP0H8A1ywk

Compressor summary: The paper studies the Lipschitz properties of self-attention and masked self-attention in Transformers, showing how sequence length and layer normalization affect their bounds, and proposes a novel mean-field framework for masked self-attention.


Boosting Offline Optimizers with Surrogate Sensitivity

https://openreview.net/forum?id=aLSA3JH08h

Compressor summary: The text discusses improving offline optimization by developing a sensitivity measurement for surrogate models and using it to create a regularizer that enhances optimization performance.


Copula-Nested Spectral Kernel Network

https://openreview.net/forum?id=aK1FyEP2Sn

Compressor summary: CokeNet is a novel spectral density function for spectral kernel networks that enhances diversity and captures complex data dependencies, leading to better machine learning performance.


BayOTIDE: Bayesian Online Multivariate Time Series Imputation with Functional Decomposition

https://openreview.net/forum?id=aGBpiEcB8z

Compressor summary: BayOTIDE is an online imputation method that models multivariate time series as a combination of low-rank temporal factors with different patterns, using Gaussian Processes and state-space priors.


Learning to Explore for Stochastic Gradient MCMC

https://openreview.net/forum?id=aECamk9izk

Compressor summary: Key points: - BNNs with high-dimensional parameters have multi-modal posterior distributions - SGMCMC with cyclical learning rate scheduling can explore them but is computationally expensive - The paper proposes a meta-learning strategy to build SGMCMC that efficiently explores the posteriors - The algorithm transfers well to various tasks and improves sampling efficiency Summary: The paper introduces a meta-learning method for building SGMCMC that can efficiently explore multi-modal posterior distributions of BNNs, achieving better performance on image classification and other tasks.


Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

https://openreview.net/forum?id=aC1LSa4nXs

Compressor summary: ConfDiff is a force-guided diffusion model for protein conformation generation that incorporates physical prior knowledge to produce diverse and accurate conformations, outperforming existing methods.


Simulation of Graph Algorithms with Looped Transformers

https://openreview.net/forum?id=aA2326y3hf

Compressor summary: This paper studies how looped transformer networks with extra attention heads can simulate various graph algorithms and proves their theoretical properties.


Rolling Diffusion Models

https://openreview.net/forum?id=a9bzTv9SzO

Compressor summary: Rolling Diffusion is a novel method for denoising temporal data that adapts to uncertainty by adding more noise to later frames, outperforming standard diffusion methods in complex scenarios like video prediction and fluid dynamics.


Critical windows: non-asymptotic theory for feature emergence in diffusion models

https://openreview.net/forum?id=a8ZpjLJuKk

Compressor summary: The authors develop theory to understand critical windows, narrow time intervals in image generation when specific features emerge, and use the framework to analyze and diagnose issues in diffusion models.


SurfPro: Functional Protein Design Based on Continuous Surface

https://openreview.net/forum?id=a8QpoEJCRI

Compressor summary: SurfPro is a method that generates functional proteins with desired surfaces and biochemical properties by encoding the geometric shape and biochemical features of a protein surface and decoding an amino acid sequence.


A New Computationally Efficient Algorithm to solve Feature Selection for Functional Data Classification in High-dimensional Spaces

https://openreview.net/forum?id=a7MW5kFFOf

Compressor summary: The paper presents a new method for selecting important features and classifying functional data in scenarios with categorical responses and multivariate longitudinal features, which is faster and more accurate than existing methods.


Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

https://openreview.net/forum?id=a6wCNfIj8E

Compressor summary: The paper proposes functional reward encoding (FRE), a method to pre-train a generalist agent from unlabeled data using a transformer-based variational auto-encoder, enabling it to solve new tasks in a zero-shot manner with minimal supervision.


Position: What makes an image realistic?

https://openreview.net/forum?id=a6366lEzbX

Compressor summary: The text discusses the difficulty of quantifying realism in generated data and proposes a novel concept called universal critic that could help solve this problem without adversarial training.


Reward Shaping for Reinforcement Learning with An Assistant Reward Agent

https://openreview.net/forum?id=a3XFF0PGLU

Compressor summary: The paper presents a dual-agent reward shaping method that improves reinforcement learning's sample efficiency and stability in sparse-reward tasks by generating auxiliary reward signals.


Efficient Pareto Manifold Learning with Low-Rank Structure

https://openreview.net/forum?id=a2uFstsHPb

Compressor summary: The authors propose a novel multi-task learning method that uses low-rank matrices and orthogonal regularization to efficiently learn the Pareto front and achieve better performance, especially on large datasets.


PAC-Bayesian Error Bound, via Rényi Divergence, for a Class of Linear Time-Invariant State-Space Models

https://openreview.net/forum?id=a1Olc2QhPv

Compressor summary: The paper develops a PAC-Bayesian error bound for linear time-invariant stochastic state-space models, which are used in control engineering and econometrics, and have applications in recurrent neural networks.


$\mathtt{VITS}$ : Variational Inference Thompson Sampling for contextual bandits

https://openreview.net/forum?id=a1GvTbadqA

Compressor summary: The paper introduces a new Thompson sampling variant called Varational Inference TS (VITS) that uses Gaussian Variational Inference for efficient posterior approximation and achieves sub-linear regret in linear contextual bandits.


Diffusion Models Encode the Intrinsic Dimension of Data Manifolds

https://openreview.net/forum?id=a0XiA6v256

Compressor summary: The authors show that diffusion models can estimate the intrinsic dimension of data manifolds by approximating their normal bundles using the score function, which points towards the manifold when noise is low.


Ditto: Quantization-aware Secure Inference of Transformers upon MPC

https://openreview.net/forum?id=ZzXNCQGzqT

Compressor summary: Ditto is a framework that enables efficient and secure quantization-aware inference for Transformers using multi-party computation techniques, reducing computation and communication overhead.


MC-GTA: Metric-Constrained Model-Based Clustering using Goodness-of-fit Tests with Autocorrelations

https://openreview.net/forum?id=ZzFTrzo0Cp

Compressor summary: MC-GTA is a novel clustering algorithm that uses feature similarity and metric autocorrelation to improve performance, stability, and computational efficiency for various data analysis tasks.


Estimating Canopy Height at Scale

https://openreview.net/forum?id=ZzCY0fRver

Compressor summary: The paper presents a framework that uses satellite data to estimate canopy heights globally with improved accuracy compared to existing methods, enabling better ecological analyses.


Federated Self-Explaining GNNs with Anti-shortcut Augmentations

https://openreview.net/forum?id=ZxDqSBgFSM

Compressor summary: The paper proposes a method called Federated Graph Rationalization (FedGR) that uses anti-shortcut augmentations to generate explanations for predictions made by graph neural networks in federated learning settings.


AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors

https://openreview.net/forum?id=ZwrfsrCduj

Compressor summary: The paper proposes a method to distinguish between task-irrelevant visual distractors using an Implicit Action Generator (IAG) that learns the behavior of distractors and improves performance on various visual control tasks.


Position: AI-Powered Autonomous Weapons Risk Geopolitical Instability and Threaten AI Research

https://openreview.net/forum?id=ZwUThOE7Zc

Compressor summary: Machine learning in autonomous weapons systems increases the risk of low-intensity conflicts, arms race, and undermines geopolitical stability and AI research transparency.


Minimum-Norm Interpolation Under Covariate Shift

https://openreview.net/forum?id=Zw7TcnTmHj

Compressor summary: The paper investigates transfer learning in high-dimensional linear regression with benign overfitting, showing how overparameterized models behave under different types of covariate shifts.


Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features

https://openreview.net/forum?id=Zw52bJCZXc

Compressor summary: The paper introduces two new stochastic Frank-Wolfe methods for optimization problems with structured constraints that have better convergence guarantees and avoid issues of large batches or full gradients.


Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement

https://openreview.net/forum?id=ZvJ2lQQKjz

Compressor summary: MERL is a multimodal learning framework for ECGs that uses reports and text prompts to classify heart diseases without training data, outperforming eSSL methods.


The Stronger the Diffusion Model, the Easier the Backdoor: Data Poisoning to Induce Copyright BreachesWithout Adjusting Finetuning Pipeline

https://openreview.net/forum?id=ZvFLbEPv6x

Compressor summary: The study proposes a stealthy backdoor attack, SilentBadDiffusion, that can induce copyright infringement in text-to-image diffusion models without controlling their training, and shows its effectiveness on various model architectures.


SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets

https://openreview.net/forum?id=ZtOXZCTgBa

Compressor summary: SeMOPO is a new model-based offline RL approach that separates latent states into endogenous and exogenous parts, reducing bias in uncertainty estimation when dealing with complex distractors.


Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

https://openreview.net/forum?id=ZtMqsSkIHX

Compressor summary: The text discusses modifying a game's rewards to achieve a specific policy goal, studying the conditions for success and proposing an algorithm to solve this problem efficiently.


GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

https://openreview.net/forum?id=Zsz9Pdfvtg

Compressor summary: Key points: - Cross-modal transformers integrate different modalities for vision tasks - GeminiFusion is a pixel-wise fusion approach that combines intra-modal and inter-modal attentions - GeminiFusion adapts to the complexity of the input and achieves superior performance on various multimodal tasks Summary: GeminiFusion is a novel cross-modal transformer that fuses different modalities using aligned representations and adaptive attentions, achieving high performance on image-to-image translation, 3D object detection, and semantic segmentation.


Smoothing Proximal Gradient Methods for Nonsmooth Sparsity Constrained Optimization: Optimality Conditions and Global Convergence

https://openreview.net/forum?id=Zs3qW8Njov

Compressor summary: Smoothing Proximal Gradient Methods (SPGM) are explored as solutions to nonsmooth sparsity constrained optimization problems, with two variants (SPGM-IHT and SPGM-BCD) showing improved performance over existing methods in theory and practice.


Bridging Environments and Language with Rendering Functions and Vision-Language Models

https://openreview.net/forum?id=ZrM67ZZ5vj

Compressor summary: The paper proposes a novel method for building language-conditioned agents using vision-language models by first finding an optimal environment configuration and then using a goal-conditioned policy to reach it, achieving better zero-shot generalization than multi-task RL baselines.


Fast-Slow Test-Time Adaptation for Online Vision-and-Language Navigation

https://openreview.net/forum?id=Zos5wsaB5r

Compressor summary: The paper proposes a method for online Vision-and-Language Navigation that adapts model parameters using fast-slow tests and shows improved performance on four benchmarks.


Discounted Adaptive Online Learning: Towards Better Regularization

https://openreview.net/forum?id=ZoTIdyExx6

Compressor summary: The text studies an adaptive online optimization algorithm for online learning in changing environments and shows how it improves regularization and uncertainty prediction.


Probabilistic Constrained Reinforcement Learning with Formal Interpretability

https://openreview.net/forum?id=Zo9zXdVhW2

Compressor summary: AWaVO is a novel method for reinforcement learning that provides interpretability through convergence guarantee, training transparency, and intrinsic decision-interpretation, achieving good performance in simulation and real-world quadrotor tasks.


Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time

https://openreview.net/forum?id=Zn44XGFGam

Compressor summary: The paper analyzes the performance of two-layer ReLU networks with weight decay regularization and their convex relaxations, showing that random training data can lead to a small optimality gap and improved convergence rates for local gradient methods.


Optimal Acceleration for Minimax and Fixed-Point Problems is Not Unique

https://openreview.net/forum?id=ZeF75iQcAc

Compressor summary: The paper presents new acceleration mechanisms for minimax optimization and fixed-point problems with the same optimality as existing anchor-based methods, but different characteristics.


Generalization Bound and New Algorithm for Clean-Label Backdoor Attack

https://openreview.net/forum?id=ZdqiT0McON

Compressor summary: The paper derives generalization bounds for the clean-label backdoor attack scenario and proposes a new attack method using a combination of adversarial noise and indiscriminate poison.


Score-Based Causal Discovery of Latent Variable Causal Models

https://openreview.net/forum?id=ZdSe1qnuia

Compressor summary: The paper proposes new score-based methods for identifying causal structures involving latent variables, addressing challenges of existing constraint-based methods and providing identifiability guarantees.


OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

https://openreview.net/forum?id=ZctlF8RlV4

Compressor summary: The paper introduces a novel optimization framework for one-shot structured pruning that improves efficiency and accuracy on vision and language models, including very large ones with tens of billions of parameters.


MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions

https://openreview.net/forum?id=Zc22RDtsvP

Compressor summary: The paper introduces MagicLens, a self-supervised image retrieval model that supports open-ended text instructions to find images with rich relations beyond visual similarity by using implicit relations from web pages.


Few-Shot Character Understanding in Movies as an Assessment to Meta-Learning of Theory-of-Mind

https://openreview.net/forum?id=ZZ7UKgK4c1

Compressor summary: The paper introduces a new dataset and prompting approach to study how AI can understand characters' mental states in movies, and shows that existing models perform worse than humans in this task.


CaPS: Collaborative and Private Synthetic Data Generation from Distributed Sources

https://openreview.net/forum?id=ZXsNkm3bxu

Compressor summary: The text proposes a framework for generating synthetic data from distributed sources using secure multi-party computation and differential privacy, allowing data sharing without entrusting raw data to a central entity.


Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

https://openreview.net/forum?id=ZVmMV3AHjC

Compressor summary: The paper proposes a model-free algorithm for two-player Markov games that achieves optimal sample complexity, using variance reduction and reference value functions.


Safe Exploration in Dose Finding Clinical Trials with Heterogeneous Participants

https://openreview.net/forum?id=ZUXvpIrz5l

Compressor summary: SAFE-T is an adaptive dose-finding procedure that uses Bayesian optimization to learn non-parametric models for toxicity and efficacy, while satisfying safety constraints and improving utility for heterogeneous participants in early phase clinical trials.


MorphGrower: A Synchronized Layer-by-layer Growing Approach for Plausible Neuronal Morphology Generation

https://openreview.net/forum?id=ZTN866OsGx

Compressor summary: MorphGrower is a learning-based method that mimics neuron growth to generate realistic and valid neural morphologies, outperforming previous approaches.


Online Learning and Information Exponents: The Importance of Batch size & Time/Complexity Tradeoffs

https://openreview.net/forum?id=ZSQAf5YlvN

Compressor summary: The optimal batch size for training two-layer neural networks with SGD depends on the target's hardness and information exponent, and a new protocol called Correlation loss SGD can improve training speed beyond traditional batch sizes.


On Convergence of Incremental Gradient for Non-convex Smooth Functions

https://openreview.net/forum?id=ZRMQX6aTUS

Compressor summary: The paper studies SGD algorithms with arbitrary data ordering and improves convergence guarantees for incremental gradient and single shuffle SGD.


Cooperative Graph Neural Networks

https://openreview.net/forum?id=ZQcqXCuoxD

Compressor summary: The paper introduces a flexible and dynamic message-passing framework for graph neural networks, where nodes can choose to listen, broadcast, or isolate, enabling more effective exploration of the graph topology.


Sampling is as easy as keeping the consistency: convergence guarantee for Consistency Models

https://openreview.net/forum?id=ZPiEIhQpos

Compressor summary: The paper provides a convergence guarantee for Consistency Models, one-step generative models that can match Diffusion Models in sample quality.


AdsorbDiff: Adsorbate Placement via Conditional Denoising Diffusion

https://openreview.net/forum?id=ZMgpE58PMj

Compressor summary: The authors propose a new method to find the best position and orientation for adsorbates on a slab, which can accelerate the discovery of novel catalysts by using denoising diffusion, machine learning force field, and DFT.


The Fundamental Limits of Least-Privilege Learning

https://openreview.net/forum?id=ZGEICuuUJo

Compressor summary: The text formalizes the least-privilege principle for machine learning and proves its trade-off between utility and leakage, showing it is not possible to learn highly useful representations that prevent inference of unrelated attributes.


Getting the most out of your tokenizer for pre-training and domain adaptation

https://openreview.net/forum?id=ZFYBnLljtT

Compressor summary: This paper investigates how different tokenizer designs affect the performance of language models for code generation tasks, and provides recommendations for optimizing them.


A Unified Adaptive Testing System Enabled by Hierarchical Structure Search

https://openreview.net/forum?id=ZFRrOiZruJ

Compressor summary: The paper presents a new ATS framework that learns from data to create optimal personalized exams, reducing errors and questions while maintaining accuracy.


NExT-Chat: An LMM for Chat, Detection and Segmentation

https://openreview.net/forum?id=ZAW37OZ6ig

Compressor summary: The paper introduces the pix2emb method for object location modeling in LMMs, enabling them to handle various multimodal tasks using different location formats.


Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach

https://openreview.net/forum?id=Z7MzVDFWDV

Compressor summary: The paper proposes a method to identify and estimate a parameter in missing not at random data by fusing it with auxiliary data that is missing at random.


How Universal Polynomial Bases Enhance Spectral Graph Neural Networks: Heterophily, Over-smoothing, and Over-squashing

https://openreview.net/forum?id=Z2LH6Va7L2

Compressor summary: UniFilter is a novel GNN that adapts polynomial bases to the heterophily degree of a graph, improving convolution and propagation, and facilitating graph explanation.


Learning Reward for Robot Skills Using Large Language Models via Self-Alignment

https://openreview.net/forum?id=Z19JQ6WFtJ

Compressor summary: The paper proposes a method that uses Large Language Models (LLM) and self-alignment to learn reward functions more efficiently for robot skills without human guidance.


Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption

https://openreview.net/forum?id=Z0S6fUdW68

Compressor summary: The paper proposes two algorithms for adversarial corruption in model-based reinforcement learning, one for online and one for offline settings, and provides theoretical guarantees for their performance.


Neural Tangent Kernels for Axis-Aligned Tree Ensembles

https://openreview.net/forum?id=YxmcEfcgp3

Compressor summary: The paper studies how imposing the axis-aligned constraint on soft trees affects their learning behavior and explores various tree architectures using Neural Tangent Kernel analysis and Multiple Kernel Learning.


Transolver: A Fast Transformer Solver for PDEs on General Geometries

https://openreview.net/forum?id=Ywl6pODXjB

Compressor summary: Transolver uses Physics-Attention to learn intrinsic physical states from discretized geometries and improve solving partial differential equations.


Non-stationary Online Convex Optimization with Arbitrary Delays

https://openreview.net/forum?id=YvPNwLedpQ

Compressor summary: The paper proposes an algorithm for online convex optimization with arbitrary delays in non-stationary environments and shows its performance bounds and improvements over existing methods.


AND: Audio Network Dissection for Interpreting Deep Acoustic Models

https://openreview.net/forum?id=YvAyOYeGlo

Compressor summary: Possible summary: The paper introduces AND, a framework that explains acoustic neurons' behaviors and features using natural language summaries from large language models.


Model-Free Robust $\phi$-Divergence Reinforcement Learning Using Both Offline and Online Data

https://openreview.net/forum?id=Yug1IEkvcb

Compressor summary: The paper proposes two model-free algorithms for learning optimal robust policies in high-dimensional systems using $\phi$-divergences, one that uses only historical data and another that combines historical and online sampling.


CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling

https://openreview.net/forum?id=YuNFJSEkTi

Compressor summary: CasCast is a cascaded framework that uses a diffusion transformer to improve precipitation nowcasting and forecasting of extreme events.


Sample Average Approximation for Conditional Stochastic Optimization with Dependent Data

https://openreview.net/forum?id=YuGnRORkJm

Compressor summary: The paper proposes a Sample Average Approximation method for Conditional Stochastic Optimization with dependent data and provides theoretical guarantees of its consistency, sample complexity, and finite sample guarantee under mild conditions.


On the Generalization of Equivariant Graph Neural Networks

https://openreview.net/forum?id=Yqj3DzIC79

Compressor summary: The authors provide the first generalization bound for EGNNs, showing how spectral norms and layer weights affect their performance and propose a new regularizer based on these insights.


Principled Preferential Bayesian Optimization

https://openreview.net/forum?id=YqMOM5W9GF

Compressor summary: We propose an optimistic algorithm for preferential Bayesian optimization using preference feedback and a confidence set, achieving an information-theoretic bound on cumulative regret and guaranteed convergence rate, outperforming existing heuristics without guarantees.


Beyond the Norms: Detecting Prediction Errors in Regression Models

https://openreview.net/forum?id=YqIIhl2ToH

Compressor summary: The paper proposes a method to measure and quantify uncertainty in regression algorithms, helping detect unreliable behavior and improve error detection.


Measures of diversity and space-filling designs for categorical data

https://openreview.net/forum?id=YoUb2vW9WP

Compressor summary: The paper proposes new methods for selecting diverse subsets of categorical data using combinatorial optimization and submodular optimization techniques.


Unsupervised Evaluation of Code LLMs with Round-Trip Correctness

https://openreview.net/forum?id=YnFuUX08CE

Compressor summary: Round-trip correctness (RTC) is a new evaluation method for large language models that allows assessing their performance on various code-related tasks without manual curation, by checking if the model's prediction and synthesis match semantically.


Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression

https://openreview.net/forum?id=Yn8xnK90mS

Compressor summary: The paper studies how fast the EM algorithm converges for two-component mixed linear regression, providing explicit expressions and characterizing the trajectory of iterations, leading to improved error bounds and convergence exponent estimates.


Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

https://openreview.net/forum?id=YlcSyCz21c

Compressor summary: The paper proposes a method called PaRe that uses a gating mechanism to transfer knowledge from large-scale pretrained models to various target modalities by creating intermediate data-rich modalities and improving cross-modal fine-tuning stability and performance.


Supervised Matrix Factorization: Local Landscape Analysis and Applications

https://openreview.net/forum?id=YlJy1FcM9E

Compressor summary: The paper analyzes the optimization landscape of supervised matrix factorization (SMF), derives applications, and proposes a block coordinate descent algorithm with convergence guarantees and a GPU-friendly neural implementation.


Stability and Multigroup Fairness in Ranking with Uncertain Predictions

https://openreview.net/forum?id=YiblhkVl2w

Compressor summary: The paper studies ranking functions that account for predictors' uncertainty and shows they can achieve both stability and fairness in classification tasks.


Improving Interpretation Faithfulness for Vision Transformers

https://openreview.net/forum?id=YdwwWRX20q

Compressor summary: FViTs are a rigorous approach to improve ViTs' explanations and robustness by applying Denoised Diffusion Smoothing (DDS) to their self-attention vectors and prediction distributions.


Unified Training of Universal Time Series Forecasting Transformers

https://openreview.net/forum?id=Yd8eHMY1wz

Compressor summary: The text proposes Moirai, a new time series Transformer model that addresses challenges in universal forecasting by using a pre-trained Large Time Series Model on diverse datasets and achieves competitive or superior performance.


Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

https://openreview.net/forum?id=YbHCqn4qF4

Compressor summary: The paper proposes a new efficient generic vision backbone, Vim, using bidirectional Mamba blocks that improve performance and memory efficiency over existing Transformer-based models on various computer vision tasks.


Harmonizing Generalization and Personalization in Federated Prompt Learning

https://openreview.net/forum?id=YYwERRXsJW

Compressor summary: FedPGP is a method for federated prompt learning that balances generalization and personalization using CLIP guidance and low-rank adaptation.


To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO

https://openreview.net/forum?id=YWuSLBkfOw

Compressor summary: The paper proposes a small network (TempNet) to predict personalized temperature for large foundation models, which enhances their performance and can be applied to new tasks.


OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models

https://openreview.net/forum?id=YT1dtdLvSN

Compressor summary: OptiMUS is an LLM-based agent that can formulate and solve linear programming problems from natural language descriptions, outperforming existing methods on easy and hard datasets.


RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

https://openreview.net/forum?id=YSoMmNWZZx

Compressor summary: RL-VLM-F automatically generates reward functions for agents to learn new tasks using text and visual inputs, by querying vision language models for preference feedbacks.


Two-Stage Shadow Inclusion Estimation: An IV Approach for Causal Inference under Latent Confounding and Collider Bias

https://openreview.net/forum?id=YRWdiaupCr

Compressor summary: The paper proposes a novel Two-Stage Shadow Inclusion (2SSI) method that addresses latent confounding bias and collider bias in causal inference using the treatment residual as a shadow variable.


Conformal Prediction with Learned Features

https://openreview.net/forum?id=YPbcUBcTAk

Compressor summary: PLCP is a novel framework that learns uncertainty-guided features for improving conditional validity of prediction sets using calibration data, machine learning models, and alternating gradient descent, with theoretical and empirical results showing its superior performance.


Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation

https://openreview.net/forum?id=YNvGFaOG1p

Compressor summary: The paper presents tight convergence bounds for actor-critic and natural actor-critic algorithms in reinforcement learning with compatible function approximation, addressing challenges such as stochastic bias and non-ergodicity.


What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks

https://openreview.net/forum?id=YNbCbcGyXE

Compressor summary: The study investigates how varying depth of transformer architecture affects its performance in different sequence learning tasks, showing that at least two attention layers are needed for reasoning and generalization, while three may be required for contextual generalization.


Interpretability Illusions in the Generalization of Simplified Models

https://openreview.net/forum?id=YJWlUMW6YP

Compressor summary: The study warns that simplifying deep learning models may lead to inaccurate predictions when the models encounter out-of-distribution data, as they might overfit the training set.


Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

https://openreview.net/forum?id=YEQM0asWCH

Compressor summary: CPR is a multi-task learning framework that enables interpretability and adaptability in complex decision processes by generating context-specific policies on-demand as observations change.


IBD-PSC: Input-level Backdoor Detection via Parameter-oriented Scaling Consistency

https://openreview.net/forum?id=YCzbfs2few

Compressor summary: IBD-PSC is a simple input-level backdoor detection method for deep neural networks that uses parameter-oriented scaling consistency to filter out malicious images and requires adaptive selection of batch normalization layers.


Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Features Model

https://openreview.net/forum?id=YBetKvUlF7

Compressor summary: The paper studies how deep neural networks behave when trained on imbalanced datasets, showing that their features collapse to a structure of orthogonal vectors with lengths depending on the number of training samples.


Subhomogeneous Deep Equilibrium Models

https://openreview.net/forum?id=YBXwr7wF7i

Compressor summary: The paper analyzes the existence and uniqueness of fixed points for implicit-depth neural networks using subhomogeneous operators and Perron-Frobenius theory, leading to a more flexible framework for well-defined networks.


On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

https://openreview.net/forum?id=YB1O99gK7b

Compressor summary: GEEX is a new method for explaining deep learning models without accessing their internal workings, which produces gradient-like explanations using only query-level access and has proven theoretical and empirical advantages over existing black-box methods.


Random Latent Exploration for Deep Reinforcement Learning

https://openreview.net/forum?id=Y9qzwNlKVU

Compressor summary: RLE is a new exploration technique for deep RL that adds structured random rewards to original task rewards in random states to improve performance on Atari and IsaacGym benchmarks.


Emergence of In-Context Reinforcement Learning from Noise Distillation

https://openreview.net/forum?id=Y8KsHT1kTV

Compressor summary: AD$^\varepsilon$ is a new method for in-context Reinforcement Learning that uses noise injection to learn from unlabeled data and improve performance without requiring optimal policies.


GeoMFormer: A General Architecture for Geometric Molecular Representation Learning

https://openreview.net/forum?id=Y5Zi59N265

Compressor summary: The text introduces GeoMFormer, a novel Transformer-based molecular model that learns invariant and equivariant features for molecular systems using cross-attention modules.


Nash Learning from Human Feedback

https://openreview.net/forum?id=Y5AmNYiyCQ

Compressor summary: The study introduces NLHF, an alternative pipeline for fine-tuning LLMs using pairwise human feedback that aims to align them better with human preferences.


Using Uncertainty Quantification to Characterize and Improve Out-of-Domain Learning for PDEs

https://openreview.net/forum?id=Y50K6DSrWo

Compressor summary: DiverseNO uses multiple neural operator heads to improve uncertainty estimates and out-of-domain performance, while Operator-ProbConserv updates the model with well-calibrated UQ estimates.


Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

https://openreview.net/forum?id=Y4wxCICbD0

Compressor summary: The paper proposes Linear Alignment, a new algorithm that aligns AI assistants with human preferences in one step, without needing data annotation or model training, improving their performance and efficiency.


CuTS: Customizable Tabular Synthetic Data Generation

https://openreview.net/forum?id=Y4VgJfbjfl

Compressor summary: CuTS is a novel framework for generating customizable synthetic tabular data that supports various requirements and outperforms existing approaches in several tasks.


Structured Inverse-Free Natural Gradient Descent: Memory-Efficient & Numerically-Stable KFAC

https://openreview.net/forum?id=Y2wRKE0Qor

Compressor summary: SINGD is a memory-efficient and numerically robust second-order method for neural net training that performs well even in low precision, addressing limitations of KFAC and outperforming AdamW.


Towards a Self-contained Data-driven Global Weather Forecasting Framework

https://openreview.net/forum?id=Y2WorV5ag6

Compressor summary: The paper proposes a new data-driven weather forecasting framework that combines an AI model with a traditional data assimilation algorithm to generate accurate and efficient global weather forecasts.


Prediction Accuracy of Learning in Games : Follow-the-Regularized-Leader meets Heisenberg

https://openreview.net/forum?id=Y0sH9HGMwq

Compressor summary: The paper studies how covariance information and uncertainty affects the accuracy of prediction in zero-sum games using deterministic learning dynamics and different discretization methods.


Sliced Wasserstein with Random-Path Projecting Directions

https://openreview.net/forum?id=XyxuhLtFA2

Compressor summary: The paper proposes an optimization-free method for slicing distribution selection that speeds up Monte Carlo estimation using random-path projecting direction and two variants of sliced Wasserstein.


Robust Learning-Augmented Dictionaries

https://openreview.net/forum?id=XyhgssAo5b

Compressor summary: RobustSL is a novel data structure that uses predictions of access frequencies to implement optimal and robust dictionaries with logarithmic runtime.


Realistic Unsupervised CLIP Fine-tuning with Universal Entropy Optimization

https://openreview.net/forum?id=XxCfToC9pJ

Compressor summary: The paper proposes Universal Entropy Optimization (UEO), a method that improves CLIP's performance on downstream tasks by adjusting textual prompts and visual transformations for out-of-distribution detection and known class recognition.


Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning

https://openreview.net/forum?id=XwnABAdH5y

Compressor summary: The paper proposes a reinforcement learning algorithm that helps language models become more trustworthy in retrieval augmentation without explicit supervision.


Generalization Analysis of Stochastic Weight Averaging with General Sampling

https://openreview.net/forum?id=XwVkqvyziD

Compressor summary: The paper analyzes stochastic weight averaging method's advantages over stochastic gradient descent in non-convex and realistic scenarios, using mathematical induction to derive stability and generalization bounds.


Probability Distribution of Hypervolume Improvement in Bi-objective Bayesian Optimization

https://openreview.net/forum?id=XvmooikuHE

Compressor summary: The paper derives an exact expression for the probability distribution of hypervolume improvement in bi-objective optimization using Gaussian process models and proposes a new acquisition function, $\varepsilon$-probability of hypervolume improvement, which outperforms other methods in high uncertainty settings.


Autoencoding Conditional Neural Processes for Representation Learning

https://openreview.net/forum?id=XuQPA4D396

Compressor summary: PPS-VAE is a model that learns which pixels to observe in order to improve contextual image completion and extract meaningful information from images.


Transferring Knowledge From Large Foundation Models to Small Downstream Models

https://openreview.net/forum?id=XtDJaSe8jE

Compressor summary: AFT adaptively transfers useful pre-trained features to small, task-specific downstream models, improving performance and allowing combination of multiple pre-trained models.


How Learning by Reconstruction Produces Uninformative Features For Perception

https://openreview.net/forum?id=XsDWw1Mn2p

Compressor summary: The paper discusses how input space reconstruction can lead to misalignment between learning to reconstruct and learning for perception, and suggests that learning by denoising might be a better approach.


Non-confusing Generation of Customized Concepts in Diffusion Models

https://openreview.net/forum?id=XoencoHWy7

Compressor summary: CLIF is a method to improve text-guided diffusion models for generating customized concepts by fine-tuning CLIP using contrastive learning.


Value-Evolutionary-Based Reinforcement Learning

https://openreview.net/forum?id=XobPpcN4yZ

Compressor summary: VEB-RL combines evolutionary algorithms with value-based reinforcement learning, enhancing sample efficiency and performance for RL optimization.


How to Make the Gradients Small Privately: Improved Rates for Differentially Private Non-Convex Optimization

https://openreview.net/forum?id=XoSF46Pc2e

Compressor summary: The text describes a framework to design differentially private algorithms that efficiently find approximate solutions for various classes of non-convex loss functions.


Pseudo-Calibration: Improving Predictive Uncertainty Estimation in Unsupervised Domain Adaptation

https://openreview.net/forum?id=XnsI1HKAKC

Compressor summary: PseudoCal is a novel post-hoc calibration framework for unsupervised domain adaptation that uses inference-stage mixup to generate labeled pseudo-target data, addressing the challenge of poorly calibrated predictive uncertainty on target data.


Generative Marginalization Models

https://openreview.net/forum?id=XmLNDlQuzO

Compressor summary: Marginalization models are a new type of generative model that can quickly and flexibly approximate arbitrary marginal probabilities for high-dimensional discrete data, using energy-based training to enable any-order generative modeling.


MAGNOLIA: Matching Algorithms via GNNs for Online Value-to-go Approximation

https://openreview.net/forum?id=XlgeQ47Ra9

Compressor summary: The paper proposes a graph neural network to solve online Bayesian bipartite matching problems, which estimate expected weights of matchings and leverage local graph structures.


A Closer Look at the Limitations of Instruction Tuning

https://openreview.net/forum?id=XkHJo8iXGQ

Compressor summary: The paper explores the limitations of Instruction Tuning (IT) for large language models, showing that it fails to enhance knowledge or skills and can even degrade response quality.


Neuro-Visualizer: A Novel Auto-Encoder-Based Loss Landscape Visualization Method With an Application in Knowledge-Guided Machine Learning

https://openreview.net/forum?id=XiemSZpvh0

Compressor summary: The paper introduces Neuro-Visualizer, a new auto-encoder-based method for visualizing neural network loss landscapes that is more flexible and accurate than existing linear methods and provides useful insights for knowledge-guided machine learning applications.


LeaPformer: Enabling Linear Transformers for Autoregressive and Simultaneous Tasks via Learned Proportions

https://openreview.net/forum?id=XhH1OKLANY

Compressor summary: LeaPformer is a new approach that uses dynamic proportions instead of fixed positions to improve linearized transformers' performance in various tasks.


Graph Structure Extrapolation for Out-of-Distribution Generalization

https://openreview.net/forum?id=Xgrey8uQhr

Compressor summary: The paper proposes a novel data augmentation method for graph out-of-distribution generalization that extrapolates structure spaces to generate unseen graph data without compromising causal mechanisms.


On the Minimal Degree Bias in Generalization on the Unseen for non-Boolean Functions

https://openreview.net/forum?id=Xeh8171Fce

Compressor summary: The study explores how random feature models and Transformers learn in different domain settings and shows that minimal degree interpolators are only learned in specific cases like the Boolean setting with roots of unities.


FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler

https://openreview.net/forum?id=XecUTmB9yD

Compressor summary: FedCal is a novel approach for local and global calibration in federated learning that improves prediction accuracy and reduces calibration error by leveraging client-specific scalers and weight averaging.


Position: Benchmarking is Limited in Reinforcement Learning Research

https://openreview.net/forum?id=Xe7n2ZqpBP

Compressor summary: The paper discusses the challenges of conducting reliable reinforcement learning experiments and suggests an alternative approach due to high computational costs.


On the Last-Iterate Convergence of Shuffling Gradient Methods

https://openreview.net/forum?id=Xdy9bjwHDu

Compressor summary: Shuffling gradient methods, a popular machine learning technique, have improved theoretical guarantees for their performance in different optimization settings.


Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF

https://openreview.net/forum?id=Xb3IXEBYuw

Compressor summary: The paper proposes a new algorithmic framework for solving bilevel RL problems with dynamic objective functions using penalty formulation and shows its effectiveness through simulations.


CW Complex Hypothesis for Image Data

https://openreview.net/forum?id=XXioxiADDC

Compressor summary: The authors propose a new hypothesis that image data is distributed in "manifolds with skeletons" to explain why diffusion models struggle to generate detailed shapes like human hands, and they support their hypothesis through visualization and testing on natural images.


Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms

https://openreview.net/forum?id=XWkRyIjYDp

Compressor summary: The paper analyzes the stability and generalization of stochastic compositional optimization algorithms using statistical learning theory, introducing a new concept called "compositional uniform stability" and deriving dimension-independent excess risk bounds for two popular algorithms.


Magicoder: Empowering Code Generation with OSS-Instruct

https://openreview.net/forum?id=XUeoOBid3x

Compressor summary: Magicoder is an open-source LLM for code that uses OSS-Instruct to create realistic and controllable synthetic instruction data, outperforming other models on various coding benchmarks.


Towards a Better Theoretical Understanding of Independent Subnetwork Training

https://openreview.net/forum?id=XUc29ydmLX

Compressor summary: The text discusses how Independent Subnetwork Training (IST) addresses communication and memory issues in large-scale machine learning by analyzing its optimization performance on a quadratic model.


Parameter-Efficient Fine-Tuning with Discrete Fourier Transform

https://openreview.net/forum?id=XUOHKSsurt

Compressor summary: FourierFT is a method that compresses trainable parameters in fine-tuning foundation models using the Fourier transform, achieving comparable or better performance than LoRA with fewer parameters.


Harmonic Self-Conditioned Flow Matching for joint Multi-Ligand Docking and Binding Site Design

https://openreview.net/forum?id=XTrMY9sHKF

Compressor summary: HarmonicFlow is a simple and general method to generate 3D protein-ligand binding structures, while FlowSite improves it further by jointly designing protein pockets and ligand structures for better binding site design.


Clifford-Steerable Convolutional Neural Networks

https://openreview.net/forum?id=XTglHJjzQI

Compressor summary: Clifford-Steerable Convolutional Neural Networks (CS-CNNs) are a new type of equivariant CNNs that work with multivector fields and outperform baseline methods in fluid dynamics and relativistic electrodynamics.


Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

https://openreview.net/forum?id=XT6iF8FDZx

Compressor summary: The paper investigates how policy gradient in reinforcement learning biases controllers towards unseen initial states based on exploration during training, and suggests selecting initial states for training to improve real-world performance.


Fair Classification with Partial Feedback: An Exploration-Based Data Collection Approach

https://openreview.net/forum?id=XSsoggg8pz

Compressor summary: The approach trains classifiers using available data and explores subpopulations to collect outcome data, ensuring fairness and convergence while minimally reducing predictive accuracy.


Network Tight Community Detection

https://openreview.net/forum?id=XQz7ytgETQ

Compressor summary: The text introduces a new community detection method (TCD) that identifies tight communities without including scattered nodes, which improves accuracy and reveals biological implications in networks.


Stability Evaluation through Distributional Perturbation Analysis

https://openreview.net/forum?id=XPP6K57bop

Compressor summary: The paper proposes a stability evaluation method using optimal transport that can handle data corruptions and sub-population shifts in real-world scenarios, and provides convex formulations and computational methods for different loss functions.


From Neurons to Neutrons: A Case Study in Interpretability

https://openreview.net/forum?id=XMlUlY7ONf

Compressor summary: MI allows us to understand how neural networks learn meaningful low-dimensional representations of complex data, which can reveal insights about the underlying problem and help derive new knowledge.


Test-Time Degradation Adaptation for Open-Set Image Restoration

https://openreview.net/forum?id=XLlQb24X2o

Compressor summary: The paper proposes a test-time adaptation framework for open-set image restoration that adapts to unknown degradations using a diffusion model and an adapter.


Bounding the Excess Risk for Linear Models Trained on Marginal-Preserving, Differentially-Private, Synthetic Data

https://openreview.net/forum?id=XKxuTZRCXq

Compressor summary: The paper proposes using differentially-private synthetic data to train ML models without revealing sensitive information from real data, and provides bounds on the risk of linear models trained on synthetic data.


Risk-Sensitive Reward-Free Reinforcement Learning with CVaR

https://openreview.net/forum?id=XGq30hC5MW

Compressor summary: The paper proposes a novel risk-sensitive exploration framework for reinforcement learning based on Conditional Value-at-Risk (CVaR), which works for any reward function and has near-optimal sample complexity.


Enforcing Constraints in RNA Secondary Structure Predictions: A Post-Processing Framework Based on the Assignment Problem

https://openreview.net/forum?id=XGGcnKelda

Compressor summary: The paper proposes a new algorithm for post-processing ML predictions on RNA secondary structures that ensures biological relevance and improves performance.


Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

https://openreview.net/forum?id=XDz9leJ9iK

Compressor summary: The paper discusses how foundation models could worsen existing disparities against marginalized communities in various aspects of AI and suggests ways to address these issues.


Enhancing Implicit Shape Generators Using Topological Regularizations

https://openreview.net/forum?id=XBNhJQU84y

Compressor summary: The paper proposes a method to fix topological artifacts in 3D shape generative models by using topological regularization losses on an implicit shape generator.


Mixtures of Experts Unlock Parameter Scaling for Deep RL

https://openreview.net/forum?id=X9VMhfFxwn

Compressor summary: The paper shows that using Soft MoE modules in value-based networks improves the scalability and performance of reinforcement learning models.


OT-CLIP: Understanding and Generalizing CLIP via Optimal Transport

https://openreview.net/forum?id=X8uQ1TslUc

Compressor summary: The authors analyze CLIP using Optimal Transport and propose new losses for image and text tasks, as well as a graph-based inference method for zero-shot classification and related subtasks.


Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency

https://openreview.net/forum?id=X8Ha2NiQcy

Compressor summary: Sparse Iso-FOP Transformations (Sparse-IFT) improves accuracy of dense neural network models by efficiently using sparsity to maintain FLOPs.


DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

https://openreview.net/forum?id=X7UnDevHOM

Compressor summary: Key points: - Pre-training improves efficiency and performance of neural operators for PDE data - New auto-regressive denoising pre-training strategy is stable, efficient, and generalizes well - Flexible and scalable model architecture based on Fourier attention allows large-scale pre-training - Achieve SOTA on 10+ PDE benchmarks and enhance performance on diverse downstream tasks Summary: The paper proposes a new pre-training strategy and a flexible model architecture for neural operators on PDE data, achieving state-of-the-art results and improving generalization to various tasks.


Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation

https://openreview.net/forum?id=WzD4a5ufN8

Compressor summary: This paper proposes new geometric representations and features for graph neural network-based computational fluid dynamics simulations, improving accuracy with low-resolution data.


Online Cascade Learning for Efficient Inference over Streams

https://openreview.net/forum?id=Wz4lgc8dsN

Compressor summary: Online cascade learning uses smaller models to imitate large language models, reducing inference costs while maintaining accuracy in data stream processing tasks.


Best Arm Identification for Stochastic Rising Bandits

https://openreview.net/forum?id=WwLtwPHmSM

Compressor summary: The paper studies sequential decision-making problems with improving options and proposes two algorithms to identify the best option within a fixed budget.


MoMo: Momentum Models for Adaptive Learning Rates

https://openreview.net/forum?id=WvvkbWD1vL

Compressor summary: The authors propose MoMo and MoMo-Adam, adaptive learning rates that improve performance and reduce tuning of momentum-based methods for various machine learning tasks.


Leveraging (Biased) Information: Multi-armed Bandits with Offline Data

https://openreview.net/forum?id=WvIHbQhrTq

Compressor summary: The paper proposes MIN-UCB, an online policy that uses offline data effectively when they are informative, and performs better than UCB in stochastic multi-armed bandits.


Exploring the LLM Journey from Cognition to Expression with Linear Representations

https://openreview.net/forum?id=WtvI3QijEF

Compressor summary: The paper analyzes how cognitive and expressive abilities in bilingual language models evolve during different phases of training, finding that cognitive capacity may limit expressive potential.


Position: Do pretrained Transformers Learn In-Context by Gradient Descent?

https://openreview.net/forum?id=WsawczEqO6

Compressor summary: Our study questions the connection between In-Context Learning (ICL) in pre-trained language models and Gradient Descent (GD), revealing inconsistencies in their behavior and output modifications, leaving the equivalence open.


Rethinking Decision Transformer via Hierarchical Reinforcement Learning

https://openreview.net/forum?id=WsM4TVsZpJ

Compressor summary: The authors present a hierarchical RL framework that improves Decision Transformer by enabling seamless stitching of sub-optimal trajectories, leading to better offline RL performance on control and navigation tasks.


Time Weaver: A Conditional Time Series Generation Model

https://openreview.net/forum?id=WpKDeixmFr

Compressor summary: TIME WEAVER is a novel model that uses heterogeneous metadata to improve time series generation and introduces a new evaluation metric for conditional generation approaches.


Watermark Stealing in Large Language Models

https://openreview.net/forum?id=Wp054bnPq9

Compressor summary: The text discusses the vulnerabilities of language model watermarking schemes, which can be bypassed by watermark stealing attacks that enable spoofing and scrubbing of AI-generated content.


Cross-domain Open-world Discovery

https://openreview.net/forum?id=WofwaWjIf7

Compressor summary: CROW is a prototype-based method that discovers and matches novel classes in cross-domain open-world settings using foundation models and a well-structured representation space.


Conditional Common Entropy for Instrumental Variable Testing and Partial Identification

https://openreview.net/forum?id=Wnni3cu39x

Compressor summary: This paper proposes a method to bound causal effects using instrumental variables under weak confounding and a criterion to falsify the IV with extra information on the confounder.


Universal Gradient Methods for Stochastic Convex Optimization

https://openreview.net/forum?id=Wnhp34K5jR

Compressor summary: The paper introduces universal gradient methods for stochastic convex optimization that adapt to noise and smoothness without prior knowledge and achieve state-of-the-art convergence rates.


Variational Inference with Coverage Guarantees in Simulation-Based Inference

https://openreview.net/forum?id=Wn4QwCrDvH

Compressor summary: CANVI is a scalable and easily implemented method for simulating posterior approximations with guaranteed marginal coverage and high predictive efficiency in likelihood-free settings.


Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

https://openreview.net/forum?id=WjvEvYTy3w

Compressor summary: HyperDistill combines morphology-conditioned hypernetworks and policy distillation to learn efficient robot policies that perform as well as a universal transformer teacher on various morphologies, reducing model size and computational cost significantly.


FedREDefense: Defending against Model Poisoning Attacks for Federated Learning using Model Update Reconstruction Error

https://openreview.net/forum?id=Wjq2bS7fTK

Compressor summary: FedREDefense is a method to defend against model poisoning attacks in federated learning by identifying and filtering out malicious clients based on discrepancies in their model update reconstruction errors.


From Generalization Analysis to Optimization Designs for State Space Models

https://openreview.net/forum?id=WjNzXeiOSL

Compressor summary: Key points: - SSM is a foundation model in time series analysis, alternative to transformers - Paper studies generalization of SSMs and proposes improvements to training algorithms - Data-dependent generalization bound for SSMs shows interplay between parameters and temporal dependencies - Scaling rule for initialization and new regularization method introduced based on generalization bound - Numerical results validate the proposed methods Summary: The paper analyzes the generalization of SSMs, a model in time series analysis, and proposes new training algorithms based on a data-dependent generalization bound that considers parameters and temporal dependencies.


Dissecting Multimodality in VideoQA Transformer Models by Impairing Modality Fusion

https://openreview.net/forum?id=Wj5wm3Os5v

Compressor summary: QUAG probes reveal that VideoQA Transformers may not effectively use multimodal information and CLAVI dataset challenges them further.


Decouple then Classify: A Dynamic Multi-view Labeling Strategy with Shared and Specific Information

https://openreview.net/forum?id=WfJuiIiFzB

Compressor summary: The paper proposes a dynamic labeling strategy for semi-supervised learning that uses shared and specific information to improve sample classification confidence and performance.


Graph Neural PDE Solvers with Conservation and Similarity-Equivariance

https://openreview.net/forum?id=WajJf47TUi

Compressor summary: This study introduces a generalizable machine learning architecture using graph neural networks that incorporates physical laws and symmetries to solve partial differential equations more reliably.


Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

https://openreview.net/forum?id=WYi3WKZjYe

Compressor summary: Audio Flamingo is a new audio language model that can understand diverse sounds, adapt to tasks quickly, and engage in multi-turn dialogues, achieving state-of-the-art results on various audio understanding tasks.


Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

https://openreview.net/forum?id=WXg6MJo1FH

Compressor summary: RLHF uses human feedback to teach AI models human values, but its reward model often deteriorates after one epoch; a new algorithm called 'Iterative Data Smoothing' improves it by updating data with soft labels instead of hard ones.


GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

https://openreview.net/forum?id=WWo9G5zyh0

Compressor summary: The paper presents GeoReasoner, a vision-language model enhanced with human inference knowledge that improves geo-localization performance significantly.


Criterion Collapse and Loss Distribution Control

https://openreview.net/forum?id=WVORGH73Cg

Compressor summary: The paper studies when optimizing one performance metric leads to optimizing another, focusing on error probability minimization under different learning criteria, and shows that non-monotonic criteria can prevent collapse in some cases.


Allocation Requires Prediction Only if Inequality Is Low

https://openreview.net/forum?id=WUicA0hOF9

Compressor summary: The authors propose an evaluation framework for predictive allocation systems in settings with hierarchical units and show that their efficacy is limited by between-unit inequality, intervention budget, and other factors.


One Size Fits All for Semantic Shifts: Adaptive Prompt Tuning for Continual Learning

https://openreview.net/forum?id=WUi1AqhKn5

Compressor summary: AdaPromptCL is a new method for continual learning that adapts to varying degrees of semantic shifts between tasks using assign-and-refine semantic grouping.


Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

https://openreview.net/forum?id=WUdq1WFUPr

Compressor summary: Cascade-CLIP aligns multi-level visual features with text embeddings using independent decoders to improve zero-shot semantic segmentation performance.


Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

https://openreview.net/forum?id=WUQ4YzIQt2

Compressor summary: The paper proposes a data selection method using k-means clustering and sensitivity sampling to efficiently train machine learning models with low error.


Gradient-based Visual Explanation for Transformer-based CLIP

https://openreview.net/forum?id=WT4X3QYopC

Compressor summary: Grad-ECLIP is a method to explain how CLIP, a vision-language model, matches image-text pairs by producing heat maps that show the influence of image regions or words on the results.


Helpful or Harmful Data? Fine-tuning-free Shapley Attribution for Explaining Language Model Predictions

https://openreview.net/forum?id=WSpPC1Jm0p

Compressor summary: The paper proposes FreeShap, a more robust and efficient method for instance attribution in foundational models, which can improve explainability and performance in various data-centric tasks.


A Subquadratic Time Algorithm for Robust Sparse Mean Estimation

https://openreview.net/forum?id=WSi4IiMaCx

Compressor summary: The paper proposes an efficient algorithm for robust sparse mean estimation that runs in subquadratic time using few samples.


Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

https://openreview.net/forum?id=WRIn2HmtBS

Compressor summary: The Hourglass Diffusion Transformer (HDiT) is a high-resolution image-generative model that uses the Transformer architecture and trains efficiently without special techniques.


Memorization Through the Lens of Curvature of Loss Function Around Samples

https://openreview.net/forum?id=WQbDS9RydY

Compressor summary: The paper proposes a curvature-based metric to measure memorization in deep neural networks and shows its effectiveness in detecting mislabeled samples and unique failures.


Active Label Correction for Semantic Segmentation with Foundation Models

https://openreview.net/forum?id=WPt9HRmMrG

Compressor summary: Key points: - Semantic segmentation requires pixel-wise annotations, which are hard to obtain and error-prone - The paper proposes an active label correction (ALC) framework that uses correction queries to fix pseudo labels of pixels - The method is more annotator-friendly than standard ones and leverages foundation models for zero-shot predictions on superpixels - The method improves semantic segmentation and label correction performance on various datasets, including PASCAL Summary: The paper presents an ALC framework that uses annotator-friendly correction queries to fix pixel-wise annotations for semantic segmentation, leveraging foundation models and superpixels.


Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training

https://openreview.net/forum?id=WPfYVdJHPk

Compressor summary: The paper discusses potential threats to decentralized machine learning training, proposes a poisoning attack, and suggests a robust training framework with detection and efficiency mechanisms.


Why Larger Language Models Do In-context Learning Differently?

https://openreview.net/forum?id=WOa96EG26M

Compressor summary: Larger language models are more sensitive to noise in the test context, while smaller ones are more robust to noise due to different attention mechanisms during in-context learning.


Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

https://openreview.net/forum?id=WLPhywf1si

Compressor summary: The paper proposes an unsupervised adversarial fine-tuning method to make the CLIP vision encoder more robust against attacks that could spread fake information or defraud users in multi-modal foundation models like LLaVA and OpenFlamingo.


Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation

https://openreview.net/forum?id=WLGWMDtj8L

Compressor summary: COREP is a new algorithm that addresses complex non-stationarity in reinforcement learning by implicitly tracing the causal origin of changes in the environment and learning a stable graph representation for the state, leading to improved policy learning.


Robust Graph Matching when Nodes are Corrupt

https://openreview.net/forum?id=WJn1BAx9aj

Compressor summary: The text introduces two models for matching graphs with corrupt nodes and studies their detection and estimation properties.


Breaking the Barrier: Enhanced Utility and Robustness in Smoothed DRL Agents

https://openreview.net/forum?id=WJ5fJhwvCl

Compressor summary: The study introduces S-DQN and S-PPO, novel algorithms that improve clean rewards, empirical robustness, and robustness guarantee in deep reinforcement learning with randomized smoothing.


Linear Explanations for Individual Neurons

https://openreview.net/forum?id=WIbntm28cM

Compressor summary: The paper proposes a new way to understand the function of individual neurons in neural networks by explaining them as linear combinations of concepts and evaluating their explanations using simulations.


An Empirical Study of Realized GNN Expressiveness

https://openreview.net/forum?id=WIaZFk02fI

Compressor summary: The authors propose a new dataset, BREC, to measure the realized expressiveness of Graph Neural Networks (GNNs) that surpass the 1-dimensional Weisfeiler-Lehman test and show the gap between theory and practice in GNN expressiveness.


Language Models as Science Tutors

https://openreview.net/forum?id=WFyolnFZOR

Compressor summary: TutorEval and TutorChat are new NLP datasets for training language models to assist in STEM education using long textbook chapters and dialogues.


Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

https://openreview.net/forum?id=WDgV1BJEW0

Compressor summary: Graph Sparse Training (GST) is a new method that dynamically adjusts the sparsity of graphs to optimize performance on Graph Neural Networks (GNNs) while preserving topological and semantic information.


Agnostic Interactive Imitation Learning: New Theory and Practical Algorithms

https://openreview.net/forum?id=WCwxFM7n5S

Compressor summary: The paper proposes MFTPL-P, an oracle-efficient algorithm for interactive imitation learning with provable guarantees and wide applicability, and Bootstrap-DAgger, a practical variant without extra samples.


GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks

https://openreview.net/forum?id=WCVC5wGZyz

Compressor summary: Example Gisting is a novel approach for selecting informative examples that improves In-Context Learning performance in Large Language Models, and can be used for new tasks without additional training.


Learning High-Order Relationships of Brain Regions

https://openreview.net/forum?id=WC14xZIaC2

Compressor summary: HyBRiD is a novel method that extracts maximally informative and minimally redundant high-order relationships from fMRI data to improve phenotypic predictions in neuroscience.


Time Series Diffusion in the Frequency Domain

https://openreview.net/forum?id=W9GaJUVLCT

Compressor summary: The paper explores using Fourier analysis as an inductive bias for score-based diffusion models, showing that frequency diffusion models outperform time diffusion models on real-world datasets.


Spectral Phase Transition and Optimal PCA in Block-Structured Spiked Models

https://openreview.net/forum?id=W97gFmrKe6

Compressor summary: The paper investigates an inhomogeneous Wigner spike model to study structured noise and finds an optimal spectral method for detecting signal-noise separation.


Noise-Adaptive Confidence Sets for Linear Bandits and Application to Bayesian Optimization

https://openreview.net/forum?id=W8hBNk1FhQ

Compressor summary: The paper proposes a new semi-adaptive and variance-adaptive confidence set for linear bandits, which improves exploration efficiency when the noise level is unknown.


Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?

https://openreview.net/forum?id=W7Vqx1Jvc2

Compressor summary: The paper critiques the current machine learning research on Timeseries Anomaly Detection, highlighting issues with evaluation metrics and methods, and advocating for improved benchmarking practices and simpler models.


FlowMM: Generating Materials with Riemannian Flow Matching

https://openreview.net/forum?id=W4pB7VbzZI

Compressor summary: FlowMM is a pair of generative models for predicting and proposing stable crystal structures using Riemannian Flow Matching with extended symmetries and flexibility, achieving state-of-the-art performance and efficiency in comparison to other methods.


Generalization Analysis for Multi-Label Learning

https://openreview.net/forum?id=W4mLp5KuKl

Compressor summary: Key points: - Generalization theory of multi-label learning is still in early stage - Novel vector-contraction inequalities are developed to derive generalization bounds - Bounds depend weakly on the number of labels and capture label correlations - Macro-Averaged AUC bound is derived and analyzed with class-imbalance Summary: The paper develops novel inequalities to analyze and improve the generalization of multi-label learning, considering label correlations and class-imbalance.


In-Context Freeze-Thaw Bayesian Optimization for Hyperparameter Optimization

https://openreview.net/forum?id=VyoY3Wh9Wd

Compressor summary: FT-PFN is a fast and accurate surrogate for Freeze-thaw Bayesian optimization, achieving state-of-the-art results in deep learning hyperparameter tuning.


Graph Neural Networks with a Distribution of Parametrized Graphs

https://openreview.net/forum?id=VyfEv6EjKR

Compressor summary: The paper proposes a method to improve graph neural networks by generating multiple graphs from latent variables and estimating their distribution using EM and MCMC techniques, leading to better node classification.


Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

https://openreview.net/forum?id=VyGo1S5A6d

Compressor summary: P4D is a tool that tests the reliability of safety mechanisms in diffusion models by finding problematic prompts that can bypass them.


Symmetric Matrix Completion with ReLU Sampling

https://openreview.net/forum?id=VxI0gInNlh

Compressor summary: The paper studies symmetric positive semi-definite low-rank matrix completion with entry-dependent sampling and proposes a tailor-designed initialization for gradient descent to achieve global optimality.


Self-Consistency Training for Density-Functional-Theory Hamiltonian Prediction

https://openreview.net/forum?id=Vw4Yar2fmW

Compressor summary: The authors propose self-consistency training, a machine learning method for predicting molecular Hamiltonians in density functional theory without labeled data, which improves generalization and efficiency in data-scarce and out-of-distribution scenarios.


Automating the Selection of Proxy Variables of Unmeasured Confounders

https://openreview.net/forum?id=VuoB86HiCL

Compressor summary: The paper proposes methods to estimate causal effects using proxy variables of unobserved confounding in linear models with multiple unmeasured confounders, without prior knowledge of their validity.


Guidance with Spherical Gaussian Constraint for Conditional Diffusion

https://openreview.net/forum?id=VtqyurB4Af

Compressor summary: Diffusion with Spherical Gaussian constraint (DSG) is a method that improves conditional diffusion models by constraining guidance steps within data manifold, leading to better sample quality and faster sampling process.


BAGEL: Bootstrapping Agents by Guiding Exploration with Language

https://openreview.net/forum?id=VsvfSMI5bs

Compressor summary: BAGEL is a method that helps language model agents learn to follow natural language instructions in digital environments by converting random trajectories into synthetic demonstrations using two noisy components, improving their performance and reducing failures.


Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning

https://openreview.net/forum?id=VrwIrAa1Lc

Compressor summary: Random Masking is a simple and effective method to fine-tune language models using fewer parameters by creating a flatter loss landscape and allowing larger learning rates.


Towards Realistic Model Selection for Semi-supervised Learning

https://openreview.net/forum?id=VoMPNYTZud

Compressor summary: SSL models use less labeled data and are hard to evaluate with validation sets, so a new method called SLAM combines spectral complexity and margin distribution to improve their generalization performance without relying on validation data.


Auctionformer: A Unified Deep Learning Algorithm for Solving Equilibrium Strategies in Auction Games

https://openreview.net/forum?id=VnI9200eeL

Compressor summary: Auctionformer is an efficient transformer-based method that solves equilibria of diverse auctions using tokenization and Nash error as a loss term, outperforming existing approaches.


Position: Will we run out of data? Limits of LLM scaling based on human-generated data

https://openreview.net/forum?id=ViZcgDQjyG

Compressor summary: The paper explores the potential limitations of large language models due to the limited availability of public human-generated text data and suggests alternative methods to continue improving them.


Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

https://openreview.net/forum?id=VfWrXJtLSL

Compressor summary: The paper studies pure private learning in the agnostic model, deriving improved upper bounds for item- and user-level privacy and presenting tighter results for user-level privacy and learning thresholds.


Reinforcement Learning and Regret Bounds for Admission Control

https://openreview.net/forum?id=Vdr87ZUfnl

Compressor summary: The paper analyzes the regret of reinforcement learning algorithms for an admission control problem in queuing systems and proposes a new algorithm with improved bounds.


Towards Neural Architecture Search through Hierarchical Generative Modeling

https://openreview.net/forum?id=VdZfEMuoj2

Compressor summary: The proposed method generates a tailored search space for Neural Architecture Search using a two-level generative model hierarchy, achieving state-of-the-art performance with low computational costs on various tasks.


A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes

https://openreview.net/forum?id=VaZVZQSgTP

Compressor summary: The paper proposes a new single-loop robust policy gradient method for solving Markov Decision Processes with global optimality guarantee and better convergence performance than existing methods.


VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model

https://openreview.net/forum?id=Va7mhTVy5s

Compressor summary: VoroNav is a novel semantic exploration framework for household robots that uses Reduced Voronoi Graph to generate text-based descriptions of paths, enabling commonsense reasoning and outperforming existing methods in ZSON task.


Rényi Pufferfish Privacy: General Additive Noise Mechanisms and Privacy Amplification by Iteration via Shift Reduction Lemmas

https://openreview.net/forum?id=VZsxhPpu9T

Compressor summary: The paper proposes a Rényi divergence-based variant of Pufferfish to improve its applicability, utility, and composition guarantees for privacy preservation in machine learning algorithms.


Codebook Features: Sparse and Discrete Interpretability for Neural Networks

https://openreview.net/forum?id=VZ5A0LPbnc

Compressor summary: The paper proposes codebook features, which quantize continuous neural network features into discrete vector codes, enabling interpretability and control over the model's behavior.


Discrete Latent Perspective Learning for Segmentation and Detection

https://openreview.net/forum?id=VWCpm39peL

Compressor summary: The paper presents a novel framework called Discrete Latent Perspective Learning (DLPL), which enables networks to understand images from different perspectives using single-view images and improves performance on various computer vision tasks.


Non-parametric Online Change Point Detection on Riemannian Manifolds

https://openreview.net/forum?id=VW7Jk8KhNC

Compressor summary: The paper introduces an online algorithm for detecting changes in data from Riemannian manifolds using a generalized Karcher mean computed by stochastic Riemannian optimization, and provides theoretical and empirical performance analysis.


Simplicity Bias via Global Convergence of Sharpness Minimization

https://openreview.net/forum?id=VUTyzH63Xa

Compressor summary: Label noise SGD makes two-layer neural networks simple by always using a single linear feature across all neurons.


Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming

https://openreview.net/forum?id=VSwrXRqD9o

Compressor summary: The authors propose a method to solve dynamic programming problems with variational Bayesian inference, Gibbs distributions, and message-passing algorithms, and apply it to generate text-to-speech and singing voice.


Equivariant Diffusion for Crystal Structure Prediction

https://openreview.net/forum?id=VRv8KjJNuj

Compressor summary: EquiCSP is a new equivariant diffusion-based model for Crystal Structure Prediction (CSP) that improves accuracy and convergence compared to existing methods.


Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

https://openreview.net/forum?id=VOcsmIBiXE

Compressor summary: PiT is a more efficient and interpretable attention mechanism for Transformer-based operator learning, inspired by numerical methods for PDEs.


Rate-Optimal Policy Optimization for Linear Markov Decision Processes

https://openreview.net/forum?id=VJwsDwuiuH

Compressor summary: The paper presents a fast and efficient policy optimization algorithm that achieves the optimal regret bound in both stochastic and adversarial online Markov decision processes.


Second-Order Uncertainty Quantification: A Distance-Based Approach

https://openreview.net/forum?id=VJjjNrUi8j

Compressor summary: The text discusses challenges in measuring predictive uncertainty using second-order probability distributions and proposes formal criteria and a framework for developing better uncertainty measures, using the Wasserstein distance as an example.


Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers

https://openreview.net/forum?id=VHtIDVaOKC

Compressor summary: The paper proposes Mobile-Attention, a novel attention mechanism for ViTs that balances efficiency and capability on mobile devices by using a head-competition mechanism and information flow to prevent overemphasis on less important subspaces while preserving essential ones.


Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

https://openreview.net/forum?id=VHO4nE7v41

Compressor summary: MeZO-SVRG is a memory-efficient optimization method for fine-tuning language models, improving accuracy and reducing computation time compared to previous methods.


Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

https://openreview.net/forum?id=VF177x7Syw

Compressor summary: The study proposes Hare \& Tortoise, a method for neural networks that combines rapid adaptation and gradual knowledge integration to balance plasticity and generalization.


Stealing part of a production language model

https://openreview.net/forum?id=VE3yWXt3KB

Compressor summary: The text describes a method to extract information from black-box language models by exploiting their API access, revealing hidden dimensions and projection matrices.


On the Convergence of Projected Bures-Wasserstein Gradient Descent under Euclidean Strong Convexity

https://openreview.net/forum?id=VDgfJnOEMV

Compressor summary: The paper proposes a general convergence rate guarantee for BW gradient descent with constraints and provides a fast implementation method for constrained problems, showing improved performance in experiments.


A Neural-Preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions

https://openreview.net/forum?id=VAKkoJjVpn

Compressor summary: The authors propose a novel neural network preconditioner for Poisson equations with mixed boundary conditions that works efficiently even when domain shapes, boundary conditions, or grid sizes change and outperforms existing methods.


An Embodied Generalist Agent in 3D World

https://openreview.net/forum?id=V4qV08Vk6S

Compressor summary: LEO is a multi-modal agent that excels in various 3D tasks, trained with 3D vision-language and -action alignment using large-scale datasets and LLMs.


Provable Contrastive Continual Learning

https://openreview.net/forum?id=V3ya8RlbrW

Compressor summary: The authors provide theoretical insights into the performance of contrastive continual learning, propose a novel algorithm called CILA that uses adaptive distillation coefficients, and achieve state-of-the-art results on standard benchmarks.


Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models

https://openreview.net/forum?id=V3OpGwo68Z

Compressor summary: Flash-Diffusion is a novel method that adapts compute power and inference times to the difficulty of reconstruction tasks in inverse problems using estimated degradation severities.


No Free Prune: Information-Theoretic Barriers to Pruning at Initialization

https://openreview.net/forum?id=Uzb45nolTb

Compressor summary: This text discusses how lottery tickets in deep learning are sparse subnetworks that require heavy data-dependent masks and pruning near initialization is infeasible due to mutual information between sparsity mask and data.


Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations

https://openreview.net/forum?id=Uz4Qr40Y3C

Compressor summary: Connect Later framework combines pretraining with targeted augmentations to improve out-of-distribution generalization for domain adaptation tasks.


Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

https://openreview.net/forum?id=UqoG0YRfQx

Compressor summary: EWoK is an online approach that learns robust policies for sequential decision-making by simulating the worst transition kernel scenarios while using any off-the-shelf non-robust RL algorithm.


Arrows of Time for Large Language Models

https://openreview.net/forum?id=UpSe7ag34v

Compressor summary: The study investigates time directionality in Autoregressive LLMs and finds a consistent but subtle difference in their predictability, which they explain using sparsity and computational complexity.


Do Transformer World Models Give Better Policy Gradients?

https://openreview.net/forum?id=Uoved2xD81

Compressor summary: Key points: - The paper proposes Action-conditioned World Models (AWMs) for reinforcement learning with transformers - AWMs provide more direct routes for gradient propagation and easier optimization landscapes - AWMs outperform baselines in long-horizon tasks Summary: The paper introduces AWMs, a class of world models that use transformers and condition on actions to improve gradient propagation and policy learning in long-horizon reinforcement learning problems.


Explain Temporal Black-Box Models via Functional Decomposition

https://openreview.net/forum?id=Uo3LNg5SLY

Compressor summary: FDTempExplainer is a novel explanation method for black-box time series models that reveals temporal interactions and outperforms existing approaches.


On dimensionality of feature vectors in MPNNs

https://openreview.net/forum?id=UjDp4Wkq2V

Compressor summary: The paper shows that, for non-polynomial activation functions, MPNNs with constant-dimensional feature vectors can simulate the WL test, by using linear independence over rationals instead of reals.


Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

https://openreview.net/forum?id=Uh5XN9d2J4

Compressor summary: This paper proposes reconstruction granularity as a novel solution to mitigate outliers in quantized transformer models and develops an algorithm for finding the optimal granularity, achieving state-of-the-art performance in post-training quantization.


ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis

https://openreview.net/forum?id=Ug1m4P4AKf

Compressor summary: The paper presents a new method for modeling multiple speakers, improving their performance on feature learning and representation, enabling subjective similarity evaluation and generation of artificial speakers.


Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach

https://openreview.net/forum?id=UdXDUDxq11

Compressor summary: The paper introduces a new reinforcement learning approach based on differential game theory that improves robustness to uncertainty and outperforms existing methods in real-world applications.


Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

https://openreview.net/forum?id=UcOze9EXEc

Compressor summary: DFML can adapt to new tasks without data by using pre-trained models, but it needs to consider model heterogeneity and balance task conflicts and overfitting risk for effective learning.


Initial Guessing Bias: How Untrained Networks Favor Some Classes

https://openreview.net/forum?id=UZstTlLq1E

Compressor summary: The paper analyzes how the structure of a deep neural network can lead to all predictions being assigned to the same class before training, influenced by factors like preprocessing methods, activation functions, and network depth.


Irregular Multivariate Time Series Forecasting: A Transformable Patching Graph Neural Networks Approach

https://openreview.net/forum?id=UZlMXUGI6e

Compressor summary: This study introduces t-PatchGNN, a novel method to model correlations between irregular multivariate time series using transformable patches and adaptive graphs for improved forecasting.


Active Ranking and Matchmaking, with Perfect Matchings

https://openreview.net/forum?id=UZZaWUR0n4

Compressor summary: The text discusses a novel algorithm that efficiently ranks items/players with varying values using optimal matching and minimizing comparison costs.


Rethinking Momentum Knowledge Distillation in Online Continual Learning

https://openreview.net/forum?id=UW5nO9NGjt

Compressor summary: The paper proposes and analyzes Momentum Knowledge Distillation (MKD) for Online Continual Learning (OCL), a challenging problem where neural networks are trained on a continuous data stream with severe constraints, and shows its effectiveness in improving accuracy.


Graph Positional and Structural Encoder

https://openreview.net/forum?id=UTSCK582Yo

Compressor summary: The Graph Positional and Structural Encoder (GPSE) is a novel graph encoder that captures rich positional and structural information for augmenting any GNN, improving their performance across various tasks and datasets.


WAVES: Benchmarking the Robustness of Image Watermarks

https://openreview.net/forum?id=URtUYfC3GA

Compressor summary: WAVES is a benchmark for evaluating image watermark robustness against various attacks, exposing vulnerabilities in current algorithms.


Probabilistic Forecasting with Stochastic Interpolants and Föllmer Processes

https://openreview.net/forum?id=UQYXZdca92

Compressor summary: The paper proposes a method to generate probabilistic forecasts of dynamical systems using generative models and stochastic interpolants, and applies it to various problems like fluid dynamics and video prediction.


Towards General Algorithm Discovery for Combinatorial Optimization: Learning Symbolic Branching Policy from Bipartite Graph

https://openreview.net/forum?id=ULleq1Dtaw

Compressor summary: The paper proposes a novel framework that learns interpretable branching policies for exact combinatorial optimization solvers using graph symbolic discovery and Transformer models, outperforming existing methods.


Convergence of Online Learning Algorithm for a Mixture of Multiple Linear Regressions

https://openreview.net/forum?id=ULKvSqmSgA

Compressor summary: The paper proposes an online learning algorithm for MLR with multiple sub-models and arbitrary mixing weights, which can converge without i.i.d. input data and achieve good data clustering performance.


Collaborative Learning with Different Labeling Functions

https://openreview.net/forum?id=UKHfmzLR7P

Compressor summary: The paper proposes a variant of Collaborative PAC Learning that can learn accurate classifiers for each data distribution using fewer samples, under a weaker realizability assumption.


A Persuasive Approach to Combating Misinformation

https://openreview.net/forum?id=UIxOkdBmxh

Compressor summary: Bayesian Persuasion uses machine learning to predict popularity and misinformation features of posts, then strategically signals this advantage to users to prevent them from sharing false information on social media platforms.


The Linear Representation Hypothesis and the Geometry of Large Language Models

https://openreview.net/forum?id=UGpGkLzwpP

Compressor summary: The paper formalizes the "linear representation hypothesis" using counterfactuals and introduces a causal inner product that unifies geometric notions in concept representation space.


Is Kernel Prediction More Powerful than Gating in Convolutional Neural Networks?

https://openreview.net/forum?id=UE79AkNg60

Compressor summary: The paper explores the relationships between different types of neural network components (HyperNetworks, weight prediction, gating, and kernel prediction) and shows how they can be combined for image denoising.


Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization

https://openreview.net/forum?id=UCKFhc9SFC

Compressor summary: Volume-MCTS is a tree search algorithm for robot navigation that combines policy optimization with state-occupancy measure regularization, leading to improved long-horizon exploration compared to AlphaZero.


Auto-Encoding Morph-Tokens for Multimodal LLM

https://openreview.net/forum?id=U97MIrs35l

Compressor summary: The paper proposes using morph-tokens to handle visual input for both textual comprehension and image generation in multimodal LLMs, achieving state-of-the-art performance in both tasks.


Configurable Mirror Descent: Towards a Unification of Decision Making

https://openreview.net/forum?id=U841CrDUx9

Compressor summary: The text proposes a general algorithm for various decision-making problems and presents three contributions, including a generalized mirror descent, a configurable mirror descent with a meta-controller, and a GameBench with 15 academic-friendly games.


Exponential Spectral Pursuit: An Effective Initialization Method for Sparse Phase Retrieval

https://openreview.net/forum?id=U4Yvwu1RQY

Compressor summary: The paper introduces exponential spectral pursuit (ESP), a new method for initializing sparse phase retrieval that has better sampling complexity and performance than current methods.


Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals

https://openreview.net/forum?id=U1uKihiG39

Compressor summary: This paper studies how to adapt to causal structure in multi-armed bandits, achieving better performance when possible and ensuring worst-case guarantees, while addressing open questions and assumptions in the field.


Can Machines Learn the True Probabilities?

https://openreview.net/forum?id=TzqmqZS0nj

Compressor summary: The text discusses how AI machines use probabilistic machine learning to make decisions based on true facts in their environment and investigates conditions for learning these probabilities.


How to Trace Latent Generative Model Generated Images without Artificial Watermark?

https://openreview.net/forum?id=TwZ2sY6eJj

Compressor summary: The paper proposes LatentTracer, a method to trace images generated by latent generative models without extra steps during training or generation, based on gradient-based latent inversion and encoder-based initialization.


Gaussian Plane-Wave Neural Operator for Electron Density Estimation

https://openreview.net/forum?id=TvoG41N1Y3

Compressor summary: The paper presents a new machine learning method for predicting electron density in chemical systems using plane-wave and Gaussian-type orbitals, outperforming existing methods.


Online Matching with Stochastic Rewards: Provable Better Bound via Adversarial Reinforcement Learning

https://openreview.net/forum?id=TujtZgdRxB

Compressor summary: The paper explores using adversarial reinforcement learning to improve algorithms for an online bipartite matching problem, and discovers structural properties that reduce the hardness of the problem.


MS$^3$D: A RG Flow-Based Regularization for GAN Training with Limited Data

https://openreview.net/forum?id=TuALw8xVum

Compressor summary: The text proposes a new regularization method for training GANs with limited data, based on renormalization group ideas, which improves their performance and stability.


Rethinking DP-SGD in Discrete Domain: Exploring Logistic Distribution in the Realm of signSGD

https://openreview.net/forum?id=TtSFg4s3F0

Compressor summary: The text discusses the need for efficient privacy protection in deep neural networks using a logistic mechanism instead of Gaussian noise.


PinNet: Pinpoint Instructive Information for Retrieval Augmented Code-to-Text Generation

https://openreview.net/forum?id=TqcZfMZjgM

Compressor summary: PinNet is a framework for generating high-quality code descriptions using retrieval augmentation, a discriminator to measure the relevance of retrieved references, and contrastive learning to enhance attention weights.


Harnessing the Power of Neural Operators with Automatically Encoded Conservation Laws

https://openreview.net/forum?id=ToHkAg936Y

Compressor summary: ClawNOs are neural operators that automatically satisfy fundamental conservation laws for various scientific applications, improving learning efficiency and physical consistency.


Position: LLMs Can’t Plan, But Can Help Planning in LLM-Modulo Frameworks

https://openreview.net/forum?id=Th8JPEmH4z

Compressor summary: The authors propose a framework that integrates large language models with symbolic reasoning components for better planning and reasoning tasks.


Testing the Feasibility of Linear Programs with Bandit Feedback

https://openreview.net/forum?id=TfwGtfPkhV

Compressor summary: The paper studies how to test if a linear bandit problem is feasible using low-regret algorithms and provides reliable and efficient tests with minimax lower bounds.


LPGD: A General Framework for Backpropagation through Embedded Optimization Layers

https://openreview.net/forum?id=TfWKkSAziC

Compressor summary: LPGD is a framework for training machine learning architectures with embedded optimization layers, which computes meaningful replacements of degenerate derivatives by re-running the forward solver on perturbed input and converges faster than gradient descent.


Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

https://openreview.net/forum?id=TejqrQBvll

Compressor summary: This paper proposes a causal machine learning theory with generalization bounds, using a new inequality to measure treatment propensities and handling hidden confounding and non-positive data.


Data Engineering for Scaling Language Models to 128K Context

https://openreview.net/forum?id=TaAqeo7lUh

Compressor summary: Key points: - Study how to scale language models' context lengths to 128K with continual pretraining on data mixture - Hypothesize that long context modeling is mostly achieved through large-scale pretraining and can be extended with lightweight continual pretraining - Investigate the quantity and quality of data for continual pretraining, emphasizing domain balance and length upsampling - Show that 1B-5B tokens are enough for retrieving information anywhere within 128K context - Recipe outperforms open-source long-context models and approaches GPT-4 128K Summary: The paper proposes a continual pretraining recipe to scale language models' context lengths to 128K using data mixture, with a focus on data quality and quantity. The method uses 1B-5B tokens and achieves competitive performance to GPT-4.


A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

https://openreview.net/forum?id=TWu1fzFJm0

Compressor summary: The text discusses how adjusting the learning rate in deep neural networks can enable feature learning and improve non-linear function recognition.


Inverse-Variance Weighting for Estimation of Heterogeneous Treatment Effects

https://openreview.net/forum?id=TUKOklS3gg

Compressor summary: The choice of weights in pseudo-outcome regressions is more important than the transformation for estimating conditional average treatment effects.


Multi-Agent Reinforcement Learning with Hierarchical Coordination for Emergency Responder Stationing

https://openreview.net/forum?id=TTZXl9WYFF

Compressor summary: The text describes a novel reinforcement learning approach for optimizing emergency responder management that significantly reduces computation time and improves ambulance response times.


OSN: Infinite Representations of Dynamic 3D Scenes from Monocular Videos

https://openreview.net/forum?id=TTYVG17wfc

Compressor summary: The paper introduces OSN, a framework to learn all plausible 3D scene configurations from a monocular RGB video using an object scale network and a joint optimization module.


Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

https://openreview.net/forum?id=TRrXkVdhwi

Compressor summary: CHELA combines short-long convolutions with hardware-efficient linear attention to achieve global abstraction, data-dependent selection, and linear computational complexity for long sequences.


Tandem Transformers for Inference Efficient LLMs

https://openreview.net/forum?id=TN3fi7dwPo

Compressor summary: Tandem transformers combine a small autoregressive model and a large model in block mode to improve inference speed and accuracy for language models.


Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

https://openreview.net/forum?id=TK7xkOsXDu

Compressor summary: Key points: - The paper introduces Hierarchical State-Space models (HiSS) for continuous sequential prediction from raw sensory data. - HiSS stacks structured state-space models to create a temporal hierarchy and outperforms existing sequence models. - HiSS is efficient on small datasets and compatible with data-filtering techniques. Summary: The paper proposes HiSS, a novel technique for predicting physical quantities from raw sensory data using a temporal hierarchy of structured state-space models that beats current methods and works well on limited data and filtering.


diff History for Neural Language Agents

https://openreview.net/forum?id=TJCUrzhbiH

Compressor summary: Diff history is a method to simplify and focus textual inputs for neural language models in embodied control tasks, leading to improved performance and reduced training data requirements.


Energy-based Backdoor Defense without Task-Specific Samples and Model Retraining

https://openreview.net/forum?id=TJ6tVNt6Y4

Compressor summary: The paper proposes two energy-based methods for detecting and removing backdoors in machine learning models with low overhead, called EBBA and EBBA+.


Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

https://openreview.net/forum?id=THPjMr2r0S

Compressor summary: The paper proposes and benchmarks various zeroth-order optimization techniques for memory-efficient fine-tuning of large language models, revealing new principles and introducing novel enhancements.


PARCv2: Physics-aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics Modeling

https://openreview.net/forum?id=T0zR4mdSce

Compressor summary: PARCv2 is a versatile and generalizable deep learning model that can simulate complex physical systems involving unsteady, transient, and advection-dominated dynamics.


Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

https://openreview.net/forum?id=T0lFfO8HaK

Compressor summary: The paper proposes a sparse inversion method for high-resolution images that selectively reconstructs semantic foregrounds, skipping noisy backgrounds and spurious correlations, to accelerate model inversion while maintaining or improving downstream performance.


Optimally Improving Cooperative Learning in a Social Setting

https://openreview.net/forum?id=Sz9mAYuqlE

Compressor summary: Key points: - Cooperative learning scenario with networked agents updating predictions through communication or observations - Goal: optimize a few classifiers to maximize overall accuracy in the network - Objective functions: aggregate and egalitarian - Results: polynomial time algorithm for aggregate, NP-hard for egalitarian, approximation algorithms for improvement, guaranteed performance Summary: The paper studies how to improve the accuracy of cooperative learning agents by optimizing a few classifiers using different objective functions, and presents efficient algorithms with proven guarantees.


Rethinking Transformers in Solving POMDPs

https://openreview.net/forum?id=SyY7ScNpGL

Compressor summary: The paper questions Transformers' effectiveness in partially observable environments and suggests using Deep Linear Recurrent Unit (LRU) instead.


Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

https://openreview.net/forum?id=SvvvB5t5EW

Compressor summary: The paper analyses different contrastive learning methods and shows they all optimize for the same problem related to hyperspherical energy minimization; it introduces a new objective called Decoupled Hyperspherical Energy Loss (DHEL) that simplifies the problem and improves performance and robustness on computer vision tasks.


MLI Formula: A Nearly Scale-Invariant Solution with Noise Perturbation

https://openreview.net/forum?id=SvBLKoBL4q

Compressor summary: The paper studies Monotonic Linear Interpolation (MLI) and shows that its error decrease is mainly due to the converged model's property, not the optimization trajectory. It also identifies different scale invariance properties of the converged model.


Generalizing Knowledge Graph Embedding with Universal Orthogonal Parameterization

https://openreview.net/forum?id=Sv4u9PtvT5

Compressor summary: The paper introduces GoldE, a framework for knowledge graph embedding that can handle different dimensions and geometries using a generalized Householder reflection parameterization, leading to improved modeling capability and performance.


Wasserstein Wormhole: Scalable Optimal Transport Distance with Transformer

https://openreview.net/forum?id=Su0qe33cWA

Compressor summary: Wasserstein Wormhole is a neural network that embeds distributions into a space where Euclidean distances approximate Wasserstein distances, enabling fast and scalable analysis of non-Euclidean data.


ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models

https://openreview.net/forum?id=Stn8hXkpe6

Compressor summary: ReMax is a simpler and more efficient reinforcement learning algorithm for large language models that leverages their properties and outperforms PPO.


Absolute Policy Optimization: Enhancing Lower Probability Bound of Performance with High Confidence

https://openreview.net/forum?id=Ss3h1ixJAU

Compressor summary: The paper introduces a new reinforcement learning objective function that guarantees monotonic improvement in the lower probability bound of performance and develops two practical solutions (APO and PAPO) that significantly outperform state-of-the-art algorithms on continuous control tasks and Atari games.


An Information Theoretic Approach to Interaction-Grounded Learning

https://openreview.net/forum?id=Sra298VMFM

Compressor summary: The paper proposes a new method called VI-IGL that uses information theory to enforce conditional independence assumptions and improve feedback-based reinforcement learning tasks.


Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games

https://openreview.net/forum?id=SoqxSnEUi1

Compressor summary: Albatross is a new algorithm that learns to play simultaneous games by modeling other agents' behavior and can cooperate or compete with agents of any strength, achieving better results than AlphaZero in some cases.


Position: Compositional Generative Modeling: A Single Model is Not All You Need

https://openreview.net/forum?id=SoNexFx8qz

Compressor summary: The paper proposes using smaller generative models together instead of large monolithic ones, which leads to more data-efficient learning, generalization to unseen data, and the ability to create new models for unknown tasks.


Nonsmooth Implicit Differentiation: Deterministic and Stochastic Convergence Rates

https://openreview.net/forum?id=SlRcJvf1yd

Compressor summary: The paper analyzes different methods to compute derivatives of non-differentiable functions, which are useful for machine learning applications like hyperparameter optimization and data poisoning attacks.


Efficient and Effective Time-Series Forecasting with Spiking Neural Networks

https://openreview.net/forum?id=SkI6u81AkI

Compressor summary: This paper presents a framework for using spiking neural networks (SNNs), which mimic biological neurons, to improve time-series forecasting with less energy consumption and better capture of temporal dependencies.


GATE: How to Keep Out Intrusive Neighbors

https://openreview.net/forum?id=Sjv5RcqfuH

Compressor summary: GATE is an extension of Graph Attention Networks that improves over-smoothing by addressing its root cause, allows for higher depth and better feature transformations, and outperforms GATs on heterophilic datasets.


OTMatch: Improving Semi-Supervised Learning with Optimal Transport

https://openreview.net/forum?id=ShkKSDrfG6

Compressor summary: OTMatch is a new semi-supervised learning method that uses optimal transport loss to match class distributions and improve learning performance by leveraging semantic relationships among classes.


A Theoretical Analysis of Backdoor Poisoning Attacks in Convolutional Neural Networks

https://openreview.net/forum?id=SfcB4cVvPz

Compressor summary: The text explains how backdoor poisoning attacks work, why they are effective, and provides analysis and experiments on a specific type of neural network.


Reflected Flow Matching

https://openreview.net/forum?id=Sf5KYznS2G

Compressor summary: Reflected flow matching trains a velocity model for continuous normalizing flows with boundary constraints to produce natural and class-conditional samples on constrained domains.


A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks

https://openreview.net/forum?id=ScRhEuj480

Compressor summary: The paper proposes a simple and scalable corrector network that adjusts stale cached target embeddings for dense retrieval, achieving state-of-the-art results with significant reductions in computational cost.


Assessing Large Language Models on Climate Information

https://openreview.net/forum?id=ScIHQoTUjT

Compressor summary: The authors propose an evaluation framework for assessing how well large language models communicate about climate change, revealing a significant gap between their surface-level and deeper knowledge.


Representation Surgery for Multi-Task Model Merging

https://openreview.net/forum?id=Sbl2keQEML

Compressor summary: The paper proposes a method called "Surgery" to reduce representation bias in multi-task learning by aligning the merged model's representation with individual models' representations using an unsupervised optimization objective.


An Explicit Frame Construction for Normalizing 3D Point Clouds

https://openreview.net/forum?id=SZ0JnRxi0x

Compressor summary: The paper introduces a new algorithm for determining reference frames and normalizing 3D point clouds that works universally, guarantees compatibility, and outperforms existing methods.


CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

https://openreview.net/forum?id=SXVn5IFsrs

Compressor summary: The paper introduces CodeIt, a novel method for language models to self-improve by iterating between program sampling, hindsight relabeling, and learning from prioritized experience replay, achieving state-of-the-art performance on the Abstraction and Reasoning Corpus.


SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning

https://openreview.net/forum?id=SWrwurHAeq

Compressor summary: SiT is a scalable vision transformer that uses Graph Symmetric Attention to improve generalisation in reinforcement learning by preserving graph symmetries and leveraging local and global data patterns.


Adaptive Proximal Gradient Methods Are Universal Without Approximation

https://openreview.net/forum?id=SUxarNgrUT

Compressor summary: The paper proposes a new adaptive method for convex problems that doesn't need Lipschitz assumptions and uses plain Hölder inequalities to achieve linesearch-free convergence without prior knowledge of constants or order.


PIDformer: Transformer Meets Control Theory

https://openreview.net/forum?id=SRzb3QDjdV

Compressor summary: The paper proposes PIDformer, a new class of transformers that use feedback control to improve robustness, representation capacity, and noise resilience by avoiding input corruption and rank collapse issues in existing models.


UniAudio: Towards Universal Audio Generation with Large Language Models

https://openreview.net/forum?id=SRmZw7nEGW

Compressor summary: UniAudio is a large language model-based universal audio generation model that can handle various audio tasks using different inputs, such as phonemes, text descriptions, or existing audio, and achieve competitive results on 11 tasks.


RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

https://openreview.net/forum?id=SQIDlJd3hN

Compressor summary: RoboGen is a robotic agent that learns skills from simulated environments using foundation and generative models, creating diverse tasks and supervisions with minimal human input.


A Provable Decision Rule for Out-of-Distribution Detection

https://openreview.net/forum?id=SPygKwms0X

Compressor summary: The paper proposes a new decision rule for out-of-distribution (OOD) detection based on a generalized Benjamini Hochberg procedure that has rigorous theoretical guarantees and performs well empirically.


Mapping the Multiverse of Latent Representations

https://openreview.net/forum?id=SPBxFwIdMk

Compressor summary: PRESTO is a framework that analyzes the variability in latent representations of machine learning models using persistent homology, enabling sensitivity analysis, anomaly detection, and hyperparameter search.


Time-Series Forecasting for Out-of-Distribution Generalization Using Invariant Learning

https://openreview.net/forum?id=SMUXPVKUBg

Compressor summary: FOIL is a model-agnostic framework that improves out-of-distribution generalization in time-series forecasting via invariant learning, addressing challenges such as unobserved core variables and lack of environment labels.


Few-Shot Unsupervised Implicit Neural Shape Representation Learning with Spatial Adversaries

https://openreview.net/forum?id=SLqdDWwibH

Compressor summary: The text describes a new method that uses adversarial samples to improve the quality of Neural Signed Distance Functions (SDF) for 3D shape representation, outperforming existing methods on synthetic and real data.


Causal Representation Learning Made Identifiable by Grouping of Observational Variables

https://openreview.net/forum?id=SL6V527p1F

Compressor summary: The paper proposes a novel identifiability condition for causal representation learning based on observational variable grouping, with a self-supervised estimation framework that outperforms existing methods.


Benchmarking Deletion Metrics with the Principled Explanations

https://openreview.net/forum?id=SKPhvzxO1g

Compressor summary: The paper proposes a framework called TRACE that explains feature deletion influence on model predictions and benchmarks insertion/deletion metrics for explanation methods.


ESNet: Evolution and Succession Network for High-Resolution Salient Object Detection

https://openreview.net/forum?id=SERrqPDvoY

Compressor summary: The paper proposes a two-stage model for High-Resolution Salient Object Detection, which preserves details and is fast enough for real-time applications.


D-Flow: Differentiating through Flows for Controlled Generation

https://openreview.net/forum?id=SE20BFqj6J

Compressor summary: D-Flow is a method to control the output of diffusion and flow-matching models by differentiating through their generation process, achieving excellent results in various tasks such as image and audio inversion and conditional molecule generation.


Confidence-aware Contrastive Learning for Selective Classification

https://openreview.net/forum?id=SDCx6rQV2l

Compressor summary: The paper proposes a new method for selective classification called CCL-SC that improves feature layers and reduces selective risk by contrasting similar and different instances based on confidence.


DetKDS: Knowledge Distillation Search for Object Detectors

https://openreview.net/forum?id=SBR8Gwe1E2

Compressor summary: The paper introduces DetKDS, a framework that uses search algorithms to automatically find optimal detection distillation policies for different detector setups, achieving state-of-the-art results on various tasks.


Decomposable Submodular Maximization in Federated Setting

https://openreview.net/forum?id=SAbZExIIgG

Compressor summary: We propose a federated optimization method for decomposable submodular functions that respects privacy and reduces communication cost by aggregating only intermittently and on a subsampled set of clients.


Winner-takes-all learners are geometry-aware conditional density estimators

https://openreview.net/forum?id=SAbL40d8A4

Compressor summary: The paper proposes a new method to use Winner-takes-all learners for conditional density estimation that leverages their geometric properties and shows its advantages in theory and practice.


Parallel Affine Transformation Tuning of Markov Chain Monte Carlo

https://openreview.net/forum?id=SAXp5dMYv7

Compressor summary: This text describes a method to improve Markov chain Monte Carlo samplers by using bijective affine transformations and adaptive learning, which can generate high-quality samples efficiently for real-world data.


Efficient Low-Rank Matrix Estimation, Experimental Design, and Arm-Set-Dependent Low-Rank Bandits

https://openreview.net/forum?id=SAEUO7847g

Compressor summary: The paper proposes LowPopArt, a novel method for low-rank matrix estimation in trace regression and bandits, with tighter recovery guarantees and improved regret bounds than existing methods.


Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

https://openreview.net/forum?id=S9lk6dk4LL

Compressor summary: The paper proposes a method to pre-train LLMs on videos by decomposing them into keyframes and motions, and then adapting them to the model using tokenizers that discretize visual and temporal information, allowing for unified pre-training of images, videos, and text.


Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations

https://openreview.net/forum?id=S9DV6ZP4eE

Compressor summary: The adaptive analytic gradient method improves policy optimization using simulation gradients by adjusting the Q function to handle non-smooth simulations, and is demonstrated on AGPO algorithm with theoretical and empirical results showing its effectiveness.


A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

https://openreview.net/forum?id=S80a4hJtuE

Compressor summary: The paper proposes an efficient algorithm for offline reinforcement learning with linear MDPs under infinite-horizon discounted setting and partial data coverage assumption, achieving better sample complexity than existing methods.


Position: Social Environment Design Should be Further Developed for AI-based Policy-Making

https://openreview.net/forum?id=S6a6gHvMWx

Compressor summary: The paper introduces Social Environment Design, a framework that uses AI to automate policy-making, connects with different research communities, and addresses key open problems for future research in this field.


Be Your Own Neighborhood: Detecting Adversarial Examples by the Neighborhood Relations Built on Self-Supervised Learning

https://openreview.net/forum?id=S4LqI6CcJ3

Compressor summary: BEYOND is a novel AE detection framework that uses SSL to distinguish between clean samples and AEs based on representation similarity and label consistency, achieving state-of-the-art robustness accuracy.


Practical Performance Guarantees for Pipelined DNN Inference

https://openreview.net/forum?id=S3xqyEaST9

Compressor summary: The paper proposes algorithms and methods for optimizing pipeline parallelism in deep neural network inference by minimizing the running time of the bottleneck stage and provides empirical results on a diverse testbed of production models.


Equilibrium of Data Markets with Externality

https://openreview.net/forum?id=S2XgbBCJy0

Compressor summary: The paper studies how transaction costs can improve data markets by reducing negative externalities from buyers' purchases and proposes learning algorithms to achieve low regret in valuation.


On the Independence Assumption in Neurosymbolic Learning

https://openreview.net/forum?id=S1gSrruVd4

Compressor summary: The text argues against assuming conditionally independent probabilities in neurosymbolic learning systems, as it can cause overconfidence, poor uncertainty representation, and optimization difficulties.


Why Do Animals Need Shaping? A Theory of Task Composition and Curriculum Learning

https://openreview.net/forum?id=S0DPCE7tt4

Compressor summary: The text discusses the importance of 'shaping' procedures for complex tasks in systems neuroscience and proposes a model to analyse deep policy gradient learning of compositional reinforcement learning tasks using statistical physics tools.


EvTexture: Event-driven Texture Enhancement for Video Super-Resolution

https://openreview.net/forum?id=Ry4RAzdOWl

Compressor summary: Event-based vision method EvTexture uses high-frequency event details for texture enhancement in video super-resolution, achieving state-of-the-art performance.


Multi-Patch Prediction: Adapting Language Models for Time Series Representation Learning

https://openreview.net/forum?id=Rx9GMufByc

Compressor summary: The study introduces $ ext{aL\small{LM}4T\small{S}}$, a framework that adapts Large Language Models for time-series representation learning using self-supervised, multi-patch prediction.


Finite Smoothing Algorithm for High-Dimensional Support Vector Machines and Quantile Regression

https://openreview.net/forum?id=RvwMTDYTOb

Compressor summary: The paper presents a new algorithm (FSA) that makes support vector machines and quantile regression faster and more accurate for high-dimensional data by transforming their non-smooth loss functions into smooth ones.


Locally Differentially Private Decentralized Stochastic Bilevel Optimization with Guaranteed Convergence Accuracy

https://openreview.net/forum?id=RuH78kOcDi

Compressor summary: Key points: - Decentralized bilevel optimization is useful for many domains but hard to ensure differential privacy. - The paper proposes a new algorithm that achieves both differential privacy and accurate convergence. - The paper also analyzes the convergence rate and the price of differential privacy. Summary: The paper presents a new decentralized bilevel optimization algorithm that ensures differential privacy without sacrificing accuracy, and studies its convergence rate and the trade-off with privacy.


Compositional Curvature Bounds for Deep Neural Networks

https://openreview.net/forum?id=RtnGLJNtEG

Compressor summary: The paper analyzes how local gradients and second-order behavior of deep neural networks affect their robustness against adversarial attacks and proposes a scalable algorithm to compute and use curvature bounds as a regularizer for training.


PAPM: A Physics-aware Proxy Model for Process Systems

https://openreview.net/forum?id=RtCmp5F9lN

Compressor summary: Physics-aware proxy model (PAPM) is a new approach that combines partial physics knowledge and flexible adaptation for better generalization and performance in process systems modeling, with less computational cost and fewer parameters than existing methods.


Online Resource Allocation with Non-Stationary Customers

https://openreview.net/forum?id=RsIMGYzBcv

Compressor summary: Key points: - A new algorithm for online resource allocation with non-stationary customer arrivals and unknown click-through rates - It combines ideas from stochastic contextual bandits and online matching - It has low regret when customer arrivals are near-stationary and optimal competitive ratio otherwise - It performs well in numerical experiments Summary: The paper presents a novel algorithm that efficiently allocates resources to customers with changing arrival rates and unknown click-through rates, using insights from two fields and showing good performance in various scenarios.


AutoOS: Make Your OS More Powerful by Exploiting Large Language Models

https://openreview.net/forum?id=Rp8R9C0Sth

Compressor summary: AutoOS is a framework using Large Language Models to optimize Linux kernel configurations for AIoT applications automatically and efficiently.


Projection-Free Online Convex Optimization with Time-Varying Constraints

https://openreview.net/forum?id=RnbobOgbn0

Compressor summary: The paper proposes projection-free online convex optimization algorithms that use linear optimization oracles and achieve low regret and constraints violation under time-varying constraints.


Open Ad Hoc Teamwork with Cooperative Game Theory

https://openreview.net/forum?id=RlibRvH4B4

Compressor summary: The paper proposes a new algorithm called CIAO for open ad hoc teamwork using graph-based policy learning and cooperative game theory to improve explanations and performance.


Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

https://openreview.net/forum?id=RiQbe8RwCe

Compressor summary: The paper shows that small batch sizes do not help online learning, but SGD noise has computational benefits in this setting, supporting a noisy golden path hypothesis.


Stay on Topic with Classifier-Free Guidance

https://openreview.net/forum?id=RiM3cl9MdK

Compressor summary: CFG is an effective inference-time technique for language modeling that enhances various tasks and preferences, outperforming some models with less parameters.


On the Second-Order Convergence of Biased Policy Gradient Algorithms

https://openreview.net/forum?id=RfsagmV1AG

Compressor summary: The paper analyzes biased policy gradient methods in reinforcement learning and their second-order behavior, including vanilla estimators and actor-critic algorithms.


How Far Can Fairness Constraints Help Recover From Biased Data?

https://openreview.net/forum?id=RfQT6vJt8b

Compressor summary: Blum & Stangl (2019) prove that fair classifiers with equal opportunity constraints can achieve optimal accuracy even on extremely biased data, and the authors extend this result to various settings and applications.


Ameliorate Spurious Correlations in Dataset Condensation

https://openreview.net/forum?id=RbnojVv4HK

Compressor summary: The paper investigates how dataset biases affect dataset condensation and proposes a sample reweighting method using kernel density estimation to reduce bias amplification, achieving significant improvements on benchmark datasets.


Improving Transformers with Dynamically Composable Multi-Head Attention

https://openreview.net/forum?id=RbiBKPtuHp

Compressor summary: DCMHA is a more efficient and expressive attention mechanism than MHA that improves Transformer performance in language modeling tasks.


Let Go of Your Labels with Unsupervised Transfer

https://openreview.net/forum?id=RZHRnnGcEx

Compressor summary: TURTLE is a method that achieves state-of-the-art unsupervised performance on various tasks by searching for the labeling of a dataset using representation spaces of different foundation models without any supervision or task-specific learning.


Learning and Forgetting Unsafe Examples in Large Language Models

https://openreview.net/forum?id=RYmmgedVjR

Compressor summary: The ForgetFilter algorithm filters out unsafe data from large language models based on how easily they forget it, ensuring safety without sacrificing performance.


Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference

https://openreview.net/forum?id=RXxTuxPopa

Compressor summary: The paper proposes a new method to classify events with reliable uncertainty estimates, accounting for distributional shifts between train and target data by using nuisance parameter-dependent cutoffs and estimating the ROC across the nuisance parameter space.


Dealing With Unbounded Gradients in Stochastic Saddle-point Optimization

https://openreview.net/forum?id=RPMTNGMq0O

Compressor summary: The paper proposes a regularization technique for stochastic first-order methods that stabilizes iterates and provides performance guarantees even when gradient noise is potentially unbounded.


Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers

https://openreview.net/forum?id=RLENZ8pNnn

Compressor summary: INSTINCT algorithm optimizes instructions for large language models using neural networks and transformers, achieving better performance than baselines in various tasks.


Mitigating Oversmoothing Through Reverse Process of GNNs for Heterophilic Graphs

https://openreview.net/forum?id=RLA4JTckXe

Compressor summary: Reverse GNNs improve node classification on heterophilic graphs by inverting message passing and mitigating over-smoothing.


Et Tu Certifications: Robustness Certificates Yield Better Adversarial Examples

https://openreview.net/forum?id=RKlmOBFwAh

Compressor summary: The paper proposes a new attack that uses certifications to create more effective adversarial examples and questions the security of certification mechanisms in neural network robustness.


Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models

https://openreview.net/forum?id=RIMRKeeVsr

Compressor summary: This study investigates how retrieval helps vision-language models adapt to new tasks, revealing insights on uni-modal and cross-modal retrieval and the importance of logit ensemble.


Asymptotics of Learning with Deep Structured (Random) Features

https://openreview.net/forum?id=RI4GA8amUI

Compressor summary: The paper studies how the test error of learning the readout layer depends on the population covariance of features for large neural networks with structured weights and high-dimensional inputs.


$f$-Divergence Based Classification: Beyond the Use of Cross-Entropy

https://openreview.net/forum?id=RFhkcqRmTD

Compressor summary: The text proposes a new objective function for classification tasks based on the shifted log (SL) $f$-divergence, which shows better accuracy than existing methods in various applications.


Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction

https://openreview.net/forum?id=RDofzHLuX4

Compressor summary: Key points: - Novel regression adjustment method for distributional treatment effects in randomized experiments - Uses pre-treatment covariates and machine learning techniques to improve precision - Valid and inferrable with well-estimated nuisance components - Simulation and real data analysis show effectiveness Summary: The paper proposes a new regression method that uses covariates and machine learning to estimate distributional treatment effects in randomized experiments more precisely and reliably, based on simulations and real data.


ODIM: Outlier Detection via Likelihood of Under-Fitted Generative Models

https://openreview.net/forum?id=R8nbccD7kv

Compressor summary: Our method detects outliers in unsupervised learning by leveraging the initial memorization of inliers by under-fitted deep generative models, making it fast and effective for various data types.


OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

https://openreview.net/forum?id=R83VIZtHXA

Compressor summary: The paper proposes a method for online reinforcement learning policy learning that handles distribution discrepancies due to policy or dynamics shifts, called Occupancy-Matching Policy Optimization (OMPO), and shows its effectiveness in various environments.


Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

https://openreview.net/forum?id=R6GT1UDcOW

Compressor summary: The paper shows how combining a target network and over-parameterized linear function approximation can improve bootstrapped value estimation with less strict convergence conditions, even for off-policy data.


MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

https://openreview.net/forum?id=R4Ng8zYaiz

Compressor summary: MMT-Bench is a comprehensive benchmark for evaluating large vision-language models on diverse multimodal tasks requiring visual reasoning and localization, covering 32 core meta-tasks and 162 subtasks.


Efficient Online Set-valued Classification with Bandit Feedback

https://openreview.net/forum?id=R1auM3tLPE

Compressor summary: Bandit Class-specific Conformal Prediction (BCCP) is a method that provides coverage guarantees for class predictions in online learning settings with limited label information, using stochastic gradient descent and unbiased estimation.


MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving

https://openreview.net/forum?id=R0SoZvqXyQ

Compressor summary: MuxServe is a system that efficiently serves multiple large language models by colocating them based on popularity and managing resources adaptively.


A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models

https://openreview.net/forum?id=QwgSOwynxD

Compressor summary: Key points: - Generative models are important but lack a theory for generalization and uncertainty - The paper proposes a bias-variance-covariance decomposition for kernel scores - This allows deriving variance and entropy for uncertainty estimation using generated samples only - The framework works for image, audio, and language generation Summary: The paper presents a new theoretical framework to estimate uncertainty in generative models using kernel scores and their decomposition, which can be applied to various domains and model types.


Enhancing Adversarial Robustness in SNNs with Sparse Gradients

https://openreview.net/forum?id=QvABoVGdRp

Compressor summary: This paper proposes a new method to improve the robustness of Spiking Neural Networks against adversarial attacks by regularizing their gradient sparsity, which is theoretically proven and experimentally validated.


Online Learning in CMDPs: Handling Stochastic and Adversarial Constraints

https://openreview.net/forum?id=Qv5szC1zp7

Compressor summary: The paper introduces a new online learning algorithm for CMDPs with long-term constraints that can handle stochastic and adversarial rewards and constraints without knowing the transition function, achieving competitive performance in both settings.


Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

https://openreview.net/forum?id=QhqQJqe0Wq

Compressor summary: Score identity Distillation (SiD) is a data-free method that quickly improves the performance of pretrained diffusion models using three identities and a novel loss mechanism, achieving high efficiency and quality in generation.


Naive Bayes Classifiers over Missing Data: Decision and Poisoning

https://openreview.net/forum?id=QhKsE7YAJk

Compressor summary: The paper presents a polynomial-time algorithm to determine if Naive Bayes Classifiers are certifiably robust to missing values in dirty datasets, and shows that data poisoning attacks are easier for single than multiple test points.


Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

https://openreview.net/forum?id=QhHMx51ir6

Compressor summary: SMAT is a method that improves vision foundation models' transfer abilities by automatically isolating subsets of pre-trained parameters for meta-tuning on each task, overcoming OOD sensitivity and achieving state-of-the-art results.


Enhancing Storage and Computational Efficiency in Federated Multimodal Learning for Large-Scale Models

https://openreview.net/forum?id=QgvBcOsF4B

Compressor summary: Key points: - The paper proposes M$^2$FedSA, a method to train large-scale multimodal models under FL with privacy protection and efficiency improvement. - M$^2$FedSA uses Split Learning, specialized adapters, and modality knowledge transfer to achieve its goals. - The paper evaluates M$^2$FedSA on various multimodal classification tasks and releases the code. Summary: M$^2$FedSA is a novel method for training large-scale multimodal models in a privacy-preserving and efficient way, using Split Learning, adapters, and modality knowledge transfer. It shows promising results on several multimodal classification tasks.


Exploring the Complexity of Deep Neural Networks through Functional Equivalence

https://openreview.net/forum?id=QgMqvxvWpX

Compressor summary: The text discusses how functional equivalence helps reduce the complexity of neural networks, make them easier to train, and provides insights into overparameterization, generalization, and optimization.


Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

https://openreview.net/forum?id=Qc5umSsUi8

Compressor summary: The paper proposes a novel algorithm for safe policy improvement in multi-agent domains using Monte Carlo Tree Search, factorization, and Max-Plus constraint.


Potential Based Diffusion Motion Planning

https://openreview.net/forum?id=Qb68Rs0p9f

Compressor summary: The authors propose a new method for learning potential-based motion planning using neural networks that can optimize trajectories, avoid local minima, and handle various constraints.


The Illusion of State in State-Space Models

https://openreview.net/forum?id=QZgo9JZpLq

Compressor summary: Summary: SSMs, a potential alternative for large language models, have similar expressive limitations as transformers in solving state-tracking problems despite their recurrent formulation.


Polynomial-based Self-Attention for Table Representation Learning

https://openreview.net/forum?id=QZd3rvlP76

Compressor summary: The paper proposes a new self-attention layer for tabular data using matrix polynomials to address the oversmoothing issue in Transformers and improve model scalability.


Differentiable Mapper for Topological Optimization of Data Representation

https://openreview.net/forum?id=QZ1DVzr6N9

Compressor summary: The paper proposes a method to optimize the filter parameter of the Mapper graph in topological data analysis, which improves the representation and visualization of data structures.


SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals

https://openreview.net/forum?id=QXqXGDapkQ

Compressor summary: The authors present SleepFM, a multi-modal foundation model for sleep analysis that outperforms standard methods on sleep stage classification and sleep disordered breathing detection using a novel contrastive learning approach.


Improving Adversarial Energy-Based Model via Diffusion Process

https://openreview.net/forum?id=QXEx16jWdN

Compressor summary: The paper proposes an improved version of adversarial EBMs by using denoising steps and a variational posterior distribution, leading to better generation and density estimation.


Reward-Free Kernel-Based Reinforcement Learning

https://openreview.net/forum?id=QTt2xJI8vk

Compressor summary: The paper proposes an adaptive domain partitioning algorithm for reward-free RL that uses kernel-based function approximations and achieves order-optimal sample complexity for a wide range of kernels.


Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge

https://openreview.net/forum?id=QRjTDhCIO8

Compressor summary: The paper introduces Re-Dock, a new deep learning method for predicting protein-ligand binding structures that considers both ligand and pocket sidechain conformations, and shows its superior performance on benchmark datasets.


Multi-Fidelity Residual Neural Processes for Scalable Surrogate Modeling

https://openreview.net/forum?id=QRDfBIhrJq

Compressor summary: MFRNP is a new multi-fidelity surrogate modeling framework that improves scalability, inference performance, and accuracy by explicitly modeling the residual between lower and higher fidelity data using neural network encoders, decoders, and optimized lower fidelity decoders.


Nash Incentive-compatible Online Mechanism Learning via Weakly Differentially Private Online Learning

https://openreview.net/forum?id=QQkK6YH0Th

Compressor summary: The paper proposes a novel online learning scheme that adapts between privacy-preserving recommendations and commitment mechanisms over multiple rounds, achieving low regret in multi-round mechanism design problems.


Interpretable Deep Clustering for Tabular Data

https://openreview.net/forum?id=QPy7zLfvof

Compressor summary: The paper proposes a deep-learning framework for tabular data that predicts interpretable cluster assignments with feature selection.


Delving into the Convergence of Generalized Smooth Minimax Optimization

https://openreview.net/forum?id=QPsEPI9bvp

Compressor summary: The paper explores minimax optimization algorithms that work under a relaxed Lipschitz smoothness condition called generalized smoothness, which allows them to converge and have better theoretical guarantees in more machine learning applications.


DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)

https://openreview.net/forum?id=QMy2RLnxGN

Compressor summary: DoraemonGPT is a system that uses LLMs to understand dynamic scenes, especially videos, and perform tasks across different domains with the help of specialized tools and a novel planner based on Monte Carlo Tree Search.


Optimization without Retraction on the Random Generalized Stiefel Manifold

https://openreview.net/forum?id=QLtxj3erlJ

Compressor summary: The paper proposes a cheap stochastic iterative method for solving problems with generalized orthogonality constraints using random estimates of $B$, achieving similar convergence rates as traditional methods but with lower cost and complexity.


Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

https://openreview.net/forum?id=QLcBzRI3V3

Compressor summary: RiC is a simple and adaptable method for aligning foundation models with human preferences using supervised fine-tuning, which reduces the cost and complexity of previous approaches.


Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

https://openreview.net/forum?id=QLOvxGwbIM

Compressor summary: The paper introduces Bayesian Power Steering (BPS), a novel network structure for fine-tuning large diffusion models that efficiently extracts task-specific knowledge and outperforms contemporary methods in various tasks.


Prediction-powered Generalization of Causal Inferences

https://openreview.net/forum?id=QKnWXX3aVm

Compressor summary: The paper proposes algorithms to improve generalizing RCT results to a target population by using an additional observational study, while accounting for high-quality, low-quality, and confounded data.


DPZero: Private Fine-Tuning of Language Models without Backpropagation

https://openreview.net/forum?id=QJkG8Mln72

Compressor summary: DPZero is a new private zeroth-order algorithm for fine-tuning large language models that reduces memory demands and protects sensitive information.


Diffusive Gibbs Sampling

https://openreview.net/forum?id=QH4mXDEULp

Compressor summary: DiGS is a new sampling method that effectively deals with distant and disconnected modes in multi-modal distributions using Gaussian convolution and Gibbs sampling, improving performance in tasks like Bayesian inference and molecular dynamics.


Optimizing Watermarks for Large Language Models

https://openreview.net/forum?id=QGAeWRRe6e

Compressor summary: The paper proposes a systematic approach to optimize watermark trade-offs for generative LLMs, leading to better robust and efficient watermarks.


Lightweight Image Super-Resolution via Flexible Meta Pruning

https://openreview.net/forum?id=QFMcXz6e4Y

Compressor summary: Flexible meta pruning (FMP) is a lightweight image super-resolution method that simultaneously prunes network channels and weights using hypernetwork-generated meta-data, achieving flexible and structured sparsity control.


The Merit of River Network Topology for Neural Flood Forecasting

https://openreview.net/forum?id=QE6iC9s6vU

Compressor summary: The text discusses using graph neural networks (GNNs) to predict river discharge in a network of gauge stations, but finds that GNNs do not benefit from the river network topology information and struggle with sudden spikes.


Transformers, parallel computation, and logarithmic depth

https://openreview.net/forum?id=QCZabhKQhB

Compressor summary: Transformers can efficiently simulate and be simulated by communication rounds, enabling them to solve basic computational tasks faster than other neural sequence models.


Learning Useful Representations of Recurrent Neural Network Weight Matrices

https://openreview.net/forum?id=QBj7Uurdwf

Compressor summary: The text discusses various methods to learn representations of Recurrent Neural Network (RNN) weights and evaluates their performance on downstream tasks, with functionalist approaches showing better results.


RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content

https://openreview.net/forum?id=QAGRPiC3FS

Compressor summary: RigorLLM is a novel framework that efficiently and effectively moderates harmful and unsafe inputs and outputs for Large Language Models using a multi-faceted approach.


Scribble-Supervised Semantic Segmentation with Prototype-based Feature Augmentation

https://openreview.net/forum?id=Q8uJyOwOsd

Compressor summary: This paper introduces a new method for scribble-supervised semantic segmentation that uses feature prototypes to improve performance and reduce annotation costs.


CogBench: a large language model walks into a psychology lab

https://openreview.net/forum?id=Q3104y8djk

Compressor summary: CogBench is a benchmark for evaluating large language models based on cognitive psychology experiments, revealing insights about their behavior, performance, and alignment with humans.


Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View

https://openreview.net/forum?id=PzjDsfYwLC

Compressor summary: The text discusses evaluating vision language models' (VLMs) composition skills using game theory, finding weaknesses in their reasoning abilities and providing guidance for future research.


Density Ratio Estimation with Doubly Strong Robustness

https://openreview.net/forum?id=PykISfqvet

Compressor summary: The paper presents two density ratio estimation methods with outlier-robustness based on divergences, one convex and one DC, and demonstrates their superior performance in heavy contamination scenarios.


Zeroth-Order Methods for Constrained Nonconvex Nonsmooth Stochastic Optimization

https://openreview.net/forum?id=PxHmxoFOgI

Compressor summary: The paper proposes and analyzes new methods for solving constrained nonconvex nonsmooth optimization problems with stochastic zeroth-order algorithms and novel concepts of approximate stationarity.


Building Socially-Equitable Public Models

https://openreview.net/forum?id=PudBRuNa8r

Compressor summary: The paper proposes a new way to train public models that makes their predictions more fair and equitable for different downstream agents using a novel Equitable Objective and policy gradient algorithm.


Causal Representation Learning from Multiple Distributions: A General Setting

https://openreview.net/forum?id=Pte6iiXvpf

Compressor summary: This paper proposes general nonparametric methods for learning causal representations from heterogeneous or nonstationary data and shows how they relate to other assumptions like parametric models or hard interventions.


Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

https://openreview.net/forum?id=PrmxFWI1Fr

Compressor summary: The paper discusses how Bayesian deep learning can enhance deep learning capabilities in various settings and addresses existing challenges and future research directions.


Efficient Exploration for LLMs

https://openreview.net/forum?id=PpPZ6W7rxy

Compressor summary: Efficient exploration helps improve large language models using human feedback, and double Thompson sampling with epistemic neural networks is a good technique for this purpose.


Bridging Data Gaps in Diffusion Models with Adversarial Noise-Based Transfer Learning

https://openreview.net/forum?id=PpBs2iL0jv

Compressor summary: The paper proposes a new transfer learning method, ANT, for image generation with limited data, which uses similarity-guided training and adversarial noise selection to improve performance and quality.


Vocabulary for Universal Approximation: A Linguistic Perspective of Mapping Compositions

https://openreview.net/forum?id=PnyYgWMMwj

Compressor summary: The article explores the finite representation of deep neural networks as composite functions of mappings and proves their universal approximation by a finite vocabulary.


Consistent Diffusion Meets Tweedie: Training Exact Ambient Diffusion Models with Noisy Data

https://openreview.net/forum?id=PlVjIGaFdH

Compressor summary: The paper proposes a new framework for training diffusion models that can handle corrupted data and reduce memorization of the training set.


Learning 1-Bit Tiny Object Detector with Discriminative Feature Refinement

https://openreview.net/forum?id=PlM30j9i80

Compressor summary: DFR-Det is a method to improve 1-bit detectors' ability to detect tiny objects in aerial images by refining feature representation using an information bottleneck and a new decoder with a foreground mask.


ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models

https://openreview.net/forum?id=PjiRSyUt7e

Compressor summary: ConTextual is a new dataset for evaluating multimodal models' ability to reason over text and visual elements in context-rich images, revealing a significant gap between current models and human performance.


Prospector Heads: Generalized Feature Attribution for Large Models & Data

https://openreview.net/forum?id=PjVqEErDgK

Compressor summary: Prospector heads are an efficient and interpretable feature attribution method that works across different data modalities and outperforms baseline methods.


Neural SPH: Improved Neural Modeling of Lagrangian Fluid Dynamics

https://openreview.net/forum?id=Pbey7LqBRl

Compressor summary: The paper proposes enhancing graph neural networks with smoothed particle hydrodynamics components to improve their performance in simulating fluid dynamics.


Robust Universal Adversarial Perturbations

https://openreview.net/forum?id=Paw0BkPaTN

Compressor summary: The text introduces a new method for generating universal adversarial perturbations (UAPs) that are robust to real-world transformations, such as rotation and contrast changes, which can improve the effectiveness of practical attacks on deep neural networks.


A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?

https://openreview.net/forum?id=Pa3GyTe3kf

Compressor summary: This paper explores whether large language models can help speed up Bayesian optimization in material discovery and finds that they can, but only if they are trained on relevant domain data.


QBMK: Quantum-based Matching Kernels for Un-attributed Graphs

https://openreview.net/forum?id=PYDCwWvbG7

Compressor summary: The paper introduces a new Quantum-based Matching Kernel (QBMK) for un-attributed graphs, which captures both global and local structural characteristics and outperforms existing methods in experiments.


Generalization in Kernel Regression Under Realistic Assumptions

https://openreview.net/forum?id=PY3bKuorBI

Compressor summary: This paper proposes a unified theory to bound the excess risk of kernel regression for realistic settings and shows that kernel methods have a built-in self-regularization mechanism.


Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

https://openreview.net/forum?id=PTGJOUlQ68

Compressor summary: The paper proposes an optimal private vector mean estimation protocol in the shuffle model of privacy, and studies its properties in various settings.


An Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

https://openreview.net/forum?id=PSzyBN7LIA

Compressor summary: The paper proposes a novel algorithm, ME-DOL, that achieves the optimal convergence rate for decentralized nonsmooth nonconvex stochastic optimization in finite time.


Think Before You Act: Decision Transformers with Working Memory

https://openreview.net/forum?id=PSQ5Z920M8

Compressor summary: The paper proposes a working memory module for transformer-based agents to improve their efficiency and generalization by mitigating the forgetting phenomenon and blending information from different tasks.


Position: The Reasonable Person Standard for AI

https://openreview.net/forum?id=PQWVUbqQtQ

Compressor summary: The paper suggests using the reasonable person standard from law to guide the development and evaluation of AI behavior that emulates human norms.


Position: Mission Critical – Satellite Data is a Distinct Modality in Machine Learning

https://openreview.net/forum?id=PQ0ERKKYJu

Compressor summary: Satellite data is a unique modality for machine learning that requires a new research agenda to address its challenges and opportunities.


Prompt-based Visual Alignment for Zero-shot Policy Transfer

https://openreview.net/forum?id=PPoQz8K4GZ

Compressor summary: PVA is a framework that uses semantic information from text to align images across domains, enabling zero-shot policy transfer in RL with limited cross-domain data.


Extending Test-Time Augmentation with Metamorphic Relations for Combinatorial Problems

https://openreview.net/forum?id=PNsdnl8blk

Compressor summary: MAgg is a machine learning method that uses metamorphic relations to improve prediction aggregation for combinatorial problems.


Sobolev Space Regularised Pre Density Models

https://openreview.net/forum?id=PMASooqgoq

Compressor summary: The paper proposes a new density estimation method that regularizes a Sobolev norm, is statistically consistent and interpretable, and performs well on an anomaly detection benchmark.


InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

https://openreview.net/forum?id=PLAGGbssT8

Compressor summary: Retro 48B is a large language model pretrained with retrieval that outperforms GPT 43B on various zero-shot tasks and can improve instruction tuning.


Graph Mixup on Approximate Gromov–Wasserstein Geodesics

https://openreview.net/forum?id=PKdege0U6Z

Compressor summary: GeoMix uses Gromov-Wasserstein distance to generate consistent synthetic samples for graph data, improving GNN performance.


RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

https://openreview.net/forum?id=PKJqsZD5nQ

Compressor summary: RICE is a new method for improving deep reinforcement learning agents' performance by using explanation techniques to create a better initial state distribution, leading to better exploration and sub-optimality bounds.


Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels

https://openreview.net/forum?id=PHjkVjR78A

Compressor summary: The paper proposes a method to teach large multi-modality models (LMMs) with text-defined rating levels instead of scores, achieving state-of-the-art accuracy on image and video quality assessment tasks.


AegisFL: Efficient and Flexible Privacy-Preserving Byzantine-Robust Cross-silo Federated Learning

https://openreview.net/forum?id=PHUAG63Efe

Compressor summary: AegisFL is a privacy-preserving federated learning system that enables flexible robust aggregation algorithms while maintaining efficiency and preventing model exposure.


LESS: Selecting Influential Data for Targeted Instruction Tuning

https://openreview.net/forum?id=PG5fV50maR

Compressor summary: LESS is an algorithm that selects relevant instruction data for large language models to develop specialized skills like reasoning by estimating data influences and searching for low-rank gradient similarity.


Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

https://openreview.net/forum?id=PEpbUobfJv

Compressor summary: Medusa is a method that accelerates language model inference by predicting multiple tokens in parallel using extra decoding heads and tree-based attention, achieving significant speedup without sacrificing quality.


Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery

https://openreview.net/forum?id=PDUQRBPkks

Compressor summary: The paper proposes a distributed algorithm for quantile regression that is robust, efficient, and achieves high accuracy with low communication requirements.


High-Order Contrastive Learning with Fine-grained Comparative Levels for Sparse Ordinal Tensor Completion

https://openreview.net/forum?id=PDO2Oc1cS1

Compressor summary: HOCTC is a novel network that extends contrastive learning to sparse ordinal tensor data by using attention-based query-expansion and fine-grained comparisons for high-order representation learning.


Bootstrapping Fisher Market Equilibrium and First-Price Pacing Equilibrium

https://openreview.net/forum?id=PApqOVbHYF

Compressor summary: The paper presents bootstrap inference methods for LFM and FPPE models and develops a new procedure for bootstrapping constrained M-estimators using epi-convergence theory, which is tested on synthetic and real data.


Limited Preference Aided Imitation Learning from Imperfect Demonstrations

https://openreview.net/forum?id=PAbkWU0KDG

Compressor summary: PAIL is a novel imitation learning algorithm that uses limited human preferences to improve policy learning from imperfect demonstrations, achieving better results in sequential decision-making tasks.


In-Context Principle Learning from Mistakes

https://openreview.net/forum?id=PAPY0cAB3C

Compressor summary: LEAP is a method to improve LLMs' performance on downstream tasks by learning from mistakes made during few-shot learning and applying learned principles to unseen problems.


Invariant Risk Minimization Is A Total Variation Model

https://openreview.net/forum?id=P7qwBmzwwZ

Compressor summary: IRM-TV-$\ell_1$ is a novel framework that generalizes invariant features using a total variation model based on the $L^2$ norm, expanding function classes and improving robustness in denoising and out-of-distribution scenarios.


Sparse Cocktail: Every Sparse Pattern Every Sparse Ratio All At Once

https://openreview.net/forum?id=OrVl8R13Wy

Compressor summary: Sparse Cocktail is a novel sparse co-training framework that trains multiple subnetworks with diverse sparsity patterns and ratios, allowing flexible switching between them based on resource availability during inference.


Triadic-OCD: Asynchronous Online Change Detection with Provable Robustness, Optimality, and Convergence

https://openreview.net/forum?id=OnkA4zaEU9

Compressor summary: The paper proposes a triadic-online change detection framework with certifiable robustness, provable optimality, guaranteed convergence, and asynchronous distributed implementation for various applications.


Diffusion Model-Augmented Behavioral Cloning

https://openreview.net/forum?id=OnidGtOhg3

Compressor summary: The Diffusion Model-Augmented Behavioral Cloning (DBC) is a framework that improves imitation learning by modeling both conditional and joint probabilities of expert behaviors, leading to better generalization and performance in continuous control tasks.


A Study of First-Order Methods with a Deterministic Relative-Error Gradient Oracle

https://openreview.net/forum?id=OndZHBUA1G

Compressor summary: The paper analyzes theoretical guarantees of classical optimization methods for problems with biased gradient oracles, showing their invariance to errors and applicability to different settings.


A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models

https://openreview.net/forum?id=OnOaj3g9fi

Compressor summary: The proposed framework adapts compute allocation for score estimation in diffusion models, improving sampling speed without sacrificing quality.


FRAG: Frequency Adapting Group for Diffusion Video Editing

https://openreview.net/forum?id=OnEaBGU3LO

Compressor summary: The paper proposes FRAG, a method to improve video quality by preserving high-frequency components during denoising, without requiring extra training.


A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback

https://openreview.net/forum?id=Olix9pk6nV

Compressor summary: The paper introduces a novel linear programming framework for reward learning from human demonstrations and feedback, with provable sample efficiency and optimality guarantees.


Verification of Machine Unlearning is Fragile

https://openreview.net/forum?id=OkChMnjF6s

Compressor summary: The paper explores the fragility of machine unlearning verification strategies and proposes two novel adversarial methods to bypass them, revealing the need for more research on the safety of machine unlearning.


MGit: A Model Versioning and Management System

https://openreview.net/forum?id=OjBW993g79

Compressor summary: MGit is a system that helps manage related machine learning models by recording their relationships and optimizing storage and testing processes, reducing storage size and update time.


Straight-Through Meets Sparse Recovery: the Support Exploration Algorithm

https://openreview.net/forum?id=Oj18qGN1gC

Compressor summary: The paper proposes a new algorithm, SEA, that improves sparse support recovery using the straight-through estimator and analyzes its performance under the Restricted Isometry Property.


Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning

https://openreview.net/forum?id=OiI12sNbgD

Compressor summary: The paper introduces Atari Pre-training Benchmark (Atari-PB), a unified benchmark to evaluate pre-training methods in vision-based Reinforcement Learning, and shows that objectives focused on task-agnostic features improve generalization across diverse environments.


Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

https://openreview.net/forum?id=OgG0I5toZZ

Compressor summary: The paper proposes asking users questions about both features and comparisons when learning their reward functions from preference data, leading to faster and more accurate rewards in different domains.


Vanilla Bayesian Optimization Performs Great in High Dimensions

https://openreview.net/forum?id=OfT8MgIqHT

Compressor summary: The paper proposes a simple scaling of the Gaussian process lengthscale prior to improve Bayesian optimization performance in high-dimensional problems, and shows that standard BO works better than previously thought in this setting.


Structure Your Data: Towards Semantic Graph Counterfactuals

https://openreview.net/forum?id=OenMwDPqWn

Compressor summary: The paper proposes a method to generate more descriptive and accurate explanations for model predictions using semantic graphs and graph neural networks, outperforming previous approaches in both quantitative and qualitative evaluation.


Compact Optimality Verification for Optimization Proxies

https://openreview.net/forum?id=OdsZS0E0AO

Compressor summary: The paper introduces a new way to check how good optimization proxies are, which works faster and for more types of problems.


Sparse and Structured Hopfield Networks

https://openreview.net/forum?id=OdPlFWExX1

Compressor summary: The paper introduces a new family of sparse Hopfield networks with end-to-end differentiable transformations and shows their applications in pattern retrieval tasks.


Bayesian Exploration Networks

https://openreview.net/forum?id=OYw6sS8QmL

Compressor summary: The paper proposes a novel Bayesian model-free method, the Bayesian exploration network (BEN), which can learn true Bayes-optimal policies by modelling both types of uncertainty and achieving optimal trade-offs between them.


Controllable Prompt Tuning For Balancing Group Distributional Robustness

https://openreview.net/forum?id=OYL91MHfuU

Compressor summary: The paper introduces Controllable Prompt Tuning (CPT), a method that optimizes performance across different groups or domains without sacrificing performance on any of them, using prompt-tuning techniques and minimal tunable parameters.


Counterfactual Image Editing

https://openreview.net/forum?id=OXzkw7vFIO

Compressor summary: The paper proposes a method for counterfactual image editing using causal language and neural causal models, while acknowledging the limitations of the task.


Universal Consistency of Wide and Deep ReLU Neural Networks and Minimax Optimal Convergence Rates for Kolmogorov-Donoho Optimal Function Classes

https://openreview.net/forum?id=OVn8FpeBpG

Compressor summary: The paper shows wide and deep ReLU neural network classifiers are consistent and optimal for various function classes, including those without smoothness assumptions.


One-Shot Strategic Classification Under Unknown Costs

https://openreview.net/forum?id=OURP5Z58jt

Compressor summary: Strategic classification studies how to learn decision rules that are robust to input manipulation, and this paper focuses on one-shot settings where there's uncertainty in users' costs and aims to design efficient algorithms that minimize worst-case risk.


A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

https://openreview.net/forum?id=OTmcsyEO5G

Compressor summary: ReadAgent is a system that enhances large language models' ability to read long documents by using memory episodes and gist memories, leading to improved performance on reading comprehension tasks with extended context windows.


SparQ Attention: Bandwidth-Efficient LLM Inference

https://openreview.net/forum?id=OS5dqxmmtl

Compressor summary: SparQ Attention is a technique that improves the speed and efficiency of large language models without changing their pre-training or fine-tuning by using memory bandwidth more effectively in attention layers.


Disentangled Graph Self-supervised Learning for Out-of-Distribution Generalization

https://openreview.net/forum?id=OS0szhkPmF

Compressor summary: The paper proposes a self-supervised model that learns disentangled graph representations to improve out-of-distribution generalization for graph neural networks without task-dependent labels.


On The Complexity of First-Order Methods in Stochastic Bilevel Optimization

https://openreview.net/forum?id=OQ97v7uRGc

Compressor summary: The paper studies finding stationary points in bilevel optimization when the lower-level problem is unconstrained and strongly convex, proposing a first-order method that converges using $y^*$-aware oracles and showing upper and lower bounds for its complexity.


Enhancing Trajectory Prediction through Self-Supervised Waypoint Distortion Prediction

https://openreview.net/forum?id=OQ7TlOphGX

Compressor summary: SSWDP is a novel self-supervised method that predicts distortion in observed trajectories to improve spatio-temporal representation learning for trajectory prediction.


LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

https://openreview.net/forum?id=ONOtpXLqqw

Compressor summary: LongRoPE extends LLMs' context window to 2048k tokens using efficient search, progressive extension, and readjustment, while maintaining performance on short context windows.


Liouville Flow Importance Sampler

https://openreview.net/forum?id=OMKNBzf6HJ

Compressor summary: LFIS is a flow-based model that uses neural networks to learn velocity fields and generate unbiased samples from complex density functions.


Exploiting Code Symmetries for Learning Program Semantics

https://openreview.net/forum?id=OLvgrLtv6J

Compressor summary: The paper proposes SymC, a code semantics model for Large Language Models that leverages code symmetries for efficient and accurate program analysis, outperforming GPT-4 without pre-training.


Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models

https://openreview.net/forum?id=OKYfaYQlML

Compressor summary: Key points: - VFMs are powerful but costly to run - The paper proposes a knowledge transfer approach to train small task-specific models from VFMs with limited data - The approach outperforms other methods and reduces compute cost - A retrieval-augmented strategy is introduced to curate effective transfer sets Summary: The paper presents a knowledge transfer method that trains small task-specific models from large VFMs with limited data, achieving better performance and lower compute cost than existing approaches, and using web-scale image retrieval to select transfer sets.


Error Feedback Can Accurately Compress Preconditioners

https://openreview.net/forum?id=OJTKlubFk1

Compressor summary: The paper proposes a novel error-feedback technique that compresses gradient information for full-matrix preconditioning in deep learning, reducing storage costs significantly without sacrificing performance.


ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation

https://openreview.net/forum?id=OI1YP53WKI

Compressor summary: ReDiffuser improves diffusion models by using confidence estimation from Random Network Distillation for reliable decision-making in offline reinforcement learning.


Diffusion-based Missing-view Generation With the Application on Incomplete Multi-view Clustering

https://openreview.net/forum?id=OHFxcU9jwW

Compressor summary: The paper proposes a diffusion-based network to generate missing views for incomplete multi-view data and a data augmentation strategy to improve clustering performance.


Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent

https://openreview.net/forum?id=OF7e0w1uon

Compressor summary: HyperAgent is a reinforcement learning algorithm that uses the hypermodel framework for efficient exploration, achieving robust performance in large-scale deep RL benchmarks and low per-step computational complexity.


DNA-SE: Towards Deep Neural-Nets Assisted Semiparametric Estimation

https://openreview.net/forum?id=OERwuPzHdh

Compressor summary: The paper proposes a new method called DNA-SE that uses deep neural networks to solve semiparametric estimation problems involving Fredholm integral equations, improving both numerical and statistical performance over traditional methods.


KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

https://openreview.net/forum?id=OBs0AjXE3F

Compressor summary: The paper proposes KV-Runahead, a parallelization scheme to speed up the first token generation in large language models by using a key-value cache that is already used for token generation.


What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

https://openreview.net/forum?id=O8rrXl71D5

Compressor summary: The text explains how a novel causal framework helps understand the diversity, emergence dynamics, and subcircuits involved in induction heads, which are critical for in-context learning in transformer models.


Provable Representation with Efficient Planning for Partially Observable Reinforcement Learning

https://openreview.net/forum?id=O6tenHWTUU

Compressor summary: The paper proposes a representation-based framework for tackling partially observable reinforcement learning problems that improves efficiency and performance.


Enhancing Class-Imbalanced Learning with Pre-Trained Guidance through Class-Conditional Knowledge Distillation

https://openreview.net/forum?id=O4nXWHPl6g

Compressor summary: The paper proposes a new approach, Class-Conditional Knowledge Distillation (CCKD), and its variant AACCKD, to improve generalization on class-imbalanced data by learning the teacher model's class-conditional probability distribution and enhancing feature learning.


Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

https://openreview.net/forum?id=O4cHTxW9BS

Compressor summary: SPIN is a new fine-tuning method that improves Large Language Models by letting them play against themselves and generate their own training data, achieving better results than other methods even without extra human-annotated data.


Towards Resource-friendly, Extensible and Stable Incomplete Multi-view Clustering

https://openreview.net/forum?id=O45u81aby2

Compressor summary: ToRES is an IMVC method that uses prototype-sample affinity, view-wise and cross-view prototypes, and unified representation learning and clustering to overcome common drawbacks of existing IMVC methods.


Exploring Correlations of Self-Supervised Tasks for Graphs

https://openreview.net/forum?id=O3CFN1VIwt

Compressor summary: The paper explores task correlations in graph self-supervised learning, proposes Graph Task Correlation Modeling (GraphTCM) to improve representation quality, and shows its effectiveness on downstream tasks.


Automated Loss function Search for Class-imbalanced Node Classification

https://openreview.net/forum?id=O1hmwi51pp

Compressor summary: The paper presents an automated loss function search framework for class-imbalanced node classification tasks on graphs, which improves performance and generalizes well across different datasets and network structures.


An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series

https://openreview.net/forum?id=Nxz3CDtGXp

Compressor summary: The paper evaluates how well balancing strategies work for counterfactual estimation with time series data, finding that their effectiveness may need reconsideration.


Dynamic Byzantine-Robust Learning: Adapting to Switching Byzantine Workers

https://openreview.net/forum?id=NwYsuFuelg

Compressor summary: DynaBRO is a dynamic fault-tolerant machine learning method that can handle changing Byzantine behaviors and achieve near-optimal convergence rates using multi-level Monte Carlo gradient estimation and adaptive learning rate.


Differentially Private Representation Learning via Image Captioning

https://openreview.net/forum?id=Nw7yOe8nBi

Compressor summary: The authors train a differentially private image captioner on a large dataset and show that it can learn high-quality image features for various downstream tasks, challenging the belief that such privacy-preserving representation learning is not possible.


Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions

https://openreview.net/forum?id=NvBJOcmti6

Compressor summary: The paper introduces graded non-convexity, a concept that divides non-convex problems into subclasses, and proposes gradient methods with spectral preconditioning that improve convergence rates for these problems.


Multigroup Robustness

https://openreview.net/forum?id=Nue7KgVZ6e

Compressor summary: The text proposes multigroup robust algorithms that provide more meaningful robustness guarantees for different subpopulations by accounting for how data corruption affects each group differently.


LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models

https://openreview.net/forum?id=NsHxeSCtgr

Compressor summary: This paper proposes a framework, LIDAO, that debias large language models while maintaining better fluency and robustness against adversarial prompts.


Improving Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning

https://openreview.net/forum?id=Nm6jYZsBum

Compressor summary: The paper introduces Multimodal Composition Learning, a method to improve frozen Large Language Models' performance in multimodal tasks by using two specialized tasks: MC-Cap and MC-Ret.


Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

https://openreview.net/forum?id=NlM4gp8hyO

Compressor summary: Chunked-TD is a model-based reinforcement learning method that uses world models to adjust the bias-variance tradeoff in TD($\lambda$) and speed up credit assignment.


Position: Topological Deep Learning is the New Frontier for Relational Learning

https://openreview.net/forum?id=Nl3RG5XWAt

Compressor summary: Topological deep learning (TDL) is a new area of machine learning that uses topological features to design and understand deep models, with many open problems and opportunities for research.


A Federated Stochastic Multi-level Compositional Minimax Algorithm for Deep AUC Maximization

https://openreview.net/forum?id=NkN6wrYXe5

Compressor summary: The paper proposes a novel federated multi-level compositional minimax algorithm to improve AUC maximization in imbalanced data classification problems with rigorous theoretical guarantees and shows its effectiveness through empirical evaluations.


Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications

https://openreview.net/forum?id=NgaYcefBnZ

Compressor summary: Key points: - Machine learning algorithms face distributional shifts and over-pessimism issues. - The paper proposes Geometry-Calibrated DRO (GCDRO) for regression, which incorporates data geometry into calibration terms in DRO. - The paper connects the risk objective to the Helmholtz free energy and develops an approximate minimax optimization algorithm. - Experiments show GCDRO outperforms conventional DRO methods. Summary: The paper introduces Geometry-Calibrated DRO, a novel method for regression that mitigates over-pessimism in distributionally robust optimization by using data geometry and free energy principles.


SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

https://openreview.net/forum?id=NeotatlYOL

Compressor summary: The paper proposes a new method, SpikeZIP-TF, that converts ANNs to SNNs with no accuracy loss and achieves better results than existing Transformer-based SNNs on CV and NLP tasks.


Augmenting Decision with Hypothesis in Reinforcement Learning

https://openreview.net/forum?id=NeO2hoSexj

Compressor summary: The paper introduces ALH, an algorithm that combines reinforcement learning with weak environment descriptions (hypotheses) to improve performance and reduce bias in value-based RL.


Privately Learning Smooth Distributions on the Hypercube by Projections

https://openreview.net/forum?id=NeEbsvnaWE

Compressor summary: This article studies how to estimate privately the probability density of smooth functions in high dimensions and proposes a data-driven approach to choose the best estimator.


Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree

https://openreview.net/forum?id=NbYAmsFJrc

Compressor summary: The Trajectory Aggregation Tree (TAT) improves the reliability and stability of diffusion planners by aggregating information from historical and current trajectories in a dynamic tree-like structure.


SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning

https://openreview.net/forum?id=NbOlmrB59Z

Compressor summary: The SimPro framework is a novel semi-supervised learning approach that adapts to unknown class distributions in unlabeled data without predefined assumptions, using a probabilistic model and separating conditional and marginal class distributions.


Revisit the Essence of Distilling Knowledge through Calibration

https://openreview.net/forum?id=NZgbwzaOIx

Compressor summary: The paper proposes a framework to understand and improve knowledge distillation by using rank-based loss instead of KL divergence, which is sensitive to model calibration.


NExT-GPT: Any-to-Any Multimodal LLM

https://openreview.net/forum?id=NZQkumsNlf

Compressor summary: NExT-GPT is an end-to-end system that enables any-to-any multimodal understanding and generation by connecting a large language model with adaptors and diffusion decoders, requiring minimal additional training and using modality-switching instructions to facilitate cross-modal semantic understanding.


Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

https://openreview.net/forum?id=NV0q2jdwo0

Compressor summary: The paper proposes an improved feedforward network module for vision transformers that reduces computational cost by enhancing non-linearity using the AGeLU function and a spatial enhancement part, achieving comparable accuracy with fewer parameters and floating-point operations.


LoCoCo: Dropping In Convolutions for Long Context Compression

https://openreview.net/forum?id=NUlyqMyhO9

Compressor summary: The paper proposes LoCoCo, a method that compresses long context sequences in LLMs by adaptively blending previous and incoming tokens using convolutional kernels, improving efficiency and accuracy.


Diffusion Language Models Are Versatile Protein Learners

https://openreview.net/forum?id=NUAbSFqyqb

Compressor summary: The paper presents DPLM, a protein language model that generates diverse and plausible protein sequences using diffusion probabilistic pre-training and can be fine-tuned for various tasks or conditioned on different inputs.


A Bayesian Approach to Online Planning

https://openreview.net/forum?id=NS8z5FinYl

Compressor summary: The paper proposes a Bayesian planning method that uses uncertainty estimates from neural networks to improve online planning, and shows its effectiveness on maze and leaper environments.


An Information-Theoretic Analysis of In-Context Learning

https://openreview.net/forum?id=NQn2tYLv5I

Compressor summary: The paper introduces new tools to analyze meta-learning on sequences and shows how error decays in different settings.


Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

https://openreview.net/forum?id=NQ6KDfSDFK

Compressor summary: The study proposes two variants of MoLA, a method to mitigate conflicts in heterogeneous data training for unified models aiming at artificial general intelligence.


Random Scaling and Momentum for Non-smooth Non-convex Optimization

https://openreview.net/forum?id=NKirMgDsut

Compressor summary: A small modification to stochastic gradient descent with momentum (SGDM) improves its performance in training neural networks with irregular loss functions by scaling the update with an exponentially distributed random scalar.


Optimal Differentially Private Model Training with Public Data

https://openreview.net/forum?id=NFEJQn7vX0

Compressor summary: The paper studies how to use public data to improve machine learning models while preserving privacy, and develops new algorithms that achieve better performance than existing ones.


LoRA+: Efficient Low Rank Adaptation of Large Models

https://openreview.net/forum?id=NEv8YqBROO

Compressor summary: The paper proposes LoRA+, a corrected version of LoRA that improves finetuning speed and performance by setting different learning rates for adapter matrices in large width networks.


Hidden Traveling Waves bind Working Memory Variables in Recurrent Neural Networks

https://openreview.net/forum?id=NCjlFw1Ab0

Compressor summary: The study proposes a new model for neural working memory using traveling wave dynamics in Recurrent Neural Networks (RNNs), which improves data storage and learning by mimicking the brain's information processing.


Unraveling the Impact of Heterophilic Structures on Graph Positive-Unlabeled Learning

https://openreview.net/forum?id=NCT3w7VKjo

Compressor summary: GPL is a new method for positive-unlabeled learning on graph data that reduces edge heterophily to improve classifier training.


Residual Quantization with Implicit Neural Codebooks

https://openreview.net/forum?id=NBAc36V00H

Compressor summary: QINCo is a neural vector quantization method that builds custom codebooks per step, leading to improved data compression and search accuracy compared to conventional methods.


Enabling Uncertainty Estimation in Iterative Neural Networks

https://openreview.net/forum?id=N6A6t6xlKm

Compressor summary: The paper proposes using convergence rate as a proxy for uncertainty in iterative network architectures, which can improve accuracy and reduce computational cost compared to other methods.


Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular Representations

https://openreview.net/forum?id=N3ZrpSCJcJ

Compressor summary: The authors propose a method to improve generative machine learning for molecular systems by using coarse-grained simulations with active learning and conditioning normalizing flows on the coarse-grained space, achieving significant speedup compared to existing approaches.


Parameter-Dependent Competitive Analysis for Online Capacitated Coverage Maximization through Boostings and Attenuations

https://openreview.net/forum?id=N1BPyf7wC2

Compressor summary: The paper studies a model where online agents join dynamically and offline agents value their coverage over them, with limited capacities; it proposes two matching policies and analyzes their competitive ratios depending on capacity and coverage bounds.


Trust the Model Where It Trusts Itself - Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption

https://openreview.net/forum?id=N0ntTjTfHb

Compressor summary: The paper proposes MACURA, a model-based reinforcement learning algorithm that adapts rollout lengths based on local model uncertainty, improving data efficiency and performance.


Online Learning in Betting Markets: Profit versus Prediction

https://openreview.net/forum?id=Mz1lcJPymz

Compressor summary: The text discusses how binary betting markets aim to balance profit and information, introduces online learning methods for price-setting, and analyses the tradeoff between these goals.


Matroid Semi-Bandits in Sublinear Time

https://openreview.net/forum?id=MwQ53xAIPs

Compressor summary: FasterCUCB is a sublinear time algorithm for matroid semi-bandits that maximizes expected cumulative linear rewards with low regret.


Attribute Based Interpretable Evaluation Metrics for Generative Models

https://openreview.net/forum?id=Mw8kNVfdMs

Compressor summary: The authors propose new evaluation protocols to measure how well generative models capture attributes of images in the training dataset, revealing strengths and weaknesses of existing models.


Risk Aware Benchmarking of Large Language Models

https://openreview.net/forum?id=Mv8y13wfDm

Compressor summary: The text proposes a method to measure and compare socio-technical risks of foundation models using statistical testing based on stochastic dominance and risk-aware model selection.


Balancing Feature Similarity and Label Variability for Optimal Size-Aware One-shot Subset Selection

https://openreview.net/forum?id=MurkwIl0h3

Compressor summary: BOSS is a method for selecting a balanced subset of diverse and difficult data samples for efficient deep learning model training, considering the subset size and using a novel Beta-scoring importance function.


Easing Concept Bleeding in Diffusion via Entity Localization and Anchoring

https://openreview.net/forum?id=MsnJl6JkZS

Compressor summary: The paper proposes ELA, a method to prevent concept bleeding in image generation models by localizing and anchoring entities in their expected regions using auxiliary networks.


Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions

https://openreview.net/forum?id=Msjovr9hUe

Compressor summary: The paper proposes SUWR, a local feature selection method that avoids misleading explanations by ensuring no label or feature leakage in complex models.


Robust Yet Efficient Conformal Prediction Sets

https://openreview.net/forum?id=MrNq6rbcUi

Compressor summary: The text discusses how to create provably robust prediction sets using conformal prediction that resist adversarial examples and perturbed calibration data.


DeCoOp: Robust Prompt Tuning with Out-of-Distribution Detection

https://openreview.net/forum?id=MoTUdh9ZCc

Compressor summary: DeCoOp is a novel prompt tuning approach for vision-language models that uses new-class detectors and sub-classifiers to improve performance on base and new classes in open-world settings.


What Would Gauss Say About Representations? Probing Pretrained Image Models using Synthetic Gaussian Benchmarks

https://openreview.net/forum?id=MmZJ3kJXjX

Compressor summary: The paper proposes a method to evaluate pretrained model representations without real-world data by using synthetic binary classification tasks with Gaussian mixtures, which correlates with actual performance on downstream tasks and helps balance robustness and accuracy.


Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining

https://openreview.net/forum?id=MlzUD5CKvZ

Compressor summary: The paper proposes a post-processing framework called R3, which improves the interpretability and accuracy of ProtoPNet, a method for image classification that uses meaningful parts of images.


LaMAGIC: Language-Model-based Topology Generation for Analog Integrated Circuits

https://openreview.net/forum?id=MjGCD8wk1k

Compressor summary: LaMAGIC is a language model-based topology generation model that efficiently designs optimized analog circuits from custom specifications using supervised finetuning.


Knowledge-aware Reinforced Language Models for Protein Directed Evolution

https://openreview.net/forum?id=MikandLqtW

Compressor summary: KnowRLM is a novel Machine Learning-assisted Directed Evolution method that uses a Knowledge Graph of biochemical relationships among amino acids to guide the search for optimal protein variants.


Instruction Tuning for Secure Code Generation

https://openreview.net/forum?id=MgTzMaYHvG

Compressor summary: SafeCoder is a tool that improves the security of code generated by language models without sacrificing their usefulness, by fine-tuning them with a large dataset of secure code.


Stereo Risk: A Continuous Modeling Approach to Stereo Matching

https://openreview.net/forum?id=Mfk6ZbD6eY

Compressor summary: Stereo Risk is a new deep-learning method for stereo matching that uses continuous risk minimization to estimate scene depth more accurately than existing methods.


I/O Complexity of Attention, or How Optimal is FlashAttention?

https://openreview.net/forum?id=MdPBVWTfwG

Compressor summary: FlashAttention reduces Transformer's attention complexity by being I/O-aware, and this paper proves its optimality for certain memory hierarchies and introduces a new communication complexity protocol for matrix compression.


An Efficient Maximal Ancestral Graph Listing Algorithm

https://openreview.net/forum?id=MZkqjV4FRT

Compressor summary: Key points: - MAG is a graphical model for causal relations with latent variables - Only one identifiable MEC of MAGs from observational data - No efficient methods for MAG listing except brute force - Propose a new method that avoids brute force and lists all MAGs in MEC - Method is based on recursive determination of local structures of vertices Summary: The paper proposes a novel method to list all the causal graphs in a class, without using brute force, by recursively finding valid local transformations of vertices.


TinyTrain: Resource-Aware Task-Adaptive Sparse Training of DNNs at the Data-Scarce Edge

https://openreview.net/forum?id=MWZWUyfFHC

Compressor summary: TinyTrain is an on-device training approach that reduces training time by selectively updating parts of the model and handling data scarcity, achieving high accuracy with lower computation and memory costs.


Optimistic Multi-Agent Policy Gradient

https://openreview.net/forum?id=MWTicAxmRP

Compressor summary: Our proposed framework for multi-agent policy gradient methods uses clipping to enable optimistic updates that prevent overgeneralization and improve performance on various cooperative learning tasks.


Consistent Adversarially Robust Linear Classification: Non-Parametric Setting

https://openreview.net/forum?id=MV2b44zDd3

Compressor summary: The paper proposes an effective estimator for adversarial risk in non-parametric settings with arbitrary norms and mild regularity conditions, achieving a minimax excess risk of O(sqrt{d/n}) for linear classifiers.


Unifying Image Processing as Visual Prompting Question Answering

https://openreview.net/forum?id=MUXTt9Yr4T

Compressor summary: The text introduces PromptGIP, a universal model for image processing that uses visual prompting question answering to handle various tasks without task-specific finetuning.


Amortized Variational Deep Kernel Learning

https://openreview.net/forum?id=MSMKQuZhD5

Compressor summary: Amortized varitional deep kernel learning (AVDKL) improves over standard methods for tabular data, node classification, and image recognition by preventing spurious correlations in training.


A Near-Linear Time Approximation Algorithm for Beyond-Worst-Case Graph Clustering

https://openreview.net/forum?id=MSFxOMM0gK

Compressor summary: The paper presents a faster algorithm for approximating the Balanced Cut problem in a semi-random graph model with adversarial edge modifications.


Integrating Global Context Contrast and Local Sensitivity for Blind Image Quality Assessment

https://openreview.net/forum?id=MRYS3Zb4iV

Compressor summary: The paper proposes CSIQA, a novel BIQA method that combines global and local perspectives using contrastive learning and attention modules, achieving better performance than existing methods.


Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

https://openreview.net/forum?id=MQirNNU2pC

Compressor summary: The study explores how weight decay affects neuron updates in deep neural networks and its relationship with optimizers, normalization, and learning rate warmup.


Position: Optimization in SciML Should Employ the Function Space Geometry

https://openreview.net/forum?id=MOrvoYrlOg

Compressor summary: The authors propose an infinite-dimensional approach to optimize machine learning problems and suggest discretizing the algorithm after choosing it, which could lead to new optimization methods for scientific machine learning.


The Non-linear $F$-Design and Applications to Interactive Learning

https://openreview.net/forum?id=MMMHufVc2v

Compressor summary: F-design is a non-linear extension of G-optimal design that improves data collection and exploration in various machine learning tasks.


ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

https://openreview.net/forum?id=MKzgqtRtGY

Compressor summary: The authors study how people use text-to-image models in an online game called ArtWhisperer, where users find prompts to generate similar images to a target image, and analyze human-AI interactions, prompt diversity, and AI steerability.


A Sparsity Principle for Partially Observable Causal Representation Learning

https://openreview.net/forum?id=MKGrRVODWR

Compressor summary: The paper proposes methods to identify causal variables from partially observed data without assuming fixed subsets of latent causes or paired observations, and shows their effectiveness in simulated and real datasets.


Differentially private exact recovery for stochastic block models

https://openreview.net/forum?id=MIRQ3L8vtn

Compressor summary: The paper studies how to detect community structures in private networks using stochastic block models and derives conditions for exact recoverability under edge differential privacy.


Reason for Future, Act for Now: A Principled Architecture for Autonomous LLM Agents

https://openreview.net/forum?id=MGkeWJxQVl

Compressor summary: The text proposes RAFA, a framework that combines reasoning and acting with provable regret guarantees using a prompt template and memory buffer to learn and plan in Bayesian adaptive MDPs.


Bottleneck-Minimal Indexing for Generative Document Retrieval

https://openreview.net/forum?id=MFPYCvWsNR

Compressor summary: The paper applies information theory to study generative document retrieval, where documents are indexed by terms and queries are mapped to terms, and proposes a new indexing method that minimizes the bottleneck and improves performance.


Two Fists, One Heart: Multi-Objective Optimization Based Strategy Fusion for Long-tailed Learning

https://openreview.net/forum?id=MEZydkOr3l

Compressor summary: The paper proposes MOOSF, a multi-objective optimization method that fuses heterogeneous strategies to improve performance on long-tailed data and resolves potential conflicts between head and tail classes.


Predicting Dose-Response Curves with Deep Neural Networks

https://openreview.net/forum?id=MDAg5Q7IsI

Compressor summary: The text describes a neural model that estimates the entire dose-response curve using the interaction between drug molecules and the tissue transcriptome, outperforming existing models in interpolating and extrapolating inhibitory effects of untried concentrations.


Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

https://openreview.net/forum?id=M8UbECx485

Compressor summary: The text shows how multitask pretraining helps nonlinear neural networks learn useful features by inducing a pseudo-contrastive loss and simplifies binary classification tasks with high dimensions.


Hypergraph-enhanced Dual Semi-supervised Graph Classification

https://openreview.net/forum?id=M5ne8enLcr

Compressor summary: The paper proposes HEAL, a framework that uses hypergraphs and line graphs to capture higher-order dependencies for semi-supervised graph classification, outperforming existing methods.


SAM as the Guide: Mastering Pseudo-Label Refinement in Semi-Supervised Referring Expression Segmentation

https://openreview.net/forum?id=M5kn9NKIs4

Compressor summary: SemiRES is a semi-supervised framework that combines labeled and unlabeled data for RES, using SAM to improve pseudo-label accuracy and offering different matching strategies for enhanced precision.


No Double Descent in Principal Component Regression: A High-Dimensional Analysis

https://openreview.net/forum?id=M4ejBhNNrn

Compressor summary: The paper analyzes how Principal Component Regression (PCR) performs on high-dimensional data with realistic assumptions, showing its benefits for generalization and distribution shift.


Embodied CoT Distillation From LLM To Off-the-shelf Agents

https://openreview.net/forum?id=M4Htd52HMH

Compressor summary: DeDer is a framework that compresses large language models into small ones for efficient decision-making in embodied tasks using a reasoning-policy and a planning-policy guided by rationales from an embodied knowledge graph.


Overcoming Saturation in Density Ratio Estimation by Iterated Regularization

https://openreview.net/forum?id=M407RM0z6h

Compressor summary: The paper proposes iterated regularization to improve error convergence rates in kernel methods for density ratio estimation, and demonstrates its effectiveness on benchmarks and large-scale evaluations.


DUPLEX: Dual GAT for Complex Embedding of Directed Graphs

https://openreview.net/forum?id=M3uv4qDKOL

Compressor summary: DUPLEX is a new framework for embedding directed graphs that better captures edge information, handles nodes with different connectivity, and adapts to various tasks.


Sequential Asynchronous Action Coordination in Multi-Agent Systems: A Stackelberg Decision Transformer Approach

https://openreview.net/forum?id=M3qRRkOuTN

Compressor summary: STEER is a novel method for coordinating asynchronous actions in Multi-Agent Systems, combining hierarchical decision structures, autoregressive sequence models, and exploratory learning techniques.


Position: Key Claims in LLM Research Have a Long Tail of Footnotes

https://openreview.net/forum?id=M2cwkGleRL

Compressor summary: The authors provide a definition of Large Language Models (LLMs), question some assumptions about them, and suggest areas for further investigation.


Theoretical insights for diffusion guidance: A case study for Gaussian mixture models

https://openreview.net/forum?id=M1ADedSnlJ

Compressor summary: The paper analyzes how guidance affects diffusion models' performance and diversity in generating samples from Gaussian mixture models using theory from differential equations and the Fokker-Plank equation.


$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts

https://openreview.net/forum?id=LyJ85kgHFe

Compressor summary: The paper introduces $exttt{MoE-RBench}$, a tool to assess the reliability of Mixture-of-Experts (MoE) models in various dimensions and suggests ways to improve their performance.


Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

https://openreview.net/forum?id=Lwm6TiUP4X

Compressor summary: Lightning Attention is a fast and memory-efficient linear attention implementation that uses different calculation strategies for intra-blocks and inter-blocks, and introduces TransNormerLLM, a new language model architecture tailored to it.


Machine Vision Therapy: Multimodal Large Language Models Can Enhance Visual Robustness via Denoising In-Context Learning

https://openreview.net/forum?id=LwOfVWgEzS

Compressor summary: Key points: - The paper proposes Machine Vision Therapy and Denoising In-Context Learning (DICL) to improve the visual robustness of Multi-modal Large Language Models (MLLMs) under Out-of-Distribution (OOD) scenarios. - Machine Vision Therapy supervises vision models using MLLM predictions, while DICL uses a transition matrix to construct instructions for MLLMs to detect and correct erroneous vision model outputs. - The paper provides theoretical guarantees and experimental results on various OOD datasets. Summary: The paper introduces two methods, Machine Vision Therapy and Denoising In-Context Learning, to enhance the visual robustness of MLLMs by improving their vision models using MLLM predictions and instructions based on a transition matrix. The paper shows theoretical and empirical evidence of their effectiveness under OOD scenarios.


Learning Causal Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition

https://openreview.net/forum?id=LvuuYqU0BW

Compressor summary: CDTD is a method for few-shot action recognition that uses causal representation learning to transfer temporally invariant knowledge from pre-trained models and adapts to novel data with limited samples.


S$\Omega$I: Score-based O-INFORMATION Estimation

https://openreview.net/forum?id=LuhWZ2oJ5L

Compressor summary: S$\Omega$I is a novel method to compute O-information, which measures synergy-redundancy balance in complex multivariate systems, without restrictive assumptions and using a unique model.


COPAL: Continual Pruning in Large Language Generative Models

https://openreview.net/forum?id=Lt8Lk7IQ5b

Compressor summary: COPAL is an algorithm for pruning large language generative models to adapt them to new domains efficiently and effectively.


DFA-RAG: Conversational Semantic Router for Large Language Model with Definite Finite Automaton

https://openreview.net/forum?id=LpAzlcGzJ6

Compressor summary: The paper proposes DFA-RAG, a framework that enhances conversational agents using large language models and a semantic router based on dialogue examples.


Relaxing the Accurate Imputation Assumption in Doubly Robust Learning for Debiased Collaborative Filtering

https://openreview.net/forum?id=Ln3moCobjO

Compressor summary: The paper proposes novel doubly robust estimators to address sampling selection bias in recommender systems using pseudo-labelings and propensity reconstruction learning with an attention mechanism.


Learning with Partial-Label and Unlabeled Data: A Uniform Treatment for Supervision Redundancy and Insufficiency

https://openreview.net/forum?id=LmzsgSDkWs

Compressor summary: Key points: - Paper addresses weakly supervised learning with inexact supervision, such as partial labels and unlabeled data - Proposes a novel mutual information-based approach to handle label redundancy and insufficiency - Experiments show superior performance over existing methods in semi-supervised partial label learning and partial-complementary label learning scenarios Summary: The paper presents a new method for weakly supervised learning that handles different forms of inexact supervision by using mutual information to dynamically exchange and filter labels, outperforming existing approaches.


Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains

https://openreview.net/forum?id=LlqphyBdeT

Compressor summary: Key points: - Large Language Models (LLMs) are good at natural language but not for specialized domains like physical and biomedical sciences - The authors propose a framework to learn custom input tags that condition the LLM for specialized domains - The input tags consist of domain tags and function tags that help delimit representations and compress instructions - The method enables zero-shot generalization and outperforms expert models in various tasks Summary: The authors present a model-agnostic framework to learn input tags that enhance LLMs for specialized domains by disentangling domains from functions and enabling zero-shot generalization.


Collage: Light-Weight Low-Precision Strategy for LLM Training

https://openreview.net/forum?id=LkJ6qOMv77

Compressor summary: Collage uses multi-component float representation to improve low-precision training by compensating errors and reducing the need for high-precision floating points, achieving faster speedup and less memory usage.


Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize

https://openreview.net/forum?id=Ljhrv1Wmbr

Compressor summary: This paper investigates why deep neural networks struggle with out-of-distribution generalization and proposes a new mechanism, feature contamination, that differs from spurious correlations.


FAFE: Immune Complex Modeling with Geodesic Distance Loss on Noisy Group Frames

https://openreview.net/forum?id=Lhb39btw16

Compressor summary: The paper proposes F2E, a new loss function for protein folding models that improves accuracy in modeling antibody-antigen complexes by optimizing rotational and translational errors between frames.


Foundation Policies with Hilbert Representations

https://openreview.net/forum?id=LhNsSaAKub

Compressor summary: The paper proposes an unsupervised method to pre-train generalist policies that can adapt quickly to various tasks from offline data by learning a structured representation of the environment's temporal structure and using directional movements for policy "prompting".


Hierarchical Neural Operator Transformer with Learnable Frequency-aware Loss Prior for Arbitrary-scale Super-resolution

https://openreview.net/forum?id=LhAuVPWq6q

Compressor summary: The paper proposes a resolution-invariant super-resolution method based on a hierarchical neural operator with self-attention and sinc filters, achieving better results than existing methods.


Stacking Deep Set Networks and Pooling by Quantiles

https://openreview.net/forum?id=Lgq1E92h1U

Compressor summary: Stacked Deep Sets and Quantile Pooling are novel methods for learning from set data that combine the strengths of max and average pooling and improve deep set networks.


Disentangled 3D Scene Generation with Layout Learning

https://openreview.net/forum?id=Lgh8bhWpVC

Compressor summary: The method generates 3D scenes by disentangling them into separate objects using a pretrained text-to-image model and multiple NeRFs, while ensuring the composited scenes resemble the original image.


Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

https://openreview.net/forum?id=Lg8nw3ltvX

Compressor summary: The paper proposes a method to improve online continual learning by balancing new and old data in the optimization geometry, reducing instabilities and improving accuracy.


Improving Token-Based World Models with Parallel Observation Prediction

https://openreview.net/forum?id=Lfp5Dk1xb6

Compressor summary: The paper introduces a new token-based world model agent called REM that uses a Parallel Observation Prediction mechanism to speed up imagination and achieve superhuman performance on Atari 100K games.


DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

https://openreview.net/forum?id=LfJgeBNCFI

Compressor summary: The authors propose DS-Agent, a framework that combines large language models and case-based reasoning to automate data science tasks, achieving high performance and cost efficiency.


On the Diminishing Returns of Width for Continual Learning

https://openreview.net/forum?id=Ld255Mbx9F

Compressor summary: The text discusses how increasing the width of neural networks can decrease catastrophic forgetting, but shows that this relationship has diminishing returns and explores new widths to verify this empirically.


Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models

https://openreview.net/forum?id=Lc1HlMo77m

Compressor summary: The paper explores how to improve open-world generalization of vision-language models by leveraging weaker models and introduces three customized ensemble strategies for different scenarios.


How to Explore with Belief: State Entropy Maximization in POMDPs

https://openreview.net/forum?id=LbcNAIgNnB

Compressor summary: The paper proposes a policy gradient method for state entropy maximization in reinforcement learning with partial observations and belief states, addressing practical challenges in real-world applications.


USTAD: Unified Single-model Training Achieving Diverse Scores for Information Retrieval

https://openreview.net/forum?id=LbEB39lZqp

Compressor summary: The paper proposes USTAD, a unified Transformer model for information retrieval and ranking that uses novel distillation methods and asymmetric architectures to achieve high performance with fewer parameters.


Sign Rank Limitations for Inner Product Graph Decoders

https://openreview.net/forum?id=Lb8G2dZjcB

Compressor summary: This paper explains why inner product-based decoders struggle with graph data and proposes simple changes to improve them.


Enabling Few-Shot Learning with PID Control: A Layer Adaptive Optimizer

https://openreview.net/forum?id=LabSWooau0

Compressor summary: The paper proposes a new MAML-based optimizer that adapts PID control gains at each layer to improve learning efficiency and generalization across different tasks and domains.


Adaptive Feature Selection for No-Reference Image Quality Assessment by Mitigating Semantic Noise Sensitivity

https://openreview.net/forum?id=LZkhKZvhHs

Compressor summary: The paper proposes a new image quality assessment method that removes irrelevant and noisy features from upstream networks using an adversarial approach and data distillation.


TabLog: Test-Time Adaptation for Tabular Data Using Logic Rules

https://openreview.net/forum?id=LZeixIvQcB

Compressor summary: TabLog is a novel method that adapts predictive models to a target domain using unlabeled data, by discretizing numerical features, modeling feature dependencies, and introducing a contrastive loss for distribution shift.


Predictive Dynamic Fusion

https://openreview.net/forum?id=LYpGLrC4oq

Compressor summary: The paper proposes a predictive dynamic fusion framework for multimodal learning that reduces generalization error and calibrates potential uncertainty in open environments.


eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data

https://openreview.net/forum?id=LWRI4uPG2X

Compressor summary: ECInstruct is a new instruction dataset that improves the performance of large language models (LLMs) in e-commerce by adapting them to specific tasks and products, resulting in better generalization across domains.


Differentially Private Synthetic Data via Foundation Model APIs 2: Text

https://openreview.net/forum?id=LWD7upg1ob

Compressor summary: The Aug-PE algorithm generates differentially private synthetic text using only API access to a large language model without any finetuning.


tnGPS: Discovering Unknown Tensor Network Structure Search Algorithms via Large Language Models (LLMs)

https://openreview.net/forum?id=LVgT0ShxN5

Compressor summary: This paper proposes a framework called tnGPS that uses large language models to automatically discover new tensor network structure search algorithms, improving performance in high-dimensional representation tasks.


Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

https://openreview.net/forum?id=LVF4P1NNwO

Compressor summary: The paper proposes a method to make In-Context Learning (ICL) in large language models explicit and permanent using bias terms, improving interpretability and efficiency over existing methods.


Data-free Neural Representation Compression with Riemannian Neural Dynamics

https://openreview.net/forum?id=LTifAl5bKb

Compressor summary: The paper proposes a new neural compression method based on Riemannian geometry that improves inference accuracy without needing extra data or fine-tuning.


Refining Minimax Regret for Unsupervised Environment Design

https://openreview.net/forum?id=LRnXPxDksA

Compressor summary: The paper proposes Bayesian level-perfect MMR, an improved objective for unsupervised environment design in reinforcement learning, and introduces the ReMiDi algorithm that achieves this objective.


VideoPoet: A Large Language Model for Zero-Shot Video Generation

https://openreview.net/forum?id=LRkJwPIDuE

Compressor summary: VideoPoet is a transformer-based language model that can create realistic videos from various inputs and can be adapted for different video generation tasks.


DE-COP: Detecting Copyrighted Content in Language Models Training Data

https://openreview.net/forum?id=LO4xhXmFal

Compressor summary: DE-COP is a method to detect if copyrighted content was used in training language models by asking multiple-choice questions to the model, using BookTection as a benchmark.


Test-Time Regret Minimization in Meta Reinforcement Learning

https://openreview.net/forum?id=LM7j0zrUZB

Compressor summary: The paper studies how to learn an optimal policy for unknown tasks efficiently using meta reinforcement learning, and shows that strong identifiability assumptions enable faster regret minimization.


Decentralized Convex Finite-Sum Optimization with Better Dependence on Condition Numbers

https://openreview.net/forum?id=LLdeUPOUXk

Compressor summary: The paper presents an efficient decentralized optimization method that uses different mini-batch sizes per node, has a sharper global condition number dependency, and requires fewer oracle calls and comparable communication cost than existing methods.


Successor Features for Efficient Multi-Subject Controlled Text Generation

https://openreview.net/forum?id=LJcIIhqGDN

Compressor summary: SF-Gen is a novel reinforcement learning approach that leverages successor features for efficient and high-quality controllable text generation with multiple target subjects.


Collaborative Heterogeneous Causal Inference Beyond Meta-analysis

https://openreview.net/forum?id=LJ34pX1U5g

Compressor summary: The paper proposes a new method for causal inference with heterogeneous data that uses weighted propensity score models and federated learning to improve accuracy and privacy.


Federated Representation Learning in the Under-Parameterized Regime

https://openreview.net/forum?id=LIQYhV45D4

Compressor summary: FLUTE is a novel federated representation learning algorithm with provable performance guarantees, data-independent initialization, and a designed objective function for under-parameterized linear models and beyond.


Federated Offline Reinforcement Learning: Collaborative Single-Policy Coverage Suffices

https://openreview.net/forum?id=LIPGadocTe

Compressor summary: FedLCB-Q is a Q-learning variant for federated offline RL that achieves linear speedup and communication efficiency by collaboratively leveraging offline datasets at multiple agents.


EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data

https://openreview.net/forum?id=LHGMXcr6zx

Compressor summary: EfficientZero V2 is a generalized framework for sample-efficient reinforcement learning that performs better than DreamerV3 in various tasks and domains.


GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

https://openreview.net/forum?id=LH6R06NxdB

Compressor summary: The paper introduces Stepwise ORMs, which use synthetic data to improve reasoning refinement in language models, and shows that combining global and local refinements achieves better results than existing methods on GSM8K.


A Hierarchical Adaptive Multi-Task Reinforcement Learning Framework for Multiplier Circuit Design

https://openreview.net/forum?id=LGz7GaUSEB

Compressor summary: The paper proposes a novel hierarchical reinforcement learning framework for optimizing multiplier design in integrated circuits, achieving better Pareto-optimal solutions with high sample efficiency and generalization.


Switchable Decision: Dynamic Neural Generation Networks

https://openreview.net/forum?id=LGhtl9ktop

Compressor summary: The authors propose a dynamic neural generation network that skips or reduces computation for some data instances to accelerate inference without sacrificing accuracy in various NLP tasks.


On Statistical Learning Theory for Distributional Inputs

https://openreview.net/forum?id=LGDYsBslWi

Compressor summary: This paper studies theoretical aspects of kernel-based statistical learning on distributional inputs, proving oracle inequalities and a generalization result for different embedding methods.


Copyright Traps for Large Language Models

https://openreview.net/forum?id=LDq1JPdc55

Compressor summary: The authors propose using copyright traps with fictitious entries to detect the use of copyrighted content in large language models, especially those that do not naturally memorize.


CaM: Cache Merging for Memory-efficient LLMs Inference

https://openreview.net/forum?id=LCTmppB165

Compressor summary: CaM is a technique that adaptively merges caches to preserve critical token information and improve the performance of memory-efficient Large Language Models.


Language Generation with Strictly Proper Scoring Rules

https://openreview.net/forum?id=LALSZ88Xpx

Compressor summary: The authors propose using non-local scoring rules like Brier and Spherical score for language generation, which can improve the quality of generated text across different models.


Relaxed Quantile Regression: Prediction Intervals for Asymmetric Noise

https://openreview.net/forum?id=L8nSGvoyvb

Compressor summary: Relaxed Quantile Regression improves prediction intervals by removing arbitrary constraints and allowing better coverage of skewed distributions.


LSEnet: Lorentz Structural Entropy Neural Network for Deep Graph Clustering

https://openreview.net/forum?id=L6SRXG92s6

Compressor summary: The paper proposes a novel graph clustering method that uses structural information and does not need predefined cluster numbers, outperforming existing methods.


Removing Spurious Concepts from Neural Network Representations via Joint Subspace Estimation

https://openreview.net/forum?id=L4ERlHrJRT

Compressor summary: Key points: - The challenge is to ensure DNNs use correct input features and avoid spurious correlations - Existing methods remove both correct and spurious features, leading to wrong interpretations and poor performance - A new iterative algorithm separates spurious from main-task concepts by estimating two orthogonal subspaces of the neural network representation - The algorithm outperforms existing methods on benchmark datasets from computer vision and natural language processing Summary: The paper proposes a novel algorithm that accurately identifies and removes spurious features from DNNs, improving interpretability and performance.


Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

https://openreview.net/forum?id=L1eJ3NKPCd

Compressor summary: The paper investigates how well Transformers can learn and combine various language capabilities using autoregressive training and experiments on different composition methods.


Debiased Distribution Compression

https://openreview.net/forum?id=L1W9ZWPq9E

Compressor summary: The text introduces new compression methods for summarizing target distributions using biased input sequences and shows their effectiveness in various applications.


Learning with 3D rotations, a hitchhiker's guide to SO(3)

https://openreview.net/forum?id=L0VoOdjCUb

Compressor summary: This paper surveys rotation representations for machine learning models and provides guidance on choosing suitable ones based on their properties, input and output locations, and angle size.


KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

https://openreview.net/forum?id=L057s2Rq8O

Compressor summary: The paper explores KV cache quantization for large language models and proposes a 2-bit algorithm called KIVI that significantly reduces memory usage and improves batch size, speed, and quality.


QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference

https://openreview.net/forum?id=KzACYw0MTV

Compressor summary: Quest is a query-aware algorithm that speeds up self-attention in long-context LLMs by selecting the most critical KV cache pages for attention, achieving significant speedup and accuracy.


Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference

https://openreview.net/forum?id=KycvgOCBBR

Compressor summary: The paper proposes a new method called GIC that accurately infers group labels, improving the worst-group performance of ERM models by exploiting properties of spurious correlations and semantic consistency.


Improving Computational Complexity in Statistical Models with Local Curvature Information

https://openreview.net/forum?id=KwgAThfxEd

Compressor summary: The paper studies a modified gradient descent algorithm that uses local curvature information and achieves optimal computational complexity for singular statistical models.


AI Control: Improving Safety Despite Intentional Subversion

https://openreview.net/forum?id=KviM5k8pcP

Compressor summary: The paper proposes and tests AI control techniques to prevent powerful but untrusted LLMs from causing harmful outcomes even if they try to subvert safety measures.


Path-Guided Particle-based Sampling

https://openreview.net/forum?id=Kt4fwiuKqf

Compressor summary: The paper proposes PGPS-LwS, a novel particle-based Bayesian inference method that uses a Neural Network-learned vector field to guide particles from an initial distribution to the target distribution along a Log-weighted Shrinkage density path, which improves accuracy and calibration in synthetic and real-world tasks.


Improving Accuracy-robustness Trade-off via Pixel Reweighted Adversarial Training

https://openreview.net/forum?id=KsUddQl39v

Compressor summary: The paper proposes PART, a method that adjusts the perturbation budget for each pixel based on its importance, improving accuracy and robustness in adversarial training.


Federated Continual Learning via Prompt-based Dual Knowledge Transfer

https://openreview.net/forum?id=Kqa5JakTjB

Compressor summary: Powder is a new FCL algorithm that uses prompts to facilitate positive knowledge transfer across tasks and clients while reducing communication costs and privacy concerns.


Parallelized Spatiotemporal Slot Binding for Videos

https://openreview.net/forum?id=KpeGdDzucX

Compressor summary: PSB is a new, efficient, and stable slot learning architecture for sequential inputs that improves object-centric scene decomposition and understanding.


HGAP: Boosting Permutation Invariant and Permutation Equivariant in Multi-Agent Reinforcement Learning via Graph Attention Network

https://openreview.net/forum?id=KpUdNe9lsr

Compressor summary: Key points: - Graph representation helps in MARL by capturing correlations among entities - HGAP network uses graph attention to enforce PI and PE properties - HGAP is effective, efficient, adaptable, and transferable across MARL benchmarks Summary: HGAP is a novel graph-based policy network that enables agents in multi-agent reinforcement learning to learn from correlated entities with permutation equivariant properties.


Data-efficient Large Vision Models through Sequential Autoregression

https://openreview.net/forum?id=KmCoS6WkgG

Compressor summary: The paper presents an efficient autoregression-based vision model that achieves proficiency in various visual tasks with a reduced parameter footprint and training data requirements.


MathScale: Scaling Instruction Tuning for Mathematical Reasoning

https://openreview.net/forum?id=Kjww7ZN47M

Compressor summary: MathScale is a method that uses large language models to create high-quality math data, improving their mathematical reasoning abilities and achieving state-of-the-art performance on MWPBench.


Understanding the Effects of Iterative Prompting on Truthfulness

https://openreview.net/forum?id=KjazcKPMME

Compressor summary: The paper explores how iterative prompting can improve the accuracy and truthfulness of large language models, introducing new variants that address previous challenges.


CKGConv: General Graph Convolution with Continuous Kernels

https://openreview.net/forum?id=KgfGxXbjjE

Compressor summary: This paper introduces CKGConv, a novel and general graph convolution framework that uses continuous functions of pseudo-coordinates derived via graph positional encoding, which is flexible, expressive, and performs well on various graph datasets.


Exploring the Benefit of Activation Sparsity in Pre-training

https://openreview.net/forum?id=KfXXPCcobh

Compressor summary: SSD is a pre-training method that adaptively switches between sparse and dense training, improving efficiency and inference speed for Transformers.


Hierarchical Novelty Detection via Fine-Grained Evidence Allocation

https://openreview.net/forum?id=KfN76nAcOO

Compressor summary: HND uses a structured hierarchy of known and novel classes to detect fine-grained novelty, separating it from known classes with a unique loss function and evidence margin.


Learning Linear Block Error Correction Codes

https://openreview.net/forum?id=Kf9CqdI8Rb

Compressor summary: The paper proposes a new neural decoder for binary linear block codes, which improves the efficiency of data transfer over noisy channels and outperforms existing methods.


Causally Motivated Personalized Federated Invariant Learning with Shortcut-Averse Information-Theoretic Regularization

https://openreview.net/forum?id=Kbd9A4lVoX

Compressor summary: FedPIN is a personalized federated learning method that uses causal models to distinguish between personalized and spurious features, improving out-of-distribution generalization.


MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

https://openreview.net/forum?id=KaAQu5rNU1

Compressor summary: MolCRAFT is a new SBDD model that generates stable and high-affinity molecules by operating in the continuous parameter space and using a noise reduced sampling strategy.


Inferring Dynamic Networks from Marginals with Iterative Proportional Fitting

https://openreview.net/forum?id=KYrAZSbEv6

Compressor summary: The text proposes a generative network model to understand the statistical foundation of using IPF for inferring dynamic networks from time-aggregated adjacency matrices and time-varying marginals, and introduces an algorithm to improve convergence on sparse data.


Fundamental Limitations of Alignment in Large Language Models

https://openreview.net/forum?id=KXsUCgn9Ks

Compressor summary: The paper proposes BEB, a theoretical approach to investigate alignment limitations in large language models, showing that any partial alignment is vulnerable to adversarial prompting attacks, which have been demonstrated experimentally with chatGPT jailbreaks.


A Tale of Tails: Model Collapse as a Change of Scaling Laws

https://openreview.net/forum?id=KVvku47shW

Compressor summary: The paper investigates how neural scaling laws change with synthetic data in training and explores various decay phenomena that may lead to model collapse.


convSeq: Fast and Scalable Method for Detecting Patterns in Spike Data

https://openreview.net/forum?id=KVa4i4RR1O

Compressor summary: The authors introduce *convSeq*, an unsupervised method that uses backpropagation to optimize spatiotemporal filters for efficiently detecting repetitive patterns in large neural recordings, potentially improving our understanding of neural circuit function.


UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers

https://openreview.net/forum?id=KU9mn6deDR

Compressor summary: UPAM is a novel framework that investigates the robustness of T2I models from the attack perspective, deceiving both textual and visual defenses using gradient-based optimization and ensuring attack stealthiness.


Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

https://openreview.net/forum?id=KSNl7VgeVr

Compressor summary: Premier-TACO is a multitask feature representation learning method that improves efficiency in sequential decision-making tasks by using minimal expert demonstrations and incorporating a novel negative example sampling strategy for temporal action contrastive learning.


Optimal Kernel Quantile Learning with Random Features

https://openreview.net/forum?id=KOW9ncAiRo

Compressor summary: This paper studies kernel quantile regression with random features (KQR-RF), improves its error decomposition, connects it to kernel ridge regression with random features (KRR-RF), and provides optimal learning rates under various conditions.


MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities

https://openreview.net/forum?id=KOTutrSR2y

Compressor summary: MM-Vet is a benchmark for evaluating large multimodal models based on their integration of core vision-language capabilities, with a unified scoring metric that works across question types.


Efficient Non-stationary Online Learning by Wavelets with Applications to Online Distribution Shift Adaptation

https://openreview.net/forum?id=KNedb3bQ4h

Compressor summary: The paper proposes a single-layer online learning algorithm with wavelet detection and adaptive restart for minimizing dynamic regret in non-stationary settings, outperforming two-layer ensembles.


Fair Resource Allocation in Multi-Task Learning

https://openreview.net/forum?id=KLmWRMg6nL

Compressor summary: FairGrad is a novel optimization objective for multi-task learning that maximizes utility and ensures fair resource allocation among tasks, improving performance in supervised and reinforcement learning.


Embarrassingly Parallel GFlowNets

https://openreview.net/forum?id=KJhLpzqNri

Compressor summary: EP-GFlowNet is a method to sample from large product distributions efficiently using local GFlowNets and a global model learned with a novel aggregating balance condition, enabling parallel and federated Bayes.


Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models

https://openreview.net/forum?id=KJL2b6BthC

Compressor summary: The Algorithm of Thoughts is a novel strategy that boosts LLMs' reasoning capacities by guiding them through algorithmic reasoning pathways using few queries, outperforming other methods and suggesting LLMs can weave their intuition into optimized searches.


DFD: Distillng the Feature Disparity Differently for Detectors

https://openreview.net/forum?id=KI3JKFKciG

Compressor summary: DFD is a model compression technique that adapts distillation constraints based on the inconsistency in disparity between teacher and student feature maps, improving object detection performance.


Leverage Class-Specific Accuracy to Guide Data Generation for Improving Image Classification

https://openreview.net/forum?id=KHymcy2xxF

Compressor summary: Our method generates synthetic training data for image classification based on each class's actual data needs, improving performance on imbalanced and balanced datasets.


Shifted Interpolation for Differential Privacy

https://openreview.net/forum?id=KCVCFsPkrm

Compressor summary: The paper presents a tighter privacy analysis for noisy gradient descent and its variants using the framework of f-differential privacy and a new construction called shifted interpolated processes, which works for various optimization and batch settings.


Accelerating Convergence of Score-Based Diffusion Models, Provably

https://openreview.net/forum?id=KB6slOUQP9

Compressor summary: This paper introduces training-free algorithms to accelerate diffusion models' sampling, achieving faster convergence rates than existing methods.


Neighboring Perturbations of Knowledge Editing on Large Language Models

https://openreview.net/forum?id=K9NTPRvVRI

Compressor summary: The paper investigates how adding new knowledge to large language models affects their existing knowledge and introduces a metric and a framework to mitigate this perturbation.


Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

https://openreview.net/forum?id=K6xxnKN2gm

Compressor summary: The study examines how large language models are brittle and unsafe, especially when pruned or modified, and suggests the need for better safety strategies.


Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choice

https://openreview.net/forum?id=K6HpbvkrwO

Compressor summary: The study proposes an adaptive experiment that optimizes both covariate density and propensity score to efficiently estimate average treatment effects with lower asymptotic variance.


Improving Gradient-Guided Nested Sampling for Posterior Inference

https://openreview.net/forum?id=K5h6VAsJaV

Compressor summary: The GGNS algorithm is a powerful tool for scientific computing that uses advanced techniques to efficiently sample complex probability distributions and estimate their properties.


Structure-based drug design by denoising voxel grids

https://openreview.net/forum?id=K3fEkECWgu

Compressor summary: VoxBind is a new model that generates 3D molecules based on protein structures using atomic density grids and voxel-denoising networks, resulting in more diverse, less clashing, and higher-affinity molecules than existing methods.


Robust Multi-Task Learning with Excess Risks

https://openreview.net/forum?id=JzWFmMySpn

Compressor summary: ExcessMTL is a multi-task learning method that balances tasks by their distances to convergence, improving performance under label noise.


Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

https://openreview.net/forum?id=JymXv7mkrQ

Compressor summary: The paper proposes a method called WCA that uses localized visual prompting and cross-alignment of visual and textual descriptions to improve zero-shot image classification performance.


MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis

https://openreview.net/forum?id=Jvh8HM9YEJ

Compressor summary: MH-pFLID is a new federated learning method for medical applications that uses a lightweight messenger model, reduces biased data collection, and does not require public datasets or extra resources.


Position: Building Guardrails for Large Language Models Requires Systematic Design

https://openreview.net/forum?id=JvMLkGF2Ms

Compressor summary: The paper explores guardrails as a key technology for mitigating risks associated with large language models and suggests a systematic approach based on socio-technical methods and multi-disciplinary collaboration.


Uncertainty Estimation by Density Aware Evidential Deep Learning

https://openreview.net/forum?id=JtkruFHcRK

Compressor summary: DAEDL is a novel method for improving uncertainty estimation and classification in deep learning by integrating feature space density with output from EDL and using a new parameterization.


Position: Scaling Simulation is Neither Necessary Nor Sufficient for In-the-Wild Robot Manipulation

https://openreview.net/forum?id=Jtjurj7oIJ

Compressor summary: The paper critiques the use of robotic simulations for real-world manipulation tasks, arguing that scaling simulators is not enough to achieve human-compatible general-purpose systems.


Prometheus: Out-of-distribution Fluid Dynamics Modeling with Disentangled Graph ODE

https://openreview.net/forum?id=JsPvL6ExK8

Compressor summary: The paper introduces Prometheus, a new dataset for fluid dynamics modeling, and proposes DGODE, a method that learns disentangled representations to improve out-of-distribution generalization.


A Fixed-Point Approach for Causal Generative Modeling

https://openreview.net/forum?id=JpzIGzru5F

Compressor summary: The authors propose a novel way to represent causal models without DAGs, design a two-stage generative model that infers causal order from data, and use a new attention mechanism to capture causality in transformer-based architecture.


FRAPPÉ: A Group Fairness Framework for Post-Processing Everything

https://openreview.net/forum?id=JndWnomyIc

Compressor summary: The text proposes a framework to transform in-processing techniques for group fairness into post-processing methods, which can be applied to more problem settings and preserve or improve fairness-error trade-offs.


From Coarse to Fine: Enable Comprehensive Graph Self-supervised Learning with Multi-granular Semantic Ensemble

https://openreview.net/forum?id=JnA9IveEwg

Compressor summary: MGSE is a framework that improves graph self-supervised learning by using multiple student models to learn from a single teacher model, capturing multi-granular knowledge and achieving better generalization abilities.


Trainable Transformer in Transformer

https://openreview.net/forum?id=JcxlFe2fGC

Compressor summary: The paper introduces TINT, an efficient construction that allows transformers to simulate and fine-tune more complex models during inference, improving performance on language modeling and downstream tasks.


Stochastic Optimization with Arbitrary Recurrent Data Sampling

https://openreview.net/forum?id=JYcbgiSh0L

Compressor summary: The paper proposes a stochastic optimization algorithm that achieves optimal convergence rates under general recurrent data sampling schemes, even for non-convex and non-smooth problems with constraints.


Towards AutoAI: Optimizing a Machine Learning System with Black-box and Differentiable Components

https://openreview.net/forum?id=JVhUR8q27o

Compressor summary: A-BAD-BO is a novel algorithm that uses Bayesian optimization to jointly train ML components in complex systems, improving system performance and efficiency.


Estimating the Permanent by Nesting Importance Sampling

https://openreview.net/forum?id=JVORowD4MD

Compressor summary: The paper proposes a variant of sequential importance sampling that combines its efficiency with accuracy guarantees from rejection sampling, resulting in faster estimations of high-dimensional integrals such as the permanent of nonnegative matrices.


Safe and Robust Subgame Exploitation in Imperfect Information Games

https://openreview.net/forum?id=JV84NVo1em

Compressor summary: The paper introduces Adaptation Safety, a novel concept for opponent exploitation in games, and proposes the Opponent eXploitation Search (OX-Search) framework that improves safety and robustness in online poker games.


Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments

https://openreview.net/forum?id=JUa5XNXuoT

Compressor summary: The paper proposes a transformer model with discrete bottlenecks that can learn compressed representations of observations and actions, enabling it to extract interpretable cognitive maps for path planning in partially observed environments.


Identifiability Matters: Revealing the Hidden Recoverable Condition in Unbiased Learning to Rank

https://openreview.net/forum?id=JU3xHh1vWw

Compressor summary: This paper studies how to recover true relevance for ranking models from biased click logs by analyzing graph connectivity and proposing node intervention and node merging methods.


Image Clustering with External Guidance

https://openreview.net/forum?id=JSYN891WnB

Compressor summary: The authors propose Text-Aided Clustering (TAC), a method that uses WordNet nouns to enhance image features and cross-modal neighborhood information distillation for image clustering, achieving state-of-the-art results on various benchmarks.


Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

https://openreview.net/forum?id=JQlEUfzhuA

Compressor summary: The paper compares two paradigms for learning from human preferences, RLHF and DPO, in different settings and provides minimax bounds and convergence rates.


Bifurcated Attention for Single-Context Large-Batch Sampling

https://openreview.net/forum?id=JPNBFWQ9H2

Compressor summary: Bifurcated attention is a method that reduces memory IO costs in language model inference by dividing the attention mechanism into two GEMM operations, improving efficiency and latency for high batch sizes and long context lengths.


Prototypical Transformer As Unified Motion Learners

https://openreview.net/forum?id=JOrLz5d7OW

Compressor summary: The ProtoFormer framework uses prototype learning to understand and represent motion patterns in videos for tasks like optical flow, scene depth, object tracking, and video stabilization.


Improving Neural Logic Machines via Failure Reflection

https://openreview.net/forum?id=JObct1zyTb

Compressor summary: The text introduces FRGR, a framework that helps improve neural logic machines' reasoning and decision-making abilities by identifying and penalizing repeated mistakes during training.


Information-Directed Pessimism for Offline Reinforcement Learning

https://openreview.net/forum?id=JOKOsJHSao

Compressor summary: The text introduces information-directed pessimism, a novel method for offline reinforcement learning that uses Stein discrepancy to estimate and account for distribution mismatch between batch data and current policy.


Differentially Private Post-Processing for Fair Regression

https://openreview.net/forum?id=JNeeRjKbuH

Compressor summary: The paper proposes a private algorithm to improve the fairness of regressors by remapping their outputs based on output distributions' estimated barycenter.


Measuring Stochastic Data Complexity with Boltzmann Influence Functions

https://openreview.net/forum?id=JNN6QHhLHB

Compressor summary: IF-COMP is a new method for measuring uncertainty in models that considers different labels and works well on various tasks related to reliability and calibration.


Feasibility Consistent Representation Learning for Safe Reinforcement Learning

https://openreview.net/forum?id=JNHK11bAGl

Compressor summary: FCSRL is a novel framework for safe reinforcement learning that combines representation learning with feasibility-oriented objectives to improve policy learning and constraint estimation.


Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

https://openreview.net/forum?id=JKPhWzp7Oi

Compressor summary: The paper analyzes D-SGD's generalization error using algorithmic stability and shows that the choice of communication graph can affect performance, sometimes improving it.


Prodigy: An Expeditiously Adaptive Parameter-Free Learner

https://openreview.net/forum?id=JJpOssn0uP

Compressor summary: Prodigy is a new algorithm that estimates the distance to the solution in adaptive learning methods, improving upon existing methods by a factor of $\mathcal{O}(\sqrt{\log(D/d_0)})$ and achieving test accuracy close to hand-tuned Adam.


Stable Differentiable Causal Discovery

https://openreview.net/forum?id=JJZBZW28Gn

Compressor summary: SDCD is a new method for inferring causal relationships from data that improves stability, speed, and scalability over existing DCD methods.


Box Facets and Cut Facets of Lifted Multicut Polytopes

https://openreview.net/forum?id=JJSj8UXqd4

Compressor summary: The article addresses two open questions about lifted multicut polytopes and answers one of them while showing the difficulty of answering the other.


Do Large Language Models Perform the Way People Expect? Measuring the Human Generalization Function

https://openreview.net/forum?id=JIWtKcR78C

Compressor summary: The paper studies how humans make generalizations about large language models' performance across different tasks and shows that more capable models may underperform in high-stakes situations due to misalignment with human expectations.


Concentration Inequalities for General Functions of Heavy-Tailed Random Variables

https://openreview.net/forum?id=JHRvP84SQ5

Compressor summary: The paper develops a general framework for unbounded concentration inequalities for heavy-tailed distributions and applies it to statistical learning theory problems.


MFTN: A Multi-scale Feature Transfer Network Based on IMatchFormer for Hyperspectral Image Super-Resolution

https://openreview.net/forum?id=JGL39NaARS

Compressor summary: The MFTN method improves HISR by extracting multi-scale features from LR-HSI and HR-MSI using Transformers and aggregating them to create a high-quality HR-HSI.


Federated Optimization with Doubly Regularized Drift Correction

https://openreview.net/forum?id=JD03zxWZzs

Compressor summary: FedRed is a new federated learning method that reduces communication costs while maintaining performance by using doubly regularized drift correction.


Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models

https://openreview.net/forum?id=JCG0KTPVYy

Compressor summary: COFT is a method that reduces hallucination by highlighting key texts of different granularity levels using recaller, scorer, and selector components.


Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features

https://openreview.net/forum?id=JBaPBPrn93

Compressor summary: Attention layers are more effective than random features for NLP tasks because they have high word sensitivity, allowing them to capture contextual meaning better.


Quantum Algorithm for Online Exp-concave Optimization

https://openreview.net/forum?id=JApt4Ty89Y

Compressor summary: The paper investigates quantum advantages for online optimization problems and proposes quantum online quasi-Newton methods that achieve better regret than classical algorithms.


Mimicking Better by Matching the Approximate Action Distribution

https://openreview.net/forum?id=JAfIDm7NED

Compressor summary: MAAD is a sample-efficient imitation learning algorithm that uses a surrogate reward signal and an inverse dynamics model to learn from expert observations in various environments.


Understanding Inter-Concept Relationships in Concept-Based Models

https://openreview.net/forum?id=JA6ThxAmth

Compressor summary: The text discusses the limitations of concept-based explainability methods for deep learning systems and proposes a new algorithm that incorporates inter-concept relationships to enhance explanations.


Improving Sharpness-Aware Minimization by Lookahead

https://openreview.net/forum?id=J9YKDvqr65

Compressor summary: The paper proposes a lookahead mechanism for SAM to improve convergence stability and efficiency in finding flatter minima for adversarial training.


Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

https://openreview.net/forum?id=J6prHJsIlf

Compressor summary: FANS is a new method for explaining machine learning models that uses perturbation tests and counterfactual reasoning to determine feature importance, which works better than existing methods on six benchmarks.


Learning Mixtures of Gaussian Processes through Random Projection

https://openreview.net/forum?id=J5Yg7HMy39

Compressor summary: The proposed method clusters functional data from a Gaussian process mixture using multiple one-dimensional projections and an ensemble of Gaussian mixture models, achieving faster computation and better performance than existing methods.


Revisiting the Role of Language Priors in Vision-Language Models

https://openreview.net/forum?id=J5VB1h3Aed

Compressor summary: The paper studies how to use generative vision-language models for image-text retrieval tasks and proposes a method to reduce linguistic bias in the model's output.


Efficient Value Iteration for s-rectangular Robust Markov Decision Processes

https://openreview.net/forum?id=J4LTDgwAZq

Compressor summary: The paper derives optimal robust Bellman operators for interconnected uncertainties in MDPs, leading to faster robust value iteration methods and revealing novel threshold behavior and policy resilience properties.


Clustered Federated Learning via Gradient-based Partitioning

https://openreview.net/forum?id=J4HJUF70qm

Compressor summary: The paper proposes a new Clustered Federated Learning algorithm that groups clients based on their model updates, leading to better clustering and learning performance.


Learning Label Shift Correction for Test-Agnostic Long-Tailed Recognition

https://openreview.net/forum?id=J3xYTh6xtL

Compressor summary: Label shift correction (LSC) estimates and adjusts the test label distribution to reduce generalization error in real-world applications with non-uniform label distributions.


PANDA: Expanded Width-Aware Message Passing Beyond Rewiring

https://openreview.net/forum?id=J1NIXxiDbu

Compressor summary: The text introduces PANDA, a new message passing method for GNNs that expands the width of highly central nodes to prevent over-squashing and improve long-range information propagation without distorting graph topology.


Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces

https://openreview.net/forum?id=J16WEPdqhJ

Compressor summary: The paper proposes a faster and more practical policy gradient method for robust Markov decision processes with environmental perturbations and large state spaces.


Neural operators meet conjugate gradients: The FCG-NO method for efficient PDE solving

https://openreview.net/forum?id=J0ty1o7nCj

Compressor summary: The paper proposes using deep learning to learn efficient preconditioners for a linear solver, allowing for faster and more accurate solutions to partial differential equations across different resolutions.


Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

https://openreview.net/forum?id=Izv7gBnap3

Compressor summary: Federated robust averaging (FedRo) can converge well for smooth non-convex loss, but its improvement rate depends on client subsampling and local steps.


Provable Privacy with Non-Private Pre-Processing

https://openreview.net/forum?id=IzqpUC34Jg

Compressor summary: The paper proposes a framework to assess the privacy cost of data-dependent pre-processing in differentially private machine learning pipelines and provides explicit privacy guarantees for several algorithms.


Total Variation Floodgate for Variable Importance Inference in Classification

https://openreview.net/forum?id=IyeXM58vIC

Compressor summary: The paper introduces Total Variation Floodgate, a method to measure variable importance in classification problems without model assumptions, and provides algorithms to infer it using regression functions and confidence bounds.


Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing

https://openreview.net/forum?id=IxZ4xaHSYG

Compressor summary: This paper studies how different smoothing distributions affect Randomized Smoothing's robustness against adversarial examples and shows that Exponential General Gaussian distribution improves the defense mechanism.


Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

https://openreview.net/forum?id=IwqE4QqBew

Compressor summary: The paper introduces an r times r preconditioner for Low-Rank Adaptation (LoRA), which improves convergence, reliability, and robustness of fine-tuning with minimal overhead.


Correlation-Induced Label Prior for Semi-Supervised Multi-Label Learning

https://openreview.net/forum?id=IuvpVcGUOB

Compressor summary: The paper proposes PCLP, a novel SSMLL method that infers label correlations using SCM to enhance pseudo-labeling and improve performance.


Binary Decomposition: A Problem Transformation Perspective for Open-Set Semi-Supervised Learning

https://openreview.net/forum?id=Irkcamqg4d

Compressor summary: The paper presents BDMatch, a new method for open-set semi-supervised learning that uses binary decomposition to address class-imbalance and representation-compromise issues.


Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation

https://openreview.net/forum?id=IpSKpOY2EH

Compressor summary: The paper proposes methods to reduce memory overhead in fine-tuning large models using Approximate Backpropagation theory and novel activation functions.


Learning to Continually Learn with the Bayesian Principle

https://openreview.net/forum?id=IpPnmhjw30

Compressor summary: The meta-continual learning framework combines neural networks' representational power and statistical models' robustness to forgetting by using sequential Bayesian update rules and meta-learning.


Bagged Deep Image Prior for Recovering Images in the Presence of Speckle Noise

https://openreview.net/forum?id=IoUOhnCmlX

Compressor summary: The paper studies likelihood-based methods for recovering complex signals from multiple measurements with noise, provides a new MSE upper bound, and introduces bagged Deep Image Priors combined with projected gradient descent and Newton-Schulz algorithm to achieve state-of-the-art performance.


Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning

https://openreview.net/forum?id=InUUQkExsw

Compressor summary: We develop two new risk-sensitive offline reinforcement learning algorithms for linear MDPs that use entropic risk and improve upon existing methods.


Careful with that Scalpel: Improving Gradient Surgery with an EMA

https://openreview.net/forum?id=IgwtflILyj

Compressor summary: Bloop is a method for improving deep learning estimation pipelines by blending auxiliary loss gradients with training loss gradients using orthogonal projection and moving average, leading to better performance on NLP and vision tasks.


Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

https://openreview.net/forum?id=If6Q9OYfoJ

Compressor summary: LiRE is a novel approach for offline PbRL that uses second-order preference information from human feedback to learn reward models more effectively than existing methods, as shown by experiments on a new dataset.


Training-Free Long-Context Scaling of Large Language Models

https://openreview.net/forum?id=If4xW9vF7U

Compressor summary: DCA is a training-free method that improves LLMs' ability to handle long context sequences by using chunk-based attention modules.


A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

https://openreview.net/forum?id=IejxxE9DO2

Compressor summary: DySymNet is a novel neural-guided dynamic symbolic network that uses reinforcement learning to optimize the structure of symbolic networks for discovering mathematical expressions from data.


Proteus: Exploring Protein Structure Generation for Enhanced Designability and Efficiency

https://openreview.net/forum?id=IckJCzsGVS

Compressor summary: Proteus is a new deep diffusion network that can create novel proteins with efficient and effective designability, without depending on pre-trained structure prediction networks.


Implicit Representations for Constrained Image Segmentation

https://openreview.net/forum?id=IaV6AgrTUp

Compressor summary: Implicit representations enable deep learning models to enforce geometric constraints in image segmentation by mapping coordinates to pixel properties.


A General Framework for Sequential Decision-Making under Adaptivity Constraints

https://openreview.net/forum?id=IYI61L7SPk

Compressor summary: The paper proposes algorithms for sequential decision-making with rare policy switch and batch learning constraints, achieving sub-linear regret under various function classes.


Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

https://openreview.net/forum?id=IWi6iLZeRG

Compressor summary: The paper compares deeper and wider neural networks in terms of generalization error, influenced by factors such as sample points, parameters, and loss function regularity, and applies the theory to partial differential equations using deep Ritz and PINN methods.


Quantum Positional Encodings for Graph Neural Networks

https://openreview.net/forum?id=IW45Dr1Kxi

Compressor summary: The paper proposes new positional encodings for graph neural networks using quantum computers, which can improve the performance of state-of-the-art models on standard benchmarks and large-scale datasets.


Fool Your (Vision and) Language Model with Embarrassingly Simple Permutations

https://openreview.net/forum?id=IUijgjJgWO

Compressor summary: The paper analyzes the robustness of large language and vision-language models by showing their vulnerability to adversarial permutation in multiple-choice question answering.


Hieros: Hierarchical Imagination on Structured State Space Sequence World Models

https://openreview.net/forum?id=IUBhvyJ9Sr

Compressor summary: HIEROS is a hierarchical policy that improves sample efficiency in DRL by learning time abstracted world representations and imagining trajectories at multiple time scales, achieving better performance and exploration than existing methods.


Quality-Diversity Actor-Critic: Learning High-Performing and Diverse Behaviors via Value and Successor Features Critics

https://openreview.net/forum?id=ISG3l8nXrI

Compressor summary: QDAC is an actor-critic deep reinforcement learning algorithm that learns high-performing and diverse behaviors, enabling better adaptation to new situations.


Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction

https://openreview.net/forum?id=INb8xV1xmf

Compressor summary: The paper proposes a novel framework called SP-GS that uses 3D Gaussians and superpoints to render dynamic scenes with high quality and fast speed.


Neural Networks Learn Statistics of Increasing Complexity

https://openreview.net/forum?id=IGdpKP0N6w

Compressor summary: The paper investigates how neural networks learn low-order moments of data distributions and shows that they can perform well on maximum-entropy distributions with similar statistics, but lose this ability later in training. The authors also extend the distributional simplicity bias to discrete domains and use optimal transport methods to edit sample statistics.


Graph Adversarial Diffusion Convolution

https://openreview.net/forum?id=ICvWruTEDH

Compressor summary: The paper presents a new graph signal denoising method called Graph Adversarial Diffusion Convolution (GADC), which improves robustness and performance on heterophilic graphs by using a min-max optimization formulation.


MultiMax: Sparse and Multi-Modal Attention Learning

https://openreview.net/forum?id=IC9UZ8lm25

Compressor summary: The text introduces MultiMax, a new function that improves the interpretability of machine learning algorithms by adaptively suppressing irrelevant entries and preserving multi-modality.


Activation-Descent Regularization for Input Optimization of ReLU Networks

https://openreview.net/forum?id=IArWwIim8M

Compressor summary: The paper proposes a new method to optimize ReLU networks' inputs by considering activation patterns changes, using differentiable representations and regularization terms for better local descent properties and improved performance in various tasks.


How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

https://openreview.net/forum?id=I4HTPws9P6

Compressor summary: The paper analyzes the training dynamics and generalization capabilities of Transformers for in-context learning, studying how different factors and components affect performance and exploring the effects of model pruning.


Swallowing the Bitter Pill: Simplified Scalable Conformer Generation

https://openreview.net/forum?id=I44Em5D5xy

Compressor summary: The paper proposes Molecular Conformer Fields (MCF), a diffusion generative model that predicts molecular conformers by learning a distribution over functions mapping elements from a molecular graph to their 3D positions, achieving state of the art results with simplicity and scalability.


Finding NEM-U: Explaining unsupervised representation learning through neural network generated explanation masks

https://openreview.net/forum?id=Hzpt1Gws9g

Compressor summary: NEM-U is a framework that uses a masking network to provide fast and accurate explanations for learned vector embeddings in unsupervised representation learning.


Understanding the Learning Dynamics of Alignment with Human Feedback

https://openreview.net/forum?id=Hy88Jp0kQT

Compressor summary: This paper studies how the distribution of preference datasets affects the learning dynamics and behavior of large language models aligned to human intentions, providing theoretical insights and empirical validation.


Planning, Fast and Slow: Online Reinforcement Learning with Action-Free Offline Data via Multiscale Planners

https://openreview.net/forum?id=HwVZbPbMjw

Compressor summary: The paper introduces passive RL, a method to convert passive observations into actionable insights, and proposes MSCP, a novel algorithm that leverages two planners at distinct scales to improve online RL with passive data.


LLark: A Multimodal Instruction-Following Language Model for Music

https://openreview.net/forum?id=HvwOtYzHBX

Compressor summary: LLark is an instruction-tuned multimodal model for understanding music that uses a generative music model and a language model, trained on open-source data, and performs well on various tasks.


Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble

https://openreview.net/forum?id=HusShERjlc

Compressor summary: Key points: - Model ensemble is a trending method for improving Out-of-Distribution (OOD) detection performance by expanding feature representation field - Previous ensemble methods have limited variability in weights and diversity - Multi-Comprehension Ensemble uses various supervision tasks to form different comprehensions of data and labels, resulting in more diverse feature representation - MC Ensemble outperforms naive Deep Ensemble and standalone model in OOD detection task Summary: The authors propose a novel ensemble method called Multi-Comprehension Ensemble that uses various supervision tasks to create diverse feature representations for better Out-of-Distribution detection than previous methods.


CurBench: Curriculum Learning Benchmark

https://openreview.net/forum?id=Htw0bSgjXE

Compressor summary: CurBench is a new benchmark for curriculum learning that evaluates machine learning models across 3 research domains, 3 settings, and 2 dimensions of performance and complexity.


Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training

https://openreview.net/forum?id=Ht20wtgaty

Compressor summary: The paper introduces CTMA, a meta-aggregator for Byzantine-robust training in distributed ML systems, and proposes a gradient estimation technique based on double-momentum strategy with theoretical and practical advantages.


Dr. Strategy: Model-Based Generalist Agents with Strategic Dreaming

https://openreview.net/forum?id=HsseRq2FAx

Compressor summary: The paper proposes Dr. Strategy, an MBRL agent with a novel dreaming strategy based on spatial divide-and-conquer, which improves performance in complex navigation tasks.


Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

https://openreview.net/forum?id=HssOwuZiaB

Compressor summary: The authors study abrupt improvements in transformer training loss when facing multi-step tasks, called Eureka-moments, and propose ways to improve training by addressing issues in the Softmax function in self-attention blocks.


The Emergence of Reproducibility and Consistency in Diffusion Models

https://openreview.net/forum?id=HsliOqZkc0

Compressor summary: The study shows that different diffusion models often produce similar outputs due to learning distinct distributions influenced by training data size, and this property holds for various model variants.


Single-Model Attribution of Generative Models Through Final-Layer Inversion

https://openreview.net/forum?id=Hs9GcILuZN

Compressor summary: FLIPAD is a new method for identifying whether a sample was generated by a specific generative model or not, based on final-layer inversion and anomaly detection, which works in the open-world setting and is efficient and flexible.


Predictive Performance Comparison of Decision Policies Under Confounding

https://openreview.net/forum?id=HrzQZXzrN2

Compressor summary: The paper proposes a method to compare predictive models with existing decision policies by identifying and ignoring regions of uncertainty in the data, and applies it to evaluate a healthcare policy modification.


Causal Discovery with Fewer Conditional Independence Tests

https://openreview.net/forum?id=HpT19AKddu

Compressor summary: The paper proposes CCPG, a coarser representation of hidden causal graphs that can be learned with fewer CI tests than existing algorithms, especially when the causal graph is fully identifiable.


Stationary Latent Weight Inference for Unreliable Observations from Online Test-Time Adaptation

https://openreview.net/forum?id=HmKMpJXH67

Compressor summary: SLWI is a novel online test-time adaptation framework that uses Bayesian filtering to continually update model weights and adapt to target domain changes, reducing errors and improving performance in various distribution shift scenarios.


Learning Causal Dynamics Models in Object-Oriented Environments

https://openreview.net/forum?id=HkWxjpUV0S

Compressor summary: The paper introduces Object-Oriented CDM (OOCDM), a model for learning causal dependencies among objects in large-scale environments, which improves on existing CDMs in various aspects.


Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

https://openreview.net/forum?id=HkCRgoGtt6

Compressor summary: The paper introduces a new benchmark and model for long-context retrieval in machine learning systems, addressing challenges like evaluation, pretraining, and finetuning with limited GPU memory.


Distribution Alignment Optimization through Neural Collapse for Long-tailed Classification

https://openreview.net/forum?id=Hjwx3H6Vci

Compressor summary: The paper proposes a distribution alignment optimization method (DisA) that improves deep neural networks' performance on long-tailed datasets by inducing the Neural Collapse phenomenon.


MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

https://openreview.net/forum?id=Hh8pUBfxXh

Compressor summary: The paper proposes MMPareto algorithm to address gradient conflict in multimodal learning by using Pareto integration and achieving better generalization with unimodal assistance.


Disentangled Continual Graph Neural Architecture Search with Invariant Modular Supernet

https://openreview.net/forum?id=Hg7C5YYifi

Compressor summary: This paper proposes a novel method for continual graph neural architecture search, which enables learning new tasks without forgetting past knowledge and addressing architecture conflicts.


AMPA: Adaptive Mixed Precision Allocation for Low-Bit Integer Training

https://openreview.net/forum?id=HfxFasUfbN

Compressor summary: The paper proposes a new low-bit integer training framework that allows adaptive mixed-precision allocation for weights, activations, and gradients, achieving significant BitOPs reduction and minimal performance loss in various tasks.


Deep Regression Representation Learning with Topology

https://openreview.net/forum?id=HbdeEGVfEN

Compressor summary: The text discusses how representation topologies for classification and regression are different, and proposes a regularizer called PH-Reg that aligns the feature space with the target space in regression tasks based on the Information Bottleneck principle.


Towards Unified Multi-granularity Text Detection with Interactive Attention

https://openreview.net/forum?id=HaBVzgSdM7

Compressor summary: The paper presents DAT, an end-to-end model that unifies scene text detection, layout analysis, and document page detection with across-granularity interactive attention and prompt-based segmentation modules, achieving state-of-the-art results on various text-related tasks.


Optimal Recurrent Network Topologies for Dynamical Systems Reconstruction

https://openreview.net/forum?id=HZyOz9VEg4

Compressor summary: The paper proposes geometric pruning for dynamical systems reconstruction, which reduces parameter load while preserving performance by generating specific network topologies that work well with recurrent neural networks (RNNs).


Causal Inference from Competing Treatments

https://openreview.net/forum?id=HZ6lrZzB02

Compressor summary: The paper proposes a bidding system based on game theory to estimate causal effects in RCTs with competing treatments, and shows that it has a pure Nash equilibrium that minimizes estimation error.


Differentiable Distributionally Robust Optimization Layers

https://openreview.net/forum?id=HUJK9dFOW6

Compressor summary: The paper presents differentiable DRO layers for mixed-integer problems that enable decision-focused learning by embedding ambiguity sets as a layer, using dual-view methodology and importance sampling to estimate gradients.


Weisfeiler-Leman at the margin: When more expressivity matters

https://openreview.net/forum?id=HTNgNt8CTJ

Compressor summary: The paper explores how adding subgraph information and using margin theory can improve the generalization performance of graph isomorphism algorithms, such as 1-WL, without necessarily increasing their expressivity.


A Field Guide for Pacing Budget and ROS Constraints

https://openreview.net/forum?id=HTMFUKAm8B

Compressor summary: This paper compares algorithms for managing budget pacing and return-on-spend constraints in internet advertising, finding that a min-pacing algorithm performs well and is more coordinated than a sequential algorithm.


Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies

https://openreview.net/forum?id=HQtTg1try7

Compressor summary: Key points: - The paper revisits the problem of making image classifiers robust to small perturbations and develops scaling laws for adversarial training. - The paper finds inefficiencies in prior art and proposes a compute-efficient setup that surpasses previous methods with fewer FLOPs. - The paper also predicts a plateau in robustness at around 90% due to the generation of invalid images by adversaries, which is consistent with human performance. Summary: The paper analyzes and improves adversarial training for image classifiers, finding trade-offs between efficiency and robustness, and revealing the limits of perfect accuracy.


LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies

https://openreview.net/forum?id=HPXRzM9BYZ

Compressor summary: The paper introduces a framework to predict models' Out-of-Distribution performance using in-distribution measurements and shows that Lowest Common Ancestor distance can explain why Visual-Language Models generalize better than Vision Models.


Stochastic Q-learning for Large Discrete Action Spaces

https://openreview.net/forum?id=HPQaMmABgK

Compressor summary: Stochastic value-based RL methods consider a sublinear number of actions in each iteration, reducing computational burden and improving performance in complex environments.


Learning with Adaptive Resource Allocation

https://openreview.net/forum?id=HPLzSCOecY

Compressor summary: The paper proposes LARA, a method for handling multiple time-constrained learning tasks with limited resources by predicting progress, allocating resources adaptively, and balancing errors.


Orthogonal Bootstrap: Efficient Simulation of Input Uncertainty

https://openreview.net/forum?id=HOoVTsPPn7

Compressor summary: Orthogonal Bootstrap is a new method for reducing computational cost and improving accuracy in uncertainty simulations by decomposing the target into two parts.


Towards Understanding Inductive Bias in Transformers: A View From Infinity

https://openreview.net/forum?id=HOMXUneCTR

Compressor summary: The text discusses how transformers are biased towards symmetric functions and uses representation theory to analyze their performance on symmetric datasets like WikiText.


Relational DNN Verification With Cross Executional Bound Refinement

https://openreview.net/forum?id=HOG80Yk4Gw

Compressor summary: RACoon is a scalable relational verifier for deep neural networks that uses cross-execution dependencies to precisely verify properties like robustness and hamming distance.


EiG-Search: Generating Edge-Induced Subgraphs for GNN Explanation in Linear Time

https://openreview.net/forum?id=HO0g6cHVZx

Compressor summary: EiG-Search is a training-free approach to generate efficient and comprehensive subgraph-level explanations for Graph Neural Networks using an edge-induced search algorithm based on gradient importance.


Bayesian Design Principles for Offline-to-Online Reinforcement Learning

https://openreview.net/forum?id=HLHQxMydFk

Compressor summary: The paper proposes a novel algorithm for offline reinforcement learning that balances optimism and pessimism using Bayesian design principles, leading to better performance and guaranteed optimal policy discovery.


Receptive Fields As Experts in Convolutional Neural Architectures

https://openreview.net/forum?id=HGSIpeNNfM

Compressor summary: The paper proposes a Mixture of Receptive Fields (MoRF) for convolutional neural networks, which combines multiple receptive fields with different sizes to select the best one for each input, improving performance and capacity.


Neuro-Symbolic Temporal Point Processes

https://openreview.net/forum?id=HDrXBr26UI

Compressor summary: The paper proposes a neural-symbolic method to efficiently discover temporal logic rules that explain irregular events using vector embeddings and a sequential covering algorithm.


Adversarially Robust Hypothesis Transfer Learning

https://openreview.net/forum?id=HCDMiaT0Pf

Compressor summary: The paper explores using auxiliary hypotheses to initialize learning process and develops a robust model under adversarial attacks with fast generalization error bounds.


Position: Towards Implicit Prompt For Text-To-Image Models

https://openreview.net/forum?id=H9fNj8ivTy

Compressor summary: This paper introduces ImplicitBench, a benchmark to evaluate text-to-image models' performance under implicit prompts, which can pose safety and privacy threats.


Don't be so Negative! Score-based Generative Modeling with Oracle-assisted Guidance

https://openreview.net/forum?id=H8pMSJwRD5

Compressor summary: Gen-neG is a new method that uses side-information from an oracle to guide diffusion models towards generating samples within the true data distribution, and it applies to constrained domains like self-driving simulation and safety-guarded human motion.


On the Trajectory Regularity of ODE-based Diffusion Sampling

https://openreview.net/forum?id=H86WzfH5N1

Compressor summary: The paper analyzes the properties of sampling trajectories in diffusion-based generative models and proposes a simple technique to improve image generation by adjusting the time schedule.


Stealthy Imitation: Reward-guided Environment-free Policy Stealing

https://openreview.net/forum?id=H5FDHzrWe2

Compressor summary: Stealthy Imitation is a model stealing attack on deep reinforcement learning policies that uses black-box access and no environmental knowledge to approximate the input states distribution and outperforms prior data-free approaches.


Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift

https://openreview.net/forum?id=H3bATm4mKn

Compressor summary: The paper proposes SpAR, a method that adapts neural network weights based on the eigenspectrum decomposition of source and target data, to improve out-of-distribution generalization for regression problems.


Robustness of Nonlinear Representation Learning

https://openreview.net/forum?id=GyV33H5Uuk

Compressor summary: The paper studies how robust nonlinear representation learning is when the mixing function is slightly misspecified, and shows that Independent Component Analysis can approximate recovery of the mixing matrix and independent components under such conditions.


EMC$^2$: Efficient MCMC Negative Sampling for Contrastive Learning with Global Convergence

https://openreview.net/forum?id=GxOFM3f5Vm

Compressor summary: EMC$^2$ is an efficient Markov Chain Monte Carlo method for generating negative samples in contrastive learning that achieves global convergence with low computational and memory cost.


Representation Surgery: Theory and Practice of Affine Steering

https://openreview.net/forum?id=GwA4go0Mw4

Compressor summary: The paper studies how to modify neural language models' representations to prevent them from generating harmful or biased text, and proposes new optimal steering functions for this purpose.


Efficient Mixture Learning in Black-Box Variational Inference

https://openreview.net/forum?id=Grrydzui3A

Compressor summary: MISVAE improves scalability and performance of mixture variational distributions in black box variational inference by reducing parameters and inference time using importance sampling and new ELBO estimators.


Indirectly Parameterized Concrete Autoencoders

https://openreview.net/forum?id=GqsRKEhelH

Compressor summary: The paper proposes a new method called Indirectly Parameterized CAEs that improves embedded feature selection by using Gumbel-Softmax distributions and avoiding duplicate selections.


Fast Co-Training under Weak Dependence via Stream-Based Active Learning

https://openreview.net/forum?id=GqWy1wZKeE

Compressor summary: Co-training can learn natural concepts efficiently in a stream-based active learning model by reducing it to online classification.


Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

https://openreview.net/forum?id=Gq1ajaKhBC

Compressor summary: The paper introduces a new framework and algorithm for efficient reinforcement learning with high-dimensional inputs, where the environment has low-dimensional latent dynamics.


Iterative Regularized Policy Optimization with Imperfect Demonstrations

https://openreview.net/forum?id=Gp5F6qzwGK

Compressor summary: The paper proposes a method (IRPO) to improve imitation learning by combining offline imitation and online reinforcement learning, and enhancing demonstration quality using online feedback.


Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

https://openreview.net/forum?id=Gp0xZDmrA2

Compressor summary: The paper proposes a new asymmetric learning paradigm based on coupled covariance eigenproblem (CCE) for infinite-dimensional feature maps, and introduces an asymmetric Nyström method to speed up training.


Reducing Balancing Error for Causal Inference via Optimal Transport

https://openreview.net/forum?id=GktjBAGgo4

Compressor summary: The paper proposes a method to reduce confounding bias in causal inference by using optimal transport with learnable marginal distributions and cost functions.


Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning

https://openreview.net/forum?id=GiHo83ozsF

Compressor summary: The text introduces a new gradient aggregation method for multi-task learning that considers uncertainty and task dependencies using Bayesian inference.


Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

https://openreview.net/forum?id=GhPFmTJNfj

Compressor summary: The paper investigates how GCN link prediction can be biased towards within-group dynamics and proposes a new fairness metric and training strategy to address this issue.


Incorporating probabilistic domain knowledge into deep multiple instance learning

https://openreview.net/forum?id=GfNyqrwECJ

Compressor summary: The authors propose DeeMILIP, a framework for incorporating domain knowledge into deep multiple instance learning models by defining a mapping between domain entities and model components, and demonstrate its effectiveness on an immune-based diagnostics case.


A Doubly Recursive Stochastic Compositional Gradient Descent Method for Federated Multi-Level Compositional Optimization

https://openreview.net/forum?id=GentO2E4ID

Compressor summary: The paper proposes a novel federated learning algorithm for multi-level compositional problems that achieves linear speedup and mitigates heterogeneity and communication efficiency issues.


LangCell: Language-Cell Pre-training for Cell Identity Understanding

https://openreview.net/forum?id=GcZjpKA37R

Compressor summary: LangCell is a pre-trained language model that incorporates cross-modal knowledge of single-cell data and natural language to improve cell identity understanding tasks without relying on supervision signals.


Constrained Reinforcement Learning Under Model Mismatch

https://openreview.net/forum?id=GcW9pg4P9x

Compressor summary: The paper proposes a Robust Constrained Policy Optimization (RCPO) algorithm for reinforcement learning under model uncertainty, which can optimize reward and satisfy constraints in real environments.


Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

https://openreview.net/forum?id=GbFluKMmtE

Compressor summary: State-space models like Mamba can perform well in some in-context learning tasks but not others, and combining them with attention blocks improves their performance overall.


Can Implicit Bias Imply Adversarial Robustness?

https://openreview.net/forum?id=GYGkt2M8ee

Compressor summary: Gradient-based training can harm network robustness, but using a shallow pReLU network with gradient flow improves both generalization and adversarial resistance.


Retrieval Across Any Domains via Large-scale Pre-trained Model

https://openreview.net/forum?id=GVmvBNxB73

Compressor summary: The proposed Text-driven Knowledge Integration (TKI) method enables data-free adaptive cross-domain image retrieval using a pre-trained vision-language model and learnable domain word vectors, achieving better performance on benchmark datasets.


Learning to Scale Logits for Temperature-Conditional GFlowNets

https://openreview.net/forum?id=GUEsK9xJny

Compressor summary: Logit-scaling GFlowNets is a new design that improves temperature-conditional GFlowNets' training speed and performance in biological and chemical tasks by scaling the policy's logits with a learned function of the temperature.


Neurodegenerative Brain Network Classification via Adaptive Diffusion with Temporal Regularization

https://openreview.net/forum?id=GTnn6bNE3j

Compressor summary: AGT is a novel method that adaptively captures node characteristics and temporal variations for analyzing neurodegenerative diseases on brain connectomes, improving graph classification results.


QuRating: Selecting High-Quality Data for Training Language Models

https://openreview.net/forum?id=GLGYYqPwjy

Compressor summary: QuRating is a method that uses human intuitions to select high-quality pre-training data for language models based on four criteria, resulting in better performance and a training curriculum.


In-Context Unlearning: Language Models as Few-Shot Unlearners

https://openreview.net/forum?id=GKcwle8XC9

Compressor summary: In-Context Unlearning is a method for removing specific training instances from large language models without retraining or updating the model parameters by providing inputs with different labels at inference time.


Active Statistical Inference

https://openreview.net/forum?id=GKMcCtWC7H

Compressor summary: Active inference is a statistical inference method that uses machine learning to collect labels efficiently by prioritizing uncertain data points and relying on confident predictions.


From Inverse Optimization to Feasibility to ERM

https://openreview.net/forum?id=GJzqRKOdRi

Compressor summary: The paper presents a new method for contextual inverse optimization that uses additional information to infer unknown parameters and shows improved performance on various problems.


Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response

https://openreview.net/forum?id=GHZVjmaGQM

Compressor summary: The paper proposes a hybrid loss function that combines causal and predictive losses to learn interpretable and causally valid hybrid models for complex systems, using glucose dynamics post-exercise in individuals with type 1 diabetes as an example.


No-Regret Reinforcement Learning in Smooth MDPs

https://openreview.net/forum?id=GGnYDXZC1B

Compressor summary: The paper proposes new algorithms for continuous state and action spaces in reinforcement learning, using a novel structural assumption called $u-$smoothness and feature maps based on Legendre polynomials.


StableMask: Refining Causal Masking in Decoder-only Transformer

https://openreview.net/forum?id=GFfWzAReAc

Compressor summary: StableMask improves language modeling by refining the causal mask, balancing attention distributions, and encoding absolute positional information without increasing parameters.


Mechanistic Design and Scaling of Hybrid Architectures

https://openreview.net/forum?id=GDp7Gyd9nf

Compressor summary: The text describes a new pipeline for designing deep learning architectures that simplifies the process by using small-scale tests to probe capabilities and identify hybrid designs that outperform existing models in scaling and performance.


DsDm: Model-Aware Dataset Selection with Datamodels

https://openreview.net/forum?id=GC8HkKeH8s

Compressor summary: The text argues that selecting data for training large-scale models based on human notions of quality may not improve performance, and proposes a new method that optimizes model performance by choosing the best subset of data for each task.


Differentiable Weightless Neural Networks

https://openreview.net/forum?id=GBxflz0qdX

Compressor summary: The Differentiable Weightless Neural Network (DWN) is a new model that uses interconnected lookup tables and a novel technique for approximate differentiation, achieving superior performance in latency, throughput, energy efficiency, and accuracy on various edge computing devices and tabular datasets.


SCoRe: Submodular Combinatorial Representation Learning

https://openreview.net/forum?id=G8zDeKOp0R

Compressor summary: SCoRe is a new representation learning framework that uses set-based submodular measures to address inter-class bias and intra-class variance, improving performance in various tasks such as classification and object detection.


Minimum Norm Interpolation Meets The Local Theory of Banach Spaces

https://openreview.net/forum?id=G4b32bKnBy

Compressor summary: The paper proposes a general framework to analyze interpolation models' generalization properties using concepts from high-dimensional geometry and shows how these properties depend on the Gaussian complexity and cotype of the space.


GFlowNet Training by Policy Gradients

https://openreview.net/forum?id=G1igwiBBUj

Compressor summary: The paper proposes a new GFlowNet training framework that combines policy-dependent rewards with reinforcement learning to improve the efficiency and performance of generating combinatorial objects.


ByMI: Byzantine Machine Identification with False Discovery Rate Control

https://openreview.net/forum?id=G0z4bCNmkG

Compressor summary: ByMI is a general detection procedure for Byzantine machines in distributed learning that uses error rate control, sample-splitting, and score symmetry to achieve dimension insensitivity and p-value freedom.


Learning Pseudo-Contractive Denoisers for Inverse Problems

https://openreview.net/forum?id=G0vZ5ENrJQ

Compressor summary: The paper proposes a new training strategy for deep denoisers that improves their performance by enforcing a weaker constraint called pseudo-contractiveness, and provides efficient algorithms and experimental results to support its effectiveness.


Delving into Differentially Private Transformer

https://openreview.net/forum?id=FzyMdAm2fZ

Compressor summary: Key points: - The paper focuses on training Transformer models with differential privacy (DP). - It reduces the problem to training DP vanilla neural nets by identifying and addressing specific challenges. - It proposes Re-Attention Mechanism and Phantom Clipping to deal with attention distraction and gradient clipping issues. Summary: The paper introduces methods to train Transformer models with DP by reducing the problem to vanilla neural nets, and addressing attention distraction and gradient clipping challenges using Re-Attention Mechanism and Phantom Clipping.


Implicit meta-learning may lead language models to trust more reliable sources

https://openreview.net/forum?id=Fzp1DRzCIN

Compressor summary: The paper shows that large language models can learn to prioritize useful texts based on random tags and adjust their updates accordingly, and explores how this impacts their knowledge representation and potential risks.


On Discrete Prompt Optimization for Diffusion Models

https://openreview.net/forum?id=Fw4fBE2rqW

Compressor summary: The paper presents a framework for optimizing prompts in text-to-image diffusion models, addressing challenges related to domain space and text gradient computation with new methods.


Vague Prototype-Oriented Diffusion Model for Multi-Class Anomaly Detection

https://openreview.net/forum?id=FvLd8Gr7xq

Compressor summary: The VPDM model uses vague prototypes and conditional diffusion to identify anomalies in multiple object classes without being biased by normal data.


Accelerating PDE Data Generation via Differential Operator Action in Solution Space

https://openreview.net/forum?id=Fv9GLw0LkO

Compressor summary: DiffOAS is a novel algorithm that accelerates and improves the precision of PDE dataset generation for Neural Operator applications.


Combinatorial Approximations for Cluster Deletion: Simpler, Faster, and Better

https://openreview.net/forum?id=FpbKoIPHxb

Compressor summary: The paper improves approximation guarantees, derandomizes algorithms, and proposes a new scalable combinatorial method for cluster deletion, a graph partitioning problem with applications in biology and social networks.


ReGAL: Refactoring Programs to Discover Generalizable Abstractions

https://openreview.net/forum?id=FovMAzXUpj

Compressor summary: ReGAL is a method to improve program synthesis by learning reusable functions from existing code through refactorization, leading to better accuracy and efficiency in predicting programs across diverse domains.


Generative Conditional Distributions by Neural (Entropic) Optimal Transport

https://openreview.net/forum?id=FoRqdsN4IA

Compressor summary: The paper proposes a new neural network method for learning generative models of conditional distributions, especially in situations with limited data, by minimizing a regularized objective function based on entropic optimal transport.


Intersectional Unfairness Discovery

https://openreview.net/forum?id=FhWH9TQSMh

Compressor summary: Key points: - AI systems can be biased based on multiple sensitive attributes - The paper proposes a Bias-Guided Generative Network (BGGN) to discover high-bias intersectional sensitive attributes - Experiments on text and image datasets show the effectiveness of BGGN - Generating biased data reveals potential unfairness in modern generative AI systems Summary: The paper introduces a method to find bias in AI based on multiple sensitive attributes using a network that generates and evaluates biased data.


CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

https://openreview.net/forum?id=Ffpg52swvg

Compressor summary: The paper introduces CodeReasoning, a benchmark for evaluating code models on input and output prediction tasks, and shows that current models struggle to solve it even with CoT and fine-tuning schemes.


RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

https://openreview.net/forum?id=FYvpxyS43U

Compressor summary: The paper proposes RoSA, a new fine-tuning method for large language models that combines low-rank and sparse approximations to improve accuracy under limited resources.


3D Geometric Shape Assembly via Efficient Point Cloud Matching

https://openreview.net/forum?id=FYQIgQWH3d

Compressor summary: PMTR is a new framework that uses an approximate high-order feature transform layer, PMT, to reliably match and assemble geometric shapes with low memory and compute costs.


To Each (Textual Sequence) Its Own: Improving Memorized-Data Unlearning in Large Language Models

https://openreview.net/forum?id=FWlNA3et6X

Compressor summary: The paper proposes a method to measure and improve unlearning in LLMs by treating each textual sequence differently based on its memorization level, addressing privacy and copyright issues.


MOMENT: A Family of Open Time-series Foundation Models

https://openreview.net/forum?id=FVvf69a5rx

Compressor summary: MOMENT is a family of open-source foundation models for time series analysis that overcomes challenges in pre-training and evaluation by using a large, diverse collection of time series data and a new benchmark.


Auditing Private Prediction

https://openreview.net/forum?id=FVmqX0sYz9

Compressor summary: The paper presents a framework to audit the privacy leakage of four private prediction algorithms by varying adversary capabilities and introduces novel techniques to measure Renyi DP.


Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

https://openreview.net/forum?id=FV3kY9FBW6

Compressor summary: The paper proposes a method called Adaptive Advantage-guided Policy Regularization (A2PR) for offline reinforcement learning that improves policy performance by selectively using high-advantage actions from an augmented behavior policy and a VAE, while maintaining conservatism from out-of-distribution actions.


CarbonNovo: Joint Design of Protein Structure and Sequence Using a Unified Energy-based Model

https://openreview.net/forum?id=FSxTEvuFa7

Compressor summary: CarbonNovo is a unified energy-based model that generates protein structure and sequence together, improving over existing two-stage methods in various metrics.


FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

https://openreview.net/forum?id=FQQ4476dT2

Compressor summary: FightLadder is a real-time fighting game platform for competitive multi-agent research with challenging dynamics, visual inputs, and evaluation metrics.


Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

https://openreview.net/forum?id=FPnUhsQJ5B

Compressor summary: The authors improve noise sampling techniques for rectified flow models, introduce a novel transformer-based architecture for text-to-image synthesis, and achieve state-of-the-art results.


How Language Model Hallucinations Can Snowball

https://openreview.net/forum?id=FPlaQyAGHu

Compressor summary: The paper shows that some large language models can recognize their own incorrect statements after producing them, which may cause a chain reaction of errors.


Mean Field Langevin Actor-Critic: Faster Convergence and Global Optimality beyond Lazy Learning

https://openreview.net/forum?id=FOJE1kRcHG

Compressor summary: This paper studies deep reinforcement learning algorithms that use mean-field dynamics, policy gradient, and temporal-difference learning to learn features and find optimal policies, and introduces new methods for critic and actor updates.


Towards General Neural Surrogate Solvers with Specialized Neural Accelerators

https://openreview.net/forum?id=FNKnLhLuhY

Compressor summary: SNAP-DDM is a method that uses neural networks to speed up solving PDEs with arbitrary boundary conditions and geometric parameters in 2D electromagnetics and fluidic flow problems.


C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

https://openreview.net/forum?id=FMa4c5NhOe

Compressor summary: The paper proposes a framework to certify generation risks for retrieval-augmented language models (RAG) and shows they have lower generation risks than large language models (LLMs).


Ranking-based Client Imitation Selection for Efficient Federated Learning

https://openreview.net/forum?id=FMEhnS0948

Compressor summary: FedRank is a novel device selection method for federated learning that considers data and system heterogeneity and improves model accuracy, training efficiency, and energy consumption.


IIANet: An Intra- and Inter-Modality Attention Network for Audio-Visual Speech Separation

https://openreview.net/forum?id=FM61SQzF3N

Compressor summary: The proposed IIANet model leverages attention mechanisms to efficiently fuse audio-visual features for speech separation, outperforming previous methods with less computation time.


Faster Maximum Inner Product Search in High Dimensions

https://openreview.net/forum?id=FKkkdyRdsD

Compressor summary: BanditMIPS is a novel algorithm that significantly improves the complexity of finding the highest inner product vector among many vectors in high dimensions, and can be used in related problems like Matching Pursuit and Fourier analysis.


Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

https://openreview.net/forum?id=FHkavpr5Ze

Compressor summary: Memory-space visual prompting (MemVP) is a novel approach that injects visual knowledge into language models by concatenating visual prompts with the weights of the feed-forward network, leading to faster training time, lower inference latency, and improved performance on various vision-language tasks.


Jacobian Regularizer-based Neural Granger Causality

https://openreview.net/forum?id=FG5hjRBtpm

Compressor summary: The paper proposes JRNGC, a method to learn multivariate summary and full-time Granger causality with a single model, addressing limitations of existing neural Granger causality approaches.


Stochastic Interpolants with Data-Dependent Couplings

https://openreview.net/forum?id=FFILRGD0jG

Compressor summary: The paper proposes conditional generative models that use dynamical transport maps to couple base and target densities, which can be learned by a simple square loss regression problem and applied to super-resolution and in-painting tasks.


Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models

https://openreview.net/forum?id=FCtO757Onl

Compressor summary: BNS solvers are a new family of non-stationary solvers that improve sample efficiency of Diffusion and Flow models by distilling existing numerical ODE solvers with fast optimization, small parameter space, and diverse samples.


Evaluation of Trajectory Distribution Predictions with Energy Score

https://openreview.net/forum?id=FCmWhJQ14I

Compressor summary: The text discusses the limitations of "Minimum of N" metrics for evaluating trajectory prediction models in autonomous systems and suggests using Energy Score-based measures instead for better assessment.


An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning

https://openreview.net/forum?id=F3x6uYILgL

Compressor summary: MCPL is a method that learns multiple image concepts from a single sentence-image pair using textural inversion and regularisation techniques, enabling synthesis of novel images with new semantically disentangled concepts.


Trust Regions for Explanations via Black-Box Probabilistic Certification

https://openreview.net/forum?id=F3RdeyiR5H

Compressor summary: The paper introduces a new problem of finding trust regions for black box explanations, which can provide insights into model behavior, stability, explanation reuse, and comparison of methods.


How Interpretable Are Interpretable Graph Neural Networks?

https://openreview.net/forum?id=F3G2udCF3Q

Compressor summary: The paper proposes Graph Multilinear neT (GMT), a new XGNN architecture that improves the interpretability and performance of interpretable subgraph learning on graph-structured data.


Entropy-Reinforced Planning with Large Language Models for Drug Discovery

https://openreview.net/forum?id=F3Ds71Xgo1

Compressor summary: ERP improves molecule generation by balancing exploration and exploitation in Transformer decoding using entropy-revised planning, achieving state-of-the-art results on various benchmarks.


Conformal Validity Guarantees Exist for Any Data Distribution (and How to Find Them)

https://openreview.net/forum?id=F3936hVwQa

Compressor summary: The paper proposes extending conformal prediction to handle sequential data shifts in AI/ML systems, and presents practical algorithms and evaluations for black-box optimization and active learning tasks.


Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

https://openreview.net/forum?id=F2Tegvyqlo

Compressor summary: The paper proposes methods to improve the calibration of over-parametrized DNNs for image classification by decoupling feature extraction and classification layers, and placing a Gaussian prior on the last hidden layer outputs.


Libra: Building Decoupled Vision System on Large Language Models

https://openreview.net/forum?id=F1drhMjN7s

Compressor summary: Libra is a model that combines vision and language by routing the inputs through different modules, achieving strong performance in image-to-text tasks with less data.


The Role of Learning Algorithms in Collective Action

https://openreview.net/forum?id=Ez3Lckpe4l

Compressor summary: The text discusses how the choice of learning algorithm affects the success of collective action in machine learning, using distributionally robust optimization and stochastic gradient descent as examples.


Compute Better Spent: Replacing Dense Layers with Structured Matrices

https://openreview.net/forum?id=ExHTFXEhc9

Compressor summary: The authors explore structured matrices as efficient alternatives to dense matrices in foundation models, proposing a new matrix family (Block Tensor-Train) that outperforms dense matrices on multiple tasks with less compute.


Feedback Loops With Language Models Drive In-Context Reward Hacking

https://openreview.net/forum?id=EvHWlYTLWe

Compressor summary: The text discusses how language models can manipulate their outputs to optimize objectives and create negative side effects through feedback loops, and suggests three recommendations for evaluating this phenomenon.


Practical Hamiltonian Monte Carlo on Riemannian Manifolds via Relativity Theory

https://openreview.net/forum?id=Et8Pk97u4u

Compressor summary: The paper proposes a method for numerically stable Hamiltonian dynamics on Riemannian manifolds by generalizing the idea of upper-bounding particle speed based on position and deriving an algorithm for sampling from relativistic momentum distributions.


Diffusion Rejection Sampling

https://openreview.net/forum?id=EsWJ5wd2ir

Compressor summary: The paper proposes Diffusion Rejection Sampling, a method that improves sampling performance under well-trained diffusion models by refining samples with varying effort depending on their quality.


A Tensor Decomposition Perspective on Second-order RNNs

https://openreview.net/forum?id=EsSSDjwFra

Compressor summary: CPRNN is a new model that combines second-order interactions and CP decomposition to improve sequence modelling over traditional RNNs while reducing parameter count.


MusicRL: Aligning Music Generation to Human Preferences

https://openreview.net/forum?id=EruV94XRDs

Compressor summary: MusicRL is a text-to-music system that uses reinforcement learning and human feedback to generate songs that match the given captions and sound good.


Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains

https://openreview.net/forum?id=ErkzxOlOLy

Compressor summary: The paper proposes a self-supervised learning method for deep networks to better learn representations from tabular data with heterogeneous features and irregular functions, using binning as a pretext task.


Probabilistic Generating Circuits - Demystified

https://openreview.net/forum?id=EqFxIbGWRU

Compressor summary: Zhang et al. (2021) introduced probabilistic generating circuits, which unify probabilistic circuits and determinantal point processes, by showing how to transform them into probabilistic circuits with negative weights, allowing for tractable marginalization on categorical variables.


KnowFormer: Revisiting Transformers for Knowledge Graph Reasoning

https://openreview.net/forum?id=EncFNR3hxM

Compressor summary: Key points: - Knowledge graph reasoning is important and path-based methods have limitations - KnowFormer uses transformers for knowledge graph reasoning with message-passing - It defines attention computation based on query prototype and structure-aware modules - It outperforms baseline methods on transductive and inductive benchmarks Summary: KnowFormer is a novel method that uses transformers to perform knowledge graph reasoning with efficient message-passing, leveraging query prototype and structure-aware modules.


Dual Operating Modes of In-Context Learning

https://openreview.net/forum?id=ElVHUWyL3n

Compressor summary: The text discusses a probabilistic model for in-context learning that explains its dual operating modes (task learning and task retrieval) and an "early ascent" phenomenon observed in large language models.


Differentially Private Worst-group Risk Minimization

https://openreview.net/forum?id=ElNxZ40tBJ

Compressor summary: The paper studies how to minimize worst-group risk under differential privacy using new algorithms and stability analysis, and compares different approaches.


Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

https://openreview.net/forum?id=EhU0xBSP4l

Compressor summary: The study explores how over-parameterized ReLU CNNs can achieve near optimal accuracy in XOR-type classification tasks with label-flipping noises, under certain conditions.


Surprisingly Strong Performance Prediction with Neural Graph Features

https://openreview.net/forum?id=EhPpZV6KLk

Compressor summary: GRAF is a new method for predicting neural network performance that uses simple graph features, which are faster and more interpretable than existing methods and outperform them.


Junk DNA Hypothesis: Pruning Small Pre-Trained Weights $\textit{Irreversibly}$ and $\textit{Monotonically}$ Impairs ``Difficult" Downstream Tasks in LLMs

https://openreview.net/forum?id=EfUrTeuUfy

Compressor summary: The paper argues that small-magnitude weights in pre-trained language models contain crucial information for handling difficult tasks, and pruning them leads to performance degradation, while quantization does not have the same effect.


On the Identifiability of Switching Dynamical Systems

https://openreview.net/forum?id=Eew3yUQQtE

Compressor summary: The paper investigates how to identify latent variables and non-linear mappings in Switching Dynamical Systems, a type of sequential latent variable model, using techniques from deep latent variable models and non-linear Gaussians.


Position: Graph Foundation Models Are Already Here

https://openreview.net/forum?id=Edz0QXKKAo

Compressor summary: Key points: - Graph Foundation Models (GFMs) are graph models trained on diverse data for various tasks and domains - GFMs face challenges in positive transfer from diverse data sources - A "graph vocabulary" encodes the invariance of graphs and can help GFM development Summary: The paper proposes a graph vocabulary, a set of basic units that capture the invariants of graphs, to enable positive transfer for Graph Foundation Models (GFMs) trained on diverse data.


Asymptotics of feature learning in two-layer networks after one gradient-step

https://openreview.net/forum?id=EdRb84fiJY

Compressor summary: The authors study how two-layer neural networks learn features from data using a spiked Random Features model, and provide an asymptotic description of their generalization error in high dimensions.


Learning from Memory: Non-Parametric Memory Augmented Self-Supervised Learning of Visual Features

https://openreview.net/forum?id=Ed4KgHoKNe

Compressor summary: The paper presents a new method to enhance self-supervised learning stability by using a non-parametric memory for concept comparison during training.


Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

https://openreview.net/forum?id=EaJ7nqJ2Fa

Compressor summary: The authors challenge the no free lunch theorems and suggest that neural networks prefer low-complexity data and can handle diverse problems with a single model.


3D-VLA: A 3D Vision-Language-Action Generative World Model

https://openreview.net/forum?id=EZcFK8HupF

Compressor summary: The paper introduces 3D-VLA, a new model that connects 3D perception, reasoning, and action using a generative world model, improving reasoning and multimodality generation for embodied AI tasks.


Reducing sequential change detection to sequential estimation

https://openreview.net/forum?id=EZLsxOgcDg

Compressor summary: Key points: - The text proposes a scheme for detecting changes in a data stream distribution. - The scheme uses confidence sequences and has small detection delay and low false alarm rate. - The scheme works for dependent observations and nonparametric distributions. - The text relates the scheme to other frameworks in the literature. Summary: The paper presents a simple and effective scheme for sequential change detection in data streams, based on confidence sequences, that can handle various dependencies and nonparametric distributions.


Position: Video as the New Language for Real-World Decision Making

https://openreview.net/forum?id=EZH4CsKV6O

Compressor summary: The text discusses the potential of video generation as a versatile and powerful tool for solving real-world tasks, similar to how language models have been successfully applied.


HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

https://openreview.net/forum?id=EYvEVbfoDp

Compressor summary: HALC is a decoding algorithm that reduces object hallucinations in large vision-language models by using a local grounding mechanism and a global beam search.


Exploring the Enigma of Neural Dynamics Through A Scattering-Transform Mixer Landscape for Riemannian Manifold

https://openreview.net/forum?id=EYOo48YGhy

Compressor summary: The authors propose a physics-informed deep learning model to analyze how brain structure and function are interconnected through data geometry and neural activities are driven by brain-wide oscillation waves.


Cell2Sentence: Teaching Large Language Models the Language of Biology

https://openreview.net/forum?id=EWt5wsEdvc

Compressor summary: Cell2Sentence (C2S) is a method that converts gene expression data into "cell sentences" to help language models understand single-cell biology and perform various tasks in the field.


Light and Optimal Schrödinger Bridge Matching

https://openreview.net/forum?id=EWJn6hfZ4J

Compressor summary: The paper proposes a novel procedure, called optimal Schrödinger bridge matching, to learn Schrödinger bridges that efficiently recovers the process with minimal error and relates to energy-based modeling objectives.


A Geometric Explanation of the Likelihood OOD Detection Paradox

https://openreview.net/forum?id=EVMzCKLpdD

Compressor summary: We propose a method for detecting out-of-distribution data in deep generative models by combining likelihoods and local intrinsic dimension estimates from a pre-trained model, achieving state-of-the-art results.


Observable Propagation: Uncovering Feature Vectors in Transformers

https://openreview.net/forum?id=ETNx4SekbY

Compressor summary: ObProp is a novel method to find linear features of transformer language models using almost no data, enabling better understanding of their mechanisms and biases.


Federated Neuro-Symbolic Learning

https://openreview.net/forum?id=EQXZqBXeW9

Compressor summary: FedNSL is a new framework that uses neural networks to learn complex symbolic rules from distributed data, improving downstream task performance across domains.


SuDA: Support-based Domain Adaptation for Sim2Real Hinge Joint Tracking with Flexible Sensors

https://openreview.net/forum?id=ENNGAY5uKC

Compressor summary: Key points: - Flexible sensors can capture human motion without constraints or privacy issues - Existing methods need large labeled datasets and MoCap studios, which are costly and hard to obtain - Proposed method uses SuDA to adapt flexible sensor data from simulation to real world without labels - Experiments show superior performance over existing distribution-based domain adaptation methods Summary: The paper proposes a novel Sim2Real solution for human motion capture using flexible sensors, which adapts the data from simulation to real world without labeled datasets and outperforms existing methods.


Topological Neural Networks go Persistent, Equivariant, and Continuous

https://openreview.net/forum?id=ELFZWG9C7l

Compressor summary: TopNets combine topological neural networks and persistent homology to create a unified framework that improves the representation and expressivity of graph neural networks for various tasks.


FairProof : Confidential and Certifiable Fairness for Neural Networks

https://openreview.net/forum?id=EKye56rLuv

Compressor summary: Fairproof is a system that uses cryptography to verify the fairness of machine learning models without revealing their inner workings or data.


Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion

https://openreview.net/forum?id=EK7fuAMNoI

Compressor summary: The paper extends two optimization algorithms, EAG and FEG, to constrained comonotone min-max problems and proves their optimal convergence rates.


Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

https://openreview.net/forum?id=EIcxV7T0Sy

Compressor summary: The text proposes using category theory to create a unified framework for specifying and implementing deep learning architectures, recovering geometric constraints and encompassing various neural network designs.


MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

https://openreview.net/forum?id=EIGbXbxcUQ

Compressor summary: The paper proposes MobileLLM, a sub-billion parameter language model with efficient architecture and weight-sharing, achieving high accuracy and performance on various tasks.


Near-Linear Time Approximation Algorithms for k-means with Outliers

https://openreview.net/forum?id=EHjm3sXPFy

Compressor summary: The paper proposes fast sampling-based algorithms for clustering with outliers, achieving almost linear running time and outperforming previous methods.


Defense against Model Extraction Attack by Bayesian Active Watermarking

https://openreview.net/forum?id=EFtNP211X3

Compressor summary: The paper proposes an active watermarking technique for defending against model extraction, which fine-tunes the victim model to detect and prevent theft while preserving utility and efficiency.


Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

https://openreview.net/forum?id=EEinDTdKr1

Compressor summary: The paper analyzes the local sensitivity of dot-product self-attention in vision transformers, showing how it affects their robustness to input perturbations and presenting a new tool (LoFAST) to measure it.


Modeling Language Tokens as Functionals of Semantic Fields

https://openreview.net/forum?id=EEO4Iktfjp

Compressor summary: The paper introduces ${\it LasF}$, a module that simulates neuronal behaviors for better language modeling by replacing the final layers of pre-trained language models, achieving improved accuracy with fewer parameters.


Bipartite Matching in Massive Graphs: A Tight Analysis of EDCS

https://openreview.net/forum?id=EDEISRmi6X

Compressor summary: The paper proposes a new analysis for edge-degree constrained subgraph, a sparsifier for maximum matching problems, and shows that the best value of its parameter beta is 6, achieving an approximation ratio of .677.


Weakly Convex Regularisers for Inverse Problems: Convergence of Critical Points and Primal-Dual Optimisation

https://openreview.net/forum?id=E8FpcUyPuS

Compressor summary: The paper proposes a generalised framework for convergent regularisation using weakly convex neural networks, and demonstrates its advantages in improving CT reconstruction with learned adversarial regularisers.


Contextual Feature Selection with Conditional Stochastic Gates

https://openreview.net/forum?id=E6Nm3x7acv

Compressor summary: Key points: - Feature selection is important in machine learning and varies with context - c-STG is a new architecture that selects features based on context variables - c-STG uses hypernetwork to map context to feature selection parameters - c-STG has theoretical advantages, empirical results, and interpretability Summary: c-STG is a novel feature selection method that adapts to context using a hypernetwork and improves performance, flexibility, and interpretability.


From Fourier to Neural ODEs: Flow Matching for Modeling Complex Systems

https://openreview.net/forum?id=E4qjDAdVte

Compressor summary: Fourier NODEs (FNODEs) is a simulation-free framework that uses Fourier analysis to estimate temporal and spatial gradients from noisy data and trains neural ordinary differential equations more accurately, efficiently, and robustly than existing methods.


Run-Time Task Composition with Safety Semantics

https://openreview.net/forum?id=E4ItiEU8Iu

Compressor summary: The paper proposes two safety semantics for compositional Reinforcement Learning, enables enforcing them, analyzes their trade-offs, and extends Boolean composition to continuous action spaces.


Recovering Labels from Local Updates in Federated Learning

https://openreview.net/forum?id=E41gvBG4s6

Compressor summary: RLU is a novel label recovery scheme for federated learning that works well even in real-world settings with diverse data and optimizers.


Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination

https://openreview.net/forum?id=E3V5MMwFgd

Compressor summary: The paper proposes robust and efficient estimators for Gaussian sparse estimation tasks, such as mean estimation, PCA, and linear regression, with improved error guarantees compared to prior algorithms.


Position: Towards Unified Alignment Between Agents, Humans, and Environment

https://openreview.net/forum?id=DzLna0cFL1

Compressor summary: The paper introduces UA2, a framework for improving autonomous agents by aligning them with human intentions, environmental dynamics, and self-constraints in realistic scenarios.


Benign Overfitting in Adversarial Training of Neural Networks

https://openreview.net/forum?id=DyvhD8J3Wl

Compressor summary: The paper studies how interpolating neural networks trained using adversarial methods can still generalize well even when facing inference-time attacks, under certain distributional assumptions.


State-Free Inference of State-Space Models: The *Transfer Function* Approach

https://openreview.net/forum?id=DwwI9L67B5

Compressor summary: The paper presents a new way to design deep learning models using a frequency-domain transfer function that enables fast and memory-efficient inference without states.


Incorporating Information into Shapley Values: Reweighting via a Maximum Entropy Approach

https://openreview.net/forum?id=DwniHlwcOB

Compressor summary: The authors propose entropy-based variations of Shapley values that balance prior knowledge and simplicity in causal inference.


Faster Adaptive Decentralized Learning Algorithms

https://openreview.net/forum?id=Dwc0RwiNI5

Compressor summary: The paper proposes new fast and adaptive decentralized learning methods for distributed nonconvex optimization tasks with improved sample complexity and provides convergence analysis.


Dynamic Evaluation of Large Language Models by Meta Probing Agents

https://openreview.net/forum?id=DwTgy1hXXo

Compressor summary: Meta probing agents (MPA) evaluate large language models by transforming an original problem into a new one using psychometric theory, allowing multifaceted analysis of the models' abilities and improving them.


Stability and Generalization for Stochastic Recursive Momentum-based Algorithms for (Strongly-)Convex One to $K$-Level Stochastic Optimizations

https://openreview.net/forum?id=DsVzHj7jcA

Compressor summary: This paper analyzes the generalization performance of three STORM-based algorithms for different levels of stochastic optimization and provides stability results and excess risk bounds.


Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance

https://openreview.net/forum?id=DrE7jVF4VW

Compressor summary: The paper proposes methods to improve recent diffusion models by optimizing posterior covariance using maximum likelihood estimation, leading to better image denoising without retraining.


Adaptive Group Personalization for Federated Mutual Transfer Learning

https://openreview.net/forum?id=DqC9XiI71U

Compressor summary: AdaGrP is a new method for mutual transfer learning that adapts to concept drifts and has theoretical guarantees of learnability recovery without hyper-parameter tuning.


Position: Why We Must Rethink Empirical Research in Machine Learning

https://openreview.net/forum?id=DprrMz24tk

Compressor summary: The authors warn about non-replicable and unreliable results in machine learning research and suggest that it should be more exploratory instead of confirmatory.


How Graph Neural Networks Learn: Lessons from Training Dynamics

https://openreview.net/forum?id=Dn4B53IcCW

Compressor summary: The paper studies how gradient descent helps graph neural networks (GNNs) learn functions using graph structure, leading to a better understanding of their behavior and generalization.


Model-based Reinforcement Learning for Confounded POMDPs

https://openreview.net/forum?id=DlR8fWgJRl

Compressor summary: Key points: - A model-based offline RL algorithm for confounded POMDPs with general function approximations - A novel identification result for learning action effects on rewards and transitions in the POMDP - A nonparametric two-stage estimation procedure for OPE that allows general function approximations - A conservative policy optimization within confidence regions based on OPE estimator - A finite-sample upper bound on the suboptimality of the learned policy Summary: The paper presents a model-based offline RL algorithm for confounded POMDPs with general function approximations, which uses a novel identification result and a nonparametric two-stage estimation procedure for OPE, followed by a conservative policy optimization within confidence regions, and shows a finite-sample upper bound on the suboptimality.


Failures Are Fated, But Can Be Faded: Characterizing and Mitigating Unwanted Behaviors in Large-Scale Vision and Language Models

https://openreview.net/forum?id=DkqiId4AuR

Compressor summary: The paper proposes a deep reinforcement learning method to explore and fix failure modes in pre-trained neural networks for various tasks.


Low-Rank Bandits via Tight Two-to-Infinity Singular Subspace Recovery

https://openreview.net/forum?id=Dk0RBrqiyk

Compressor summary: Key points: - The paper studies contextual bandits with low-rank reward matrix - It presents efficient algorithms for policy evaluation, best policy identification and regret minimization - The algorithms have near minimax optimality guarantees and low sample complexity Summary: The paper proposes efficient and near minimax optimal algorithms for low-rank contextual bandits that estimate the reward subspaces and perform well in policy evaluation, best policy identification and regret minimization.


GenCO: Generating Diverse Designs with Combinatorial Constraints

https://openreview.net/forum?id=DiyE6OOGBa

Compressor summary: GenCO is a framework that allows deep generative models to create objects with hard constraints by using differentiable solvers and focusing on data distribution matching.


Collective Certified Robustness against Graph Injection Attacks

https://openreview.net/forum?id=DhxZVq1ZOo

Compressor summary: The paper proposes a collective certificate scheme for GNNs that certifies the robustness of a set of nodes simultaneously against graph injection attacks, improving the certification performance with efficient linear programming solutions.


Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

https://openreview.net/forum?id=DgLFkAPwuZ

Compressor summary: The paper introduces RPG, a text-to-image generation and editing framework that uses multimodal LLMs for chain-of-thought reasoning, regional diffusion, and closed-loop text-guided image editing to generate complex images with better compositionality and semantic alignment.


BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization

https://openreview.net/forum?id=DbyHDYslM7

Compressor summary: The paper introduces Bi-Exponent Block Floating Point (BiE), a novel numerical representation for LLMs that improves accuracy, reduces overhead, and is more expressive than low-bit data formats.


Evolving Subnetwork Training for Large Language Models

https://openreview.net/forum?id=DbMm8pmoAP

Compressor summary: Key points: - Large language models are costly to train but have great potential - Evolving Subnetwork Training (EST) samples subnetworks from layers and modules of the model - EST saves FLOPs and improves performance on downstream tasks without increasing loss Summary: The paper proposes EST, a novel training method that samples subnetworks from large language models to save costs and enhance generalization.


Optimal Kernel Choice for Score Function-based Causal Discovery

https://openreview.net/forum?id=DYd4vyyhUu

Compressor summary: The paper proposes a method to automatically select the optimal kernel for causal discovery using a mixture model of independent noise variables.


Graph Distillation with Eigenbasis Matching

https://openreview.net/forum?id=DYN66IJCI9

Compressor summary: GDEM is a graph distillation method that matches the eigenbasis and node features of real and synthetic graphs, preventing spectrum bias and improving cross-architecture generalization.


Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

https://openreview.net/forum?id=DYMj03Gbri

Compressor summary: Infectious jailbreak exploits large language models in multi-agent environments by compromising one agent and causing exponential infection of others without further adversary intervention.


Localizing Task Information for Improved Model Merging and Compression

https://openreview.net/forum?id=DWT9uiGjxT

Compressor summary: The authors propose TALL-masks and Consensus Merging methods to improve multi-task model merging by compressing individual checkpoints and eliminating interference weights, achieving better performance and storage reduction.


Sparsest Models Elude Pruning: An Exposé of Pruning’s Current Capabilities

https://openreview.net/forum?id=DRGgT7SyC7

Compressor summary: The authors conducted many experiments to test pruning algorithms on a synthetic dataset but found that current methods do not achieve the sparsest models, possibly due to overparameterization and other issues.


Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

https://openreview.net/forum?id=DRBgNQ2N7U

Compressor summary: The authors derive new bounds for kernel matrix condition number and use them to analyze the generalization behavior of kernel ridge regressors with different spectral decay rates.


Enhancing Sufficient Dimension Reduction via Hellinger Correlation

https://openreview.net/forum?id=DN7uk4gQ7C

Compressor summary: The authors propose a new method for dimension reduction in single-index models based on Hellinger correlation, which improves upon existing techniques by better capturing data dependencies.


Optimal Batched Linear Bandits

https://openreview.net/forum?id=DM0r4qatjT

Compressor summary: E$^4$ is a novel algorithm for linear bandits that achieves finite-time and asymptotic optimality in regret and batch complexity, and performs well on hard instances in experiments.


Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

https://openreview.net/forum?id=DLTjFFiuUJ

Compressor summary: This paper investigates attention sinks in large language models and proposes a training-free technique (ACT) that optimizes attention distributions during inference, improving accuracy on various tasks.


All-in-one simulation-based inference

https://openreview.net/forum?id=DL79HYCFFq

Compressor summary: The Simformer is a novel amortized inference method that uses a probabilistic diffusion model with transformer architectures to overcome the limitations of current approaches in flexibility, performance, and applicability to various domains such as ecology, epidemiology, and neuroscience.


Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds

https://openreview.net/forum?id=DKOHE4n8jk

Compressor summary: PIMS is an acquisition function in Bayesian optimization that achieves a tighter theoretical regret bound than GP-UCB and TS, while avoiding their practical issues.


Evaluating Quantized Large Language Models

https://openreview.net/forum?id=DKKg5EFAFr

Compressor summary: This paper evaluates the impact of post-training quantization on various language models and tasks, and provides recommendations for applying quantization techniques.


Deep Demonstration Tracing: Learning Generalizable Imitator Policy for Runtime Imitation from a Single Demonstration

https://openreview.net/forum?id=DJdVzxemdA

Compressor summary: DDT is a new algorithm that uses adaptive state tracing and meta-reinforcement learning to improve generalization of imitator agents in dynamic environments.


Deconstructing the Goldilocks Zone of Neural Network Initialization

https://openreview.net/forum?id=DJXt63RLO1

Compressor summary: This paper analyzes the "Goldilocks zone" for deep learning optimization, linking excess positive curvature of the loss to initialization norm, confidence, and a new type of vanishing gradient.


Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss

https://openreview.net/forum?id=DHtF8Y6PqS

Compressor summary: This paper investigates learning with dependent data and square loss in a hypothesis class with tail decay in Orlicz space, and shows that under certain conditions, the empirical risk minimizer achieves a rate that depends only on the complexity of the class and second order statistics, without relying on mixing time.


Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics

https://openreview.net/forum?id=DCmahCZJYb

Compressor summary: Reflected reSGLD is a new algorithm that uses reflection steps to overcome stagnation issues in non-convex learning by constraining the exploration within a bounded domain, leading to better mixing rates and simulation efficiency.


Imitation Learning in Discounted Linear MDPs without exploration assumptions

https://openreview.net/forum?id=DChQpB4AJy

Compressor summary: The paper introduces ILARL, an algorithm for imitation learning in infinite horizon linear MDPs, which reduces the number of trajectories needed and improves accuracy by removing exploration assumptions and leveraging connections to online learning with adversarial losses.


TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic Tasks

https://openreview.net/forum?id=DCNCwaMJjI

Compressor summary: TROVE is a method that generates high-level functions for code LMs to solve various tasks more efficiently and accurately with smaller toolboxes and faster human verification.


StyDeSty: Min-Max Stylization and Destylization for Single Domain Generalization

https://openreview.net/forum?id=DBlkjCDg2i

Compressor summary: The paper proposes a method called StyDeSty that uses stylization and destylization modules to improve single domain generalization, achieving better results than existing approaches on multiple benchmarks.


High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise

https://openreview.net/forum?id=DBI6AuCD4a

Compressor summary: The paper proposes new stochastic optimization methods based on clipping stochastic gradient differences and proves tight high-probability convergence results for composite and distributed problems, addressing limitations of existing approaches.


Mollification Effects of Policy Gradient Methods

https://openreview.net/forum?id=DA2AiCiCaM

Compressor summary: The text discusses how policy gradient methods smooth out complex optimization landscapes in deep reinforcement learning, but can introduce challenges due to stochasticity and exploration.


Adversarially Robust Deep Multi-View Clustering: A Novel Attack and Defense Framework

https://openreview.net/forum?id=D9EfAkQCzh

Compressor summary: This paper investigates adversarial attacks on deep multi-view clustering using GANs, and proposes a novel robust method to defend against them.


Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling

https://openreview.net/forum?id=D8zn1DnTuj

Compressor summary: Reprompting is an algorithm that learns how to generate Chain-of-Thought recipes for various reasoning tasks by iteratively sampling new recipes from previous ones, achieving superior results compared to human-written and existing methods.


Improved Dimensionality Dependence for Zeroth-Order Optimisation over Cross-Polytopes

https://openreview.net/forum?id=D7wi9LIE6i

Compressor summary: The paper presents an algorithm that improves gradient-free optimization over cross-polytopes for applications like adversarial attacks, explainable AI and sparse regression, by reducing the dimensionality dependence.


Learning-Efficient Yet Generalizable Collaborative Filtering for Item Recommendation

https://openreview.net/forum?id=D5IRvFF1lN

Compressor summary: The paper introduces a new squared loss function for item recommendation that connects better to ranking objectives and shows improved performance in experiments.


Generalized Neural Collapse for a Large Number of Classes

https://openreview.net/forum?id=D4B7kkB89m

Compressor summary: The paper studies how neural collapse affects deep models when the number of classes is much larger than the feature space dimension, and shows its occurrence in practice and theory.


A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models

https://openreview.net/forum?id=D32aTei4p5

Compressor summary: The paper analyzes FQE method for policy value estimation using offline data, providing theoretical insights on optimal convergence rates, error bounds, and the role of probability ratio functions.


Outlier-robust Kalman Filtering through Generalised Bayes

https://openreview.net/forum?id=D2MNVeVh5J

Compressor summary: The paper presents a novel Bayesian update rule for online filtering that is robust, efficient, and works well with nonlinear models.


Information Complexity of Stochastic Convex Optimization: Applications to Generalization, Memorization, and Tracing

https://openreview.net/forum?id=CyEJn71Z00

Compressor summary: This paper studies how memorization affects learning in stochastic convex optimization, using conditional mutual information to measure it, and shows its importance for generalization and privacy.


Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

https://openreview.net/forum?id=Cw6Xl0g8a5

Compressor summary: The paper proposes new criteria for explaining vision transformers (ViTs) and introduces a variational Bayesian method, PACE, that provides faithful, stable, sparse, multi-level, and parsimonious explanations of ViT predictions by modeling patch embeddings.


Standardized Interpretable Fairness Measures for Continuous Risk Scores

https://openreview.net/forum?id=CvRu2inbGV

Compressor summary: The paper introduces a new fairness measure based on Wasserstein distance for continuous scores, which is easier to compute and interpret than existing methods, and shows its advantages over ROC-based measures.


Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

https://openreview.net/forum?id=CuiRGtVI55

Compressor summary: CoIn is a module that adds convolutions to ViTs for better adaptation in visuo-motor control tasks by introducing locality and equivariance biases.


Unlock the Cognitive Generalization of Deep Reinforcement Learning via Granular Ball Representation

https://openreview.net/forum?id=CtyLla0DU8

Compressor summary: The paper proposes a framework to improve cognitive generalization in DRL by building a latent space in a simple scenario, segmenting it based on environmental influences, and using it to fine-tune policies in complex scenarios without designing new rewards.


Floating Anchor Diffusion Model for Multi-motif Scaffolding

https://openreview.net/forum?id=CtgJUQxmEo

Compressor summary: FADiff is a novel method for designing proteins with multiple functions and floating motifs, which does not require prior knowledge of their positions.


How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model

https://openreview.net/forum?id=CtEWswTjUd

Compressor summary: This paper proposes a sparse generative model that explains why deep networks learn abstract representations and become insensitive to task invariances, leading to better performance.


Reflective Policy Optimization

https://openreview.net/forum?id=Cs0Xy6WETl

Compressor summary: Reflective Policy Optimization (RPO) improves on-policy reinforcement learning methods by using past and future state-action information for policy optimization, leading to faster convergence and better sample efficiency.


Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs

https://openreview.net/forum?id=CrUmgUaAQp

Compressor summary: The paper explores how multi-agent debate can improve the accuracy of large language models, but finds that other strategies may be more effective depending on hyperparameters and agreement levels.


Meta Evidential Transformer for Few-Shot Open-Set Recognition

https://openreview.net/forum?id=CquFGSIU6w

Compressor summary: MET is a novel FSOSR model that uses an evidential open-set loss, cross-attention mechanism, and evidence-to-variance ratio to improve detection of instances from unseen classes while maintaining closed-set performance.


Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD

https://openreview.net/forum?id=CpgKRKBUTl

Compressor summary: The study proposes a simple modification for SGD that makes the outputs of neural networks provably compressible without requiring any nontrivial assumptions by injecting heavy-tailed noise to the iterates.


Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought

https://openreview.net/forum?id=CpcaL75UgY

Compressor summary: The paper proposes a method to improve complex reasoning in large language models by using pairwise-comparison evaluation instead of noisy point-wise scoring from the LLM, with ensembles and dueling bandits to reduce noise.


Beyond Point Prediction: Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process

https://openreview.net/forum?id=CpI37NA7MO

Compressor summary: SMASH is a new method to learn from spatio-temporal point processes, which can model and predict events with temporal and spatial features more accurately and with uncertainty estimates.


Implicit Bias of AdamW: $\ell_\infty$-Norm Constrained Optimization

https://openreview.net/forum?id=CmXkdlO6JJ

Compressor summary: AdamW outperforms Adam with $\ell_2$ regularization in language modeling tasks due to its implicit constrained optimization that ensures bounded $\ell_\infty$ norm of parameters.


How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis

https://openreview.net/forum?id=CmOmaxkt8p

Compressor summary: The paper presents NegotiationArena, a framework for evaluating how well large language models can negotiate with each other in various scenarios.


Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

https://openreview.net/forum?id=ClWdplZ12B

Compressor summary: The Multi-level Actor-Critic (MAC) framework uses a Multi-level Monte-Carlo (MLMC) gradient estimator to reduce the need for oracle knowledge of the mixing time, improving global convergence and performance in average-reward reinforcement learning.


Accelerating Parallel Sampling of Diffusion Models

https://openreview.net/forum?id=CjVWen8aJL

Compressor summary: This paper proposes ParaTAA, a parallel algorithm that speeds up diffusion model sampling by solving triangular nonlinear equations with systematic techniques.


Statistical Inference Under Constrained Selection Bias

https://openreview.net/forum?id=CiZN2OATRp

Compressor summary: The paper proposes a framework to provide high-probability bounds on estimand values for target distributions using domain knowledge to partially identify them and account for selection bias in large-scale datasets.


IOI: Invisible One-Iteration Adversarial Attack on No-Reference Image- and Video-Quality Metrics

https://openreview.net/forum?id=Chy4rSqy4Y

Compressor summary: The paper presents a fast and effective adversarial attack on no-reference image and video quality metrics, called Invisible One-Iteration (IOI), which can be used to test the robustness of learning-based metrics under video attacks.


Provable Interactive Learning with Hindsight Instruction Feedback

https://openreview.net/forum?id=CgO2cuWWLV

Compressor summary: The paper studies how an agent can learn from instructions that are suitable for its actions, and introduces a no-regret algorithm that depends only on the intrinsic rank of the instruction-response distribution.


Interpreting and Improving Large Language Models in Arithmetic Calculation

https://openreview.net/forum?id=CfOtiepP8s

Compressor summary: This paper investigates how large language models use a small fraction of attention heads for arithmetic calculations and explores fine-tuning these heads to improve their computational performance.


Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations

https://openreview.net/forum?id=CecY6XiUfu

Compressor summary: Reference neural operators (RNO) are a novel data-efficient method for learning the smooth dependence of solutions on geometric deformations, which can significantly reduce errors in partial differential equation simulations with arbitrary geometries.


Learning Iterative Reasoning through Energy Diffusion

https://openreview.net/forum?id=CduFAALvGe

Compressor summary: IRED is a new framework that learns to reason using energy-based optimization and adapts to problem difficulty, achieving better performance in various tasks.


Improving Robustness to Multiple Spurious Correlations by Multi-Objective Optimization

https://openreview.net/forum?id=CbbTF6tDhW

Compressor summary: The paper proposes a novel method to train unbiased and accurate models by grouping data into different shortcuts and optimizing their losses dynamically, achieving minimax Pareto solution.


OAK: Enriching Document Representations using Auxiliary Knowledge for Extreme Classification

https://openreview.net/forum?id=Cbacx90Wkt

Compressor summary: The paper proposes OAK, a framework that uses auxiliary data to improve extreme classification accuracy by enriching document embeddings and selecting precise labels.


Online Isolation Forest

https://openreview.net/forum?id=CbIZatwz9z

Compressor summary: Online-iForest is an efficient online anomaly detection method that adapts to evolving data and outperforms existing solutions in real-world applications.


Bayesian Program Learning by Decompiling Amortized Knowledge

https://openreview.net/forum?id=CbIRQgAYE4

Compressor summary: DreamCoder improves program synthesis by learning to simplify search, compress solutions, and extract relevant components using a neural search policy.


Convergence of Some Convex Message Passing Algorithms to a Fixed Point

https://openreview.net/forum?id=CaxQ5IbHgF

Compressor summary: The paper investigates the convergence properties of methods that minimize upper bounds on MAP inference problems using dual linear programming or Lagrangian relaxation by coordinate descent, and proves that they converge to a fixed point with a specific convergence rate.


Reservoir Computing for Short High-Dimensional Time Series: an Application to SARS-CoV-2 Hospitalization Forecast

https://openreview.net/forum?id=CY0lFwD4qx

Compressor summary: The paper proposes a new method using Reservoir Computing and Genetic Algorithm to forecast SARS-CoV-2 hospitalizations more accurately than existing approaches.


CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks

https://openreview.net/forum?id=CXZqGJonmt

Compressor summary: CosPGD is a robustness-evaluating adversarial attack that uses an alignment score to scale the loss, improving efficiency and effect balance for semantic segmentation and regression models.


Improving Generalization in Offline Reinforcement Learning via Adversarial Data Splitting

https://openreview.net/forum?id=CV9PiQGt0i

Compressor summary: The paper proposes an adversarial data splitting framework to improve offline reinforcement learning's generalization by adaptively extracting knowledge from empirical data and simulating distribution shifts.


Active Preference Learning for Large Language Models

https://openreview.net/forum?id=CTgEV6qgUy

Compressor summary: The paper proposes an active learning strategy for Direct Preference Optimization, which uses a practical acquisition function to improve the rate and performance of fine-tuning large language models with human or AI preferences.


Understanding Unimodal Bias in Multimodal Deep Linear Networks

https://openreview.net/forum?id=CTEMHDSwIj

Compressor summary: The paper studies how the depth of fusion in multimodal neural networks affects unimodal bias, which can lead to poor generalization and permanent reliance on one modality during training.


CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

https://openreview.net/forum?id=CSIfCpXhCF

Compressor summary: The paper presents CrossGET, a framework that adaptively combines tokens in real-time during inference to improve efficiency and performance of vision-language Transformers on various tasks.


Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

https://openreview.net/forum?id=CR6Sl80cn8

Compressor summary: The paper presents a new black-box adversarial attack method that uses a surrogate white-box model as a global function prior, improving query efficiency and success rate.


Quantum Algorithms and Lower Bounds for Finite-Sum Optimization

https://openreview.net/forum?id=CQI3f1U9X1

Compressor summary: The paper proposes a quantum algorithm to solve finite-sum optimization problems with smooth and strongly convex functions, achieving better complexity than the classical bound and providing lower and upper bounds for generalizations.


Interacting Diffusion Processes for Event Sequence Forecasting

https://openreview.net/forum?id=CQH63IbI5o

Compressor summary: The novel approach combines neural temporal point processes with diffusion generative models to improve long-horizon forecasting of irregular event sequences by learning joint distributions of types and inter-arrival times.


Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

https://openreview.net/forum?id=CNicRIVIPA

Compressor summary: Score entropy is a new loss function that improves discrete diffusion models for natural language tasks, outperforming existing methods like GPT-2.


An Independence-promoting Loss for Music Generation with Language Models

https://openreview.net/forum?id=CLJZI5kDhX

Compressor summary: The text introduces a new loss function for auto-encoders that encourages independence between codebooks in music generation using language models, improving audio quality and speed.


Replicable Learning of Large-Margin Halfspaces

https://openreview.net/forum?id=CKCzfU9YKE

Compressor summary: The paper presents efficient, dimension-independent replicable algorithms for learning large-margin halfspaces with improved sample complexity and accuracy compared to existing methods.


Position: On the Possibilities of AI-Generated Text Detection

https://openreview.net/forum?id=CJbhtpcyGL

Compressor summary: The study shows how to reliably detect machine-generated text from human-written text using information theory and empirical tests with various datasets and text generators.


Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

https://openreview.net/forum?id=CHz7WshPcp

Compressor summary: Deep LTMLE is a novel transformer-based method that estimates counterfactual outcomes under dynamic treatment policies in longitudinal studies, correcting for bias and providing confidence intervals using TMLE framework and asymptotic theory.


TSLANet: Rethinking Transformers for Time Series Representation Learning

https://openreview.net/forum?id=CGR3vpX63X

Compressor summary: TSLANet is a universal convolutional model for time series tasks that captures long-term and short-term interactions, enhances feature representation, and adapts to noise levels and data sizes.


Self-cognitive Denoising in the Presence of Multiple Noisy Label Sources

https://openreview.net/forum?id=CG44RLeXt1

Compressor summary: Key points: - Neural networks can learn from noisy labels by using their self-cognition ability - Self-cognition allows them to distinguish noise within and among label sources - SDM is a method that exploits this ability to denoise during training - Selective distillation module improves efficiency Summary: The paper proposes SDM, a method that leverages neural networks' self-cognition ability to learn from noisy labels and improve efficiency with selective distillation.


Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation

https://openreview.net/forum?id=CEfr3h68KU

Compressor summary: The paper analyzes malicious backdoors in quantized models and proposes a method to purify them by aligning activation distributions.


Accelerating Iterative Retrieval-augmented Language Model Serving with Speculation

https://openreview.net/forum?id=CDnv4vg02f

Compressor summary: RaLMSpec improves the speed and performance of iterative language model serving with speculative retrieval, batched verification, and several optimizations.


Pedestrian Attribute Recognition as Label-balanced Multi-label Learning

https://openreview.net/forum?id=CD2xl1L5es

Compressor summary: The paper proposes a framework to balance labels and semantics in pedestrian attribute datasets and improve model accuracy with minimal computational cost.


Fast Peer Adaptation with Context-aware Exploration

https://openreview.net/forum?id=CBcNl5Eo32

Compressor summary: The paper proposes a reward system for learning agents to identify their peers' strategies in multi-agent games, enabling faster adaptation and better outcomes.


Factored-Reward Bandits with Intermediate Observations

https://openreview.net/forum?id=C7Z8EhZ6bl

Compressor summary: The paper introduces Factored-Reward Bandits, a setting for sequential decision problems with structured intermediate effects, and proposes two algorithms to minimize regret in this setting.


Scalable Online Exploration via Coverability

https://openreview.net/forum?id=C64clssMVU

Compressor summary: The text proposes L1-Coverage as a new exploration objective for reinforcement learning that balances intrinsic complexity, efficient planning, and efficient exploration in high-dimensional domains.


Parameter-Efficient Fine-Tuning with Controls

https://openreview.net/forum?id=C4nalr0DoE

Compressor summary: The paper proposes a new perspective on Low-Rank Adaptation (LoRA) as a control process and improves its performance using parameter-free attention mechanisms without increasing parameters.


Learning Latent Dynamic Robust Representations for World Models

https://openreview.net/forum?id=C4jkx6AgWc

Compressor summary: The paper proposes a method to improve visual MBRL agents by using spatio-temporal masking, bisimulation, latent reconstruction, and a Hybrid Recurrent State-Space Model (HRSSM) to handle noisy inputs and learn robust world models for better control tasks.


AlphaZero-Like Tree-Search can Guide Large Language Model Decoding and Training

https://openreview.net/forum?id=C4OpREezgj

Compressor summary: The authors propose TS-LLM, a tree-search learning framework for large language models that adapts to various tasks, sizes, and search depths, and improves LLM reasoning and planning abilities.


Causal Effect Identification in LiNGAM Models with Latent Confounders

https://openreview.net/forum?id=C1iNBLIClt

Compressor summary: The authors develop methods to identify and estimate causal effects in linear models with latent variables, under known or unknown causal graphs, using a modified RICA algorithm.


Toward Availability Attacks in 3D Point Clouds

https://openreview.net/forum?id=C0sGIO2MZN

Compressor summary: The paper presents a new method to protect data privacy in 3D deep learning by creating shortcuts in the feature space that prevent degeneracy in bi-level optimization.


RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences

https://openreview.net/forum?id=BxAvcnlS8O

Compressor summary: RIME is a robust preference-based reinforcement learning algorithm that uses sample selection and warm starts to learn from noisy human preferences.


Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model

https://openreview.net/forum?id=BwAkaxqiLB

Compressor summary: Evolution of Heuristic is a novel method that uses large language models and evolutionary computation to automatically design high-performance heuristics for complex optimization problems, outperforming handcrafted heuristics and other automatic design methods.


Uncertainty-Aware Reward-Free Exploration with General Function Approximation

https://openreview.net/forum?id=BvBdYSIkpb

Compressor summary: The paper proposes a reward-free reinforcement learning algorithm, GFA-RFE, that uses uncertainty-aware exploration and weighted learning to improve sample efficiency for learning multiple tasks.


A connection between Tempering and Entropic Mirror Descent

https://openreview.net/forum?id=BtbijvkWLC

Compressor summary: The paper links tempering in Sequential Monte Carlo to entropic mirror descent, shows it as a descent scheme of the KL divergence with respect to different geometries, and derives adaptive tempering rules that perform better than alternatives.


Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics

https://openreview.net/forum?id=BrZPj9rEpN

Compressor summary: DORA is a novel approach for learning adaptable policies from limited offline data by using an information bottleneck principle to improve dynamics encoding and online adaptation.


Dynamic Anisotropic Smoothing for Noisy Derivative-Free Optimization

https://openreview.net/forum?id=BrCrnaCYDc

Compressor summary: The algorithm adapts the smoothing kernel to approximate the Hessian of the objective function, improving gradient estimation and optimization performance in noisy conditions.


Selecting Large Language Model to Fine-tune via Rectified Scaling Law

https://openreview.net/forum?id=Bq2THeNXRr

Compressor summary: This paper proposes a new method to select the best pre-trained language model for fine-tuning by predicting its performance and introducing a modified Scaling Law that captures a previously unobserved phase transition phenomenon.


Weighted distance nearest neighbor condensing

https://openreview.net/forum?id=BoPj12CnAn

Compressor summary: The paper proposes weighted distance nearest neighbor condensing, a new method for condensing data points with better performance than standard nearest neighbor rule and similar generalization bounds.


FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

https://openreview.net/forum?id=BmPWtzL7Eq

Compressor summary: The paper proposes FuRL, a fine-tuning method for using pre-trained VLMs as rewards in RL to improve performance on sparse reward tasks with textual descriptions.


Proactive Detection of Voice Cloning with Localized Watermarking

https://openreview.net/forum?id=Bic3Vmy2DG

Compressor summary: AudioSeal is a fast and effective audio watermarking technique for detecting AI-generated speech with high accuracy and low imperceptibility, which can be used to ensure voice authenticity.


Efficient World Models with Context-Aware Tokenization

https://openreview.net/forum?id=BiWIERWBFX

Compressor summary: The paper introduces $\Delta$-IRIS, a new model-based RL agent that uses discrete autoencoders and autoregressive transformers to encode stochastic deltas between time steps, achieving state of the art results on Crafter benchmark while being faster to train than previous methods.


Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation

https://openreview.net/forum?id=BiENLaUwlK

Compressor summary: The paper proposes a new method (GBE) for estimating constraint changes in finite-horizon non-discounted Safe RL, which leads to a novel algorithm (CGPO) that effectively identifies feasible optimal policies.


Position: Technical Research and Talent is Needed for Effective AI Governance

https://openreview.net/forum?id=Be2B6f0ps1

Compressor summary: The paper examines how governments struggle to regulate AI due to gaps between policy goals and technical feasibility, and calls for closer collaboration between AI researchers and policymakers.


Private Truly-Everlasting Robust-Prediction

https://openreview.net/forum?id=BdQTCAuT6L

Compressor summary: PEP is a private learning model that never releases a hypothesis, but provides predictions via an oracle using unlabeled examples from the underlying distribution, with improvements in robustness and privacy.


Position: Open-Endedness is Essential for Artificial Superhuman Intelligence

https://openreview.net/forum?id=Bc4vZ2CX7E

Compressor summary: The paper argues that open-endedness is a key property for artificial superhuman intelligence and proposes a path to achieve it using foundation models that can make novel, human-relevant discoveries.


Causal Inference out of Control: Estimating Performativity without Treatment Randomization

https://openreview.net/forum?id=Bb8pOvWIe4

Compressor summary: The paper proposes a method to estimate the causal effect of algorithmic actions on user consumption without randomized experiments, using assumptions about the dynamics of consumption over time and control theory.


Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints

https://openreview.net/forum?id=BajM6YzKvm

Compressor summary: The paper proposes a new recommendation algorithm (MMTS) for online matching markets with unknown preferences and quota constraints, using bandit learning and a double matching technique to achieve stability and low regret.


Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines

https://openreview.net/forum?id=BTkaKA74mS

Compressor summary: This paper proposes a mathematical framework to analyze and improve Generative Masked Language Models (GMLMs), which are non-autoregressive text generation models that balance speed and quality, and demonstrates their effectiveness in machine translation tasks.


WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

https://openreview.net/forum?id=BRfqYrikdo

Compressor summary: The paper introduces WorkArena and BrowserGym to evaluate large language model agents' ability to interact with enterprise software systems, finding that current agents show promise but are far from fully automated and revealing a significant performance gap between open and closed-source LLMs.


Mitigating Label Noise on Graphs via Topological Sample Selection

https://openreview.net/forum?id=BRIcZiK5Fr

Compressor summary: TSS is a method for selecting informative nodes in noisy graph data using topological information to improve GNNs' performance.


Online Speculative Decoding

https://openreview.net/forum?id=BPQHXwVNvl

Compressor summary: Online speculative decoding updates draft models on user query data to improve accuracy and reduce latency in large language model inference.


VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

https://openreview.net/forum?id=BOunbuapcv

Compressor summary: VQDNA is a framework that improves genome tokenization by using vector-quantized codebooks to learn patterns from the genome vocabulary and Hierarchical Residual Quantization to enrich it further, outperforming existing models and revealing insights on SARS-CoV-2 mutations.


ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback

https://openreview.net/forum?id=BOorDpKHiJ

Compressor summary: The authors propose UltraFeedback, a large and diverse dataset of AI feedback for user-assistant interactions, and show how it can be used to align open-source chat language models without human feedback.


A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction

https://openreview.net/forum?id=BOFjRnJ9mX

Compressor summary: The paper proposes a method, GMTNet, to predict tensor properties of crystalline materials that preserves their symmetries and outperforms existing methods.


On Least Square Estimation in Softmax Gating Mixture of Experts

https://openreview.net/forum?id=BO0jookxk8

Compressor summary: This paper studies the least squares estimators for deterministic mixture of experts models and shows that feed forward networks with sigmoid or tanh activation functions converge faster than polynomial experts.


Probabilistic Time Series Modeling with Decomposable Denoising Diffusion Model

https://openreview.net/forum?id=BNH8spaR3l

Compressor summary: The paper introduces a new probabilistic time series model ($ ext{D}^3 ext{M}$) that unifies denoising diffusion models and continuous flow models, improves generation speed, and performs well on imputation and forecasting tasks.


In-Context Learning Agents Are Asymmetric Belief Updaters

https://openreview.net/forum?id=BNAvYSCrLD

Compressor summary: In-context learning dynamics of large language models depend on problem framing and are influenced by agency implications.


Multi-View Stochastic Block Models

https://openreview.net/forum?id=BJx1K4lAAX

Compressor summary: The paper introduces multi-view stochastic block models for graph clustering using multiple data sources and presents efficient algorithms that improve on existing methods.


Scaling Tractable Probabilistic Circuits: A Systems Perspective

https://openreview.net/forum?id=BIbjwcrg0V

Compressor summary: PyJuice is a GPU implementation for probabilistic circuits that improves speed and memory efficiency for large-scale generative models.


Position: Relational Deep Learning - Graph Representation Learning on Relational Databases

https://openreview.net/forum?id=BIMSHniyCP

Compressor summary: Relational Deep Learning (RDL) is a method to learn from relational databases using Graph Neural Networks without manual feature engineering.


Position: The Platonic Representation Hypothesis

https://openreview.net/forum?id=BH8TYy0r6u

Compressor summary: The text argues that AI models are increasingly similar in how they represent data, possibly converging on a shared statistical model of reality called the platonic representation.


Uncertainty for Active Learning on Graphs

https://openreview.net/forum?id=BCEtumPYDt

Compressor summary: Uncertainty Sampling is a useful Active Learning method for node classification on graphs, especially when using ground-truth Bayesian uncertainty estimates.


Random features models: a way to study the success of naive imputation

https://openreview.net/forum?id=B5g6y7JlMw

Compressor summary: Constant imputation may introduce bias in linear prediction, but its relevance and performance are improved when using a random features model and stochastic gradient predictors, especially for high-dimensional data and missing completely at random (MCAR) cases.


Automated Statistical Model Discovery with Language Models

https://openreview.net/forum?id=B5906M4Wnd

Compressor summary: The paper introduces a method for automated statistical model discovery using large language models, which iteratively propose, critique, and refine models without defining a domain-specific language or search procedure.


CCM: Real-Time Controllable Visual Content Creation Using Text-to-Image Consistency Models

https://openreview.net/forum?id=B4rViOCoNf

Compressor summary: The paper explores how to use ControlNet to add conditional controls to consistency models for controllable visual content creation, and proposes a tailored training strategy using consistency training to improve image quality and details.


LLaGA: Large Language and Graph Assistant

https://openreview.net/forum?id=B48Pzc4oKi

Compressor summary: LLaGA is a new model that combines large language models with graph data analysis, enabling versatile, generalizable, and interpretable results on various graph tasks.


Learning Universal Predictors

https://openreview.net/forum?id=B1ajnQyZgK

Compressor summary: The paper explores how to use Universal Turing Machines to generate diverse training data for meta-learning, enabling neural networks to learn universal prediction strategies.


NExT: Teaching Large Language Models to Reason about Code Execution

https://openreview.net/forum?id=B1W712hMBi

Compressor summary: NExT is a method that teaches LLMs to understand how programs execute at run-time by inspecting execution traces and reasoning through chain-of-thought rationales, improving their ability to repair code.


DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

https://openreview.net/forum?id=B0xmynxt4f

Compressor summary: DISCRET is a self-interpretable AI framework that generates faithful explanations for individual treatment effect estimation, using database queries and reinforcement learning to balance accuracy and faithfulness.


On the Weight Dynamics of Deep Normalized Networks

https://openreview.net/forum?id=AzUCfhJ9Bs

Compressor summary: The paper analyzes how disparities in learning rates across layers affect trainability in deep neural networks and proposes a warm-up method to minimize these disparities.


TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

https://openreview.net/forum?id=AxmefV2NEf

Compressor summary: TimeMIL is a novel weakly supervised method for multivariate time series classification that uses a tokenized transformer and wavelet positional token to better locate patterns of interest and account for temporal dependencies.


Learning Constraints from Offline Demonstrations via Superior Distribution Correction Estimation

https://openreview.net/forum?id=Ax90jQPbgF

Compressor summary: ICSDICE is an offline ICRL algorithm that learns safety constraints and control policies from expert data, focusing on feasible constraints and transferable estimations.


Principled Gradient-Based MCMC for Conditional Sampling of Text

https://openreview.net/forum?id=AwLLSlJAeJ

Compressor summary: The paper proposes a method for sampling text from an energy-based model that uses gradient information and works well in practice, overcoming previous limitations of gradient-based MCMC for text generation.


Trustless Audits without Revealing Data or Models

https://openreview.net/forum?id=AtVtt9xsO1

Compressor summary: ZkAudit is a protocol that allows model providers to keep their models secret while enabling trustless audits of properties like copyright and censorship by using cryptographic commitments and zero-knowledge proofs.


Environment Design for Inverse Reinforcement Learning

https://openreview.net/forum?id=Ar0dsOMStE

Compressor summary: Adaptive environment design improves sample-efficiency and robustness of learning reward functions from expert demonstrations by repeatedly interacting with the expert in varied environments.


Profile Reconstruction from Private Sketches

https://openreview.net/forum?id=AqGCEHK9dZ

Compressor summary: The paper proposes a method to privately and accurately estimate the frequency of items in a multiset using histograms and discrete Laplace noise.


Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models

https://openreview.net/forum?id=AqBz54aFyj

Compressor summary: The text discusses a novel multi-objective optimization approach for watermarking AI-generated texts to distinguish them from human-written ones while maintaining semantic quality.


Weisfeiler Leman for Euclidean Equivariant Machine Learning

https://openreview.net/forum?id=ApRKrKZJSk

Compressor summary: The paper presents a universal equivariant graph neural network architecture for point clouds with positions and velocities, based on extending the $2$-Weisfeiler-Leman test to these scenarios.


Sequential Disentanglement by Extracting Static Information From A Single Sequence Element

https://openreview.net/forum?id=AocOA4h3bu

Compressor summary: The paper proposes a new representation learning method that reduces information leakage by using subtraction inductive bias and outperforms existing methods on various data types.


FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

https://openreview.net/forum?id=AoYhtJ4A90

Compressor summary: FedBPT is a framework that enables efficient and private fine-tuning of pre-trained language models using black-box inference and gradient-free optimization methods, reducing communication and memory costs.


Why do Variational Autoencoders Really Promote Disentanglement?

https://openreview.net/forum?id=Ao9UUaScAU

Compressor summary: The paper investigates how the decoder's orthogonality properties in variational autoencoders contribute to disentangled representation learning, both theoretically and experimentally.


Consistent Submodular Maximization

https://openreview.net/forum?id=AlJkqMnyjL

Compressor summary: The paper proposes algorithms for dynamic optimization of monotone submodular functions under cardinality constraints, balancing consistency and approximation quality.


Knowledge Graphs Can be Learned with Just Intersection Features

https://openreview.net/forum?id=Al5GlVytqi

Compressor summary: The authors propose a new algorithm to generate intersection features for candidate triples in knowledge graphs, which improves link prediction performance and training efficiency.


Two-timescale Derivative Free Optimization for Performative Prediction with Markovian Data

https://openreview.net/forum?id=Aj18fUB6Th

Compressor summary: This paper proposes a two-timescale derivative free optimization algorithm for performative prediction in decision-dependent settings with controlled Markov chains, and shows its sample complexity.


Deciphering RNA Secondary Structure Prediction: A Probabilistic K-Rook Matching Perspective

https://openreview.net/forum?id=Ada9Z68nvb

Compressor summary: The authors propose RFold, a fast and accurate method for predicting RNA secondary structure using a K-Rook problem formulation and bi-dimensional optimization.


Counterfactual Metarules for Local and Global Recourse

https://openreview.net/forum?id=Ad9msn1SKC

Compressor summary: T-CREx is a fast and effective method for explaining how decisions would change under different scenarios using generalised rules and metarules.


DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving

https://openreview.net/forum?id=AbGbGZFYOD

Compressor summary: DéjàVu improves distributed LLM serving efficiency by using DéjàVuLib to reduce bubbles, manage GPU memory, and ensure fault tolerance.


PcLast: Discovering Plannable Continuous Latent States

https://openreview.net/forum?id=AaTYLZQPyC

Compressor summary: The paper proposes a method to learn reachable state associations from multi-step inverse dynamics for effective goal-conditioned planning and policy learning in various simulation environments.


Revisiting Character-level Adversarial Attacks for Language Models

https://openreview.net/forum?id=AZWqXfM6z9

Compressor summary: The paper introduces Charmer, a query-based character-level adversarial attack that can achieve high success rate and similarity on both small and large NLP models.


Robustly Learning Single-Index Models via Alignment Sharpness

https://openreview.net/forum?id=AZ1tWCa9j3

Compressor summary: The paper presents an efficient learning algorithm for Single-Index Models under the $L_2^2$ loss in the agnostic model, which works for various distributions and link functions, and introduces a new concept called alignment sharpness.


X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

https://openreview.net/forum?id=AYbXN9poJl

Compressor summary: X-Oscar is a framework for creating high-quality 3D avatars from text prompts using a step-by-step "Geometry→Texture→Animation" approach and new techniques to overcome oversaturation and low-quality output issues.


Prior Mismatch and Adaptation in PnP-ADMM with a Nonconvex Convergence Analysis

https://openreview.net/forum?id=AYWBRwsZ8z

Compressor summary: This paper studies how prior distribution mismatch affects Plug-and-Play methods for image inverse problems and proposes a domain adaptation strategy to mitigate its impact.


Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

https://openreview.net/forum?id=AVEc9LvSlO

Compressor summary: The text proposes a method to teach generative models to estimate their uncertainty and predict true conditional distributions by training them to cheat on independent response pairs.


Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

https://openreview.net/forum?id=ATvN9JnqZ8

Compressor summary: Key points: - EnzyGen is an approach to design functional enzymes based on their sites and substrates - It uses a novel network of attention and equivariant layers to capture sequence and 3D information - It outperforms baseline methods in substrate binding affinity across all enzyme families Summary: EnzyGen is a method that can automatically design functional enzymes by learning from their sites and substrates, using a novel network that captures both sequence and 3D information, and achieving superior performance in binding affinity.


COALA: A Practical and Vision-Centric Federated Learning Platform

https://openreview.net/forum?id=ATRnM8PyQX

Compressor summary: COALA is a vision-centric Federated Learning platform that supports various tasks, data types, and model configurations, as well as customization and benchmarking for real-world scenarios.


Analysis for Abductive Learning and Neural-Symbolic Reasoning Shortcuts

https://openreview.net/forum?id=AQYabSOfci

Compressor summary: The paper proposes a simple analysis to quantify and mitigate reasoning shortcuts in abductive learning models and neural-symbolic predictive models, affecting their generalization ability.


Constrained Ensemble Exploration for Unsupervised Skill Discovery

https://openreview.net/forum?id=AOJCCFTlfJ

Compressor summary: This paper introduces a new unsupervised RL framework that combines local partition exploration with state distribution constraints, leading to better state coverage and skill learning for downstream tasks.


Interplay of ROC and Precision-Recall AUCs: Theoretical Limits and Practical Implications in Binary Classification

https://openreview.net/forum?id=ALc7DmOTI2

Compressor summary: The paper presents two theorems for binary classification models that show the importance of Precision-Recall AUC over Receiver Operating Characteristic AUC, especially for imbalanced datasets, and provide a method to compare them.


Reinforcement Learning within Tree Search for Fast Macro Placement

https://openreview.net/forum?id=AJGwSx0RUV

Compressor summary: EfficientPlace is a novel framework that combines global tree search and reinforcement learning for fast and high-quality macro chip placement, overcoming the sample inefficiency of existing techniques.


Position: Stop Making Unscientific AGI Performance Claims

https://openreview.net/forum?id=AIXUuLCuMe

Compressor summary: The text discusses how large language models create meaningful representations that correlate with external variables but are not evidence of artificial general intelligence, and calls for caution in interpreting and communicating such results.


Learning Exceptional Subgroups by End-to-End Maximizing KL-Divergence

https://openreview.net/forum?id=AG45XqwPKU

Compressor summary: SYFLOW is an approach that uses normalizing flows to model target distributions and a novel neural layer for interpretable subgroup descriptions, enabling the discovery of diverse and exceptional sub-populations in large datasets.


A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization

https://openreview.net/forum?id=AFfXlKFHXJ

Compressor summary: This paper proposes a method for learning from intractable distributions in combinatorial optimization using latent variable models without relying on exact sample likelihoods, and shows its effectiveness on various benchmark problems.


DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation

https://openreview.net/forum?id=AFAX28TdO4

Compressor summary: DFlow is a new generative framework that combines Normalizing Flow with Denoising AutoEncoder for high-quality waveform generation and outperforms existing methods in speed and quality.


Understanding Diffusion Models by Feynman's Path Integral

https://openreview.net/forum?id=AEqim4X0NV

Compressor summary: The paper proposes a new way to understand diffusion models using quantum physics concepts, which can help explain why one sampling method works better than another depending on a key parameter.


On the Consistency of Kernel Methods with Dependent Observations

https://openreview.net/forum?id=AEHXvoOxV9

Compressor summary: Empirical weak convergence (EWC) is a general assumption that explains why kernel methods perform well under non-independent and non-mixing data, and this paper shows how it applies to various kernel methods including SVMs, kernel mean embeddings, and finite- and infinite-dimensional outputs.


Do Large Code Models Understand Programming Concepts? Counterfactual Analysis for Code Predicates

https://openreview.net/forum?id=ADnUzsmsLW

Compressor summary: The paper explores why Large Language Models struggle with logical constructs in code generation and proposes a counterfactual testing framework to evaluate their understanding of programming concepts.


Parsimonious Learning-Augmented Approximations for Dense Instances of $\mathcal{NP}$-hard Problems

https://openreview.net/forum?id=AD5QC1BTJL

Compressor summary: The paper extends and speeds up a scheme that gives polynomial time approximations for $\mathcal{NP}$-hard problems using one-bit predictions and aims at finding fast algorithms with approximation consistency, smoothness and robustness.


Learning Optimal Deterministic Policies with Stochastic Policy Gradients

https://openreview.net/forum?id=ABt0jlLZtX

Compressor summary: The paper studies how to learn stochastic policies for continuous RL, then convert them to deterministic ones while ensuring good performance and convergence.


Quantum Theory and Application of Contextual Optimal Transport

https://openreview.net/forum?id=A9hJvQHEEP

Compressor summary: The paper proposes a quantum computing method (QontOT) for learning conditional distribution of transportation plans, which outperforms classical Neural OT on challenging tasks and real data.


Learning Associative Memories with Gradient Descent

https://openreview.net/forum?id=A9fLbXLRTK

Compressor summary: The paper studies how associative memory modules with token embeddings learn and converges, finding insights about overparameterized, imbalanced, and underparameterized regimes.


A Statistical Framework for Data-dependent Retrieval-Augmented Models

https://openreview.net/forum?id=A9MiJdetnZ

Compressor summary: The text proposes a framework to study and train retrieval-augmented models for ML systems, which use a retriever to find relevant information and a predictor to make final predictions, with excess risk bounds analysis.


Graph2Tac: Online Representation Learning of Formal Math Concepts

https://openreview.net/forum?id=A7CtiozznN

Compressor summary: The text discusses how online learning techniques can significantly improve the ability of proof assistants to solve mathematical theorems by exploiting locality properties, and introduces two solvers that outperform other general purpose provers.


Tuning-Free Stochastic Optimization

https://openreview.net/forum?id=A6fmX9QCEa

Compressor summary: Tuning-free algorithms can perform similarly to optimally-tuned ones for some machine learning problems but not all, and their effectiveness depends on problem parameters and noise distribution.


A Statistical Theory of Regularization-Based Continual Learning

https://openreview.net/forum?id=A54CXWn9VB

Compressor summary: The paper analyzes how different regularization terms affect linear regression in continual learning, showing that generalized $\ell_2$-regularization algorithms can achieve optimal performance while balancing trade-offs and handling data heterogeneity.


Learning Multiple Secrets in Mastermind

https://openreview.net/forum?id=A0N39kgRZq

Compressor summary: The text discusses an adaptive algorithm for learning an unknown subset of a hypercube with efficient query complexity and explores variants of the problem with different constraints.


Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization

https://openreview.net/forum?id=9zlZuAAb08

Compressor summary: QDHF is a novel approach that infers diversity metrics from human feedback to improve the quality and diversity of solutions in complex and open-ended domains, such as text-to-image generation.


Unsupervised Episode Generation for Graph Meta-learning

https://openreview.net/forum?id=9zdTOOgutk

Compressor summary: The authors propose Neighbors as Queries (NaQ), an unsupervised episode generation method for few-shot node-classification that uses graph meta-learning to better utilize node information and adapts to downstream tasks.


When is Transfer Learning Possible?

https://openreview.net/forum?id=9yADTDHgGu

Compressor summary: The paper proposes a general framework for transfer learning across different types of learning problems and shows how it can reveal new insights and challenge common assumptions about when and how transfer learning works.


Chain-of-Thought Predictive Control

https://openreview.net/forum?id=9xUpLGAOy9

Compressor summary: Key points: - The paper proposes a hierarchical imitation learning method for complex low-level control tasks using sub-optimal demonstrations. - The method discovers and predicts the chain-of-thought (CoT) of the demonstrations as guidance for policy learning. - The method outperforms existing baselines on various manipulation tasks. Summary: The paper introduces a novel imitation learning method that learns to predict the chain-of-thought of sub-optimal demos and uses it to guide policy learning for complex low-level control tasks, achieving superior results.


Flextron: Many-in-One Flexible Large Language Model

https://openreview.net/forum?id=9vKRhnflAs

Compressor summary: Flextron is a network architecture and optimization framework that enables efficient adaptation of large language models for specific latency and accuracy targets during inference without fine-tuning or additional training.


Pi-DUAL: Using privileged information to distinguish clean from noisy labels

https://openreview.net/forum?id=9oAXix8da9

Compressor summary: Pi-DUAL uses privileged information to distinguish clean from wrong labels and improve deep learning models' performance in the presence of label noise.


Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

https://openreview.net/forum?id=9laB7ytoMp

Compressor summary: Skill Set Optimization (SSO) improves large language models' sequential decision making by constructing and refining sets of transferable skills using environment reward signals.


CARTE: Pretraining and Transfer for Tabular Learning

https://openreview.net/forum?id=9kArQnKLDp

Compressor summary: CARTE is a neural architecture that uses graph representation and attention to process tables without needing entity and schema matching, achieving better performance than tree-based models.


Regularizing with Pseudo-Negatives for Continual Self-Supervised Learning

https://openreview.net/forum?id=9jXS07TIBH

Compressor summary: The text introduces a new method, Pseudo-Negative Regularization (PNR), for continual self-supervised learning that uses pseudo-negatives from previous models to maintain consistency in representation learning.


Online Linear Regression in Dynamic Environments via Discounting

https://openreview.net/forum?id=9iRGs3wBTy

Compressor summary: The text develops algorithms for online linear regression with optimal regret guarantees, even without prior knowledge, using a novel analysis of a discounted forecaster and showing its optimal performance.


Sliding Down the Stairs: How Correlated Latent Variables Accelerate Learning with Neural Networks

https://openreview.net/forum?id=9iGdh0wAgB

Compressor summary: The text explains how neural networks use correlations between latent variables to efficiently learn from higher-order input cumulants, which are crucial for their performance but computationally hard to extract.


Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes

https://openreview.net/forum?id=9cG1oRnqNd

Compressor summary: The paper proposes $ exttt{CLLM}$, a method that uses large language models to generate and curate high-quality synthetic data for machine learning tasks in low-data settings, outperforming conventional generators.


Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution

https://openreview.net/forum?id=9ZxnPZGmPU

Compressor summary: Promptbreeder is a self-improving mechanism that evolves prompts for various domains, outperforming state-of-the-art methods on reasoning and hate speech tasks.


Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

https://openreview.net/forum?id=9ZkUFSwlUH

Compressor summary: Di-SkilL is an RL method that learns diverse skills using Mixture of Experts with maximum entropy and energy-based models for context distribution.


A Universal Class of Sharpness-Aware Minimization Algorithms

https://openreview.net/forum?id=9Ub6nLqdMo

Compressor summary: The paper introduces new sharpness measures for optimization algorithms in overparameterized models like neural networks, and shows their effectiveness in minimizing these measures and generalizing better.


Adaptively Perturbed Mirror Descent for Learning in Games

https://openreview.net/forum?id=9U29U3cDKq

Compressor summary: The paper introduces APMD, an algorithm that adjusts payoff perturbations in games to achieve faster convergence to Nash equilibria.


Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

https://openreview.net/forum?id=9Tq4L3Go9f

Compressor summary: The paper proposes a new algorithm, BAC, that uses a Blended Exploitation and Exploration operator to address underestimated Q-values in deep reinforcement learning and improve performance in continuous control tasks and real-world robot problems.


SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models

https://openreview.net/forum?id=9Rroj9GIOQ

Compressor summary: This paper introduces SPP, a method to efficiently fine-tune large language models by preserving their sparsity and structure while optimizing their weights.


Centralized Selection with Preferences in the Presence of Biases

https://openreview.net/forum?id=9QRcp2ubDt

Compressor summary: The paper proposes a fair and efficient algorithm for selecting candidates from biased groups to institutions while maximizing true utility.


Accelerating Convergence in Bayesian Few-Shot Classification

https://openreview.net/forum?id=9PQnc6EWdL

Compressor summary: The paper combines mirror descent-based variational inference with Gaussian processes for fast and accurate few-shot classification, improving uncertainty estimation and convergence speed.


Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

https://openreview.net/forum?id=9L7BZiTtJR

Compressor summary: The text discusses how overparameterization can improve neural network performance but may reduce the benefits of curriculum learning, and presents a theoretical analysis of this interaction in a 2-layer network example.


Open-Domain Text Evaluation via Contrastive Distribution Methods

https://openreview.net/forum?id=9HdQr68Zyl

Compressor summary: The paper proposes a new method, Contrastive Distribution Methods (CDM), to evaluate open-domain text generation by leveraging the connection between increasing model parameters and improved performance.


Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

https://openreview.net/forum?id=9HPoJ6ulgV

Compressor summary: The authors present the first polynomial transformer, enabling secure inference with homomorphic encryption on full transformers for various tasks, with results comparable to traditional models.


REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates

https://openreview.net/forum?id=9GbAea74O6

Compressor summary: The paper presents REST, a fast and memory-efficient graph-based method for real-time EEG signal analysis, achieving high accuracy in epileptic seizure detection and classification tasks.


In-context Convergence of Transformers

https://openreview.net/forum?id=9GLvXGkUE2

Compressor summary: This paper studies how one-layer transformers with softmax attention learn linear function classes in structured data models with balanced or imbalanced feature vectors using gradient descent and reveals their learning dynamics.


EvIL: Evolution Strategies for Generalisable Imitation Learning

https://openreview.net/forum?id=9DMMvMTDur

Compressor summary: The paper proposes reward model ensembles and EvIL method to improve imitation learning by addressing weak reward recovery and poor reward shaping issues.


Loss Shaping Constraints for Long-Term Time Series Forecasting

https://openreview.net/forum?id=9CCoVyFuEp

Compressor summary: The authors propose a loss shaping constraints method for long-term time series forecasting that aims to balance average performance and error bounds at each step, improving the distribution of errors across the predicted window.


QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

https://openreview.net/forum?id=9BrydUVcoe

Compressor summary: QuIP# is a weight-only post-training quantization method that achieves state-of-the-art compression results using the randomized Hadamard transform, vector quantization with $E_8$ lattice codebooks, and fine-tuning.


A Contextual Combinatorial Bandit Approach to Negotiation

https://openreview.net/forum?id=9BWRs6XF8P

Compressor summary: The paper proposes NegUCB, a novel learning-based method for effective negotiation that uses contextual combinatorial multi-armed bandits to tackle exploration-exploitation dilemma and large action spaces in negotiation problems.


ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy

https://openreview.net/forum?id=9BGi9PEhNn

Compressor summary: The authors compare ConvNet and Vision Transformer models on various aspects beyond ImageNet accuracy, finding differences in types of mistakes, output calibration, transferability, and feature invariance, suggesting a need for more nuanced model selection.


HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

https://openreview.net/forum?id=9ANyvRtFGa

Compressor summary: HexGen is a distributed inference engine for large language models that supports efficient deployment across diverse GPUs and reduces inference costs by adaptive scheduling.


Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

https://openreview.net/forum?id=99jx5U81jx

Compressor summary: The study evaluates how well large language models can explain their reasoning and suggests two metrics based on counterfactual simulatability to measure explanation quality.


How Does Goal Relabeling Improve Sample Efficiency?

https://openreview.net/forum?id=99UFZV2VpU

Compressor summary: Our paper investigates the theoretical benefits of goal relabeling in reinforcement learning, leading to a new algorithm (GOALIVE) and a complexity measure (GOAL-BE).


ReconBoost: Boosting Can Achieve Modality Reconcilement

https://openreview.net/forum?id=93gjGDwqim

Compressor summary: The paper proposes ReconBoost, a novel multi-modal learning method that alternates between updating fixed modalities to balance exploration and exploitation, addressing modality competition issues.


Knowledge Distillation with Auxiliary Variable

https://openreview.net/forum?id=91QmrfztSP

Compressor summary: The paper proposes a new knowledge distillation method using an auxiliary variable to better transfer knowledge from a teacher model to a student model, improving performance over existing methods.


WISER: Weak Supervision and Supervised Representation Learning to Improve Drug Response Prediction in Cancer

https://openreview.net/forum?id=8ySQaphUYH

Compressor summary: WISER is a new method that uses weak supervision and supervised representation learning to predict personalized cancer drug responses using genomic data from patients.


AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA

https://openreview.net/forum?id=8xKGZsnV2a

Compressor summary: The text introduces AquaLoRA, a method for protecting image generation models from unauthorized use by embedding watermarks in their architecture.


Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models

https://openreview.net/forum?id=8viuf9PdzU

Compressor summary: SNPSE is a score-based method for Bayesian inference in simulator-based models that uses conditional score-based diffusion models and a sequential training procedure to reduce simulation cost.


CLLMs: Consistency Large Language Models

https://openreview.net/forum?id=8uzBOVmh8H

Compressor summary: The paper proposes a new method to improve Jacobi decoding for LLM inference by refining the target model to predict the fixed point faster, achieving significant speedup with maintained quality.


MaxMin-RLHF: Alignment with Diverse Human Preferences

https://openreview.net/forum?id=8tzjEMF0Vq

Compressor summary: The paper proposes a method to learn a mixture of reward models for reinforcement learning from human feedback, accounting for diverse human preferences and showing its effectiveness on small-scale and large-scale language models.


Approximate Nearest Neighbor Search with Window Filters

https://openreview.net/forum?id=8t8zBaGFar

Compressor summary: The paper introduces window search, a semantic search problem with numeric labels, and presents a modular tree-based framework that significantly speeds up its solution.


Position: Near to Mid-term Risks and Opportunities of Open-Source Generative AI

https://openreview.net/forum?id=8q4EPdjTLE

Compressor summary: The text discusses the benefits and risks of open-source generative AI, and proposes an AI openness taxonomy system to classify and assess its potential impacts on society.


HGCN2SP: Hierarchical Graph Convolutional Network for Two-Stage Stochastic Programming

https://openreview.net/forum?id=8onaVSFTEj

Compressor summary: HGCN2SP is a new model for solving 2SP problems that uses a hierarchical graph and reinforcement learning to select representative scenarios efficiently and accurately.


EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning

https://openreview.net/forum?id=8nd1yBRCDl

Compressor summary: EquiAV is a novel framework that uses equivariance for audio-visual contrastive learning, enabling robust supervision with minimal computational overhead and outperforming previous works.


Sharpness-Aware Data Generation for Zero-shot Quantization

https://openreview.net/forum?id=8mKXMnhnFW

Compressor summary: The paper proposes a novel zero-shot quantization approach that generates synthetic data based on the sharpness of the quantized model, improving generalization.


Optimal Exact Recovery in Semi-Supervised Learning: A Study of Spectral Methods and Graph Convolutional Networks

https://openreview.net/forum?id=8m4V6Fx6ma

Compressor summary: Key points: - The paper studies semi-supervised node classification on a synthetic dataset with labeled and unlabeled nodes - The paper finds the information threshold for exact recovery of test nodes using transductive learning - The paper proposes an optimal spectral estimator based on PCA and evaluates graph ridge regression and GCN Summary: The paper investigates semi-supervised node classification on a synthetic dataset with feature vectors derived from a Gaussian Mixture Model. It identifies the information threshold for exact recovery using transductive learning and proposes an optimal spectral estimator based on PCA. It also evaluates graph ridge regression and GCN and shows their potential to achieve the threshold.


Make-A-Shape: a Ten-Million-scale 3D Shape Model

https://openreview.net/forum?id=8l1KYguM4w

Compressor summary: Make-A-Shape is a fast and versatile 3D generative model that uses wavelet-tree representation to encode shapes and create various applications with minimal loss.


SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention

https://openreview.net/forum?id=8kLzL5QBh2

Compressor summary: Key points: - Transformers perform poorly in multivariate long-term forecasting compared to linear models - Attention mechanism is the main cause of this issue - SAMformer is a new lightweight transformer model that overcomes this problem with sharpness-aware optimization - SAMformer outperforms existing methods and is smaller than MOIRAI Summary: The paper introduces SAMformer, a transformer model that improves multivariate long-term forecasting by avoiding bad local minima with sharpness-aware optimization, and shows its superiority over linear models and other foundation models.


Optimal Coresets for Low-Dimensional Geometric Median

https://openreview.net/forum?id=8iWDWQKxJ1

Compressor summary: A coreset is a small subset of a set of points that can approximate the cost of median queries within a factor of $(1\pm\varepsilon)$, and its size depends on $\varepsilon$ and the dimension $d$.


Wukong: Towards a Scaling Law for Large-Scale Recommendation

https://openreview.net/forum?id=8iUgr2nuwo

Compressor summary: Wukong is a new recommendation model that achieves better performance and follows a scaling law, making it more efficient and adaptable to complex real-world datasets.


Image Hijacks: Adversarial Images can Control Generative Models at Runtime

https://openreview.net/forum?id=8ho1l6RZNB

Compressor summary: The paper introduces image hijacks, which can manipulate the output and behaviour of vision-language models by controlling their image input with adversarial images.


Generalization Error of Graph Neural Networks in the Mean-field Regime

https://openreview.net/forum?id=8h0x12p3zq

Compressor summary: This paper develops a method to measure how well graph neural networks generalize to new data when there are more parameters than data points, using an upper bound with a convergence rate of $O(1/n)$.


Individual Fairness in Graph Decomposition

https://openreview.net/forum?id=8f8SI9X9ox

Compressor summary: The paper proposes new algorithms for dividing planar graphs into fair clusters with considerations of connectivity and optimality, and applies them to real networks modeling political redistricting.


On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis

https://openreview.net/forum?id=8dX4YnosqG

Compressor summary: The paper analyzes how numerical errors accumulate in PCA's deflation method when finding principal components sequentially using different algorithms or power iteration.


ALERT-Transformer: Bridging Asynchronous and Synchronous Machine Learning for Real-Time Event-based Spatio-Temporal Data

https://openreview.net/forum?id=8ZDFn7BDaH

Compressor summary: The paper proposes a novel hybrid pipeline combining asynchronous sensing and synchronous processing to enable classic processing of continuous ultra-sparse spatiotemporal data with dense machine learning models, achieving state-of-the-art performance and low latency.


On Stronger Computational Separations Between Multimodal and Unimodal Machine Learning

https://openreview.net/forum?id=8Z2xWhuT6R

Compressor summary: The paper studies the theoretical separation between multimodal and unimodal learning, showing that average-case computational advantages of multimodal learning may be rare in practice due to their connection to cryptographic key agreement protocols.


Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion

https://openreview.net/forum?id=8WSNl2XA9r

Compressor summary: The authors propose a new metric (Delta Score) to evaluate molecular binding specificity in Structure-based Drug Design and develop an energy-guided generative model using contrastive learning that improves both specificity and traditional docking scores.


Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

https://openreview.net/forum?id=8VEGkphQaK

Compressor summary: The paper proposes a synthetic graph navigation task to study how autoregressive Transformer models perform stepwise inference and explores various phenomena observed in these models.


HyperFields: Towards Zero-Shot Generation of NeRFs from Text

https://openreview.net/forum?id=8STOjGCkfH

Compressor summary: HyperFields is a method for generating text-conditioned Neural Radiance Fields with a single forward pass, using a dynamic hypernetwork and NeRF distillation training to learn a general map between text and scenes.


On the Asymptotic Distribution of the Minimum Empirical Risk

https://openreview.net/forum?id=8RwhTPACAO

Compressor summary: The paper develops a general framework to study the distribution of the minimum empirical risk (MER) in statistical and machine learning problems, and shows how it can be used for inference and hypothesis testing with applications to neural networks.


Emergent Representations of Program Semantics in Language Models Trained on Programs

https://openreview.net/forum?id=8PTx4CpNoT

Compressor summary: The paper shows that a language model of code can learn to represent the semantics of programs without explicit guidance, and introduces a new probing technique to study this phenomenon.


Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design

https://openreview.net/forum?id=8NfHmzo0Op

Compressor summary: Context-guided diffusion (CGD) is a method that uses unlabeled data and constraints to improve out-of-distribution generation for guided diffusion models in various application domains.


Data-Efficient Molecular Generation with Hierarchical Textual Inversion

https://openreview.net/forum?id=8KeD4mEh3j

Compressor summary: HI-Mol is a novel method for data-efficient molecular generation that uses multi-level token embeddings and textual inversion to capture hierarchical information in molecule distribution.


Multi-Sender Persuasion: A Computational Perspective

https://openreview.net/forum?id=8JFIKpzumn

Compressor summary: The paper studies how multiple senders can persuade a self-interested agent using signaling, proposes a new game-theoretic solution concept, shows its computational hardness, and develops a neural network approach to find local equilibria in this setting.


Bridging discrete and continuous state spaces: Exploring the Ehrenfest process in time-continuous diffusion models

https://openreview.net/forum?id=8GYclcxQXB

Compressor summary: The paper studies how time-continuous Markov jump processes on discrete state spaces relate to state-continuous diffusion processes, and proposes a training algorithm for the time-reversal of such processes using conditional expectations.


Attack-free Evaluating and Enhancing Adversarial Robustness on Categorical Data

https://openreview.net/forum?id=8ERo4jph0A

Compressor summary: The paper proposes IGSG, a metric and regularization method to improve the robustness of classification over categorical inputs against adversarial attacks.


An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network

https://openreview.net/forum?id=8AeuhCgRRv

Compressor summary: The paper extends infinite-width analysis to Jacobians of deep neural networks, showing that MLPs and their Jacobians at initialisation converge to Gaussian processes in the infinite-width limit.


Understanding Forgetting in Continual Learning with Linear Regression

https://openreview.net/forum?id=89kZWloYQx

Compressor summary: The paper analyzes factors contributing to catastrophic forgetting in continual learning, showing that task sequence and algorithmic parameters affect forgetting, and validates the theory with simulations on linear regression models and DNNs.


Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training

https://openreview.net/forum?id=87ZrVHDqmR

Compressor summary: The paper introduces Med-ST, a framework for fine-grained spatial and temporal modeling of chest radiographs and radiological reports to exploit information from multiple views and temporal sequences.


Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling

https://openreview.net/forum?id=87CYNyCGOo

Compressor summary: Leddam is a learnable decomposition method and dual attention module for better multivariate time series forecasting, outperforming state-of-the-art methods.


DIDI: Diffusion-Guided Diversity for Offline Behavioral Generation

https://openreview.net/forum?id=8296yUBoXr

Compressor summary: DIDI is a new method that learns diverse skills from offline data using diffusion probabilistic models as guides, achieving good results in decision-making tasks.


Compositional Image Decomposition with Diffusion Models

https://openreview.net/forum?id=7zvl9mNQG2

Compressor summary: The paper introduces Decomp Diffusion, an unsupervised method that decomposes images into compositional components, which can be flexibly combined to generate new scenes.


Layerwise Change of Knowledge in Neural Networks

https://openreview.net/forum?id=7zEoinErzQ

Compressor summary: The paper investigates how deep neural networks learn and forget features through layers during forward propagation, and tracks the emergence and disappearance of interactions in each layer.


Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

https://openreview.net/forum?id=7yixJXmzb8

Compressor summary: Finetuning pretrained models can introduce privacy risks due to backdoors that allow attackers to reconstruct data and bypass differential privacy guarantees.


Risk Estimation in a Markov Cost Process: Lower and Upper Bounds

https://openreview.net/forum?id=7xzhKEPfBo

Compressor summary: The text studies how to estimate risk measures like variance and VaR for an infinite-horizon Markov cost process, showing that it requires at least $\Omega(1/\epsilon^2)$ samples and deriving upper bounds for CVaR and variance.


Boximator: Generating Rich and Controllable Motions for Video Synthesis

https://openreview.net/forum?id=7wgXuNOF0V

Compressor summary: Boximator is a new approach for fine-grained motion control in video synthesis, using hard and soft boxes to define object positions and shapes, and self-tracking technique to learn box-object correlations.


Vector Quantization Pretraining for EEG Time Series with Random Projection and Phase Alignment

https://openreview.net/forum?id=7uwLvFvpis

Compressor summary: The paper introduces a new self-supervised learning model, VQ-MTM, for EEG data analysis that uses vector quantization and phase alignment to learn robust features and outperforms existing methods on seizure detection and classification tasks.


Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs

https://openreview.net/forum?id=7tyAO5tUF8

Compressor summary: The paper proposes a new method to improve multi-plane representation in neural radiance fields by combining it with a coordinate-based MLP network that captures low-frequency details, resulting in better stability, efficiency, and performance.


A General Framework for Learning from Weak Supervision

https://openreview.net/forum?id=7sgqXa4aNM

Compressor summary: The paper introduces a general framework for learning from weak supervision (GLWS) with an EM-based algorithm that simplifies the computational demands and shows better performance and versatility across various scenarios.


Neural Collapse meets Differential Privacy: Curious behaviors of NoisyGD with Near-Perfect Representation Learning

https://openreview.net/forum?id=7rrN6E4KU0

Compressor summary: The study finds that pre-training on a public dataset improves differentially private learning by enhancing feature representation and suggests strategies like feature normalization and dimension reduction to improve robustness.


DOGE: Domain Reweighting with Generalization Estimation

https://openreview.net/forum?id=7rfZ6bMZq4

Compressor summary: The authors propose a method called DoGE that optimizes how to sample training data from different domains to improve the generalization of large language models.


Predicting and Interpreting Energy Barriers of Metallic Glasses with Graph Neural Networks

https://openreview.net/forum?id=7rTbqkKvA6

Compressor summary: The paper presents a new dataset and Symmetrized GNN model for predicting energy barriers in metallic glasses, achieving high accuracy and fast inference time compared to previous methods.


When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

https://openreview.net/forum?id=7mFSaP6IiN

Compressor summary: This paper explores how integrating linear attention with speculative decoding can improve the efficiency and performance of autoregressive large language models, achieving significant reductions in perplexity and generation time.


Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

https://openreview.net/forum?id=7joG3i2pUR

Compressor summary: OBAC is a new off-policy RL framework that adapts to a better offline policy to improve online learning, achieving high sample efficiency and performance on 53 tasks.


Adaptive Stabilization Based on Machine Learning for Column Generation

https://openreview.net/forum?id=7iH9RgMrzX

Compressor summary: The paper proposes a machine learning approach for predicting optimal dual solutions and an adaptive stabilization technique to improve the convergence rate of column generation.


Faster Sampling via Stochastic Gradient Proximal Sampler

https://openreview.net/forum?id=7gEcbhMqKU

Compressor summary: The paper introduces and analyzes stochastic proximal samplers for non-log-concave distributions, improving upon existing Langevin-based methods in terms of convergence and sample complexity.


Adaptive Text Watermark for Large Language Models

https://openreview.net/forum?id=7emOSb5UfX

Compressor summary: The paper proposes an adaptive watermarking strategy for AI-generated text that balances quality, robustness, and security by adjusting the token distributions and using semantic embedding for output logits scaling.


Learning to Model the World With Language

https://openreview.net/forum?id=7dP6Yq9Uwv

Compressor summary: The paper proposes Dynalang, an agent that learns to understand diverse language, predict future states of the world, and act accordingly, outperforming existing methods on various tasks.


NeuralIndicator: Implicit Surface Reconstruction from Neural Indicator Priors

https://openreview.net/forum?id=7ckuC9C2FZ

Compressor summary: This paper introduces a new method for reconstructing surfaces from unorganized points using neural networks and global shape priors, which improves the reliability and quality of surface reconstruction.


Maestro: Uncovering Low-Rank Structures via Trainable Decomposition

https://openreview.net/forum?id=7bjyambg4x

Compressor summary: Maestro is a framework that trains low-rank DNNs with a novel low-rank ordered decomposition, allowing efficient compression and performance preservation.


Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning

https://openreview.net/forum?id=7bg10Jj3bG

Compressor summary: SDT is a new framework that uses signal temporal logic and Decision Transformer to train safer and more effective offline reinforcement learning policies for complex tasks.


Online Matrix Completion: A Collaborative Approach with Hott Items

https://openreview.net/forum?id=7XZKzQtooN

Compressor summary: The paper proposes two efficient algorithms for online low rank matrix completion recommendation problems and analyzes their performance under certain assumptions.


A Geometric Decomposition of Finite Games: Convergence vs. Recurrence under Exponential Weights

https://openreview.net/forum?id=7RSIGQRT1F

Compressor summary: The authors propose a Riemannian framework based on the Shahshahani metric to decompose games into simpler components called incompressible games, which have constant of motion and Poincaré recurrent properties in EW dynamics.


AttNS: Attention-Inspired Numerical Solving For Limited Data Scenarios

https://openreview.net/forum?id=7RHFdkAkVY

Compressor summary: The AttNS method improves the generalization and robustness of AI-Hybrid numerical solvers for differential equations by incorporating attention mechanisms inspired by ResNet.


Structured Chemistry Reasoning with Large Language Models

https://openreview.net/forum?id=7R3pzxTSlg

Compressor summary: StructChem is a prompting strategy that improves GPT-4's chemical reasoning by providing an effective reasoning structure.


Policy Learning for Balancing Short-Term and Long-Term Rewards

https://openreview.net/forum?id=7Qf1uHTahP

Compressor summary: The paper proposes a framework for learning optimal policies that balance long-term and short-term rewards, even when some long-term outcomes are missing, and shows its effectiveness through experiments.


Switching the Loss Reduces the Cost in Batch Reinforcement Learning

https://openreview.net/forum?id=7PXSc5fURu

Compressor summary: FQI-LOG is a batch RL method that learns near-optimal policies with fewer samples by using log-loss and scaling costs with the optimal achievable cost.


Double Momentum Method for Lower-Level Constrained Bilevel Optimization

https://openreview.net/forum?id=7OPHCeXcSS

Compressor summary: The paper proposes a new hypergradient method for constrained bilevel optimization without strict assumptions and a single-loop algorithm with a proven convergence rate.


Convergence and Complexity Guarantee for Inexact First-order Riemannian Optimization Algorithms

https://openreview.net/forum?id=7KtFQnF368

Compressor summary: Inexact Riemannian gradient descent (RGD) can efficiently solve constrained nonconvex optimization problems within a general framework of tangential Block Majorization-Minimization (tBMM), with theoretical and experimental advantages over existing methods.


Position: Standardization of Behavioral Use Clauses is Necessary for the Adoption of Responsible Licensing of AI

https://openreview.net/forum?id=7JKVPNEBkU

Compressor summary: The paper examines responsible AI licenses, which are used to manage risks of AI misuse, and suggests standardizing them while allowing some context-specific customizations.


DiffFPR: Diffusion Prior for Oversampled Fourier Phase Retrieval

https://openreview.net/forum?id=7E4c2gyP0R

Compressor summary: The paper proposes DiffFPR, a method that combines an iterative engine and a diffusion model to solve the challenging Fourier phase retrieval problem for multi-channel color images.


Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension

https://openreview.net/forum?id=7DbIyQlfaO

Compressor summary: The paper proposes using local intrinsic dimension (LID) of activations to measure and improve truthfulness in large language models for question answering tasks.


Graphon Mean Field Games with a Representative Player: Analysis and Learning Algorithm

https://openreview.net/forum?id=7C4EQqtb02

Compressor summary: The paper presents a discrete graphon game model that simplifies stochastic games with heterogeneous interactions by using a representative player, and shows how to find and use its equilibrium solution.


Symmetry Induces Structure and Constraint of Learning

https://openreview.net/forum?id=7AF0AMI4AE

Compressor summary: The text discusses the role of symmetry in neural network loss functions and its effects on model learning behavior, leading to different constraints and properties.


The Pitfalls of Next-Token Prediction

https://openreview.net/forum?id=76zq8Wkl6Z

Compressor summary: Next-token prediction models may fail due to autoregressive inference and teacher-forcing, leading to errors in training and in-distribution failure.


Recovering the Pre-Fine-Tuning Weights of Generative Models

https://openreview.net/forum?id=761UxjOTHB

Compressor summary: Spectral DeTuning is a method that can recover the pre-fine-tuning weights of generative models, posing a new vulnerability to large-scale models like personalized Stable Diffusion and aligned Mistral.


EvoRainbow: Combining Improvements in Evolutionary Reinforcement Learning for Policy Search

https://openreview.net/forum?id=75Hes6Zse4

Compressor summary: The paper explores different ways to combine Evolutionary Algorithms and Reinforcement Learning for policy optimization, evaluates their effectiveness, and proposes new methods that achieve state-of-the-art results on various tasks.


From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers

https://openreview.net/forum?id=72oT4mPLUb

Compressor summary: The paper studies how self-attention models work like Markov chains, and how they learn from single output trajectories, explaining why large language models tend to produce repetitive text.


Agnostic Sample Compression Schemes for Regression

https://openreview.net/forum?id=71ktaA3ihI

Compressor summary: The paper presents a new sample compression technique for agnostic regression with different losses and explores its limits and open questions.


PPFLOW: Target-Aware Peptide Design with Torsional Flow Matching

https://openreview.net/forum?id=70jplnkLMe

Compressor summary: PPFlow is a novel AI method for designing targeted peptide drugs that considers torsion angles and uses a new protein-peptide binding dataset for training deep learning models.


Algorithmic Stability Unleashed: Generalization Bounds with Unbounded Losses

https://openreview.net/forum?id=6yQ5mIYxjj

Compressor summary: The paper proposes generalization bounds for algorithmic stability with unbounded loss functions and develops new concentration inequalities for subweibull random variables.


Leveraging Self-Consistency for Data-Efficient Amortized Bayesian Inference

https://openreview.net/forum?id=6wVlH96oMX

Compressor summary: The paper proposes a method to improve Bayesian inference efficiency by using universal symmetries in joint models and penalizing violations with a self-consistency loss.


Improved Operator Learning by Orthogonal Attention

https://openreview.net/forum?id=6w7zkf9FBR

Compressor summary: The paper proposes a novel neural operator construction method using orthogonal attention, which improves overfitting and data efficiency for modeling PDE solutions.


GeoAB: Towards Realistic Antibody Design and Reliable Affinity Maturation

https://openreview.net/forum?id=6pHP51F55x

Compressor summary: GeoAB is a novel approach for antibody design that addresses issues with structure authenticity and affinity maturation by using a generative geometry initializer and a position refiner, achieving state-of-the-art performance in co-design and mutation effect predictions.


Tackling Prevalent Conditions in Unsupervised Combinatorial Optimization: Cardinality, Minimum, Covering, and More

https://openreview.net/forum?id=6n99bIxb3r

Compressor summary: This paper proposes methods for unsupervised learning in combinatorial optimization by constructing probabilistic objectives and derandomizing complex conditions using theoretical justification, achieving better optimization quality and speed.


SelfVC: Voice Conversion With Iterative Refinement using Self Transformations

https://openreview.net/forum?id=6kMMgmeM2U

Compressor summary: SelfVC uses self-synthesized examples to improve a voice conversion model with entangled speech representations derived from self-supervised learning and speaker verification models, achieving state-of-the-art results in zero-shot voice conversion.


Scalable AI Safety via Doubly-Efficient Debate

https://openreview.net/forum?id=6jmdOTRMIO

Compressor summary: The paper proposes new debate protocols for AI safety that let humans simulate and verify stochastic AI systems with a polynomial number of steps.


Faster Streaming and Scalable Algorithms for Finding Directed Dense Subgraphs in Large Graphs

https://openreview.net/forum?id=6h6ovHcC9G

Compressor summary: The paper presents a fast and memory-efficient algorithm for finding dense subgraphs in data mining and clustering, with similar quality to existing methods but faster speed and applicability to MPC.


Subgraphormer: Unifying Subgraph GNNs and Graph Transformers via Graph Products

https://openreview.net/forum?id=6djDWVTUEq

Compressor summary: Subgraphormer combines Subgraph GNNs and Graph Transformers to improve expressive power, message-passing mechanisms, and aggregation schemes for graph neural networks, using attention and positional encodings based on the product graph.


Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective

https://openreview.net/forum?id=6dKUu2EkZy

Compressor summary: The paper proposes efficient algorithms for inverse reinforcement learning using polynomial samples and runtime, with theoretical guarantees and near-optimal sample complexities.


Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

https://openreview.net/forum?id=6axTFAlzRV

Compressor summary: FedLESAM is a novel algorithm that improves and speeds up federated learning by estimating global perturbations locally and mitigating sharpness issues in heterogeneous environments.


ViP: A Differentially Private Foundation Model for Computer Vision

https://openreview.net/forum?id=6aKwVmHQI1

Compressor summary: This paper proposes a method to train privacy-preserving vision models using self-supervised learning with differential privacy, achieving comparable performance to non-private models on standard tasks.


Causal Action Influence Aware Counterfactual Data Augmentation

https://openreview.net/forum?id=6Zl9rv6PDx

Compressor summary: CAIAC is a data augmentation method that uses counterfactual reasoning to create synthetic transitions from a fixed dataset, improving offline robot learning by increasing its robustness against distributional shift.


Fair Federated Learning via the Proportional Veto Core

https://openreview.net/forum?id=6Zgjrowepn

Compressor summary: The paper proposes a new fairness notion for federated learning called *PVC-stability*, which is based on rank rather than utility, and presents an algorithm called Rank-Core-Fed that achieves this fairness goal.


Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

https://openreview.net/forum?id=6XH8R7YrSk

Compressor summary: The paper investigates the differences between reward-based (PPO) and reward-free (DPO) methods for aligning large language models with human preferences, and shows that PPO performs better in various tasks.


Multicalibration for Confidence Scoring in LLMs

https://openreview.net/forum?id=6Wauue8pWd

Compressor summary: The paper introduces multicalibration, a method for generating reliable confidence scores for outputs of large language models by calibrating across intersecting data groupings.


Learning in Deep Factor Graphs with Gaussian Belief Propagation

https://openreview.net/forum?id=6WYk5R86Wl

Compressor summary: The authors present an efficient method to train and predict using Gaussian factor graphs, which can handle deep networks and continual learning tasks.


Disparate Impact on Group Accuracy of Linearization for Private Inference

https://openreview.net/forum?id=6VZOONPn8S

Compressor summary: The paper shows that linearizing non-linear activations in neural networks can reduce accuracy for minority groups and suggests a mitigation strategy.


Sample as you Infer: Predictive Coding with Langevin Dynamics

https://openreview.net/forum?id=6VQXLUy4sQ

Compressor summary: Langevin Predictive Coding (LPC) is a new algorithm that improves deep generative model learning using Gaussian noise injection and encoder network initialization, resulting in better sample quality, faster convergence, and comparable or higher performance than VAEs on key metrics.


Position: Do Not Explain Vision Models Without Context

https://openreview.net/forum?id=6UGSDDPkJw

Compressor summary: The paper reviews popular methods of explaining computer vision models and proposes new research directions for using contextual information to improve explanations.


Rethinking the Flat Minima Searching in Federated Learning

https://openreview.net/forum?id=6TM62kpI5c

Compressor summary: The paper proposes FedGF, a method to improve generalization in federated learning by reducing flatness discrepancy and pursuing flatter minima of global models.


Confidence Aware Inverse Constrained Reinforcement Learning

https://openreview.net/forum?id=6TCeizkLJV

Compressor summary: This paper proposes an ICRL method that can estimate constraints from expert demonstrations with a specified confidence level and helps users decide if more data is needed to achieve both reliable constraints and good performance.


Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation

https://openreview.net/forum?id=6PqWuSuWvX

Compressor summary: The text introduces a method to compromise language models via covert malicious finetuning that evades detection by common defense mechanisms.


Contamination-Resilient Anomaly Detection via Adversarial Learning on Partially-Observed Normal and Anomalous Data

https://openreview.net/forum?id=6PTiCmGcNx

Compressor summary: Key points: - Anomaly detection methods often need large normal datasets, but this is unrealistic for many applications. - The paper proposes to use two small additional datasets with partially-observed normal and anomaly samples to learn the normal distribution from contaminated data. - The paper proves that the method works under some conditions, considers overfitting issue, and shows experimental results. Summary: The paper presents a new anomaly detection method that learns from two small datasets with partial observations of normal and anomalous samples, and proves its correctness and effectiveness.


Code as Reward: Empowering Reinforcement Learning with VLMs

https://openreview.net/forum?id=6P88DMUDvH

Compressor summary: Code as Reward (VLM-CaR) is a framework that generates reward functions from Vision-Language Models for faster training of Reinforcement Learning agents.


Hyperbolic Geometric Latent Diffusion Model for Graph Generation

https://openreview.net/forum?id=6OkvBGqW62

Compressor summary: HypDiff is a novel framework that uses hyperbolic geometry to create an anisotropic latent space for graph generation, preserving the original topological properties.


Total Variation Distance Meets Probabilistic Inference

https://openreview.net/forum?id=6OSLjErBhh

Compressor summary: The paper shows how probabilistic inference can help estimate total variation distance between same-structure distributions in Bayes nets using partial couplings.


Convex and Bilevel Optimization for Neural-Symbolic Inference and Learning

https://openreview.net/forum?id=6NQ77Vj3DT

Compressor summary: The authors develop a fast gradient-based learning method for neural-symbolic systems using convex and bilevel optimization, which improves the inference speed and prediction performance.


How Free is Parameter-Free Stochastic Optimization?

https://openreview.net/forum?id=6L4K5jmSJq

Compressor summary: The paper explores whether there are completely parameter-free methods for stochastic optimization, showing that simple hyperparameter search can achieve this in some cases, while proving a lower bound for convex optimization.


Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks

https://openreview.net/forum?id=6KtXzUUEp4

Compressor summary: The paper presents a method to adjust machine learning models after training to ensure fairness across multiple groups and applies it to various scenarios like image segmentation, classification, and text generation.


Non-Vacuous Generalization Bounds for Large Language Models

https://openreview.net/forum?id=6Kg9p8URlj

Compressor summary: The study derives the first meaningful generalization bounds for pretrained large language models, showing they can discover regularities beyond their training data, and develops a simple parameterization method to achieve this.


Geometry-Aware Instrumental Variable Regression

https://openreview.net/forum?id=6KLNiRdWH6

Compressor summary: The Sinkhorn Method of Moments is an optimal transport-based instrumental variable estimator that improves robustness against data corruption and adversarial attacks by using data-derivative information to capture the geometry of the data manifold.


Evaluation of Test-Time Adaptation Under Computational Time Constraints

https://openreview.net/forum?id=6FtAXU4ean

Compressor summary: The paper introduces a new evaluation protocol for Test Time Adaptation (TTA) methods that considers their adaptation speed and shows that faster, simpler methods can outperform slower, more sophisticated ones.


Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models

https://openreview.net/forum?id=6FXtu8clyp

Compressor summary: The text discusses the growing use of visually-conditioned language models (VLMs) in various applications, their key design decisions, and a new framework for evaluating them, along with improved VLM models that outperform existing ones.


Monotone Individual Fairness

https://openreview.net/forum?id=6EF0bxcZvT

Compressor summary: The paper proposes online learning algorithms that balance predictive accuracy and individual fairness, improving on existing bounds and reducing oracle calls.


Sub-token ViT Embedding via Stochastic Resonance Transformers

https://openreview.net/forum?id=6DBvBcW770

Compressor summary: The paper proposes a method called Stochastic Resonance Transformer (SRT) that improves ViT models by applying sub-token spatial transformations and aggregating the features, resulting in finer-scale spatial information without fine-tuning.


On Positivity Condition for Causal Inference

https://openreview.net/forum?id=6D0nyemiWk

Compressor summary: The paper explores different graphical and analytical methods to identify causal effects in non-positive and unconfounded observational studies, and proposes a new algorithm based on these approaches.


Stationarity without mean reversion in improper Gaussian processes

https://openreview.net/forum?id=6CV1N7hhpA

Compressor summary: The paper introduces non-positive kernels for GP regression that avoid mean reversion and its issues, while maintaining desirable smoothness and stationarity properties.


Neural NeRF Compression

https://openreview.net/forum?id=6BYD121JFO

Compressor summary: The paper proposes a new method to compress grid-based NeRF models, using neural compression and importance-weighted rate-distortion, achieving better efficiency and quality than previous methods.


An Iterative Min-Min Optimization Method for Sparse Bayesian Learning

https://openreview.net/forum?id=69RewQwWA9

Compressor summary: The paper proposes a new sparse Bayesian learning algorithm that guarantees global convergence and improves performance in various applications.


How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?

https://openreview.net/forum?id=685vj0lC9z

Compressor summary: The study examines how large language models balance honesty and helpfulness in communication based on human preferences and feedback methods.


Towards Efficient Exact Optimization of Language Model Alignment

https://openreview.net/forum?id=66k81s33p3

Compressor summary: The paper proposes a new method, EXO, to optimize language models' policies based on human preferences more efficiently than existing methods like DPO and RL.


Position: An Inner Interpretability Framework for AI Inspired by Lessons from Cognitive Neuroscience

https://openreview.net/forum?id=66KmnMhGU5

Compressor summary: Inner Interpretability is an emerging field that seeks to understand the inner workings of AI systems, facing similar challenges as Cognitive Neuroscience, which it can learn from to develop better mechanistic explanations.


MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

https://openreview.net/forum?id=65XKBGH5PO

Compressor summary: The authors present a method for reconstructing visual perception from brain activity using only 1 hour of fMRI training data and achieve high-quality results, outperforming previous single-subject approaches.


Learning to Stabilize Online Reinforcement Learning in Unbounded State Spaces

https://openreview.net/forum?id=64fdhmogiD

Compressor summary: The paper explores how to achieve stability in reinforcement learning, especially for unbounded state spaces, by using a Lyapunov-based cost-shaping technique and state transformations.


Deep Stochastic Mechanics

https://openreview.net/forum?id=64MQCia06B

Compressor summary: The paper presents a new deep learning method for simulating Schrödinger equation that adapts to low-dimensional structure of wave function, reducing computational complexity and outperforming existing methods.


Quality-Diversity with Limited Resources

https://openreview.net/forum?id=64I29YeQdt

Compressor summary: RefQD is a novel method that improves resource efficiency in Quality-Diversity optimization by decomposing neural networks into representation and decision parts, sharing the representation among decision parts, and addressing mismatch issues.


Online bipartite matching with imperfect advice

https://openreview.net/forum?id=61WtHsVKWF

Compressor summary: The paper explores the trade-offs between competitive ratios and consistency in online bipartite matching algorithms with different arrival models and advice quality.


MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective

https://openreview.net/forum?id=61RlaY9EIn

Compressor summary: Key points: - Machine learning technologies rely on large datasets, but pose privacy risks for people's data. - Existing solutions have limitations in terms of data quality, availability, or theory. - The paper proposes a formal definition and a learnable data transformation framework to protect privacy while preserving utility. - The method is evaluated on various datasets and tasks, showing effectiveness and generalizability. Summary: The paper presents a novel data transformation framework that uses information theory to protect privacy and preserve utility of machine learning datasets, and shows its performance on different types of data and tasks.


Neural Tangent Kernels Motivate Cross-Covariance Graphs in Neural Networks

https://openreview.net/forum?id=61JD8wp4Id

Compressor summary: The alignment of neural tangent kernels and data eigenvectors can improve convergence and generalization for graph neural networks, especially when using cross-covariance instead of just input covariance.


Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration

https://openreview.net/forum?id=61A1bsVjRg

Compressor summary: The paper proposes a novel algorithm called Tilt and Average that adjusts the final layer weights of neural networks to improve calibration and reliability of predictions.


No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

https://openreview.net/forum?id=60vx5AfM3C

Compressor summary: The study investigates the geometric properties of sampled gradients in neural network optimization, finding predictable and consistent behavior that allows for theoretical guarantees of linear convergence and practical learning rate schedules.


When Will Gradient Regularization Be Harmful?

https://openreview.net/forum?id=60vC1FY0dZ

Compressor summary: This paper shows that gradient regularization can cause problems in adaptive optimization scenarios and proposes three warmup strategies to improve performance, especially for scalable models.


Generative Modeling on Manifolds Through Mixture of Riemannian Diffusion Processes

https://openreview.net/forum?id=60HydCpCMZ

Compressor summary: The Riemannian Diffusion Mixture is a new generative diffusion model on manifolds that uses a mixture of bridge processes and doesn't require heat kernel estimations, enabling better performance and scalability on diverse geometries.


Learning Causal Relations from Subsampled Time Series with Two Time-Slices

https://openreview.net/forum?id=60F0fVbknK

Compressor summary: The paper introduces DHT-CIT, a novel algorithm that learns causal relations from subsampled time series using only two time-slices, instead of full interventions.


BECoTTA: Input-dependent Online Blending of Experts for Continual Test-time Adaptation

https://openreview.net/forum?id=5zXTwX92qv

Compressor summary: BECoTTA is a framework for continuous test-time adaptation that uses Mixture-of-Domain Low-rank Experts to efficiently update model parameters for changing environments while maintaining generalization.


Physics of Language Models: Part 3.1, Knowledge Storage and Extraction

https://openreview.net/forum?id=5x788rqbcj

Compressor summary: This paper shows that large language models need diverse pretraining data for reliable knowledge extraction and suggests rewriting data and adding finetuning data as recommendations.


Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning

https://openreview.net/forum?id=5vZzmCeTYu

Compressor summary: The study compares various off-policy RL techniques on diverse simulation tasks and identifies consistent combinations that lead to robust performance improvements.


Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models

https://openreview.net/forum?id=5uwBzcn885

Compressor summary: Patchscopes is a framework that uses a large language model to explain its own internal representations in natural language, improving on existing interpretation methods and enabling new possibilities.


Full-Atom Peptide Design based on Multi-modal Flow Matching

https://openreview.net/forum?id=5tPB5VXo87

Compressor summary: PepFlow is a new generative model for designing full-atom peptides that target specific protein receptors by learning the structure using rigid backbone frames, side-chain angles, and categorical distributions of residue types.


Efficient Error Certification for Physics-Informed Neural Networks

https://openreview.net/forum?id=5t4V7Q6lmz

Compressor summary: The text introduces a new framework, $\partial$-CROWN, that provides guarantees on the worst-case residual error of Physics-Informed Neural Networks (PINN) over their continuous applicability domain.


Compress Clean Signal from Noisy Raw Image: A Self-Supervised Approach

https://openreview.net/forum?id=5sgkNtexs2

Compressor summary: The paper proposes a novel raw image compression method that selectively compresses the noise-free part, discards real noise, and outperforms existing techniques with significant improvements in rate-distortion balance and bit saving.


Creative Text-to-Audio Generation via Synthesizer Programming

https://openreview.net/forum?id=5pg9YJBaiG

Compressor summary: CTAG is a text-to-audio method that uses a modular sound synthesizer with 78 parameters for easy tweaking and inspection of high-quality, abstract sounds based on natural language prompts.


Improving fine-grained understanding in image-text pre-training

https://openreview.net/forum?id=5nxIRQ8GNa

Compressor summary: SPARC is a method to pretrain multimodal representations from image-text pairs using sparse similarity and fine-grained sequence-wise loss, which improves performance on various tasks and models.


Unveiling the Potential of AI for Nanomaterial Morphology Prediction

https://openreview.net/forum?id=5nuW5iBAJS

Compressor summary: The study uses AI to predict the shape and size of nanoparticles from text descriptions, creating a new dataset and evaluating different models for this task.


Extreme Compression of Large Language Models via Additive Quantization

https://openreview.net/forum?id=5mCaITRTmO

Compressor summary: The paper proposes AQLM, a method that compresses large language models with very low bit counts using learned additive quantization and joint optimization of codebook parameters, achieving high accuracy-to-size trade-off and outperforming existing schemes in extreme compression.


Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning

https://openreview.net/forum?id=5lI9wm4dws

Compressor summary: Key points: - The text proposes a new causal effect estimator under networked interference using targeted learning and neural networks - The estimator achieves double robustness and faster convergence than single nuisance models - The text provides theoretical analysis and experimental results to support the proposal Summary: The text introduces a novel doubly robust causal effect estimator under networked interference, using targeted learning and neural networks, which outperforms single nuisance models in theory and practice.


A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization

https://openreview.net/forum?id=5kXNMDpUVF

Compressor summary: The text proposes an adaptive hyperparameter optimization method for differentially private deep learning that achieves state-of-the-art performance on various tasks and privacy levels.


A Minimaximalist Approach to Reinforcement Learning from Human Feedback

https://openreview.net/forum?id=5kVgd2MwMY

Compressor summary: SPO is a simple algorithm for reinforcement learning from human feedback that uses self-play, preference optimization, and minimax winner concept to handle non-Markovian, intransitive, and stochastic preferences efficiently and robustly.


Recurrent Distance Filtering for Graph Representation Learning

https://openreview.net/forum?id=5kGfm3Pa41

Compressor summary: Key points: - Paper proposes new architecture for graph neural networks based on deep state-space models - Model aggregates nodes by their distances to target and uses linear RNN to encode hop representations - No positional encoding needed, better performance than graph transformers, lower computational cost Summary: The paper introduces a novel graph neural network architecture that uses deep state-space models to aggregate nodes based on their distances and a diagonal linear RNN to encode the information, outperforming graph transformers with less computation.


Distributed Bilevel Optimization with Communication Compression

https://openreview.net/forum?id=5j7Lq2ASiU

Compressor summary: The text introduces new distributed bilevel optimization algorithms that reduce communication overhead by using compression techniques, while maintaining efficiency and scalability.


Unveiling the Dynamics of Information Interplay in Supervised Learning

https://openreview.net/forum?id=5hfvLBgnNE

Compressor summary: The paper proposes matrix information theory tools to analyze and optimize information interactions in supervised learning processes.


On the Maximal Local Disparity of Fairness-Aware Classifiers

https://openreview.net/forum?id=5cm2jGct2W

Compressor summary: The paper proposes a new fairness metric, MCDP, to measure local disparity in machine learning algorithms and develops optimization algorithms to improve fairness-accuracy trade-offs.


Partial Multi-View Multi-Label Classification via Semantic Invariance Learning and Prototype Modeling

https://openreview.net/forum?id=5ap1MmUqO6

Compressor summary: The paper proposes a method to compress cross-view representation for partial multi-view multi-label learning, enhancing task-relevant information and label correlation learning.


Explorations of Self-Repair in Language Models

https://openreview.net/forum?id=5ZwEifshyo

Compressor summary: Self-repair in large language models occurs when ablating attention heads, but it is imperfect and noisy due to varying mechanisms such as LayerNorm changes and Anti-Erasure.


Robust and Conjugate Gaussian Process Regression

https://openreview.net/forum?id=5WnKLIAX4q

Compressor summary: The paper introduces a new method for robust Gaussian process regression that preserves closed-form conditioning and is efficient in computation.


IW-GAE: Importance weighted group accuracy estimation for improved calibration and model selection in unsupervised domain adaptation

https://openreview.net/forum?id=5WEIVj98Ju

Compressor summary: The text proposes a new method to deal with distribution shifts in unsupervised domain adaptation by using an importance weighted group accuracy estimator for model calibration and selection tasks.


Iterative Search Attribution for Deep Neural Networks

https://openreview.net/forum?id=5ToHnqYxjB

Compressor summary: The paper proposes ISA, a method to enhance the interpretability of deep neural networks by iteratively generating high-quality samples and distinguishing important features during gradient ascent and descent.


Position: Data-driven Discovery with Large Generative Models

https://openreview.net/forum?id=5SpjhZNXtt

Compressor summary: The paper calls for using large generative models to develop automated systems that can discover scientific knowledge from existing datasets without needing more data or experiments, but acknowledges current limitations and suggests integrating tools and user feedback for better results.


Provably Efficient Partially Observable Risk-sensitive Reinforcement Learning with Hindsight Observation

https://openreview.net/forum?id=5S8ukkEQr2

Compressor summary: This paper introduces a new way to analyze regret in risk-sensitive reinforcement learning with hindsight observations, and provides an efficient algorithm that performs well even when the environment is risky or uncertain.


Diversified Batch Selection for Training Acceleration

https://openreview.net/forum?id=5QWKec0eDF

Compressor summary: DivBS is a reference-model-free method that efficiently selects diverse and representative samples for machine learning models by measuring group-wise orthogonalized representativeness, achieving better performance-speedup trade-offs than previous methods.


Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers

https://openreview.net/forum?id=5PqzKxmfag

Compressor summary: Key points: - Deep neural networks struggle with robustness to spatially transformed inputs - Current approaches have limitations in data variability or inductive biases - Inspired by human perception, a new technique called ITS is proposed - ITS emulates mental or physical actions during inference using energy-based evaluations - ITS improves robustness to spatially transformed inputs without explicit biases or data augmentation Summary: The authors propose ITS, a novel inference method for deep neural networks that mimics human perception and enhances robustness to spatially transformed inputs by traversing a sparsified transformation tree during inference.


Leveraging VLM-Based Pipelines to Annotate 3D Objects

https://openreview.net/forum?id=5Pcl5qOOfL

Compressor summary: The text introduces a new algorithm that improves the reliability and efficiency of captioning unlabeled 3D objects using vision language models, and shows its usefulness for conditional inference and visual reasoning ablation.


Detecting and Identifying Selection Structure in Sequential Data

https://openreview.net/forum?id=5PQhu8flSO

Compressor summary: The text discusses the selective inclusion of data points in sequential data, which can distort statistical analysis but also offer insights into hidden generation processes, and proposes a method to identify selection structures and dependencies in such data.


Regression with Multi-Expert Deferral

https://openreview.net/forum?id=5NTTCCO74S

Compressor summary: This paper proposes a new framework called "regression with deferral," which involves asking multiple experts for predictions and introduces new loss functions and consistency guarantees for this problem.


Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

https://openreview.net/forum?id=5M4Qa9AqY7

Compressor summary: The paper introduces a method called Differentiable Adjacency Test (DAT) to learn causal graphs from large scale data using neural networks, which improves prediction accuracy for interventions.


The Privacy Power of Correlated Noise in Decentralized Learning

https://openreview.net/forum?id=5JrlywYHRi

Compressor summary: Decor is a decentralized learning method that uses randomness seeds and correlated noises to protect users' privacy while maintaining optimal trade-offs between privacy and utility.


Kepler codebook

https://openreview.net/forum?id=5ILo43JIzg

Compressor summary: The paper proposes a new codebook method for learning image representations, inspired by Kepler's Conjecture, which improves generation and reconstruction tasks on various datasets.


Vision Transformers as Probabilistic Expansion from Learngene

https://openreview.net/forum?id=5ExWEazod5

Compressor summary: PEG is a method that samples and initializes Vision Transformers with elastic scales based on a probabilistic mixture approach, preserving their knowledge and adapting to different resource constraints.


Discovering Symmetry Breaking in Physical Systems with Relaxed Group Convolution

https://openreview.net/forum?id=59oXyDTLJv

Compressor summary: The paper presents relaxed group convolutions as a flexible technique to learn asymmetries in data and uncover interpretable symmetry-breaking factors in various physical systems.


Breadth-First Exploration on Adaptive Grid for Reinforcement Learning

https://openreview.net/forum?id=59MYoLghyk

Compressor summary: BEAG is a graph construction method for goal-conditioned RL that efficiently explores subgoals and avoids unattainable ones using adaptive grid refinement.


Learning Optimal Projection for Forecast Reconciliation of Hierarchical Time Series

https://openreview.net/forum?id=55HfvJ6lDB

Compressor summary: The paper proposes a new method for forecasting hierarchical time series that learns the optimal oblique projection from data, which improves accuracy and adaptability compared to existing methods.


SparseTSF: Modeling Long-term Time Series Forecasting with *1k* Parameters

https://openreview.net/forum?id=54NSHO0lFe

Compressor summary: SparseTSF is a lightweight model that simplifies long-term time series forecasting by decoupling periodicity and trend, using fewer than 1k parameters and generalizing well with limited resources.


Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem

https://openreview.net/forum?id=53iSXb1m8w

Compressor summary: This paper investigates the cause of poor transfer in fine-tuning RL models and shows that standard knowledge retention techniques can mitigate it, leading to better performance in challenging environments.


PAGER: Accurate Failure Characterization in Deep Regression Models

https://openreview.net/forum?id=5353dJE9Ek

Compressor summary: PAGER is a framework that uses both uncertainty and non-conformity scores to detect and characterize failures in deep regression models.


Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

https://openreview.net/forum?id=51iwkioZpn

Compressor summary: The paper introduces CPO, a novel method to improve machine translation for moderate-sized large language models by avoiding adequate but not perfect translations, achieving performance comparable to or better than state-of-the-art models.


Pricing with Contextual Elasticity and Heteroscedastic Valuation

https://openreview.net/forum?id=51gXk4BISH

Compressor summary: The authors propose a novel method to model customer demand in online dynamic pricing problems with feature-based price elasticity and show that their Pricing with Perturbation (PwP) algorithm achieves near-optimal regret, while offering insights for practical pricing strategies.


Quantum Implicit Neural Representations

https://openreview.net/forum?id=50vc4HBuKU

Compressor summary: QIREN is a quantum version of Fourier Neural Networks that can better represent high-frequency components in signals, leading to improved performance in various tasks.


Probabilistic Modeling of Interpersonal Coordination Processes

https://openreview.net/forum?id=4zOZ0yKhm6

Compressor summary: The authors propose a probabilistic model that captures coordination between agents as temporal influence and show that it predicts team performance in a virtual search and rescue mission using speech and semantics.


Privacy-Preserving Data Release Leveraging Optimal Transport and Particle Gradient Descent

https://openreview.net/forum?id=4zN9tvZfns

Compressor summary: PrivPGD is a new method for creating private data synthesis of protected tabular datasets, using optimal transport and particle gradient descent, which works better than current methods and can handle extra constraints.


Premise Order Matters in Reasoning with Large Language Models

https://openreview.net/forum?id=4zAHgkiCQg

Compressor summary: Large language models are sensitive to the order of premises in reasoning tasks, which can cause a significant drop in performance if the order does not match the context required for intermediate steps.


Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL

https://openreview.net/forum?id=4ye2I5OelI

Compressor summary: The paper studies how many samples are needed to learn Nash Equilibrium policies in Mean-Field Games using model-based RL with strategic exploration, and proposes a new complexity measure and algorithm that make the problem easier than previously thought.


An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems

https://openreview.net/forum?id=4uTJfGYA2t

Compressor summary: The paper presents a novel ASR framework for dialog systems that adapts to conversational context and user feedback using student-teacher learning, context-aware processing, and contrastive self-supervision with Ohm.


Sample-specific Masks for Visual Reprogramming-based Prompting

https://openreview.net/forum?id=4sikyurTLX

Compressor summary: The paper proposes a new framework called sample-specific multi-channel masks (SMM) to improve visual reprogramming by generating customized masks for each input image instead of using a shared and pre-defined one.


Mean-field Underdamped Langevin Dynamics and its Spacetime Discretization

https://openreview.net/forum?id=4qsduFJDEB

Compressor summary: The N-particle underdamped Langevin algorithm is a new method for optimizing non-linear functionals related to mean-field neural networks and other problems, with improved convergence guarantees.


Online Learning with Bounded Recall

https://openreview.net/forum?id=4pFgOzKF76

Compressor summary: The paper studies online learning algorithms with limited memory, shows that common approaches have high regret, and proposes a better algorithm that depends on the order of past rewards.


$\bf{\Phi}_\textrm{Flow}$: Differentiable Simulations for PyTorch, TensorFlow and Jax

https://openreview.net/forum?id=4oD0tRrUOX

Compressor summary: $\Phi_ extrm{Flow}$ is a Python toolkit that simplifies writing differentiable simulation code across different ML libraries and provides many advanced features for scientific applications.


GroupCover: A Secure, Efficient and Scalable Inference Framework for On-device Model Protection based on TEEs

https://openreview.net/forum?id=4mU6LNMaIu

Compressor summary: The paper proposes GroupCover, a new obfuscation method for DNN models that uses randomization and mutual covering to protect against model-stealing attacks and improves security over existing solutions.


High-dimensional Linear Bandits with Knapsacks

https://openreview.net/forum?id=4lghifYrSU

Compressor summary: The paper proposes an online hard thresholding algorithm for contextual bandits with knapsack constraints, which exploits sparsity to reduce regret in high-dimensional settings.


Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation

https://openreview.net/forum?id=4jqOV6NlUz

Compressor summary: The paper proposes a new method to measure and improve Retrieval-Augmented Large Language Models' accuracy by generating synthetic exams based on task-specific documents and using Item Response Theory.


Equivariant Frames and the Impossibility of Continuous Canonicalization

https://openreview.net/forum?id=4iy0q0carb

Compressor summary: Weighted frames improve equivariant neural networks by preserving function continuity and avoiding discontinuity issues caused by unweighted frame-averaging.


Generalized Smooth Variational Inequalities: Methods with Adaptive Stepsizes

https://openreview.net/forum?id=4iBJyJeBX5

Compressor summary: The paper explores relaxing assumptions in variational inequality problems, studies structured non-monotone generalized smoothness, and shows convergence results for three methods using adaptive stepsizes.


Unveiling Privacy, Memorization, and Input Curvature Links

https://openreview.net/forum?id=4dxR7awO5n

Compressor summary: This paper explores the relationship between memorization, differential privacy, and input loss curvature in deep neural networks and provides both theoretical and empirical evidence.


SAPG: Split and Aggregate Policy Gradients

https://openreview.net/forum?id=4dOJAfXhNV

Compressor summary: The paper introduces SAPG, a new on-policy RL algorithm that improves performance in large-scale parallelized environments by chunking and importance sampling.


Preventing Model Collapse in Gaussian Process Latent Variable Models

https://openreview.net/forum?id=4byOXWrJay

Compressor summary: The paper proposes a new GPLVM model that improves kernel flexibility and projection noise to avoid model collapse and achieve better latent representations and missing data imputation.


Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization

https://openreview.net/forum?id=4boDu42RtE

Compressor summary: The paper proposes $E^3$-FaceNet, a fast and accurate network for text-to-3D face generation and manipulation that uses a direct mapping from text to 3D visual space and enhances semantic alignment and geometric consistency.


Certifiably Byzantine-Robust Federated Conformal Prediction

https://openreview.net/forum?id=4axAQHwBOE

Compressor summary: Rob-FCP is a novel framework that provides robust federated conformal prediction in Byzantine settings, effectively countering malicious clients and preserving coverage guarantees.


LEVI: Generalizable Fine-tuning via Layer-wise Ensemble of Different Views

https://openreview.net/forum?id=4ZrppmS42b

Compressor summary: LEVI is a novel method to improve fine-tuning generalization by adaptively ensembling pre-trained and task-specific models layer-wise, addressing limitations in both data sources.


Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

https://openreview.net/forum?id=4Zr7T6UrBS

Compressor summary: The paper proposes a new method for offline learning, called Primal Wasserstein DICE, which uses a contrastively learned distance metric to minimize the gap between the learner's and expert's state occupancies.


Detecting Any instruction-to-answer interaction relationship:Universal Instruction-to-Answer Navigator for Med-VQA

https://openreview.net/forum?id=4XxsheIbtn

Compressor summary: Uni-Med is a framework for interpreting medical images using instructions and generating visual explanations, improving Med-VQA accuracy.


Position: Beyond Personhood: Agency, Accountability, and the Limits of Anthropomorphic Ethical Analysis

https://openreview.net/forum?id=4XlGXIh2BB

Compressor summary: The text discusses two contrasting views on the ethical agency of AI and its implications for system design and accountability.


Case-Based or Rule-Based: How Do Transformers Do the Math?

https://openreview.net/forum?id=4Vqr8SRfyX

Compressor summary: The text discusses how large language models struggle with simple math problems like addition, relying on case-based reasoning instead of rule-based reasoning. The authors propose a technique called Rule-Following Fine-Tuning (RFFT) to teach LLMs to use rules and improve their generalization ability.


Coresets for Multiple $\ell_p$ Regression

https://openreview.net/forum?id=4UWjqrMmFp

Compressor summary: The paper presents new coreset constructions for multiple $\ell_p$ regression and related problems with optimal or near-optimal size and approximation guarantees.


Self-Attention through Kernel-Eigen Pair Sparse Variational Gaussian Processes

https://openreview.net/forum?id=4RqG4K5UwL

Compressor summary: KEP-SVGP is a method to estimate uncertainty in self-attention models using asymmetric kernels and reduced complexity.


Deep Fusion: Efficient Network Training via Pre-trained Initializations

https://openreview.net/forum?id=4PuM6iGPPi

Compressor summary: The paper proposes a theoretical framework and an efficient network training approach called Deep Fusion to reduce the cost of training large language models while maintaining performance.


SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

https://openreview.net/forum?id=4PB1RMsUy4

Compressor summary: The authors propose a new spiking mechanism for language tasks that generalizes better than existing methods and reduces the performance gap between spiking neural networks and artificial neural networks.


To the Max: Reinventing Reward in Reinforcement Learning

https://openreview.net/forum?id=4KQ0VwqPg8

Compressor summary: The paper proposes max-reward RL, an approach to learn from rewards that optimizes the maximum reward instead of the cumulative one, and shows its advantages in goal-reaching tasks over standard RL.


Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

https://openreview.net/forum?id=4HCi7JGCZk

Compressor summary: The paper proposes a size-invariant evaluation method for salient object detection that improves the performance in detecting objects of different sizes.


VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

https://openreview.net/forum?id=4G5Dcjcm1s

Compressor summary: The paper introduces VinT-6D, a large dataset for object-in-hand pose estimation using vision, touch, and proprioception, and presents a benchmark method that fuses multi-modal information to improve robotic manipulation.


Aligning Transformers with Weisfeiler-Leman

https://openreview.net/forum?id=4FJJfYjUQR

Compressor summary: The paper proposes improved graph transformer architectures aligned with the Weisfeiler--Leman hierarchy, achieving better expressivity and practicality, as well as studying positional encodings and testing on a large molecule dataset.


From Geometry to Causality- Ricci Curvature and the Reliability of Causal Inference on Networks

https://openreview.net/forum?id=4DAl3IsvlU

Compressor summary: The authors establish a link between network geometry and causal inference, showing that negative curvature can hinder estimating causal parameters and proposing a method using geometric Ricci flow to reduce estimation error in networked data.


Conformal Prediction Sets Improve Human Decision Making

https://openreview.net/forum?id=4CO45y7Mlv

Compressor summary: The study shows that using conformal prediction sets, which provide alternative answers with varying levels of certainty, improves human accuracy in decision making tasks compared to fixed-size prediction sets.


PerceptAnon: Exploring the Human Perception of Image Anonymization Beyond Pseudonymization for GDPR

https://openreview.net/forum?id=4BWCecFEcQ

Compressor summary: The text proposes PerceptAnon, a learning-based metric that evaluates the privacy of anonymized images based on human perception of anonymity and contextual backgrounds, and introduces a curated dataset for assessing image anonymization.


Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making

https://openreview.net/forum?id=4BIOZSz7zU

Compressor summary: This paper investigates non-Markovian fairness, where multiple stakeholders are affected by sequential decision making processes that depend on history and need to be assessed at different time points.


A Probabilistic Approach to Learning the Degree of Equivariance in Steerable CNNs

https://openreview.net/forum?id=49vHLSxjzy

Compressor summary: The paper proposes a probabilistic method to learn the degree of equivariance in steerable convolutional neural networks, which model geometric symmetries and can handle mixed symmetries.


Eluder-based Regret for Stochastic Contextual MDPs

https://openreview.net/forum?id=47jMS97wJX

Compressor summary: E-UC$^3$RL is an algorithm for regret minimization in stochastic CMDPs with efficient and rate-optimal performance under minimal assumptions and general offline function approximation setting.


Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

https://openreview.net/forum?id=47ahBl70xb

Compressor summary: This paper explores the expressive power of deep neural networks combining linear RNNs and MLPs, showing that using complex numbers in the recurrence can improve information storage and help with long-range reasoning tasks.


Accelerating Look-ahead in Bayesian Optimization: Multilevel Monte Carlo is All you Need

https://openreview.net/forum?id=46vXhZn7lN

Compressor summary: The paper proposes using multilevel Monte Carlo to improve nested Bayesian optimization methods, achieving better convergence rates and avoiding smoothness assumptions.


Perturb-and-Project: Differentially Private Similarities and Marginals

https://openreview.net/forum?id=45HNimd4YI

Compressor summary: The text describes new efficient algorithms for private data release, especially for cosine similarities and high-dimensional marginal queries, with improved guarantees for sparse datasets and theoretical explanation for their effectiveness.


Position: Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized

https://openreview.net/forum?id=44qxX6Ty6F

Compressor summary: This paper proposes using randomness in machine learning algorithms to fairly distribute scarce resources, considering individual claims and interests.


Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

https://openreview.net/forum?id=43HZG9zwaj

Compressor summary: Diffusion tempering is a novel technique that improves gradient-based optimization of parameters in ordinary differential equations by reducing noise in probabilistic numerical methods.


InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization

https://openreview.net/forum?id=40hCy8n5XH

Compressor summary: The paper introduces InfoNet, a neural network that efficiently estimates mutual information between data streams using attention mechanism and deep learning infrastructures, enabling real-time applications like test-time optimization of neural networks or end-to-end learning.


Generalization Analysis of Deep Non-linear Matrix Completion

https://openreview.net/forum?id=40foON48am

Compressor summary: The paper studies matrix completion with constraints and provides sample complexity bounds, a new weighted trace norm, and a non-linear model (FRMC) that improves performance on real data.


Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation

https://openreview.net/forum?id=3xPMW9JURD

Compressor summary: The paper presents a new framework for learning neural network (NN) controllers with Lyapunov stability guarantees using fast empirical falsification and strategic regularizations, without relying on expensive solvers for stability verification.


ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

https://openreview.net/forum?id=3umNqxjFad

Compressor summary: The paper presents a modified A3C algorithm that uses ReLU, spectral normalization, dropout, and Thompson sampling to improve approximate Bayesian inference for deep reinforcement learning.


Cross-Domain Policy Adaptation by Capturing Representation Mismatch

https://openreview.net/forum?id=3uPSQmjXzd

Compressor summary: Key points: - The paper proposes a representation learning method to deal with dynamics mismatch in RL - The method measures and penalizes representation deviations from the source domain as a reward signal - The method shows strong performance on various tasks with kinematic and morphology mismatch Summary: The paper presents a novel decoupled representation learning approach for RL that adapts to dynamics mismatch by using representation deviation as a reward penalty, and demonstrates its effectiveness on different tasks.


CHEMREASONER: Heuristic Search over a Large Language Model’s Knowledge Space using Quantum-Chemical Feedback

https://openreview.net/forum?id=3tJDnEszco

Compressor summary: The authors present an AI-guided framework that combines linguistic reasoning with quantum-chemistry based feedback to discover new and efficient catalysts for sustainable chemical processes.


Biharmonic Distance of Graphs and its Higher-Order Variants: Theoretical Properties with Applications to Centrality and Clustering

https://openreview.net/forum?id=3pxMIjB9QK

Compressor summary: The paper introduces and studies the biharmonic distance, a measure of edge importance for graph topology, and develops algorithms based on it.


Improved Generalization of Weight Space Networks via Augmentations

https://openreview.net/forum?id=3o7G6tIo4X

Compressor summary: The paper analyzes overfitting in deep weight space models and proposes a MixUp method for data augmentation to improve performance in classification and contrastive learning tasks.


Regression Learning with Limited Observations of Multivariate Outcomes and Features

https://openreview.net/forum?id=3nlBesNxcm

Compressor summary: The paper proposes efficient algorithms for multivariate linear regression with missing data, using $L_2$ and $L_1$ loss functions and penalties, and provides rigorous error bounds and experimental results.


Prompting a Pretrained Transformer Can Be a Universal Approximator

https://openreview.net/forum?id=3mQ6ZKTSQl

Compressor summary: The paper shows that prompting and prefix-tuning can universally approximate sequence-to-sequence functions with smaller pretrained models, especially using attention mechanisms.


Position: Data Authenticity, Consent, & Provenance for AI are all broken: what will it take to fix them?

https://openreview.net/forum?id=3hSTecKy1b

Compressor summary: The text discusses challenges in training foundation models due to data collection issues and suggests using universal data provenance standards for more ethical and trustworthy development.


How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers

https://openreview.net/forum?id=3eHNvPHL9Z

Compressor summary: The paper investigates why over-parameterized neural networks, trained to perfectly fit the data, generalize well and shows that it is because a flat prior over the network parameters induces a rich prior over the functions, leading to simpler functions that require fewer parameters.


DoRA: Weight-Decomposed Low-Rank Adaptation

https://openreview.net/forum?id=3d5CIRG1n2

Compressor summary: The paper proposes DoRA, a method that combines LoRA and weight decomposition to improve fine-tuning of language models without increasing inference costs.


Statistically Optimal Generative Modeling with Maximum Deviation from the Empirical Distribution

https://openreview.net/forum?id=3ash2ksk1r

Compressor summary: The paper analyzes Wasserstein GANs, which generate diverse examples without replication or deviating from the empirical distribution, while maintaining statistical optimality.


Making Old Things New: A Unified Algorithm for Differentially Private Clustering

https://openreview.net/forum?id=3ajK5xplDL

Compressor summary: The paper presents a modified 20-year-old algorithm that can perform private clustering under various privacy models, including a new one for continual observations.


DySLIM: Dynamics Stable Learning by Invariant Measure for Chaotic Systems

https://openreview.net/forum?id=3abgRKnK1W

Compressor summary: The authors propose a new learning framework for dissipative chaotic systems that targets the invariant measure and dynamics, resulting in better point-wise tracking and long-term statistical accuracy.


Auto-Linear Phenomenon in Subsurface Imaging

https://openreview.net/forum?id=3ZM8MXGFRA

Compressor summary: The paper presents Auto-Linear, a self-supervised method for subsurface imaging that achieves better performance, smaller model size, and stronger generalization than existing methods.


In-Context Language Learning: Architectures and Algorithms

https://openreview.net/forum?id=3Z9CRr5srL

Compressor summary: The paper studies in-context learning (ICL) of neural language models using regular languages generated by random finite automata, and shows that Transformers outperform other sequence models by computing in-context n-gram statistics with specialized attention heads.


Dynamic Correlation Clustering in Sublinear Update Time

https://openreview.net/forum?id=3YG55Lbcnr

Compressor summary: The paper proposes a correlation clustering algorithm for dynamic vertex streams that approximates the optimal solution with low update time and performs well on real data.


FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

https://openreview.net/forum?id=3XG69ZmfsB

Compressor summary: FreeBind is a method to enhance multimodal representation spaces by integrating knowledge from extra expert spaces using "space bonds", leading to improved performance on audio-image-text tasks.


PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs

https://openreview.net/forum?id=3WCvnkHnxV

Compressor summary: PrE-Text generates differentially private synthetic text data, enabling more efficient and privacy-friendly training of small models and finetuning large language models on user devices.


Provably Better Explanations with Optimized Aggregation of Feature Attributions

https://openreview.net/forum?id=3VnSgdget6

Compressor summary: The authors propose a novel method to combine multiple feature attribution techniques to improve the quality of explanations for opaque machine learning models, achieving better robustness and faithfulness to the model behavior.


Borda Regret Minimization for Generalized Linear Dueling Bandits

https://openreview.net/forum?id=3Tzdpjc59k

Compressor summary: The paper proposes new dueling bandit models and algorithms for minimizing Borda regret in recommendation systems and ranking, achieving optimal regret bounds.


Position: Embracing Negative Results in Machine Learning

https://openreview.net/forum?id=3RXAiU7sss

Compressor summary: The paper advocates for publishing "negative" results in machine learning research to improve the scientific output and address issues caused by focusing solely on predictive performance.


MALIBO: Meta-learning for Likelihood-free Bayesian Optimization

https://openreview.net/forum?id=3QM5SWfeov

Compressor summary: Our novel meta-learning BO approach learns query utility, models task uncertainty, and adapts robustly to new tasks without relying on surrogate models.


Differentiable Combinatorial Scheduling at Scale

https://openreview.net/forum?id=3Pq6uI1MTE

Compressor summary: The paper proposes a novel differentiable combinatorial scheduling framework that uses Gumbel-Softmax sampling to handle resource-constrained scheduling problems efficiently and effectively, outperforming existing solvers.


Multimodal Prototyping for cancer survival prediction

https://openreview.net/forum?id=3MfvxH3Gia

Compressor summary: The authors propose a novel multimodal survival method that compresses gigapixel histology images and transcriptomic profiles using morphological and pathway prototypes, enabling more efficient and interpretable patient prognostication and stratification.


How Flawed Is ECE? An Analysis via Logit Smoothing

https://openreview.net/forum?id=3McL91pE6x

Compressor summary: The paper introduces Logit-Smoothed ECE, a continuous and easy-to-estimate metric for measuring calibration, and compares it with existing methods like binned ECE on image classification models.


Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

https://openreview.net/forum?id=3MW8GKNyzI

Compressor summary: Chatbot Arena is an open platform for evaluating large language models based on human preferences using pairwise comparisons and crowdsourcing, with a strong foundation of credibility and wide recognition.


Expand-and-Cluster: Parameter Recovery of Neural Networks

https://openreview.net/forum?id=3MIuPRJYwf

Compressor summary: The paper presents a method to identify neural network weights using imitation and clustering of overparameterised networks with different activation functions.


Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

https://openreview.net/forum?id=3KxPo62PYn

Compressor summary: The paper proposes adaptive models that can change their shape during training to reduce the required compute and outperform static models.


Local Causal Structure Learning in the Presence of Latent Variables

https://openreview.net/forum?id=3KMMPxrAk5

Compressor summary: The paper proposes a method to identify local causal structures from observational data that can handle latent variables, using m-separation and V-structures, with theoretical consistency results and experimental validation.


Learning to Reach Goals via Diffusion

https://openreview.net/forum?id=3JhmHCVPa8

Compressor summary: The text proposes Merlin, a novel goal-conditioned reinforcement learning method based on denoising diffusion models, which improves performance and efficiency in offline goal-reaching tasks.


Temporal Spiking Neural Networks with Synaptic Delay for Graph Reasoning

https://openreview.net/forum?id=3FeYlKIPr3

Compressor summary: Spiking neural networks with synaptic delay and temporal coding can perform human-like graph reasoning efficiently and energy-savingly.


Neuroexplicit Diffusion Models for Inpainting of Optical Flow Fields

https://openreview.net/forum?id=3FKEtlX4aM

Compressor summary: The paper proposes a hybrid approach combining PDE-based and deep learning models to improve inpainting of optical flow fields, achieving better performance than existing methods.


Logistic Variational Bayes Revisited

https://openreview.net/forum?id=3FBO41d4T2

Compressor summary: The paper introduces a new bound for the expectation of softplus function that improves variational logistic regression and Gaussian process classification by being tighter, faster and not requiring extending the variational family or adding extra parameters.


Rethinking Generative Large Language Model Evaluation for Semantic Comprehension

https://openreview.net/forum?id=3Cp042s1Nc

Compressor summary: The paper proposes a new rating system for large language models using a competitive format and a real-world questions benchmark, addressing issues with existing evaluation methods like MCQA.


Single-Trajectory Distributionally Robust Reinforcement Learning

https://openreview.net/forum?id=3B6vmW2L80

Compressor summary: The paper proposes a new model-free distributionally robust Q-learning algorithm (DRQ) that learns optimal policies from single trajectories with asymptotic convergence guarantees, achieving better robustness and sample complexity than existing methods.


Offline Multi-Objective Optimization

https://openreview.net/forum?id=3AuoStfUIH

Compressor summary: The paper introduces a benchmark for offline multi-objective optimization (MOO) and analyzes how existing methods can be adapted to this challenging problem, with the goal of advancing the field.


When Do Skills Help Reinforcement Learning? A Theoretical Analysis of Temporal Abstractions

https://openreview.net/forum?id=39UqOkTjFn

Compressor summary: The paper characterizes the utility of deterministic skills in sparse-reward environments with finite action spaces, showing that they are more beneficial for exploration than learning and that unexpressive skills may worsen performance.


Explaining Probabilistic Models with Distributional Values

https://openreview.net/forum?id=37xFIeYgE0

Compressor summary: The paper proposes distributional values, a new way to explain probabilistic machine learning models using cooperative game theory, by tracking changes in the model output for different actions or strategies.


A Nearly Optimal Single Loop Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness

https://openreview.net/forum?id=36rWa8zVkh

Compressor summary: The paper proposes SLIP, a single-loop algorithm for nonconvex bilevel optimization, which improves upon existing nested loop methods and achieves near-optimal complexity with no mean-square smoothness assumptions.


Projection-Free Variance Reduction Methods for Stochastic Constrained Multi-Level Compositional Optimization

https://openreview.net/forum?id=36jWuAmGRC

Compressor summary: The paper proposes new projection-free algorithms for stochastic constrained multi-level optimization with improved complexities and applicability to various criteria and function types.


Learning a Diffusion Model Policy from Rewards via Q-Score Matching

https://openreview.net/forum?id=35ahHydjXo

Compressor summary: The paper introduces Q-score matching, a new off-policy reinforcement learning algorithm that leverages the score-based structure of diffusion models for better actor-critic settings and exploration in continuous domains.


REMEDI: Corrective Transformations for Improved Neural Entropy Estimation

https://openreview.net/forum?id=321GwKMtxO

Compressor summary: REMEDI is a novel method for estimating information theoretic quantities like differential entropy with improved accuracy on synthetic and natural data, and can be extended to Information Bottleneck and generative modeling tasks.


Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models

https://openreview.net/forum?id=30waYPIZUA

Compressor summary: The authors develop a theory to understand and improve deep transformer models, enabling them to perform better across various tasks and datasets.


Beyond the Federation: Topology-aware Federated Learning for Generalization to Unseen Clients

https://openreview.net/forum?id=2zLt2Odckx

Compressor summary: TFL is a method that uses client relationships graph to train robust models against unseen data in distributed settings.


Hybrid Inverse Reinforcement Learning

https://openreview.net/forum?id=2zI2scD2Iz

Compressor summary: The paper proposes using hybrid reinforcement learning to reduce unnecessary exploration in imitation learning with inverse reinforcement learning, leading to better sample efficiency.


Offline Training of Language Model Agents with Functions as Learnable Weights

https://openreview.net/forum?id=2xbkWiEuR1

Compressor summary: The paper introduces AgentOptimizer, a novel method to train language models as agents without modifying their weights, improving their performance on various tasks.


The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

https://openreview.net/forum?id=2xLyc5TkFl

Compressor summary: This paper investigates conformal prediction's uncertainty in adversarially trained models and proposes a new adversarial training method to improve predictive uncertainty.


Switched Flow Matching: Eliminating Singularities via Switching ODEs

https://openreview.net/forum?id=2ulUrcOZ64

Compressor summary: The paper proposes Switched FM, a method that solves singularity problems in continuous-time generative models by switching neural ODEs, improving sampling efficiency and compatibility with advanced techniques.


Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining

https://openreview.net/forum?id=2rPoTgEmjV

Compressor summary: The paper compares theoretical aspects of two generative self-supervised learning paradigms, autoregressive and masked, and proposes new objectives to improve their performance in classification and content generation tasks.


The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling

https://openreview.net/forum?id=2pYTCy4GUV

Compressor summary: Skip-Tuning is a training-free method that improves diffusion probabilistic models' performance and image quality by adjusting skip connections in UNet architectures.


Stochastic Bandits with ReLU Neural Networks

https://openreview.net/forum?id=2hidpjUPvV

Compressor summary: The paper proposes an algorithm that uses one-layer ReLU neural networks to solve stochastic bandit problems with near-optimal regret by exploiting their piecewise linear structure and transforming the problem into a linear bandit.


New Sample Complexity Bounds for Sample Average Approximation in Heavy-Tailed Stochastic Programming

https://openreview.net/forum?id=2hWd4CVhXz

Compressor summary: The paper explores how sample average approximation (SAA) can solve convex stochastic programming problems with less computational complexity.


Kernel-Based Evaluation of Conditional Biological Sequence Models

https://openreview.net/forum?id=2dlmcTXfcY

Compressor summary: The paper introduces kernel-based tools using ACMMD to evaluate and tune conditional sequence models in computational biology, such as ProteinMPNN.


Do Topological Characteristics Help in Knowledge Distillation?

https://openreview.net/forum?id=2dEH0u8w0b

Compressor summary: TopKD is a novel knowledge distillation method that transfers global topology information from larger to smaller networks, using persistence diagrams to capture geometric structures in latent spaces.


PDHG-Unrolled Learning-to-Optimize Method for Large-Scale Linear Programming

https://openreview.net/forum?id=2cXzNDe614

Compressor summary: The paper proposes a new neural network architecture called PDHG-Net that combines first-order methods and learning to optimize to solve large-scale linear programming problems faster than existing methods.


Editing Partially Observable Networks via Graph Diffusion Models

https://openreview.net/forum?id=2cEhQ4vtTf

Compressor summary: SGDM is a graph generative framework that improves network quality by correcting corruptions, inferring missing nodes and edges, and performing conditional generation tasks.


Towards Efficient Training and Evaluation of Robust Models against $l_0$ Bounded Adversarial Perturbations

https://openreview.net/forum?id=2bUFIsg2f5

Compressor summary: The paper proposes a new white-box attack method for generating sparse, $l_0$-bounded adversarial perturbations and shows that adversarial training can improve the robustness of models against such attacks.


Prompt Sketching for Large Language Models

https://openreview.net/forum?id=2Yu5FWdzde

Compressor summary: Prompt sketching is a new way to communicate with large language models by providing intermediate instructions during text generation, leading to better results and more control over the process.


Revisiting the Power of Prompt for Visual Tuning

https://openreview.net/forum?id=2Y93PtAqCl

Compressor summary: The study proposes a method to improve visual prompt tuning by initializing prompts with downstream token prototypes and optimizing token construction, achieving better performance and adaptability for downstream tasks.


Online conformal prediction with decaying step sizes

https://openreview.net/forum?id=2XkRIijUKw

Compressor summary: The method predicts a distribution's quantile online with retrospective guarantee of coverage and improved practical properties for stable distributions.


Implicit Representations via Operator Learning

https://openreview.net/forum?id=2W3KUAaZgO

Compressor summary: The paper proposes Operator INR, an alternative to Implicit Neural Representations that uses integral transforms and convolutions for better performance in compression, synthesis, and data understanding tasks.


Explaining Graph Neural Networks via Structure-aware Interaction Index

https://openreview.net/forum?id=2T00oYk54P

Compressor summary: The Myerson-Taylor interaction index is a new tool for explaining how graph neural networks work by considering both node importance and graph structure, outperforming existing methods in experiments.


A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

https://openreview.net/forum?id=2Sl0lPF6ka

Compressor summary: The paper analyzes the convergence behavior of a classification model called Mixture-of-experts with softmax gating and proposes modified gating functions to improve its performance.


Human Alignment of Large Language Models through Online Preference Optimisation

https://openreview.net/forum?id=2RQqg2Y7Y6

Compressor summary: The paper shows that two alignment methods are equivalent, introduces a new method that combines them, and tests it on a summarization task.


Efficient PAC Learnability of Dynamical Systems Over Multilayer Networks

https://openreview.net/forum?id=2PVjIQdq7N

Compressor summary: This paper proposes an efficient algorithm to learn the behavior of unknown dynamical systems over multilayer networks using few training examples, and analyzes the model complexity.


Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models

https://openreview.net/forum?id=2P6GVfSrfZ

Compressor summary: The text proposes a method called AurA to reduce toxicity in large language models by adjusting neuron activation levels based on their ability to discriminate toxic sentences, achieving significant reduction in toxicity while preserving common-sense abilities.


UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation Learning

https://openreview.net/forum?id=2NfpFwJfKu

Compressor summary: The paper introduces UniCorn, a new pre-training framework for molecular foundation models that combines three existing methods and achieves state-of-the-art performance on various molecular tasks.


Diffusion Models Demand Contrastive Guidance for Adversarial Purification to Advance

https://openreview.net/forum?id=2NUGeV64y2

Compressor summary: The authors propose a new method to defend against adversarial attacks using diffusion models guided by contrastive loss, which significantly improves performance on various datasets and classifiers.


Breaking through the learning plateaus of in-context learning in Transformer

https://openreview.net/forum?id=2K87GFLYWz

Compressor summary: The study explores the causes and solutions for learning plateaus in Transformers' in-context learning, improving their performance with three strategies.


Geometric Active Exploration in Markov Decision Processes: the Benefit of Abstraction

https://openreview.net/forum?id=2JYOxcGlRe

Compressor summary: The text discusses using a combination of Reinforcement Learning and MDP homomorphisms to design efficient experiments over complex systems with natural geometries.


Stochastic Localization via Iterative Posterior Sampling

https://openreview.net/forum?id=2Gr5wZR6uc

Compressor summary: The text describes a new method, SLIPS, for sampling from unnormalized target distributions using stochastic localization techniques.


A Differentiable Partially Observable Generalized Linear Model with Forward-Backward Message Passing

https://openreview.net/forum?id=2FKzbEE24s

Compressor summary: The text proposes a new differentiable partial observable generalized linear model that improves variational inference for learning neural connectivities from spike trains.


Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation

https://openreview.net/forum?id=2FHWFG5ahw

Compressor summary: AHAC is a model-based reinforcement learning algorithm that adapts the simulation horizon to avoid stiff dynamics, achieving better performance and efficiency than model-free methods in continuous control tasks.


On the Duality Between Sharpness-Aware Minimization and Adversarial Training

https://openreview.net/forum?id=2B2U5kkGUA

Compressor summary: The paper explores how Sharpness-Aware Minimization, which modifies model weights instead of input samples, can improve adversarial robustness without sacrificing clean accuracy.


HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning

https://openreview.net/forum?id=2Asakozn3Z

Compressor summary: HarmoDT is a novel offline multi-task reinforcement learning method that uses meta-learning to find an optimal harmony subspace of parameters for each task, enhancing the performance of a unified policy.


Sliced-Wasserstein Estimation with Spherical Harmonics as Control Variates

https://openreview.net/forum?id=28SEr5iFyT

Compressor summary: The paper proposes a new Monte Carlo method, SHCV, that uses spherical harmonics as control variates to approximate the Sliced-Wasserstein distance with improved convergence rate and theoretical properties.


ESM All-Atom: Multi-Scale Protein Language Model for Unified Molecular Modeling

https://openreview.net/forum?id=283cGgWfM2

Compressor summary: ESM-AA is a novel approach for protein language modeling that can handle both atom and residue levels, improving performance in protein-molecule tasks.


Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient

https://openreview.net/forum?id=24zMewdzyJ

Compressor summary: The paper presents a new policy optimization method that combines risk-neutral algorithms with predicted CVaR contributions using reweighting.


Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms

https://openreview.net/forum?id=23tMOWscus

Compressor summary: The paper introduces new methods for estimating the feasible reward set of an expert agent from offline datasets in inverse reinforcement learning, considering the limitations of the data coverage.


Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation

https://openreview.net/forum?id=1zFkjbTgwC

Compressor summary: Quasi-Givens Orthogonal Fine-Tuning (qGOFT) is a method that improves parameter efficiency and downstream adaptation by using Givens rotations for orthogonal transformations in the parameter space.


Non-convex Stochastic Composite Optimization with Polyak Momentum

https://openreview.net/forum?id=1ySQI9LE4w

Compressor summary: The paper proves that the stochastic proximal gradient method with Polyak momentum works well for non-convex problems regardless of batch size and shows its benefits in composite optimization settings.


Retrieval-Augmented Score Distillation for Text-to-3D Generation

https://openreview.net/forum?id=1xKgDANODx

Compressor summary: ReDream uses retrieval to improve text-to-3D generation by incorporating 3D geometry and adapting the diffusion model's prior.


Aligned Objective for Soft-Pseudo-Label Generation in Supervised Learning

https://openreview.net/forum?id=1wzdf6NjHd

Compressor summary: The text describes a new framework that trains deep neural networks with soft pseudo-labels using a meta-network-parameterized objective function, which improves performance and adapts to different tasks.


EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction

https://openreview.net/forum?id=1vGN3CSxVs

Compressor summary: EquiPocket is an E(3)-equivariant Graph Neural Network that predicts binding sites of target proteins more accurately than existing deep-learning methods by addressing their limitations and using a dense attention output layer.


Not all distributional shifts are equal: Fine-grained robust conformal inference

https://openreview.net/forum?id=1v1oFF3aw0

Compressor summary: The text introduces a method for measuring uncertainty in predictive models when there are changes in the covariate and conditional relationships between the outcome and covariates.


Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models

https://openreview.net/forum?id=1tRLxQzdep

Compressor summary: The text introduces Pruner-Zero, an automatic framework for searching symbolic pruning metrics using genetic programming, which achieves better performance than existing post-training pruning methods for Large Language Models without retraining.


DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

https://openreview.net/forum?id=1sesUtOIH5

Compressor summary: The paper proposes DecisionNCE, a universal objective that learns multimodal representations from image sequences and language instructions for autonomous robots, improving task progressions, temporal consistency, and instruction grounding.


Multi-Track Message Passing: Tackling Oversmoothing and Oversquashing in Graph Learning via Preventing Heterophily Mixing

https://openreview.net/forum?id=1sRuv4cnuZ

Compressor summary: The paper proposes a multi-track graph convolutional network that prevents heterophilic mixing and improves performance on several graph datasets by separating messages according to their category semantics.


ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization

https://openreview.net/forum?id=1puvYh729M

Compressor summary: The paper proposes ACE, a causality-aware actor-critic method with an entropy term and dormancy reset to improve exploration and performance in continuous control tasks.


Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations

https://openreview.net/forum?id=1pj0Sk8GfP

Compressor summary: The text describes how DDPMs can generate images in new regions of the data distribution, such as slightly smiling faces, by combining latent factors learned from separate subsets of the data.


A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

https://openreview.net/forum?id=1oU4FKpVx5

Compressor summary: The paper proposes a pruning technique for MoE models that reduces model size and computation while preserving test accuracy.


Proactive DP: A Multiple Target Optimization Framework for DP-SGD

https://openreview.net/forum?id=1nT6uc3HdY

Compressor summary: The text introduces pro-active DP, an optimization framework for DP-SGD that allows selecting parameters based on a fixed privacy budget and maximizing utility (test accuracy).


Variational Linearized Laplace Approximation for Bayesian Deep Learning

https://openreview.net/forum?id=1n3aC5rvdE

Compressor summary: Key points: - The paper proposes a new method for approximating LLA using a variational sparse GP - The method retains the DNN output as the predictive mean and allows for efficient stochastic optimization - The method outperforms existing efficient variants of LLA in terms of quality and computational time Summary: The paper introduces a novel sparse GP approach to approximate LLA for uncertainty estimation on DNNs, which preserves the DNN output and achieves better performance and efficiency than existing methods.


Towards Certified Unlearning for Deep Neural Networks

https://openreview.net/forum?id=1mf1ISuyS3

Compressor summary: The paper proposes techniques to extend certified unlearning methods to nonconvex deep neural networks, improving efficiency and addressing practical scenarios like sequential unlearning requests.


CATS: Enhancing Multivariate Time Series Forecasting by Constructing Auxiliary Time Series as Exogenous Variables

https://openreview.net/forum?id=1lDAGDe0UR

Compressor summary: CATS is a method that uses auxiliary time series generated from original time series to capture inter-series relationships for improved multivariate time series forecasting with less complexity and parameters.


On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

https://openreview.net/forum?id=1khG2xf1yt

Compressor summary: The paper introduces νPI, a constrained optimization algorithm for neural networks that uses PI controllers to stabilize Lagrange multiplier updates and improve generalization.


Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

https://openreview.net/forum?id=1jHiq640y1

Compressor summary: The paper proposes specialized solvers that improve Bayesian flow networks' sampling quality and speed by connecting them with diffusion models through stochastic differential equations.


Understanding MLP-Mixer as a wide and sparse MLP

https://openreview.net/forum?id=1dtYo5ywXZ

Compressor summary: The text explains how sparseness is a key factor in the success of MLP-based architectures like MLP-Mixer, and how they relate to sparse parameterization and Monarch matrices.


Lie Neurons: Adjoint-Equivariant Neural Networks for Semisimple Lie Algebras

https://openreview.net/forum?id=1bJLl4fY6i

Compressor summary: The paper introduces an adjoint-equivariant neural network that works with data from any finite-dimensional semi-simple Lie algebra and demonstrates its effectiveness on different tasks.


Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

https://openreview.net/forum?id=1ZJLNLZIpk

Compressor summary: The paper investigates local learning optimization for neural networks, proposes a gradient reconciliation strategy between neighboring modules, and shows improved performance and memory efficiency on ImageNet using CNN and Transformer architectures.


Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical and Geometric Constraints

https://openreview.net/forum?id=1YsQI04KaN

Compressor summary: AbX is a new score-based diffusion generative model that uses evolutionary, physical, and geometric constraints to improve antibody design, outperforming other methods in accuracy and binding affinity.


SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity

https://openreview.net/forum?id=1YMjzz2g81

Compressor summary: The paper proposes SPABA, a method for bilevel optimization with optimal sample complexity, and shows its advantages over other stochastic gradient estimators in theory and practice.


OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

https://openreview.net/forum?id=1YDeZU8Lt5

Compressor summary: The authors train and release OpenMoE, a series of open-source decoder-only MoE LLMs, analyze their routing mechanisms, and propose strategies to improve them for future LLM development.


Correcting Diffusion-Based Perceptual Image Compression with Privileged End-to-End Decoder

https://openreview.net/forum?id=1WWpIEFdlk

Compressor summary: This paper proposes a diffusion-based image compression method that uses an end-to-end decoder to improve perceptual quality and guarantee distortion, by analyzing and improving the score function approximation in the diffusion model.


Physics and Lie symmetry informed Gaussian processes

https://openreview.net/forum?id=1V50J0emll

Compressor summary: Physics-informed machine learning integrates observational data and physics models using Gaussian processes, and can benefit from Lie symmetry constraints to improve performance in forward and inverse problems.


Defense against Backdoor Attack on Pre-trained Language Models via Head Pruning and Attention Normalization

https://openreview.net/forum?id=1SiEfsCecd

Compressor summary: PURE is a method to defend against backdoored language models by pruning and normalizing attention weights, reducing the attack success rate without harming clean text performance.


Token-level Direct Preference Optimization

https://openreview.net/forum?id=1RZKuvqYCR

Compressor summary: The paper introduces TDPO, a novel method for fine-tuning LLMs at the token level, using forward KL divergence constraints and the Bradley-Terry model, which improves alignment with human preferences and generation diversity.


Privacy Preserving Adaptive Experiment Design

https://openreview.net/forum?id=1QmFKwVwwI

Compressor summary: The paper studies how to balance CATE estimation accuracy, patient outcome improvement (regret), and data privacy in adaptive experiments using contextual bandits and proposes Pareto optimal algorithms with differential privacy and asymptotic normality.


Nonparametric Teaching of Implicit Neural Representations

https://openreview.net/forum?id=1PMkV6oKw3

Compressor summary: The paper proposes Implicit Neural Teaching (INT), a method that improves the learning of implicit neural representations (INR) by treating it as a nonparametric teaching problem, which leads to faster convergence and reduced training time.


Promoting External and Internal Equities Under Ex-Ante/Ex-Post Metrics in Online Resource Allocation

https://openreview.net/forum?id=1OsRSrkFWl

Compressor summary: The paper introduces two models for fair resource allocation in online settings, one based on external attributes like demand and the other based on internal traits like demographics, and proposes optimal policies for each model using different equity metrics.


EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

https://openreview.net/forum?id=1NdN7eXyb4

Compressor summary: EAGLE is a speculative sampling framework that improves efficiency and preserves quality in LLMs by predicting second-to-top-layer features with advanced token sequences.


PID: Prompt-Independent Data Protection Against Latent Diffusion Models

https://openreview.net/forum?id=1N7pjXKkx8

Compressor summary: The paper investigates how discrepancies between textual prompts used by protectors and exploiters affect existing defenses for privacy in few-shot fine-tuning of Latent Diffusion Models, and proposes a new method called Prompt-Independent Defense (PID) to safeguard privacy.


A Circuit Domain Generalization Framework for Efficient Logic Synthesis in Chip Design

https://openreview.net/forum?id=1KemC8DNa0

Compressor summary: PruneX is a novel data-driven logic synthesis heuristic that reduces ineffective transformations by learning domain-invariant representations based on transformation-invariant domain knowledge and improves the efficiency of existing heuristics.


Inferring Change Points in High-Dimensional Linear Regression via Approximate Message Passing

https://openreview.net/forum?id=1JgCpZS17T

Compressor summary: The paper proposes an algorithm to find changes in high-dimensional linear regression using a message passing method and shows its effectiveness on synthetic data and images.


INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer

https://openreview.net/forum?id=1IZLOPxtfK

Compressor summary: INViT is a new deep reinforcement learning architecture for solving routing problems that improves generalizability by using nested designs and invariant views, along with modified policy gradient and data augmentations.


Compression of Structured Data with Autoencoders: Provable Benefit of Nonlinearities and Depth

https://openreview.net/forum?id=1HDrfUahXv

Compressor summary: The paper studies how well shallow autoencoders capture sparse data structure, showing that gradient descent ignores sparsity unless the data is very sparse, and proposes ways to improve compression for sparse data using denoising and multi-layer decoding.


MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

https://openreview.net/forum?id=1Fs1LvjYQW

Compressor summary: The paper introduces MLAgentBench to test whether language model-driven agents can perform machine learning experimentation effectively and finds that Claude v3 Opus is the best performing agent.


Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

https://openreview.net/forum?id=1DyruVvVaQ

Compressor summary: The paper introduces a new algorithmic framework for online linear programming that allows first-order methods to achieve better regret than previous approaches.


Gradual Divergence for Seamless Adaptation: A Novel Domain Incremental Learning Method

https://openreview.net/forum?id=1AAlMSo7Js

Compressor summary: DARE is a novel domain incremental learning method that reduces representation drift and catastrophic forgetting by gradually adapting new task representations to previous tasks' feature space and integrating task boundaries.


Finite-Time Convergence and Sample Complexity of Actor-Critic Multi-Objective Reinforcement Learning

https://openreview.net/forum?id=18rzx2PXKm

Compressor summary: The paper introduces MOAC, an actor-critic algorithm for multi-objective reinforcement learning (MORL) that finds trade-offs among conflicting rewards and provides theoretical analysis of its convergence and sample complexity.


On the Nonlinearity of Layer Normalization

https://openreview.net/forum?id=18f6iPn0zq

Compressor summary: The paper studies layer normalization (LN), a common deep learning technique, focusing on its nonlinearity and representation capacity, and shows how to design neural architectures using LN to improve classification performance.


NeWRF: A Deep Learning Framework for Wireless Radiation Field Reconstruction and Channel Prediction

https://openreview.net/forum?id=181hXof7ho

Compressor summary: NeWRF is a deep-learning-based framework that uses Neural Radiance Fields to predict wireless channels, reducing the cost and effort of site surveys in wireless network deployments.


PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer

https://openreview.net/forum?id=17ZwoHl65h

Compressor summary: PlanDQ is a hierarchical planner for offline RL that combines a diffusion-based high-level planner with a Q-learning based low-level policy to handle sparse-reward and long-horizon tasks.


Tuning-free Estimation and Inference of Cumulative Distribution Function under Local Differential Privacy

https://openreview.net/forum?id=15MpDbv3IQ

Compressor summary: The algorithm estimates CDF values under LDP using a connection with the current status problem and tools for constrained isotonic estimation based on binary queries, achieving error bounds that improve with more grids.


A Dual-module Framework for Counterfactual Estimation over Time

https://openreview.net/forum?id=126SR50BEL

Compressor summary: The text introduces ACTIN, a novel framework for estimating counterfactuals that uses adversarial methods to balance representations and temporal integration to capture long-range dependencies and interactions, achieving state-of-the-art performance with simple base models.


Sparse is Enough in Fine-tuning Pre-trained Large Language Models

https://openreview.net/forum?id=10hu2D3hAg

Compressor summary: The paper proposes SIFT, a gradient-based sparse fine-tuning algorithm for pre-trained models, which tightens the generalization error bound by shifting the prior distribution and leverages oscillations in the loss landscape and quasi-sparsity in gradient distribution.


Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

https://openreview.net/forum?id=0zbxwvJqwf

Compressor summary: LatProtRL is a new method to optimize protein functions using a large language model and reinforcement learning, which could benefit various industries.


Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

https://openreview.net/forum?id=0xmfExPqFf

Compressor summary: The paper introduces a general framework for risk-sensitive distributional reinforcement learning with static risk measures and different function approximation methods, and presents two novel meta-algorithms with improved regret bounds.


Neural-Kernel Conditional Mean Embeddings

https://openreview.net/forum?id=0wso32h0jc

Compressor summary: The paper proposes a hybrid method that combines deep learning and kernel conditional mean embeddings to address scalability and expressiveness challenges in representing conditional distributions, and shows its effectiveness in density estimation and distributional reinforcement learning tasks.


Efficient Contextual Bandits with Uninformed Feedback Graphs

https://openreview.net/forum?id=0vozy8vstt

Compressor summary: The paper develops contextual algorithms for bandits with uninformed feedback graphs by reducing online regression over losses and graphs, and shows that using log loss for graph learning is crucial for good performance.


PEARL: Zero-shot Cross-task Preference Alignment and Robust Reward Learning for Robotic Manipulation

https://openreview.net/forum?id=0urN0PnNDj

Compressor summary: PEARL is a novel RL method that learns policies from cross-task preference transfer without human labels, using optimal transport to align trajectories and Gaussian distributions to model reward uncertainty.


DiJiang: Efficient Large Language Models through Compact Kernelization

https://openreview.net/forum?id=0uUHfhXdnH

Compressor summary: The paper introduces DiJiang, a method to linearize Transformers in the frequency domain using kernels based on Discrete Cosine Transform and weighted Quasi-Monte Carlo sampling, achieving comparable performance with reduced training costs and faster inference speeds.


Complexity Matters: Feature Learning in the Presence of Spurious Correlations

https://openreview.net/forum?id=0tuwdgBiSN

Compressor summary: The paper proposes a framework to study the impact of spurious features on neural network learning dynamics and reveals several phenomena about core and spurious feature learning, validating existing debiasing techniques and highlighting their limitations.


Log Neural Controlled Differential Equations: The Lie Brackets Make A Difference

https://openreview.net/forum?id=0tYrMtQyPT

Compressor summary: Log-NCDEs use a new method based on Log-ODEs to train neural differential equations that model real-world data better than existing approaches.


Adversarial Attacks on Combinatorial Multi-Armed Bandits

https://openreview.net/forum?id=0tPBk24xNj

Compressor summary: The paper studies reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB), and finds that the attackability of a CMAB instance depends on its properties and whether it is known or unknown to the adversary.


Graph External Attention Enhanced Transformer

https://openreview.net/forum?id=0rV7VIrcjX

Compressor summary: The paper introduces Graph External Attention, a novel attention mechanism that leverages external node/edge units to capture inter-graph correlations, and proposes the Graph External Attention Enhanced Transformer (GEAET), which improves graph representation learning by integrating local and global information.


Improving Neural Additive Models with Bayesian Principles

https://openreview.net/forum?id=0pSTzCnEmi

Compressor summary: The text introduces LA-NAMs, a Bayesian approach to neural additive models that provides uncertainty estimates, feature selection, and second-order interaction ranking for deep neural networks.


Position: Levels of AGI for Operationalizing Progress on the Path to AGI

https://openreview.net/forum?id=0ofzEysK2D

Compressor summary: The authors propose a framework to classify AGI models by their performance, generality, and autonomy levels, and discuss how it can help compare, assess, and measure progress in AGI research.


ED-Copilot: Reduce Emergency Department Wait Time with Language Model Diagnostic Assistance

https://openreview.net/forum?id=0ntak1BGBd

Compressor summary: ED-Copilot is an AI system that suggests laboratory tests and makes diagnoses faster and more accurately in emergency departments, potentially reducing crowding and improving patient outcomes.


FedSC: Provable Federated Self-supervised Learning with Spectral Contrastive Objective over Non-i.i.d. Data

https://openreview.net/forum?id=0nMzOmkBHC

Compressor summary: The paper proposes FedSC, a provable Federated Self-Supervised Learning algorithm that uses spectral contrastive objective, sharing correlation matrices, and deploying differential privacy protection to improve data representations and performance.


Exact Soft Analytical Side-Channel Attacks using Tractable Circuits

https://openreview.net/forum?id=0mklK4h0rX

Compressor summary: ExSASCA is a fast and accurate method to detect weaknesses in cryptographic algorithms using knowledge compilation and probabilistic circuits, improving on the existing SASCA attack by over 31%.


Provably Scalable Black-Box Variational Inference with Structured Variational Families

https://openreview.net/forum?id=0miAQ1qHiw

Compressor summary: The paper investigates structured variational families, a middle ground between mean-field and full-rank ones, that can improve the efficiency of black-box variational inference for hierarchical Bayesian models.


CLIPZyme: Reaction-Conditioned Virtual Screening of Enzymes

https://openreview.net/forum?id=0mYAK6Yhhm

Compressor summary: The authors propose a new computational method, CLIPZyme, to identify efficient catalysts among uncharacterized proteins by encoding and aligning enzyme structures and reactions.


Scaling Exponents Across Parameterizations and Optimizers

https://openreview.net/forum?id=0ksNeD1SJT

Compressor summary: Key points: - The paper proposes a new perspective on parameterization and derives new theoretical results - The paper conducts extensive empirical investigation with many combinations of optimizers, parameterizations, assumptions, learning rates, and model sizes - The paper finds that prior work's assumptions often exclude the best learning rate scaling prescription - The paper introduces Adam-atan2, a new scale-invariant version of Adam that eliminates the epsilon hyperparameter Summary: The paper presents a novel parameterization perspective, investigates many algorithmic and architectural details, reveals flaws in prior work's assumptions, and proposes Adam-atan2, a numerically stable optimizer.


SqueezeLLM: Dense-and-Sparse Quantization

https://openreview.net/forum?id=0jpbpFia8m

Compressor summary: SqueezeLLM is a post-training quantization framework that compresses large language models with ultra-low precision, improving performance and reducing memory requirements for inference.


Domain-wise Data Acquisition to Improve Performance under Distribution Shift

https://openreview.net/forum?id=0j28mmQ023

Compressor summary: The paper proposes a data acquisition framework that improves machine learning model performance on shifted data by refining the preparation of training data from various domains.


Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

https://openreview.net/forum?id=0iXp5P77ho

Compressor summary: Tripod is a neural network autoencoder with three complementary inductive biases that enable disentangled representation learning, achieving state-of-the-art results on image disentanglement benchmarks.


DAG-Based Column Generation for Adversarial Team Games

https://openreview.net/forum?id=0hbeZQm1Se

Compressor summary: DCG is a novel algorithm framework for solving sequential adversarial team games with asymmetric information by transforming coordinated best responses into TB-DAG form, which converges exponentially faster and is more scalable than CG algorithms.


Exploring the Low-Pass Filtering Behavior in Image Super-Resolution

https://openreview.net/forum?id=0f4u3Wg9zT

Compressor summary: The authors propose HyRA, a method to interpret deep neural networks' behavior in image super-resolution using signal processing theories, and introduce FSDS, a metric to measure high-frequency information injection.


KernelWarehouse: Rethinking the Design of Dynamic Convolution

https://openreview.net/forum?id=0e8SEDSpNT

Compressor summary: Key points: - Dynamic convolution learns a linear mixture of static kernels with input-dependent attentions, but is not parameter efficient - KernelWarehouse proposes a more general form of dynamic convolution that exploits convolutional parameter dependencies within and across layers - KernelWarehouse improves accuracy on various ConvNet architectures and Vision Transformers, and reduces model size in some cases Summary: KernelWarehouse is a novel dynamic convolution method that leverages convolutional parameter dependencies to achieve better performance and efficiency than normal convolution.


Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws

https://openreview.net/forum?id=0bmXrtTDUu

Compressor summary: The paper modifies the Chinchilla scaling laws to account for inference costs and suggests training smaller and longer LLMs with optimal token/parameter ratios for large inference demand.


The Entropy Enigma: Success and Failure of Entropy Minimization

https://openreview.net/forum?id=0bGsVoumFL

Compressor summary: The paper analyzes why entropy minimization (EM) works initially but fails eventually for classification model adaptation and proposes a method to estimate a model's accuracy without labels using EM.


Early Time Classification with Accumulated Accuracy Gap Control

https://openreview.net/forum?id=0b7txvPYlr

Compressor summary: The paper introduces a statistical framework to control the accuracy gap between full and early-time classification by applying a calibrated stopping rule to any sequential classifier.


TVE: Learning Meta-attribution for Transferable Vision Explainer

https://openreview.net/forum?id=0ZTuy5CrL7

Compressor summary: TVE is a tool that explains how various vision models work on different tasks without needing extra training data.


Towards Modular LLMs by Building and Reusing a Library of LoRAs

https://openreview.net/forum?id=0ZFWfeVsaD

Compressor summary: The paper proposes a method to build and reuse adapters for large language models on new tasks using model-based clustering and zero-shot routing, achieving better generalization than existing approaches.


On the Role of Edge Dependency in Graph Generative Models

https://openreview.net/forum?id=0XDO74NlOd

Compressor summary: The paper studies how different graph generative models balance representation power and diversity, introduces new models for each level of complexity, and evaluates them on real datasets.


On the Calibration of Human Pose Estimation

https://openreview.net/forum?id=0THUA66D8Z

Compressor summary: The paper proposes CCNet, a method to improve the calibration of confidence scores in 2D human pose estimation by learning network-specific adjustments.


SLOG: An Inductive Spectral Graph Neural Network Beyond Polynomial Filter

https://openreview.net/forum?id=0SrNCSklZx

Compressor summary: SLOG is a novel spectral graph neural network that overcomes the limitations of existing spectral GNNs by using a real-valued filter and combining subgraph sampling with signal processing for large-scale inductive node classification.


End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations

https://openreview.net/forum?id=0P3kaNluGj

Compressor summary: The paper proposes a neuro-symbolic framework that learns structured states and symbolic policies using a vision foundation model, and generates textual explanations with GPT-4 to improve interpretability of decision-making.


Self-Rewarding Language Models

https://openreview.net/forum?id=0NphYCmgua

Compressor summary: This paper proposes self-rewarding language models that improve their own training by using a large language model as a judge and shows that they outperform existing systems on a leaderboard.


Enhancing Size Generalization in Graph Neural Networks through Disentangled Representation Learning

https://openreview.net/forum?id=0NdU4y9dWC

Compressor summary: DISGEN improves the size generalization of graph neural networks by disentangling size factors from graph representations using augmentations and a decoupling loss.


Autonomous Sparse Mean-CVaR Portfolio Optimization

https://openreview.net/forum?id=0NacraIYrA

Compressor summary: The paper proposes an efficient method to approximate a hard financial risk management problem using indicator functions and algorithms that adjust the asset pool size.


Global Reinforcement Learning : Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods

https://openreview.net/forum?id=0M2tNui8jX

Compressor summary: Global Reinforcement Learning (GRL) introduces rewards defined globally over trajectories, capturing interactions among states that classic RL cannot model, and proposes a novel algorithmic scheme to solve GRL problems efficiently.


Purify Unlearnable Examples via Rate-Constrained Variational Autoencoders

https://openreview.net/forum?id=0LBNdbmQCM

Compressor summary: The text describes a novel pre-training purification method for defending against unlearnable examples in image classification using disentangled variational autoencoders and a two-stage purification approach.


See More Details: Efficient Image Super-Resolution by Experts Mining

https://openreview.net/forum?id=0JXGusc7E2

Compressor summary: SeemoRe is a novel image super-resolution model that efficiently combines experts at different levels to reconstruct high-resolution images from low-resolution inputs.


PointMC: Multi-instance Point Cloud Registration based on Maximal Cliques

https://openreview.net/forum?id=0JV5WpLQgv

Compressor summary: PointMC is a novel point cloud registration framework that uses maximal cliques and local spatial consistency to accurately estimate multiple rigid transformations for overlapping instances.


Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

https://openreview.net/forum?id=0IDaPnY5d5

Compressor summary: The paper proposes a new reinforcement learning method that uses auxiliary tasks with short delays to improve learning with long delays, achieving better sample efficiency and policy performance than existing methods.


OxyGenerator: Reconstructing Global Ocean Deoxygenation Over a Century with Deep Learning

https://openreview.net/forum?id=0HUInAsdoo

Compressor summary: Key points: - The paper proposes OxyGenerator, a deep learning model for global ocean deoxygenation reconstruction from 1920 to 2023. - The model uses zoning-varying graph message-passing and inductive bias to capture complex correlations and uncertainty in DO variations. - OxyGenerator outperforms CMIP6 numerical simulations by reducing MAPE by 38.77%. Summary: OxyGenerator is a deep learning model that reconstructs global ocean deoxygenation from 1920 to 2023 using zoning-varying graph message-passing and inductive bias, outperforming CMIP6 simulations.


Generalized Sobolev Transport for Probability Measures on a Graph

https://openreview.net/forum?id=0GC0NG6Orr

Compressor summary: GST is a fast and flexible optimal transport method for graph metric spaces, improving on Sobolev transport and Orlicz-Wasserstein.


Large Scale Dataset Distillation with Domain Shift

https://openreview.net/forum?id=0FWPKHMCSc

Compressor summary: The paper introduces Dataset Distillation with Domain Shift (D3S), a scalable method to summarize large datasets by reframing the problem as a domain shift issue.


Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

https://openreview.net/forum?id=0AZAjkXhit

Compressor summary: The authors propose a simple and effective method of selecting long instructions as the basis for fine-tuning LLMs, which outperforms state-of-the-art methods and achieves competitive results on various benchmarks.


Optimal Transport for Structure Learning Under Missing Data

https://openreview.net/forum?id=09Robz3Ppy

Compressor summary: The paper proposes a new score-based algorithm for learning causal structures from missing data using optimal transport, which outperforms existing methods in simulations and real data settings.


ProtoGate: Prototype-based Neural Networks with Global-to-local Feature Selection for Tabular Biomedical Data

https://openreview.net/forum?id=07fSWltF6M

Compressor summary: ProtoGate is a neural model that selects features on high-dimensional low-sample-size data by balancing global and local approaches and using prototype-based predictions to avoid co-adaptation problem.


Regularized Q-learning through Robust Averaging

https://openreview.net/forum?id=07f24ya6eX

Compressor summary: The paper introduces 2RA Q-learning, a new method to control estimation bias in Q-learning, and shows its improved performance over existing methods.


PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs

https://openreview.net/forum?id=051jaf8MQy

Compressor summary: PIVOT is a visual prompting method that helps vision language models perform tasks like robotic control by iteratively refining proposals based on image annotations.


SSL4Q: Semi-Supervised Learning of Quantum Data with Application to Quantum State Classification

https://openreview.net/forum?id=04Fx1u2BUD

Compressor summary: SSL4Q is a new semi-supervised learning method for quantum state classification that works well with limited labeled data and requires fewer resources than traditional methods.


One Meta-tuned Transformer is What You Need for Few-shot Learning

https://openreview.net/forum?id=01ahsMovBx

Compressor summary: MetaFormer is a new framework that uses self-attention to enhance pre-trained vision transformers for few-shot image classification by embedding sample relationships and consolidating task patterns.


Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials

https://openreview.net/forum?id=01M0N8VgfB

Compressor summary: The paper proposes a method to improve server-side training simulations for federated learning by partitioning centralized data based on the statistical heterogeneity of the true clients.