This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-16 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.08313v1
Compressor summary: The authors create a benchmark to test large language models' understanding of symbolic graphics programs, which generate visual content, and find that existing models struggle with this task but finetuning improves their performance.
http://arxiv.org/abs/2408.08310v1
Compressor summary: ScalingFilter is a new method to evaluate text quality for pre-training large language models by comparing perplexities of two models on the same data, without relying on a reference dataset, and it improves zero-shot performance while preserving semantic diversity.
http://arxiv.org/abs/2408.08307v1
Compressor summary: The paper investigates how local geometric properties of generative models' manifolds relate to generation quality and proposes using reward models trained on these properties to control generation outcomes.
http://arxiv.org/abs/2408.08302v1
Compressor summary: The paper evaluates large language models' abilities to solve transportation engineering problems using the new benchmark dataset TransportBench, revealing each model's strengths and limitations.
http://arxiv.org/abs/2408.08295v1
Compressor summary: SLCA++ is a framework that improves continual learning with pre-training by slowing down the learning rate and aligning classification layers, achieving state-of-the-art results on image classification benchmarks.
http://arxiv.org/abs/2408.08291v1
Compressor summary: ShareLM is a collection of human-model conversations that can be shared using a Web extension plugin to improve open source language models.
http://arxiv.org/abs/2408.08286v1
Compressor summary: The paper shows that the training dynamics of two-layer neural networks are non-integrable, meaning they are complex and hard to predict, which implies the need for numerical methods in optimizing neural networks.
http://arxiv.org/abs/2408.08274v1
Compressor summary: BAM is a method for improving large language models by using experts' attention parameters as well as their feed-forward networks to initialize Mixture of Experts (MoE) layers, achieving better performance than existing methods.
http://arxiv.org/abs/2408.08270v1
Compressor summary: HeightLane is a novel method for 3D lane detection from monocular images that uses a height map to improve spatial accuracy and recognition.
http://arxiv.org/abs/2408.08261v1
Compressor summary: mhGPT is a small but powerful AI model that helps with mental health tasks using social media and PubMed data, and it works well even on low-end devices.
http://arxiv.org/abs/2408.08260v1
Compressor summary: GSVD-NMF is a method to improve non-negative matrix factorization by recovering missing components using generalized singular value decomposition.
http://arxiv.org/abs/2408.08258v1
Compressor summary: Snuffy is a sparse transformer-based MIL-pooling method for WSI classification in digital pathology that requires limited pre-training and achieves superior performance on two datasets.
http://arxiv.org/abs/2408.08252v1
Compressor summary: Our method improves naturalness while optimizing downstream rewards in diffusion models without differentiable proxies or fine-tuning, using soft value functions and iterative sampling.
http://arxiv.org/abs/2408.08250v1
Compressor summary: The paper reviews model compression techniques for computer vision tasks to enable the use of large deep neural networks in embedded systems, discusses different approaches and their performance variations on various devices, and provides codes and case studies.
http://arxiv.org/abs/2408.08248v1
Compressor summary: The paper proposes using conformal prediction to generate answer sets with probabilistic guarantees for link prediction tasks in knowledge graph embeddings, addressing the issue of distinguishing plausible from implausible answers.
http://arxiv.org/abs/2408.08230v1
Compressor summary: Temporal Reward Decomposition (TRD) is a method to predict and explain the future rewards of reinforcement learning agents, such as their expected reward timing, value, confidence, and input feature importance.
http://arxiv.org/abs/2408.08226v1
Compressor summary: The paper studies predictive multiplicity in link prediction for knowledge graphs and proposes using voting methods to reduce conflicting predictions.
http://arxiv.org/abs/2408.08222v1
Compressor summary: The paper proposes a bilevel optimization framework called LETS to learn the perturbation radius for sharpness-aware minimization algorithms, which improves model generalization by searching for flat minima in the loss landscape.
http://arxiv.org/abs/2408.08217v1
Compressor summary: This study proposes a systems design approach to improve classification performance using large language models as imperfect data annotators in various industries.
http://arxiv.org/abs/2408.08216v1
Compressor summary: The study proposes using Kolmogorov-Arnold Network (KAN) instead of Multi-layer Perceptron (MLP) to improve image-to-image translation in generative AI.
http://arxiv.org/abs/2408.08215v1
Compressor summary: TinyML is explored as a solution for healthcare support in low connectivity areas by using low-spec devices to classify skin diseases from images without internet access.
http://arxiv.org/abs/2408.08212v1
Compressor summary: The study investigates how implicit language affects bias amplification in large language models and finds that biased models generate more cautious responses when aligned with conflicting opinions but are less reliable on socially nuanced topics.
http://arxiv.org/abs/2408.08210v1
Compressor summary: The paper proposes a framework to assess how well large language models can reason using probability of necessity and sufficiency concepts, and demonstrates it with math examples.
http://arxiv.org/abs/2408.08206v1
Compressor summary: The proposed method fuses volumetric rendering with 3D Gaussian Splatting to effectively reconstruct underwater scenes, outperforming state-of-the-art NeRF-based methods in quality and efficiency.
http://arxiv.org/abs/2408.08201v1
Compressor summary: HeLlO is a framework that generates synthetic labels from images, reducing the storage and data requirements for dataset distillation while maintaining performance.
http://arxiv.org/abs/2408.08192v1
Compressor summary: The paper proposes a novel online learning method called SemiSGD for large-population multi-agent systems, which combines value function and population distribution into one parameter, and shows its theoretical advantages over traditional fixed-point iteration methods.
http://arxiv.org/abs/2408.08191v1
Compressor summary: The paper proposes a learning-based method to generate infrared small target labels using a single-point annotation paradigm that improves detection and reduces false alarms.
http://arxiv.org/abs/2408.08189v1
Compressor summary: FancyVideo is a text-to-video generator that improves temporal coherence in video synthesis using a Cross-frame Textual Guidance Module with three components.
http://arxiv.org/abs/2408.08184v1
Compressor summary: The paper proposes a method to measure text-to-image model's originality based on the number of tokens needed to reconstruct an image, inspired by legal definitions of originality.
http://arxiv.org/abs/2408.08179v1
Compressor summary: The text proposes a blind modulation detection method for OFDM-based technologies using a ResNet network that can handle realistic environmental imperfections without prior knowledge of the transmitted signal.
http://arxiv.org/abs/2408.08172v1
Compressor summary: Key points: - The authors propose a new way of doing image classification using a database instead of a neural network - Their approach has three main advantages: flexibility, unlearning, and interpretability - They argue that this method could improve how knowledge is represented in deep vision models Summary: The authors introduce a flexible and interpretable image classification method that uses a database to store and search for embeddings, allowing them to add, remove, and intervene on data.
http://arxiv.org/abs/2408.08152v1
Compressor summary: DeepSeek-Prover-V1.5 is an open-source language model that improves theorem proving in Lean 4 by optimizing training, inference, and exploration strategies, achieving state-of-the-art results on miniF2F and ProofNet benchmarks.
http://arxiv.org/abs/2408.08150v1
Compressor summary: Answer set programming techniques are applied to solve the arcade game snake, with five implementations compared and visualized using clingraph.
http://arxiv.org/abs/2408.08149v1
Compressor summary: VaT is an unsupervised learning method that bridges restoration and high-level vision networks without retraining them, enhancing image quality and performance on degraded environments.
http://arxiv.org/abs/2408.08146v1
Compressor summary: KOALA improves speculative decoding by optimizing the draft head with multi-layer architecture and adversarial learning, achieving significant latency reduction.
http://arxiv.org/abs/2408.08145v1
Compressor summary: The paper proposes an automatic way to generate PDDL descriptions for planning from integrated system and product models using MBSE.
http://arxiv.org/abs/2408.08144v1
Compressor summary: The paper introduces MIDAS, a novel approach using multi-level knowledge distillation to improve multi-turn Natural Language Understanding (NLU) in conversations.
http://arxiv.org/abs/2408.08142v1
Compressor summary: The study evaluates a custom data preprocessing pipeline that improves the accuracy of machine learning models predicting COVID-19 mortality using OWID data.
http://arxiv.org/abs/2408.08137v1
Compressor summary: The text discusses the limitations of AOPC for evaluating feature attribution faithfulness in deep neural networks, proposing Normalized AOPC (NAOPC) as a more reliable and interpretable alternative.
http://arxiv.org/abs/2408.08134v1
Compressor summary: Key points: - CorrAdaptor is a new architecture for pixel-level correspondences in computer vision and robotics - It uses two branches to learn local contexts: explicit (KNN) and implicit (learnable matrix) - It also has a motion injection module to improve robustness and adaptability - It outperforms previous methods on various tasks Summary: CorrAdaptor is a novel architecture that learns pixel-level correspondences using two branches of local context learning and a motion injection module, achieving state-of-the-art results.
http://arxiv.org/abs/2408.08133v1
Compressor summary: Key points: - Neural probabilistic logic systems combine neural networks and probabilistic logic - Optimize a sampling based objective instead of exact likelihood - Error vanishes with more samples and sample diversity - EXAL method explains, agrees, and learns from explanations - Scales up to larger problems and outperforms previous methods Summary: The paper proposes a novel neural probabilistic logic system that uses sampling based optimization and a new method called EXAL that can explain, agree, and learn from explanations, achieving scalability and performance on complex problems.
http://arxiv.org/abs/2408.08125v1
Compressor summary: CPRFL is a novel approach for long-tailed multi-label image classification that leverages semantic correlations between categories and refines prompts with contextual visual information to improve recognition performance.
http://arxiv.org/abs/2408.08119v1
Compressor summary: Neural networks trained on inverse problems can achieve higher accuracy than classical optimizers even on their training data, challenging the assumption that faster inference sacrifices solution quality.
http://arxiv.org/abs/2408.08109v1
Compressor summary: The study proposes using voice analysis as a non-invasive way to monitor blood glucose levels in people with diabetes, potentially improving their quality of life.
http://arxiv.org/abs/2408.08108v1
Compressor summary: The paper proposes a novel method called PartFormer to learn unsupervised part-specific attention from paired images using geometric and semantic constraints, improving downstream tasks and part discovery performance.
http://arxiv.org/abs/2408.08106v1
Compressor summary: The paper introduces an extension of the uncertainty-penalized Bayesian information criterion (UBIC) for efficiently discovering parametric partial differential equations (PDEs) in noisy situations, using data transformation based on power spectral densities and confidence intervals.
http://arxiv.org/abs/2408.08105v1
Compressor summary: The paper introduces MuCR, a new benchmark to test VLLMs' ability to infer cause-and-effect relationships from visual cues using image synthesis and tailored metrics.
http://arxiv.org/abs/2408.08093v1
Compressor summary: The paper introduces a new cross-modality approach to video compression using multimodal language models that separates videos into spatial content and motion components, optimizing quality for specific decoding requirements.
http://arxiv.org/abs/2408.08092v1
Compressor summary: OC3D is a weakly supervised method for LiDAR-based 3D object detection that uses coarse clicks and achieves state-of-the-art performance with minimal annotation cost.
http://arxiv.org/abs/2408.08091v1
Compressor summary: HAIR is a plug-in-and-play method that generates parameters for image restoration models based on the contents of input images, improving their performance on various tasks.
http://arxiv.org/abs/2408.08089v1
Compressor summary: The paper introduces AgentCourt, a simulation system that uses large language models to train lawyer agents in legal skills through adversarial evolutionary processes and courtroom simulations.
http://arxiv.org/abs/2408.08087v1
Compressor summary: ColorMamba is a novel model that uses Mamba, improved padding tokens, local convolutional enhancement, agent attention, and HSV color guidance to achieve better spectral translation in the visible spectrum.
http://arxiv.org/abs/2408.08086v1
Compressor summary: The paper presents a method to reconstruct 3D scenes with interacting objects and people from a single image, reducing mesh collisions and improving performance.
http://arxiv.org/abs/2408.08084v1
Compressor summary: The paper proposes a new method to overcome catastrophic forgetting in class-incremental learning using pre-trained models, replay, and simple gradient constraints.
http://arxiv.org/abs/2408.08078v1
Compressor summary: The paper proposes a novel framework, CTMA, that incorporates motion cues and spatial features for remote sensing change detection using bi-temporal images.
http://arxiv.org/abs/2408.08075v1
Compressor summary: The paper studies how to improve the efficiency of learning Nash equilibria in Markov Potential Games using policy gradient methods.
http://arxiv.org/abs/2408.08073v1
Compressor summary: The paper explores various methods to improve sentence embeddings for natural language processing tasks, achieving significant improvements for static token-based models and comparable results for BERT-derived representations.
http://arxiv.org/abs/2408.08072v1
Compressor summary: I-SHEEP is a self-alignment method for large language models that continuously improves their performance without human intervention.
http://arxiv.org/abs/2408.08071v1
Compressor summary: Reservoir Computing models can approximate dynamic filters with fading memory, and Simple Cycle Reservoirs, a specialized class with constrained architecture, are universally applicable and suitable for low-complexity hardware implementations.
http://arxiv.org/abs/2408.08070v1
Compressor summary: MambaMIM is a generative self-supervised learning method for selective state space models that improves long-range dependency representation and can be used for pre-training medical image tasks.
http://arxiv.org/abs/2408.08067v1
Compressor summary: RAGChecker is a framework to evaluate Retrieval-Augmented Generation (RAG) systems by measuring their retrieval and generation modules, which reveals patterns and trade-offs in RAG design choices and can help improve them.
http://arxiv.org/abs/2408.08059v1
Compressor summary: The paper proposes a new method to generate reward machines from multiple plans, which leads to higher rewards for learning agents compared to methods based on a single plan.
http://arxiv.org/abs/2408.08058v1
Compressor summary: The study compares different foundation models' performance in few-shot and zero-shot learning for medical image analysis tasks, finding that BiomedCLIP works best with very small training sets while CLIP models perform better with slightly more samples.
http://arxiv.org/abs/2408.08056v1
Compressor summary: DATTA is a new method for test-time adaptation that adapts batch normalization and fine-tuning strategies based on the diversity score of input data, improving accuracy and Quality of Experience.
http://arxiv.org/abs/2408.08055v1
Compressor summary: The paper proposes a new model for event sequences that considers actor dynamics governed by a Gaussian Process and incorporates uncertainty estimation and negative feedback for improved performance.
http://arxiv.org/abs/2408.08054v1
Compressor summary: Text2BIM is a framework that uses LLMs and multi-agent reasoning to generate high-quality 3D building models from natural language instructions, enhancing the design process in the AEC industry.
http://arxiv.org/abs/2408.08050v1
Compressor summary: CamoTeacher is a novel semi-supervised object detection method that uses dual-rotation consistency learning to reduce noise and achieve state-of-the-art results.
http://arxiv.org/abs/2408.08047v1
Compressor summary: The paper proposes a continuous control framework for sequential recommendation using reinforcement learning that improves efficiency and long-term user engagement.
http://arxiv.org/abs/2408.08041v1
Compressor summary: Unsupervised learning models may produce accurate predictions but with hidden biases that can lead to Clever Hans effects; using Explainable AI techniques, researchers found widespread CH effects and suggest ways to improve model robustness.
http://arxiv.org/abs/2408.08035v1
Compressor summary: The text proposes a novel three-stream hybrid model that combines pixel and skeleton-based features to recognize hand gestures, addressing challenges like dataset limitations and varying lighting conditions.