This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-22 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.14509v1
Compressor summary: The proposed method explains how an image classifier works by permuting interpretable concepts like captions across images and measuring performance changes.
http://arxiv.org/abs/2407.14507v1
Compressor summary: This paper introduces Internal Consistency and Self-Feedback frameworks to improve large language models' reasoning and reduce hallucinations by evaluating and updating themselves.
http://arxiv.org/abs/2407.14505v1
Compressor summary: The paper introduces T2V-CompBench, a benchmark for evaluating the ability of text-to-video generation models to compose different objects, attributes, actions, and motions into videos, covering various aspects of compositionality and proposing evaluation metrics.
http://arxiv.org/abs/2407.14506v1
Compressor summary: The paper proposes a method to improve large language models' ability to understand and extract numeric values from scientific charts by incorporating data value alignment pre-training, random image replacement, and question-based fine-tuning.
http://arxiv.org/abs/2407.14504v1
Compressor summary: The Nonlinear Schr"odinger Network combines physics with AI by treating the Nonlinear Schr"odinger Equation as a trainable model for learning complex patterns, offering interpretability, efficiency, and accuracy in time series tasks.
http://arxiv.org/abs/2407.14503v1
Compressor summary: RLHF rewards may have light- or heavy-tailed error, and while light-tailed errors are mitigated by less restrictive KL penalties, heavy-tailed errors can lead to catastrophic Goodhart even with KL regularization.
http://arxiv.org/abs/2407.14502v1
Compressor summary: The Multi-Motion Discrete Diffusion Models (M2D2M) is a novel method to generate human motions from text descriptions using discrete diffusion models that adapt transition probabilities based on motion proximity and use a two-phase sampling strategy for smooth and coherent results.
http://arxiv.org/abs/2407.14501v1
Compressor summary: The text describes a dataset of air quality measurements from 30 diverse indoor sites in India, which can help researchers understand and address the problem of indoor air pollution in developing countries.
http://arxiv.org/abs/2407.14499v1
Compressor summary: The Discover-then-Name-CBM (DN-CBM) method uses sparse autoencoders to find, name, and classify concepts learned by deep neural networks, improving their interpretability.
http://arxiv.org/abs/2407.14498v1
Compressor summary: The paper proposes a YOLO-based framework for detecting multiple layout hotspots using PCA-augmented layout images, achieving high precision and recall rates with low false alarms.
http://arxiv.org/abs/2407.14495v1
Compressor summary: CTI is a new method for producing prediction sets with guaranteed coverage by estimating the conditional interquantile intervals and thresholding them based on their length.
http://arxiv.org/abs/2407.14494v1
Compressor summary: InterpBench is a tool for evaluating mechanistic interpretability methods using semi-synthetic transformers with known circuits trained by Strict IIT, which aligns internal computation with a high-level causal model.
http://arxiv.org/abs/2407.14491v1
Compressor summary: PD-TPE is a model that uses a double-branch decoder to focus on relevant tokens in 3D visual grounding tasks, outperforming the state-of-the-art on two datasets.
http://arxiv.org/abs/2407.14487v1
Compressor summary: The paper studies how well language models can explain their own answers and suggests using counterfactual explanations as a better way to understand their reasoning.
http://arxiv.org/abs/2407.14482v1
Compressor summary: ChatQA 2 is a model that combines long-context understanding and retrieval-augmented generation, outperforming GPT-4 on some tasks and achieving comparable results on others.
http://arxiv.org/abs/2407.14478v1
Compressor summary: The paper reviews vision-based motion measurement methods, discusses their limitations, and proposes a new method using Gaussian kernels for improved accuracy and robustness.
http://arxiv.org/abs/2407.14477v1
Compressor summary: The text proposes enriching preference datasets with machine-generated rationales to improve reinforcement learning from human feedback, and shows that this approach enhances learning efficiency, data efficiency, convergence speed, and reduces some biases.
http://arxiv.org/abs/2407.14474v1
Compressor summary: CoFE is a novel framework for radiology report generation that uses counterfactual explanations to learn non-spurious visual representations, leading to better report quality and performance.
http://arxiv.org/abs/2407.14467v1
Compressor summary: Check-Eval is a novel evaluation framework that uses LLMs to assess text quality through a checklist-based approach, which improves on existing metrics and correlates better with human judgments.
http://arxiv.org/abs/2407.14464v1
Compressor summary: The paper proposes two new 3D attention blocks for efficient processing of large 3D medical images, and applies them to 3D lung nodule detection using pulmonary CT scans.
http://arxiv.org/abs/2407.14463v1
Compressor summary: The paper proposes SurvReLU, a deep ReLU network that combines the interpretability of tree-based survival models with the performance of neural networks.
http://arxiv.org/abs/2407.14459v1
Compressor summary: PolyFormer uses PolyAttn to learn node-wise filters efficiently and effectively for graph representation learning on large-scale graphs.
http://arxiv.org/abs/2407.14439v1
Compressor summary: The text proposes a method to compress document images for more efficient and adaptive document understanding in multimodal language models, by assessing token repetitiveness and selecting informative tokens based on their correlation with the [CLS] token.
http://arxiv.org/abs/2407.14435v1
Compressor summary: JumpReLU SAEs are a new type of sparse autoencoder that achieves better reconstruction fidelity and interpretability than other methods by using a discontinuous activation function and straight-through estimators.
http://arxiv.org/abs/2407.14430v1
Compressor summary: Implicit deep learning models outperform traditional ones in extrapolating unobserved data across various scenarios due to their adaptability and feedback incorporation.
http://arxiv.org/abs/2407.14419v1
Compressor summary: HOTS3D is a new method that uses spherical optimal transport to align text and image features for better 3D shape generation based on CLIP embeddings.
http://arxiv.org/abs/2407.14414v1
Compressor summary: The System-1.x Planner is a controllable planning framework that combines system 1 and system 2 modes of language models to generate hybrid plans for long-horizon problems based on difficulty.
http://arxiv.org/abs/2407.14412v1
Compressor summary: DEAL improves VLMs' explanations for fine-grained concepts by making them more distinct and localized, reducing spurious correlations and boosting accuracy.
http://arxiv.org/abs/2407.14402v1
Compressor summary: This paper explores using large language models to help computers manage themselves like biological organisms in changing environments, and shows promising results with a microservice demo project.
http://arxiv.org/abs/2407.14387v1
Compressor summary: GLAudio is a novel architecture that learns from audio data using graph learning, information propagation, and processing in two separate steps.
http://arxiv.org/abs/2407.14386v1
Compressor summary: The text summarizes two recent papers on applying deep learning to image processing and recommendation systems, discussing their techniques and future research directions.
http://arxiv.org/abs/2407.14381v1
Compressor summary: The paper studies how using class-balanced loss functions can improve Gradient Boosting Decision Trees' performance on imbalanced datasets and introduces a Python package for this purpose.
http://arxiv.org/abs/2407.14371v1
Compressor summary: The OAK dataset is a large-scale resource of over 500 million tokens generated by multiple state-of-the-art LLMs to provide diverse and high-quality text across various domains for training chat-based AI systems.
http://arxiv.org/abs/2407.14367v1
Compressor summary: The paper introduces a new dataset, FairFD, for studying racial bias in forgery detection models and proposes novel fairness metrics to evaluate them.
http://arxiv.org/abs/2407.14357v1
Compressor summary: The paper introduces a new representation method (evolutionary s-rep) for anatomic objects that captures locational correspondence and produces alignment-free geometric features using ellipsoid deformation and skeletal representation, and shows improved classification performance on hippocampi shape.
http://arxiv.org/abs/2407.14352v1
Compressor summary: Key points: - Vision-based system to detect power lines and pylons for low-flying aircraft safety - Deep learning approach with transfer learning and curvilinear structure delineation - Single network for both detection tasks, tested on two datasets, integrated in onboard system Summary: The authors develop a deep learning vision system that uses one network to detect power lines and pylons from far away, enhancing the safety of low-flying aircraft.
http://arxiv.org/abs/2407.14344v1
Compressor summary: The study explores if GPT-4 can classify news source political bias based on URLs and finds a high correlation with MBFC ratings but also some limitations.
http://arxiv.org/abs/2407.14342v1
Compressor summary: The paper presents a method to measure how useful it is to share information from similar structures for maintaining and operating aircraft models.
http://arxiv.org/abs/2407.14330v1
Compressor summary: SLS is a layer-wise pruning method for transfer learning that uses clustering metrics to reduce model redundancy and storage overhead, improving model throughput and accuracy.
http://arxiv.org/abs/2407.14328v1
Compressor summary: The study presents a novel method to improve early detection of autism spectrum disorder in children by analyzing code-switched speech using advanced audio processing techniques and a hierarchical feature fusion strategy.
http://arxiv.org/abs/2407.14321v1
Compressor summary: The paper explores how large language models can help detect misinformation by incorporating evidence retrieval into the process, using both text and images as input for fact-checking.
http://arxiv.org/abs/2407.14320v1
Compressor summary: Early exits help reduce computation in deep neural networks by allowing them to stop earlier for high-confidence inputs; this paper proposes a new training approach and categorizes early-exit strategies for efficiency and performance analysis.
http://arxiv.org/abs/2407.14309v1
Compressor summary: The authors introduce GuidingQ, a dataset of in-text questions from textbooks and scientific articles, analyze its linguistic characteristics, and explore various methods to generate similar questions using language models, finding that generated questions are almost as effective as human-written ones in improving reading comprehension.
http://arxiv.org/abs/2407.14302v1
Compressor summary: The paper proposes Dynamic Adapter (Dyn-Adapter), an efficient visual recognition method that improves parameter-efficient transfer learning by disentangling features and reducing computation cost, while maintaining or enhancing recognition accuracy.
http://arxiv.org/abs/2407.14303v1
Compressor summary: Key points: - The paper proposes Spatio-Temporal Monge Alignment (STMA), an Optimal Transport method for Domain Adaptation in machine learning applications on signals. - STMA adapts the cross-PSD of multivariate signals by mapping them to the Wasserstein barycenter of source domains, without retraining the model with source data. - The paper also studies two special cases of STMA, TMA and SMA, and provides non-asymptotic concentration bounds for the mappings estimation. - The paper shows that STMA leads to significant performance gains on multivariate biosignals and image data, and is complementary to deep learning methods. Summary: The paper introduces STMA, an Optimal Transport method for adapting signals across domains without retraining the model, and shows its effectiveness on biosignals and images.
http://arxiv.org/abs/2407.14292v1
Compressor summary: AFENet is a novel end-to-end network for single image deraining that adaptively enhances images across various frequencies, improving the visibility of images damaged by rain and reconstructing details accurately.
http://arxiv.org/abs/2407.14280v1
Compressor summary: The authors explore how to manipulate latent spaces for concept blending using diffusion models and find that it is possible but depends on the context.
http://arxiv.org/abs/2407.14279v1
Compressor summary: The paper proposes a method to build scalable, instance-level 3D scene representations from 2D models, improving open world understanding of complex queries and outperforming existing methods.
http://arxiv.org/abs/2407.14277v1
Compressor summary: PIMPNet is a neural network that uses 3D brain images and age to classify Alzheimer's Disease from structural Magnetic Resonance Imaging.
http://arxiv.org/abs/2407.14259v1
Compressor summary: The text proposes a framework for training language models that captures minority perspectives by clustering similar opinions from annotators without using their metadata.
http://arxiv.org/abs/2407.14257v1
Compressor summary: SparseCraft is a fast and accurate method to reconstruct 3D shapes and appearance from few images using an implicit neural representation trained with ray marching and multi-view stereo cues.
http://arxiv.org/abs/2407.14249v1
Compressor summary: The text discusses the challenges and solutions for continual learning in multi-label scenarios, proposing a new method called Selective Class Attention Distillation that outperforms existing methods.
http://arxiv.org/abs/2407.14246v1
Compressor summary: Key points: - Unipa-GPT is a chatbot for helping students choose degrees at University of Palermo - It uses gpt-3.5-turbo, a Large Language Model - It employs Retrieval Augmented Generation and fine-tuning techniques - The paper compares RAG and fine-tuned systems, and discusses their performance - It also compares with other Large Language Models and shows results from SHARPER night Summary: The paper presents Unipa-GPT, a chatbot that helps students choose degrees using gpt-3.5-turbo and different techniques, and evaluates its performance and advantages over other models.
http://arxiv.org/abs/2407.14245v1
Compressor summary: Automatic Training Trajectories (ATT) is a new approach that adapts trajectory length to avoid mismatches and improve generalization for dataset distillation.
http://arxiv.org/abs/2407.14231v1
Compressor summary: The paper evaluates test-time adaptation methods using surrogate-based hyperparameter selection and shows that some state-of-the-art methods perform worse in realistic scenarios, highlighting the need for more rigorous benchmarking and supervision.
http://arxiv.org/abs/2407.14230v1
Compressor summary: The text describes a new computer-aided glaucoma diagnosis framework, ETSCL, that uses contrastive learning and evidence theory to improve feature extraction and prediction with uncertainty estimation.
http://arxiv.org/abs/2407.14224v1
Compressor summary: Key points: - The paper proposes a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure - The dataset covers 2,002 common words in ISL recorded by 20 deaf adult signers - The model, HWGAT, captures distinctive motions by giving attention to different body parts induced by the human skeleton graph structure - The model outperforms existing state-of-the-art skeleton-based models on various datasets Summary: The paper introduces a new ISL dataset and a skeleton graph-based SL recognition model, HWGAT, that captures distinctive motions and achieves superior performance compared to existing models.
http://arxiv.org/abs/2407.14214v1
Compressor summary: The CDA forecaster uses causality analysis and answer-based attention to improve industrial time-series forecasting with limited data, enabling better decision-making in production processes.
http://arxiv.org/abs/2407.14211v1
Compressor summary: Key points: - The text is about predicting mortality risk of ischemic stroke patients in ICU using deep learning and machine learning models - The data was obtained from MIMIC-IV database and preprocessed with SMOTE and feature selection - The proposed model (XGB-DL) achieved higher specificity and AUROC than other baseline models Summary: The authors developed a deep learning model (XGB-DL) that improved mortality prediction of ischemic stroke patients in ICU using data from MIMIC-IV database and advanced feature selection techniques.
http://arxiv.org/abs/2407.14210v1
Compressor summary: Fair-ONB is an undersampling method that uses data morphology to reduce discrimination in machine learning models by selecting areas for undersampling where different groups overlap.
http://arxiv.org/abs/2407.14209v1
Compressor summary: The paper proposes a low-cost method to unlearn specific concepts from text-to-video diffusion models by transferring the unlearning capability of text encoders from text-to-image diffusion models, enabling the removal of copyrighted content and unsafe videos.
http://arxiv.org/abs/2407.14208v1
Compressor summary: The paper proposes a memory-efficient method for adapting models to new classes and domains without access to the source data or large memory, using Gaussian mixture models and contrastive losses.
http://arxiv.org/abs/2407.14207v1
Compressor summary: The paper proposes a new deep state-space model architecture for sequence modeling based on online learning objectives, which achieves better performance than existing models.
http://arxiv.org/abs/2407.14206v1
Compressor summary: Smoothing attacks can remove watermarks from large language models' text without affecting its quality, exposing a weakness in watermarking methods.
http://arxiv.org/abs/2407.14204v1
Compressor summary: Bucketed Ranking-based Losses reduce the time complexity of ranking-based losses in object detection by grouping negative predictions into buckets, achieving similar accuracy with faster training.
http://arxiv.org/abs/2407.14198v1
Compressor summary: PDCNet is a neural network that combines convolutional and self-attention mechanisms to improve 3D measurement accuracy using structured light, reducing fringe order ambiguity and enhancing performance at object boundaries.
http://arxiv.org/abs/2407.14197v1
Compressor summary: The paper introduces Graph-based GS Compression (GGSC), a simple and effective method for compressing graph signal data, and a corresponding dataset (GSQA) to analyze the distortion characteristics of different GS operations.
http://arxiv.org/abs/2407.14192v1
Compressor summary: The text introduces LeKUBE, a benchmark to evaluate knowledge update methods for legal language models considering the specific challenges of the legal domain.
http://arxiv.org/abs/2407.14191v1
Compressor summary: The authors propose a new method called normative diffusion autoencoder that combines advantages of normative modeling and diffusion models to improve survival prediction in ALS using MRI data.
http://arxiv.org/abs/2407.14185v1
Compressor summary: The study compares metrics for hyperparameter tuning in neural network models for drug discovery and proposes a Bayesian method to improve calibration and quantify uncertainty in predictions.
http://arxiv.org/abs/2407.14180v1
Compressor summary: The paper presents a method to analyze gender biases in French TV and radio news transcriptions using a large language model and a smaller specialized model.
http://arxiv.org/abs/2407.14177v1
Compressor summary: The paper presents a new efficient multi-modal language model using cross-attention, hierarchical ViT features, and MoE to improve visual perception and reduce computational costs.
http://arxiv.org/abs/2407.14170v1
Compressor summary: The paper proposes a new algorithm, Forbes, that obscures faces from humans while preserving identity and attributes for machines using multiple transformations with optimized parameters.
http://arxiv.org/abs/2407.14166v1
Compressor summary: The paper presents a novel method for solving dimension-reducing linear mappings using maximum entropy that unifies existing methods and applies to new constraints like [0, 1].
http://arxiv.org/abs/2407.14151v1
Compressor summary: The study compares three Deep Reinforcement Learning models in BreakOut Atari, testing their learning efficiency, strategy development, and adaptability.
http://arxiv.org/abs/2407.14143v1
Compressor summary: The paper proposes RAPF, a method to improve class-incremental learning by adjusting representations and fusing parameters during adaptation to new classes.
http://arxiv.org/abs/2407.14138v1
Compressor summary: SceneVTG is a visual text generator that uses a large language model to recommend reasonable text regions and contents, and a conditional diffusion model to produce high-quality text images in the wild.
http://arxiv.org/abs/2407.14133v1
Compressor summary: The paper introduces a new model called ZeroVLM that improves visual spatial reasoning in VLMs using 3D reconstruction and prompting mechanisms.
http://arxiv.org/abs/2407.14129v1
Compressor summary: The text compares different deep learning weather prediction models and their performance on synthetic and real-world data, finding various tradeoffs and highlighting the need for further research.
http://arxiv.org/abs/2407.14126v1
Compressor summary: Mono-ViFI is a self-supervised monocular depth estimation framework that synthesizes virtual camera views, fuses multi-frame features, and uses image transformation and triplet loss for improved performance.
http://arxiv.org/abs/2407.14121v1
Compressor summary: The paper introduces Seismic Fault SAM, a pre-trained model that uses adaptors, 2.5D input strategy, and prior-based data augmentation to improve seismic fault detection and interpretation.
http://arxiv.org/abs/2407.14120v1
Compressor summary: The paper presents a mathematical model using soccer ball graph and identifying code sets to optimize satellite monitoring of the Earth's surface by finding the minimum number of satellites needed to uniquely identify regions.
http://arxiv.org/abs/2407.14119v1
Compressor summary: Key points: - Paper presents a method for data augmentation using two GANs to create artificial images for precision farming - Method replaces only patches with objects of interest, considering both foreground and background - Experiments show effectiveness on public datasets Summary: The paper proposes a GAN-based data augmentation method that enhances training data for precision farming by replacing object patches in original images, respecting foreground and background.
http://arxiv.org/abs/2407.14117v1
Compressor summary: Our method improves low-shot capability of CLIP by refining visual content with prediction margins and adapting to global and local details.
http://arxiv.org/abs/2407.14103v1
Compressor summary: Key points: - Hand gesture recognition helps human-machine interaction underwater - CADDIAN is a new gesture-based language for divers - Zero-shot underwater gesture recognition (ZSUGR) aims to recognize unseen gestures using few seen classes - A two-stage framework with novel transformer and conditional GAN is proposed - The method outperforms existing zero-shot techniques Summary: The paper proposes a novel approach for recognizing unseen hand gestures underwater using few visual samples of seen gestures, by combining a transformer and a conditional generative adversarial network.
http://arxiv.org/abs/2407.14097v1
Compressor summary: This work proposes a biologically plausible AI model using the Forward-Forward Algorithm (FFA) and develops an OoD detection algorithm (FF-SCP) that outperforms existing methods in terms of accuracy, energy efficiency, and explainability.
http://arxiv.org/abs/2407.14088v1
Compressor summary: The study investigates the impact of model size on D2T performance across five datasets and twelve LLMs, finding that larger models improve readability and informativeness but may sacrifice faithfulness and struggle with source-reference divergence.
http://arxiv.org/abs/2407.14087v1
Compressor summary: Key points: - State-of-the-art face recognition networks have different score distributions across demographics - The paper proposes to integrate demographic information in score normalization methods to improve fairness - Experiments show that the proposed techniques improve overall fairness without sacrificing performance Summary: The paper introduces demographic-aware score normalization methods for face recognition networks to enhance fairness across different demographics, and demonstrates their effectiveness on two datasets.
http://arxiv.org/abs/2407.14086v1
Compressor summary: The paper proposes a new learning approach for multi-object tracking using cross-correlation to capture temporal information of objects, improving performance and reducing computational cost.
http://arxiv.org/abs/2407.14085v1
Compressor summary: Keyword extraction method using KeyBERT library to identify class-specific keywords from seed keywords, achieving improved results in German business registry classification.
http://arxiv.org/abs/2407.14081v1
Compressor summary: DisenSemi is a novel framework for semi-supervised graph classification that learns disentangled representation using a factor-wise graph encoder and an MI-based consistency regularization.
http://arxiv.org/abs/2407.14078v1
Compressor summary: The Stable-Hair framework is a novel diffusion-based approach that can transfer realistic and diverse hairstyles onto bald faces for virtual hair try-on.
http://arxiv.org/abs/2407.14076v1
Compressor summary: The paper explores domain-specific and mixed-domain pretraining for creating specialized language models that can handle specific tasks with less sensitive data compared to general-purpose models like GPT-4 or Claude-3-opus.
http://arxiv.org/abs/2407.14069v1
Compressor summary: The paper proposes BOLD-DI, a method to improve video contrastive learning by capturing both static and dynamic features of videos in an integrated and decoupled way.
http://arxiv.org/abs/2407.14066v1
Compressor summary: The paper introduces a new benchmark dataset and method for interpolating distorted omnidirectional videos to reduce dizziness in VR experiences.
http://arxiv.org/abs/2407.14065v1
Compressor summary: The paper introduces MSCT, a causal deep learning model for predicting post-crash traffic conditions that considers time-varying confounding factors and treatment effects, and shows its superior performance over existing methods using synthetic and real data.
http://arxiv.org/abs/2407.14064v1
Compressor summary: The paper proposes using pre-training and a balanced optimization technique to improve tuberculosis classification from chest X-ray images, aligning models with human experts without sacrificing accuracy or generalization.
http://arxiv.org/abs/2407.14062v1
Compressor summary: The paper proposes a new method, DVQ-VAE, that encodes hand into separate parts and uses a dual-stage decoding strategy to generate more realistic human grasps for computer graphics and robotics applications.
http://arxiv.org/abs/2407.14059v1
Compressor summary: The paper introduces a new way to reconstruct dynamic radiance fields from monocular videos using kinematics and physics-driven regularizers, improving the capture of real-world motion patterns.
http://arxiv.org/abs/2407.14058v1
Compressor summary: The paper proposes a method, $C^3$R, to learn causally complete representations for multi-modal learning that balances sufficiency and necessity in cause discovery.
http://arxiv.org/abs/2407.14057v1
Compressor summary: LazyLLM is a method that selectively computes the KV cache for prompts to speed up generation without fine-tuning.
http://arxiv.org/abs/2407.14056v1
Compressor summary: Rasa is a multilingual TTS dataset that provides a practical way to improve expressiveness by increasing neutral data and adding some expressive data for Indian languages.
http://arxiv.org/abs/2407.14054v1
Compressor summary: PointRegGPT generates realistic synthetic data for 3D point cloud registration using depth maps, improving performance of existing algorithms and achieving state-of-the-art results.
http://arxiv.org/abs/2407.14049v1
Compressor summary: Key points: - PAKPA is a quantitative summarization method for reviews - It uses aspect sentiment analysis and prompted in-context learning with LLMs - It generates and quantifies key points grounded in aspects for business entities - It does not require supervised training or annotated data Summary: PAKPA is a novel review summarization method that uses aspect sentiment analysis and large language models to generate and measure key points for business entities without supervision.
http://arxiv.org/abs/2407.14047v1
Compressor summary: The paper introduces a new problem (OCMOT) that involves tracking objects of known and unknown categories in open corpora, builds a large benchmark dataset (OCTrackB) for it, and proposes a new metric to evaluate recognition performance.
http://arxiv.org/abs/2407.14044v1
Compressor summary: ECCO is a benchmark for evaluating program efficiency using natural language and history-based code generation and editing methods, which can help improve the reliability and performance of large language models in generating correct and efficient code.
http://arxiv.org/abs/2407.14043v1
Compressor summary: The paper proposes a kinematics-based method to accurately reconstruct 3D human-object interaction from single-view RGB images by using improved forward kinematics, Multi-Layer Perceptron, and Contact Region Recognition Network.
http://arxiv.org/abs/2407.14041v1
Compressor summary: The paper proposes a method to select and optimize noises for diffusion models to improve their generation quality and human preference.
http://arxiv.org/abs/2407.14040v1
Compressor summary: The authors introduce CatGPT, a transformer-based language model that generates valid and accurate inorganic catalyst structures from a vast chemical space, and show its applicability in fine-tuning for specific catalytic reactions.
http://arxiv.org/abs/2407.14039v1
Compressor summary: The authors propose various techniques to enhance BERT's performance in different natural language understanding tasks, achieving state-of-the-art results and showing BERT's versatility.
http://arxiv.org/abs/2407.14032v1
Compressor summary: The paper introduces a novel method called Semantic-CC for generating more accurate and comprehensive change descriptions in bi-temporal remote sensing images using latent knowledge, multi-task interaction, and pixel-level semantics.
http://arxiv.org/abs/2407.14030v1
Compressor summary: HeCiX-KG is a novel knowledge graph that fuses data from ClinicalTrials.gov and Hetionet to improve target validation and drug optimization in clinical research, and HeCiX integrates it with GPT-4 for enhanced usability.
http://arxiv.org/abs/2407.14029v1
Compressor summary: The paper proposes a novel dual bias reduction framework for class-incremental learning that uses self-supervised transformation and prototype augmentation to address representation and classifier biases, enabling non-exemplar CIL without catastrophic forgetting.
http://arxiv.org/abs/2407.14026v1
Compressor summary: The paper introduces a new method to extract sketches with different styles from color images using unpaired data and achieves better results than existing methods.
http://arxiv.org/abs/2407.14024v1
Compressor summary: The paper proposes an out-of-distribution detection method that uses test-time augmentation to improve abnormality detection in gastrointestinal images.
http://arxiv.org/abs/2407.14008v1
Compressor summary: The authors apply and adapt existing interpretability techniques to Mamba, a recurrent model with scaling similar to Transformers, and investigate its Indirect Object Identification circuit.
http://arxiv.org/abs/2407.14007v1
Compressor summary: MRD is a pre-training method that uses large VLMs to learn better 3D shape representations from point clouds, images, and language descriptions, improving downstream tasks like classification and retrieval.
http://arxiv.org/abs/2407.13999v1
Compressor summary: NeLLCom-X is a framework that simulates language emergence and evolution through realistic agent interactions and communication pressures, allowing for the study of various linguistic properties and dynamics.
http://arxiv.org/abs/2407.13998v1
Compressor summary: LFRQA is a new dataset for question answering that tests cross-domain generalization and evaluates large language models by comparing their generated answers to human-written long-form answers.
http://arxiv.org/abs/2407.13989v1
Compressor summary: The paper proposes a novel approach that combines LLMs and GNNs to improve node classification accuracy with few labeled nodes, outperforming existing methods.
http://arxiv.org/abs/2407.13987v1
Compressor summary: The paper examines how artifacts affect attention mechanisms in video super-resolution, proposes a channel-attention-based framework called RealViformer that handles artifacts better than existing methods, and makes the code publicly available.
http://arxiv.org/abs/2407.13986v1
Compressor summary: The paper proposes Deep Feature Surgery, a method to improve multi-exit network accuracy and efficiency by feature partitioning and referencing during training.
http://arxiv.org/abs/2407.13982v1
Compressor summary: The study finds that ASR performance varies by dialect and recording quality, suggesting a need for further investigation to address potential fairness issues in speech recognition technology.
http://arxiv.org/abs/2407.13979v1
Compressor summary: The study analyzes the truthfulness of calibration measures in sequential prediction and introduces a new measure called SSCE that ensures optimal truthful predictions.
http://arxiv.org/abs/2407.13978v1
Compressor summary: DGRN is a method for fault diagnosis in industrial systems that uses dual adversarial training and contrastive learning to generate domain-invariant features from single-mode data, improving accuracy on unseen modes.
http://arxiv.org/abs/2407.13976v1
Compressor summary: PlacidDreamer is a text-to-3D framework that uses a multi-view diffusion model to harmonize initialization, multi-view generation, and text-conditioned generation, while solving the limitations of previous methods in score distillation.
http://arxiv.org/abs/2407.13975v1
Compressor summary: Chameleon is a system that generates user-centric privacy masks for facial images to protect them from unauthorized face recognition with minimal quality loss and robustness.
http://arxiv.org/abs/2407.13974v1
Compressor summary: ADDP is a novel method that tackles continual learning for rPPG measurement using adapter finetuning, domain prototypes, feature augmentation, and inference simplification strategies.
http://arxiv.org/abs/2407.13968v1
Compressor summary: The paper proposes an adaptive hybrid tree search algorithm for optimizing seed order fulfillment in centralized warehouses, considering seasonality, unpredictability, and deadlines.
http://arxiv.org/abs/2407.13957v1
Compressor summary: Key points: - Modern machine learning models can rely on spurious correlations and perform poorly on minority groups - Class-balancing techniques may decrease worst-group accuracy (WGA) over time or depending on group structure - Scaling pretrained models and appropriate class-balancing can improve WGA - Spectral imbalance in features is a potential source of group disparities Summary: The paper investigates how finetuned machine learning models behave on minority groups and identifies factors such as class-balancing, scaling, and spectral imbalance that affect worst-group accuracy.
http://arxiv.org/abs/2407.13954v1
Compressor summary: Neural topology optimization uses neural networks to reparameterize the decision space and shape the optimization landscape, influencing convergence and exploration depending on the network architecture.