This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-17 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.11969v1
Compressor summary: The authors show that many LLMs can be tricked into generating harmful outputs by simply reformulating requests in the past tense, revealing a generalization gap in current refusal training methods.
http://arxiv.org/abs/2407.11966v1
Compressor summary: The paper proposes a weight generator that uses GANs and text conditions to synthesize neural weights, which reduces training time and improves image quality.
http://arxiv.org/abs/2407.11965v1
Compressor summary: UrbanWorld is a generative model that creates realistic 3D city environments for training AI agents to perceive, decide, and act like humans.
http://arxiv.org/abs/2407.11963v1
Compressor summary: NeedleBench is a framework for testing large language models' abilities to retrieve and reason from long texts in different languages, revealing their limitations in complex reasoning tasks.
http://arxiv.org/abs/2407.11962v1
Compressor summary: MoCo-NeRF is a framework that uses radiance residual fields to model non-rigid motions in dynamic clothed humans, achieving state-of-the-art free-viewpoint rendering quality with efficient learning and simultaneous multi-subject support.
http://arxiv.org/abs/2407.11954v1
Compressor summary: The paper proposes a GTD network that models uncertainty in long-term action anticipation using a Gated Anticipation Network (GTAN) to represent past and future frames mutually.
http://arxiv.org/abs/2407.11950v1
Compressor summary: The paper proposes a video stereo matching method that uses temporal information to improve consistency, accuracy, and efficiency by completing the previous disparity map and refining it iteratively in both disparity and gradient spaces.
http://arxiv.org/abs/2407.11948v1
Compressor summary: This paper investigates Transformer-based models' performance and behaviors in multi-document summarization using five empirical studies and various evaluation metrics.
http://arxiv.org/abs/2407.11946v1
Compressor summary: The authors propose an efficient reconstruction architecture for Snapshot Compressive Imaging that uses Hierarchical Separable Video Transformer (HiSViT) to improve performance and efficiency.
http://arxiv.org/abs/2407.11941v1
Compressor summary: The authors propose a method to explain face recognition decisions by analyzing the influence of frequency components in images, which has not been done before.
http://arxiv.org/abs/2407.11936v1
Compressor summary: The study compares radar and thermal imaging for non-contact sleep monitoring and finds that thermal imaging outperforms radar in detecting and classifying sleep apnea.
http://arxiv.org/abs/2407.11935v1
Compressor summary: The study proposes MVAD, a framework for multi-view anomaly detection that learns and integrates features from multiple views using MVAS algorithm, which achieves state-of-the-art performance with minimal computational complexity.
http://arxiv.org/abs/2407.11933v1
Compressor summary: The paper introduces GAP, a new differentiable loss function for target detection that balances accuracy across demographic groups and reduces disparate impact.
http://arxiv.org/abs/2407.11930v1
Compressor summary: HaluQuestQA is a new dataset with localized error annotations for LFQA to evaluate and improve answer quality and comprehensiveness.
http://arxiv.org/abs/2407.11928v1
Compressor summary: Key points: - GNNs encode topological structures of networks but suffer from oversmoothing problem in dense regions. - Truss-based graph sparsification prunes redundant edges in dense regions to prevent excessive mixing of node embeddings. - The proposed model improves the performance of various GNN models on graph classification task. Summary: The paper proposes a truss-based graph sparsification method to reduce oversmoothing and enhance graph classification with GNNs.
http://arxiv.org/abs/2407.11921v1
Compressor summary: The text describes a new attack called IPA-NeRF that can embed hidden backdoors in Neural Radiance Fields, allowing them to produce illusory outputs when triggered by specific views.
http://arxiv.org/abs/2407.11919v1
Compressor summary: The authors propose a multi-LLM correction method for meeting summarization that uses human feedback on error types and refines the summary based on relevance, informativeness, conciseness, and coherence.
http://arxiv.org/abs/2407.11917v1
Compressor summary: The paper proposes a new way to estimate uncertainty in gradient-free optimization of black-box simulators using deep generative models and Wasserstein distance.
http://arxiv.org/abs/2407.11913v1
Compressor summary: The method combines local image patches with global frequencies learned from data to describe images more efficiently in quantised autoencoders.
http://arxiv.org/abs/2407.11910v1
Compressor summary: The paper proposes a new evaluation protocol for attribution methods, finding that intrinsically explainable models and raw attribution values perform better than previous methods, and different network designs affect attribution quality.
http://arxiv.org/abs/2407.11907v1
Compressor summary: Graph Foundation Model (GraphFM) is a scalable pretraining approach for node classification on diverse graphs using Perceiver-based encoders and latent tokens, improving adaptability, stability, and generalization across domains.
http://arxiv.org/abs/2407.11902v1
Compressor summary: Key points: - Paradigm combines various models into one prompt without modifying them or needing training data - Enables efficient and convenient knowledge transfer in realistic scenarios - Solves problems of low reusability and high storage consumption of Data-Free Knowledge Transfer - Experiments show effectiveness on different datasets and models, even with no data and limited storage Summary: The KiOP paradigm transfers knowledge from multiple models efficiently and conveniently without modifying them or needing training data, solving common problems of data-free methods, and performing well in various scenarios.
http://arxiv.org/abs/2407.11895v1
Compressor summary: OmniBind is a large-scale multimodal joint representation model that efficiently integrates 3D, audio, image, and language inputs by remaping and binding pre-trained specialist models.
http://arxiv.org/abs/2407.11894v1
Compressor summary: The paper presents a new training algorithm for deep neural networks with random complex exponential activation functions that uses Markov Chain Monte Carlo sampling to achieve theoretical approximation and efficient learning of features without Gibbs phenomena.
http://arxiv.org/abs/2407.11890v1
Compressor summary: DepGAN uses depth maps and alpha channels to improve image composition by rectifying occlusions and enhancing transparency effects with a novel Depth Aware Loss function.
http://arxiv.org/abs/2407.11878v1
Compressor summary: The paper proposes a novel framework that uses class-dependent confidence bounds to improve the robustness and reliability of classification models for imbalanced data.
http://arxiv.org/abs/2407.11876v1
Compressor summary: The paper simplifies over-smoothing theory in graph convolutions by relating it to power iteration, introduces rank collapse and rank-one distance as new concepts, and identifies more models affected by rank collapse.
http://arxiv.org/abs/2407.11867v1
Compressor summary: The paper proposes an efficient machine unlearning method that modifies only a single critical layer of model parameters using one-time gradient computation, enabling effective erasure of certain training samples with low computational cost and general utility retention.
http://arxiv.org/abs/2407.11862v1
Compressor summary: The paper presents a new Liberty lexicon for analyzing moral values in social issues using word embeddings and compositional semantics.
http://arxiv.org/abs/2407.11861v1
Compressor summary: The paper proposes a meme identification protocol based on memetics to improve meme classification and suggests that existing meme datasets lack genuine memes.
http://arxiv.org/abs/2407.11857v1
Compressor summary: The text proposes using Constraint Satisfaction Problem (CSP) to detect inconsistencies in task-oriented dialogues, finding that LLMs struggle to re-lexicalize dialogues consistently and accurately reflecting domain knowledge.
http://arxiv.org/abs/2407.11855v1
Compressor summary: This paper presents a large-scale sign language translation pretraining method that uses various data sources and improves open-domain performance on multiple languages and sign languages.
http://arxiv.org/abs/2407.11854v1
Compressor summary: The paper presents a two-stage fine-tuning pipeline that uses multilingual synthetic data to generate synthetic error corpora for grammatical error detection in low-resource languages, outperforming current annotation-free methods.
http://arxiv.org/abs/2407.11850v1
Compressor summary: SpaceJAM is an efficient and simple model for joint image alignment that reduces complexity and improves speed by 10x.
http://arxiv.org/abs/2407.11843v1
Compressor summary: InferAct is a new method that uses LLMs' Theory-of-Mind to detect potential errors before risky actions are taken, and it integrates human feedback to improve decision-making.
http://arxiv.org/abs/2407.11840v1
Compressor summary: MVG-Splatting improves 3D reconstruction by adaptively adjusting densification levels and normal calculations for better rendering quality and mesh extraction.
http://arxiv.org/abs/2407.11833v1
Compressor summary: LoFTI is a new benchmark to evaluate large language models' ability to localize and transfer factual information for different locations in India.
http://arxiv.org/abs/2407.11832v1
Compressor summary: The paper shows that approximating the number of relevant variables in a parity function is as hard as learning parities, and presents new algorithms for learning sparse parities with low-degree noise.
http://arxiv.org/abs/2407.11827v1
Compressor summary: The study develops RhetAnn, a web app that simplifies annotating text with rhetorical and linguistic features of persuasion, and uses GPT to fine-tune models for detecting propaganda techniques cost-effectively and interpretably.
http://arxiv.org/abs/2407.11823v1
Compressor summary: The paper proposes a machine learning-based policy to help the FDA reduce recalls and workload in its 510(k) medical device approval process, potentially saving billions of dollars annually.
http://arxiv.org/abs/2407.11821v1
Compressor summary: Knowledge graph embeddings can help make probabilistic inference in statistical extensions of description logics more efficient and accurate.
http://arxiv.org/abs/2407.11814v1
Compressor summary: The paper proposes a method to create consistent multi-scene videos from action-centric text descriptions using contrastive sequential video diffusion.
http://arxiv.org/abs/2407.11812v1
Compressor summary: The DFDRNN model uses two features to precisely encode drugs and diseases for drug repositioning, outperforming six existing methods on four datasets.
http://arxiv.org/abs/2407.11802v1
Compressor summary: ICD is a new knowledge distillation method that uses contrastive learning and invariance penalties to transfer more structural knowledge from a teacher model to a student model, achieving better results on several image datasets.
http://arxiv.org/abs/2407.11798v1
Compressor summary: PipeInfer is a new technique to speed up large language models across computer clusters by combining continuous asynchronous speculation and early inference cancellation.
http://arxiv.org/abs/2407.11790v1
Compressor summary: The study analyzes the efficiency of heterogeneous graph neural network (HGNN) training and identifies performance bottlenecks to optimize it.
http://arxiv.org/abs/2407.11789v1
Compressor summary: The study shows that large language models can be deceptive in reading comprehension tasks, leading to significant accuracy drops when used as assistants.
http://arxiv.org/abs/2407.11785v1
Compressor summary: The paper proposes a framework to evaluate synthetic smart meter data for privacy and utility, using outlier injection and differential privacy methods.
http://arxiv.org/abs/2407.11786v1
Compressor summary: The study proposes a machine learning model using technical indicators to predict Bitcoin prices and help traders make better decisions.
http://arxiv.org/abs/2407.11784v1
Compressor summary: The paper introduces a sandbox for co-developing data and multi-modal generative models, improving performance and efficiency, and provides resources to foster progress in the field.
http://arxiv.org/abs/2407.11781v1
Compressor summary: Key points: - The paper introduces SlingBAG, a novel 3D PAI reconstruction algorithm based on differentiable rendering and adaptive growth of point clouds. - The method shows improved quality and efficiency under sparse or limited view compared to traditional methods. - The paper provides a new dataset and code for future research. Summary: The paper presents SlingBAG, a 3D PAI reconstruction algorithm that uses differentiable rendering and adaptive point cloud growth to achieve high-quality results in sparse or limited view scenarios, outperforming traditional methods.
http://arxiv.org/abs/2407.11780v1
Compressor summary: The paper proposes a method to prevent catastrophic forgetting in large language models when adapting to new tasks by switching between tuned models.
http://arxiv.org/abs/2407.11778v1
Compressor summary: The paper introduces SUWR, a local feature selection method that avoids misleading explanations by ensuring no label or feature leakage, improving interpretability of complex models.
http://arxiv.org/abs/2407.11774v1
Compressor summary: The paper presents a RoBERTa-based neural model for detecting machine-generated text in English, achieving 78.9% accuracy and ranking 57th in SemEval-2024 Subtask A.
http://arxiv.org/abs/2407.11773v1
Compressor summary: The paper proposes using large language models and prompt engineering to create adaptable, interactive, and transparent personalized learning path planning systems that improve learning efficiency and engagement.
http://arxiv.org/abs/2407.11770v1
Compressor summary: The paper presents a framework using LLMs to evaluate privacy and utility of anonymized data and optimize the anonymization process, achieving better results than existing methods.
http://arxiv.org/abs/2407.11767v1
Compressor summary: ITI-IQA is a toolbox for assessing imputation quality, selecting best imputers, filtering low-quality features, and diagnosing missing data issues in various data types.
http://arxiv.org/abs/2407.11766v1
Compressor summary: The article introduces a new language structure inspired by large language models that better captures linguistic diversity and suggests it could improve scientific research.
http://arxiv.org/abs/2407.11762v1
Compressor summary: DECAFORK is a decentralized algorithm that maintains the number of random walks in a graph around a desired value by forking them when failures are likely, ensuring failure resilience.
http://arxiv.org/abs/2407.11756v1
Compressor summary: The paper introduces a framework that models higher-order node interactions using tree-shaped motifs and spectral filters, and shows its effectiveness on regression and classification tasks.
http://arxiv.org/abs/2407.11753v1
Compressor summary: The research presents a new hybrid deep learning model for accurate and early rice leaf disease identification, which could improve agricultural efficiency and reduce crop loss.
http://arxiv.org/abs/2407.11751v1
Compressor summary: The paper argues that using long model rollouts in model-based offline reinforcement learning can improve Q-value estimates and may enhance the technique.
http://arxiv.org/abs/2407.11750v1
Compressor summary: The paper proposes a new unsupervised method called CCLGAN for single image deraining that combines cycle contrastive learning and location contrastive learning to improve image quality.
http://arxiv.org/abs/2407.11735v1
Compressor summary: The text introduces ProSub, a new open-set semi-supervised learning (OSSL) framework that uses angles in feature space for in-distribution/out-of-distribution classification and conditional distribution estimation, achieving state-of-the-art performance on several benchmarks.
http://arxiv.org/abs/2407.11733v1
Compressor summary: The authors evaluate LLMs for stereotyping and find that they improve with a safety system prompt but still struggle with certain toxic harms, especially for intersectional identities, and call for more accountability and awareness in NLP.
http://arxiv.org/abs/2407.11722v1
Compressor summary: This study explores how quantization can make pre-training Transformer models more efficient for language modeling by applying it to various components during training.
http://arxiv.org/abs/2407.11717v1
Compressor summary: Turbo is a plug-and-play module that prunes data redundancy in vision-language models using information degree, achieving efficiency and performance trade-offs.
http://arxiv.org/abs/2407.11701v1
Compressor summary: The text discusses how olfaction shapes human experiences, and proposes a transfer-learning method to classify fragrant spaces in artistic scenes using weakly labeled data.
http://arxiv.org/abs/2407.11700v1
Compressor summary: The text proposes a new image compression method for machines that allows users to control the bitrate, quality, and task accuracy with one neural model.
http://arxiv.org/abs/2407.11699v1
Compressor summary: The paper proposes Relation-DETR, an enhanced DETR method that uses position relation embeddings to improve convergence, performance, and speed for object detection tasks.
http://arxiv.org/abs/2407.11698v1
Compressor summary: NITRO-D is a novel framework for training deep integer-only Convolutional Neural Networks that operate entirely in the integer-only domain, reducing memory and computational requirements while maintaining performance.
http://arxiv.org/abs/2407.11696v1
Compressor summary: EarthNet is a model that learns to predict global atmospheric conditions from satellite observations, producing accurate and efficient data assimilation for weather forecasting.
http://arxiv.org/abs/2407.11691v1
Compressor summary: VLMEvalKit is an open-source PyTorch toolkit for evaluating large multi-modal models and benchmarks, with a user-friendly interface and automatic handling of various tasks.
http://arxiv.org/abs/2407.11686v1
Compressor summary: The paper proposes CCoE architecture, which combines multiple domain experts to create a large LLM, improving performance while reducing training costs.
http://arxiv.org/abs/2407.11683v1
Compressor summary: The paper proposes a network that learns stable image representations under distractors and generates captions based on reliable difference features using cross-modal contrastive regularization.
http://arxiv.org/abs/2407.11682v1
Compressor summary: MapDistill uses knowledge distillation to transfer knowledge from a camera-LiDAR fusion model to a lightweight camera model for efficient high-definition map construction in autonomous driving.
http://arxiv.org/abs/2407.11681v1
Compressor summary: The paper proposes a hybrid pruning method for large language models that uses gradients estimated from forward passes to efficiently remove less critical components and improve performance on various tasks.
http://arxiv.org/abs/2407.11678v1
Compressor summary: The paper analyzes the excess risk of CycleGAN, a model that transforms unpaired data while ensuring consistent mappings, by decomposing it into approximation and estimation errors and exploring their trade-offs.
http://arxiv.org/abs/2407.11677v1
Compressor summary: The paper introduces a new module for pre-training video-language models that leverages spatio-temporal graph structure to learn contexts and improve alignment accuracy for downstream tasks.
http://arxiv.org/abs/2407.11676v1
Compressor summary: SKADA-Bench is a framework to evaluate and compare unsupervised domain adaptation methods fairly and realistically, using nested cross-validation and various scores.
http://arxiv.org/abs/2407.11668v1
Compressor summary: The authors propose a novel network that improves sub-pixel accuracy in detecting 2D local features by learning an offset vector for each feature, leading to better keypoint localization and faster computation.
http://arxiv.org/abs/2407.11666v1
Compressor summary: The authors propose a method to compress large atmospheric data sets using neural networks, enabling faster access and analysis for various stakeholders.
http://arxiv.org/abs/2407.11663v1
Compressor summary: The paper presents a solution for the Multi-Task Learning Challenge of the ABAW7 Competition that uses a pre-trained model and cross-attention to extract features for action unit detection, facial expression recognition, and valance-arousal estimation tasks.
http://arxiv.org/abs/2407.11660v1
Compressor summary: GenResCoh is an open source dataset for evaluating dialogue coherence in multiple languages, created to overcome GPT-4's closed-source nature.
http://arxiv.org/abs/2407.11650v1
Compressor summary: The paper proposes an improved audio-visual deepfake detection method that uses statistical features, waveform representation, normalization, and shallower networks to enhance performance and reduce complexity.
http://arxiv.org/abs/2407.11647v1
Compressor summary: The paper proposes a novel decentralized dataset dictionary learning approach that uses Wasserstein barycenters to adapt multiple related and heterogeneous source datasets to an unlabeled target dataset without centralizing clients' data or violating privacy.
http://arxiv.org/abs/2407.11644v1
Compressor summary: The paper proposes Perception Helps Planning (PHP), a framework that integrates lane-level perception and planning for safe and efficient autonomous driving, achieving state-of-the-art performance on three Carla benchmarks.
http://arxiv.org/abs/2407.11638v1
Compressor summary: This paper evaluates the ability of large language models (LLMs) in temporal event forecasting and proposes new methods using a constructed dataset and various input formats.
http://arxiv.org/abs/2407.11637v1
Compressor summary: REMM is a framework for multimodal image matching that encodes rotational differences in descriptors, improving performance and robustness to any angle.
http://arxiv.org/abs/2407.11633v1
Compressor summary: DiT-MoE is a sparse diffusion Transformer that optimizes inference and achieves competitive performance with dense networks in conditional image generation using expert routing and balance loss.
http://arxiv.org/abs/2407.11626v1
Compressor summary: The Dynamic Dimension Wrapping (DDW) algorithm is a new optimization method for efficiently searching multi-dimensional spaces with varying dimensions, using a fitness function based on mapping relationships between time series and a novel cross-dimensional search mechanism.
http://arxiv.org/abs/2407.11625v1
Compressor summary: The text discusses two experiments on visual validation of linear regression models in scatterplots and finds that people are biased towards steeper slopes and error lines reduce bias but do not improve accuracy.
http://arxiv.org/abs/2407.11624v1
Compressor summary: FairGB is a method to make Graph Neural Networks (GNNs) more fair by balancing the contributions of different groups in the data using counterfactual node mixup and contribution alignment loss.
http://arxiv.org/abs/2407.11619v1
Compressor summary: The paper studies online binary classification with strategic agents who can manipulate their features to get positive labels, introduces a new complexity measure for the learning problem, and provides algorithms with improved regret bounds in different settings.
http://arxiv.org/abs/2407.11615v1
Compressor summary: Graph Dimension Attention Network (GDAN) improves credit risk evaluation by considering different feature dimensions using a dimension-level attention mechanism and provides edge-level interpretability through GDAN-DistShift.
http://arxiv.org/abs/2407.11610v1
Compressor summary: MergeNet is a novel method that predicts local connectivity and filters out irrelevant edges to reconstruct meshes from sparse point clouds efficiently and accurately.
http://arxiv.org/abs/2407.11606v1
Compressor summary: The paper proposes a formal framework to analyze and design tokenization models for neural language modeling, addressing the lack of theory in this critical NLP step.
http://arxiv.org/abs/2407.11596v1
Compressor summary: HyperAggregation is a novel aggregation function for Graph Neural Networks that uses a hypernetwork to generate weights for aggregating variable-sized neighborhoods, and shows promising results on various graph tasks.
http://arxiv.org/abs/2407.11591v1
Compressor summary: The study evaluates how well Large Language Models adapt to different domains for summarization tasks and introduces a new evaluation suite called AdaptEval.
http://arxiv.org/abs/2407.11588v1
Compressor summary: The PPT framework progressively trains a model to predict pedestrian positions using short-term dynamics and long-term dependencies, achieving state-of-the-art performance with high efficiency.
http://arxiv.org/abs/2407.11585v1
Compressor summary: The paper proposes a post-training quantization strategy (QVD) for video diffusion models to reduce latency and memory consumption while preserving temporal discriminability and improving channel coverage.
http://arxiv.org/abs/2407.11579v1
Compressor summary: The study applies classification algorithms to improve stop location detection from noisy or incomplete GPS datasets, using multiple features and prioritizing recall over precision.
http://arxiv.org/abs/2407.11578v1
Compressor summary: The study proposes a new task and method (UP-Diff) for forecasting future urban layouts using remote sensing and planned change maps, which could help with city development planning.
http://arxiv.org/abs/2407.11566v1
Compressor summary: Text-Guided Inpainting Forgery (TGIF) is a new dataset for evaluating image forgery localization and synthetic image detection methods in the context of text-guided inpainting, a powerful generative AI technique for editing images.
http://arxiv.org/abs/2407.11555v1
Compressor summary: Key points: - A novel approach for generating minority samples using diffusion models and guided sampling - Self-contained sampler that does not require external components - Time-scheduling techniques to manage guidance influence - Improved performance on real datasets Summary: The authors propose a self-contained, guided sampling method for generating realistic low-likelihood minority samples using diffusion models and time-scheduling techniques.
http://arxiv.org/abs/2407.11550v1
Compressor summary: The authors propose an adaptive allocation algorithm for efficient and high-quality language model inference by reducing the cache size within a memory budget without compromising generation quality.
http://arxiv.org/abs/2407.11549v1
Compressor summary: The paper presents a simulation framework using Large Language Model agents with synthetic personality traits that can mimic human negotiation behavior and strategically impact outcomes.
http://arxiv.org/abs/2407.11546v1
Compressor summary: The paper proposes a collaborative perception model for autonomous vehicles that improves detection accuracy and reduces computational cost by communicating with other vehicles and infrastructures in either sequential or parallel connections.
http://arxiv.org/abs/2407.11543v1
Compressor summary: The paper presents a new algorithm (GER) for building sparse PBNs, which are mathematical models used in various domains, and shows its superior performance over existing methods.
http://arxiv.org/abs/2407.11542v1
Compressor summary: The text analyzes simple transformer models trained on counting items in sequences, showing that different architectures can implement relation- or inventory-based counting mechanisms depending on various factors.
http://arxiv.org/abs/2407.11540v1
Compressor summary: The paper introduces NAIM, a transformer-based model that handles missing values in tabular datasets without imputation techniques, and shows its superior performance over existing models.
http://arxiv.org/abs/2407.11537v1
Compressor summary: The authors propose using adversarial examples as reconstruction targets for masked image modeling to enhance representation learning, improve generalization, and increase robustness.
http://arxiv.org/abs/2407.11536v1
Compressor summary: This study explores why medical LLMs struggle with long-context understanding and suggests adjusting the fine-tuning data composition to improve their performance.
http://arxiv.org/abs/2407.11534v1
Compressor summary: Low-Rank Quantization (LRQ) is a post-training weight quantization method for large language models that uses low-rank weight-scaling matrices to improve accuracy and reduce parameters, achieving better results than existing techniques under various quantization schemes.
http://arxiv.org/abs/2407.11532v1
Compressor summary: The paper proposes a new model, Length-Aware Latent Diffusion (LADiff), that can generate 3D human motions with variable target lengths from textual descriptors.
http://arxiv.org/abs/2407.11522v1
Compressor summary: FIRE is a new dataset for improving vision language models' ability to refine their responses based on user feedback in various tasks, and it is used to evaluate and develop better models.
http://arxiv.org/abs/2407.11514v1
Compressor summary: Key points: - Colorway creation is generating textile samples with different colors but the same pattern - The paper proposes a framework called ColorwAI that uses disentanglement methods on StyleGAN and Diffusion models - The paper introduces ShapleyVec, a variation of InterfaceGAN for supervised disentanglement - The paper analyzes the color representations in different architectures and finds StyleGAN's W space most aligned with human color perception - The paper suggests that disentanglement can enable creative colorway creation and evaluates it with experts Summary: The paper presents ColorwAI, a framework for generating creative textile samples with alternative colors based on disentangling color information from StyleGAN and Diffusion models. It also introduces ShapleyVec, a new method for supervised disentanglement, and evaluates the effect of disentanglement on color creation with experts.
http://arxiv.org/abs/2407.11511v1
Compressor summary: The paper reviews prompt-based reasoning with large language models, explores different approaches and open problems, and discusses the relation between reasoning and other aspects of artificial intelligence.
http://arxiv.org/abs/2407.11505v1
Compressor summary: The Haze-Aware Attention Network (HAA-Net) combines a novel attention module based on atmospheric scattering and a multiscale frequency enhancement module to effectively remove haze from images, outperforming existing methods.
http://arxiv.org/abs/2407.11502v1
Compressor summary: The paper proposes TextGen, a framework to improve visual text generation by optimizing control information using Fourier analysis and a two-stage generation process, achieving state-of-the-art results in Chinese and English.
http://arxiv.org/abs/2407.11501v1
Compressor summary: The paper proposes a new model, Diff-MTS, for generating multivariate time series data in the industrial field, which improves diversity, fidelity, and utility compared to existing GAN-based methods.
http://arxiv.org/abs/2407.11500v1
Compressor summary: The paper proposes a method to automatically grade knee osteoarthritis severity using self-supervised anomaly detection, denoising with CLIP, and dual centre representation learning, outperforming existing techniques and achieving human-level correlation.
http://arxiv.org/abs/2407.11499v1
Compressor summary: The paper proposes a method called "Bridge Past and Future" (BPF) that aligns models across stages to overcome inconsistent optimization objectives in incremental object detection, and introduces a new loss called "Distillation with Future" (DwF) that leverages background probability to mitigate forgetting and improve adaptability.
http://arxiv.org/abs/2407.11494v1
Compressor summary: The paper proposes Semantic Latent Directions, a method to improve stochastic human motion prediction by constraining the latent space to learn meaningful motion semantics and offering controllable predictions with diverse queries.
http://arxiv.org/abs/2407.11489v1
Compressor summary: The paper proposes a meta-learning approach for residential appliance scheduling that adapts quickly to changing contexts, reduces electricity bills, increases user comfort, and saves utility while using less data and training time.
http://arxiv.org/abs/2407.11487v1
Compressor summary: The paper proposes a navigation method that aligns instructions with trajectories on a directed graph, improving efficiency and performance compared to previous methods.
http://arxiv.org/abs/2407.11486v1
Compressor summary: The paper proposes an efficient framework for cervical cytopathology WSI classification using unsupervised and weakly supervised learning, which enhances the performance of various MIL methods and achieves SOTA results.
http://arxiv.org/abs/2407.11485v1
Compressor summary: The VerifAI project is an open-source system that generates and verifies referenced claims from scientific papers using semantic search, retrieval-augmented generation, and a verification engine.
http://arxiv.org/abs/2407.11484v1
Compressor summary: The survey explores the development and challenges of role-playing with language models, focusing on their ability to create complex character simulations using various methods and resources.
http://arxiv.org/abs/2407.11481v1
Compressor summary: This paper proposes a method to generate realistic 12-lead ECG signals from single-lead ECG using a multi-channel masked autoencoder, and introduces a benchmark for evaluating the quality of synthetic ECGs.
http://arxiv.org/abs/2407.11480v1
Compressor summary: This paper overviews generative models for industrial time series, discussing their applications, frameworks, technologies, and challenges.
http://arxiv.org/abs/2407.11477v1
Compressor summary: The XTraffic dataset combines spatiotemporally-aligned traffic and incident data to enable new research on traffic-related tasks with higher interpretability and practice.
http://arxiv.org/abs/2407.11473v1
Compressor summary: The paper develops quantum versions of maximum entropy inference and graphical model learning algorithms, improves their convergence rates using quasi-Newton methods, and applies them to Hamiltonian learning.
http://arxiv.org/abs/2407.11471v1
Compressor summary: The paper proposes an algorithm for safe online convex optimization using only zero-order information, achieving sublinear regret and zero constraint violation with smooth and strongly convex constraints.
http://arxiv.org/abs/2407.11468v1
Compressor summary: The paper proposes a new video pre-training method for facial action unit detection that uses multi-label properties, temporal label consistency, and prior knowledge matrices to improve performance on limited data.
http://arxiv.org/abs/2407.11456v1
Compressor summary: A reinforcement learning agent with specialized hemispheres can exploit generalist knowledge for better initial performance on novel tasks while maintaining learning capabilities.
http://arxiv.org/abs/2407.11451v1
Compressor summary: Isometric Diffusion is a technique that improves diffusion models by learning a disentangled latent space for better interpolation, inversion, and attribute control.
http://arxiv.org/abs/2407.11449v1
Compressor summary: Controllable Contextualized Image Captioning (Ctrl-CIC) generates focused captions for images based on a user-defined highlight, using two approaches and a GPT-4V evaluator.
http://arxiv.org/abs/2407.11448v1
Compressor summary: The paper proposes a Bayesian nonparametric framework for multiple instance learning in histopathology image analysis, using cascade of Dirichlet processes to improve feature aggregation and prevent overfitting.
http://arxiv.org/abs/2407.11442v1
Compressor summary: EARN Fairness is a new framework that helps stakeholders without AI expertise choose and agree on fairness metrics for AI models.
http://arxiv.org/abs/2407.11439v1
Compressor summary: Repurformer is a model that uses multi-hop relationships among proteins and compounds to generate diverse molecules with desired properties for drug discovery, overcoming the sample bias problem.
http://arxiv.org/abs/2407.11438v1
Compressor summary: The study analyzes personal disclosures in human-chatbot interactions, revealing privacy risks from leaking identifiable information and sensitive topics in various contexts.
http://arxiv.org/abs/2407.11433v1
Compressor summary: The paper introduces CycleHOI, a new learning framework for computer vision that bridges detection and generation tasks using cycle consistency loss, feature distillation, and data augmentation to improve human-object interaction detection.
http://arxiv.org/abs/2407.11431v1
Compressor summary: The paper presents an end-to-end framework that uses a recurrent neural network-like structure to mutually reinforce multi-view dense matching and point cloud surface optimization for indoor 3D reconstruction, improving both tasks and achieving better results.
http://arxiv.org/abs/2407.11427v1
Compressor summary: Key points: - Deep generative approach using latent temporal processes for modeling complex disease trajectories - Learn temporal latent representations that explain SSc patient trajectories - Semi-supervised disentangling of latent space using medical knowledge - Discovery of new aspects, sub-types, and personalized monitoring and prediction of SSc Summary: The authors propose a deep generative method to model and analyze complex disease trajectories, especially Systemic Sclerosis, by learning latent temporal representations that explain the disease and using medical knowledge to enhance interpretability.
http://arxiv.org/abs/2407.11426v1
Compressor summary: The paper proposes a general framework for counterfactual explanations in machine learning that is robust to model and data changes.
http://arxiv.org/abs/2407.11424v1
Compressor summary: Diff-MI is a novel diffusion model-based method that improves generative fidelity and privacy invasion of model inversion attacks by incorporating the target classifier into the learning process and using an iterative image reconstruction technique.
http://arxiv.org/abs/2407.11422v1
Compressor summary: Reflective instruction tuning improves LVLMs' reasoning proficiency by learning rationales behind correct and incorrect responses, as demonstrated by the REVERIE dataset and benchmark results.
http://arxiv.org/abs/2407.11421v1
Compressor summary: The paper reveals that large language models can perform complex arithmetic calculations without explicit chain-of-thought steps, possibly using implicit discrete state representations, but these representations are not lossless and cause inaccuracies.
http://arxiv.org/abs/2407.11419v1
Compressor summary: The study proposes TeethDreamer, a framework that uses five intra-oral photos to reconstruct 3D dental models for remote orthodontic monitoring, improving on previous methods with better geometry accuracy.
http://arxiv.org/abs/2407.11417v1
Compressor summary: The SPINACH dataset and agent improve Knowledge Base Question Answering (KBQA) by handling complex questions and reasoning about large, incomplete schemas, achieving state-of-the-art results on several datasets.
http://arxiv.org/abs/2407.11414v1
Compressor summary: SDPT improves the performance of fusion-based VLPMs by using learnable unified prototype tokens to represent aligned semantics of text and image modalities across different prompts.
http://arxiv.org/abs/2407.11409v1
Compressor summary: The study examines how well GPT-3.5-Turbo model simulates political behavior and opinions across different countries, languages, demographics, and regimes, finding that it performs better in English-speaking bipartisan democracies.
http://arxiv.org/abs/2407.11406v1
Compressor summary: The study finds that modular programming does not necessarily improve the performance of code generation models using large language models, challenging conventional wisdom.
http://arxiv.org/abs/2407.11407v1
Compressor summary: The paper proposes a new graph convolutional network model that incorporates roadway maintenance work zones information to improve traffic speed forecasting and its impact on the economy and public well-being.
http://arxiv.org/abs/2407.11404v1
Compressor summary: The study uses EnMAP and Sentinel-2 data to accurately map fractional woody cover of three species in a South African savannah, helping to protect the ecosystem from invasive plants.
http://arxiv.org/abs/2407.11401v1
Compressor summary: EndoFinder is a framework that uses content-based image retrieval to find similar polyps in a reference database, enabling explainable diagnostics and optical biopsy for colorectal cancer screening.
http://arxiv.org/abs/2407.11398v1
Compressor summary: Animate3D is a novel framework for animating any static 3D model using multi-view video diffusion and 4D Score Distillation Sampling to achieve better spatiotemporal consistency.
http://arxiv.org/abs/2407.11394v1
Compressor summary: DreamCatalyst is a novel framework that improves 3D editing quality and reduces training time by interpreting Score Distillation Sampling as a diffusion reverse process, offering fast and high-quality modes for NeRF scene editing.
http://arxiv.org/abs/2407.11393v1
Compressor summary: The paper proposes a method to generate diverse, high-quality, and focused image descriptions using a structured semantic representation, which improves the performance of controllable image captioning models.
http://arxiv.org/abs/2407.11384v1
Compressor summary: InvAgent is a novel approach using large language models to manage multi-agent inventory systems, enhancing resilience and efficiency in supply chain management by leveraging zero-shot learning, explainability, and adaptability.
http://arxiv.org/abs/2407.11383v1
Compressor summary: The authors introduce a multilingual speech-based VQA dataset for medical diagnostics and evaluate different systems using acoustic and visual features.
http://arxiv.org/abs/2407.11382v1
Compressor summary: Key points: - The paper proposes an algorithm for labeling 3D objects from 2D prompts for autonomous driving. - The algorithm uses a Segment, Lift, and Fit (SLF) paradigm that predicts 3D shapes instead of bounding boxes. - The algorithm does not require training on a specific dataset and shows better generalization and pseudo-label performance than previous methods. Summary: The paper presents an SLF algorithm that labels 3D objects from 2D prompts for autonomous driving without specific training data, achieving high quality and generalization.
http://arxiv.org/abs/2407.11380v1
Compressor summary: NAMER is a novel non-autoregressive model for handwritten mathematical expression recognition that leverages visual and linguistic contexts and achieves better performance and speed than existing methods.
http://arxiv.org/abs/2407.11379v1
Compressor summary: The paper explores how spectral analysis of model gradients can reveal transfer learning biases and frequency shortcuts in medical imaging, and suggests source data editing as a way to reduce overfitting to artifacts.
http://arxiv.org/abs/2407.11375v1
Compressor summary: The paper introduces MAMMI, a novel method for interpreting deep neural networks in medical imaging without expensive pixel-level annotations, enabling transparent clinical decision-making.
http://arxiv.org/abs/2407.11373v1
Compressor summary: The study proposes a neurosymbolic approach using Prolog to improve LLMs' reasoning skills and introduces a new dataset for testing non-linear reasoning abilities.
http://arxiv.org/abs/2407.11371v1
Compressor summary: The paper proposes a model for generating random annotations to estimate chance agreement in sequence annotation tasks, which can help evaluate their reliability.
http://arxiv.org/abs/2407.11368v1
Compressor summary: The study compares three methods for translating ancient texts with sparse corpora and proposes a new inter methodological approach that outperforms previous models in BLEU score.
http://arxiv.org/abs/2407.11361v1
Compressor summary: GPL is a novel method that enhances GNN training by capturing intrinsic graph characteristics using task-independent structure losses, improving node and graph representations and achieving state-of-the-art results on several tasks.
http://arxiv.org/abs/2407.11360v1
Compressor summary: The paper compares human-AI interactions to giraffes and acacias on the Savannah, discussing how humans adapt to and shape AI while addressing ethical risks using the HHH framework.
http://arxiv.org/abs/2407.11359v1
Compressor summary: The paper explores how Shapley value-based model interpretability methods can expose private features in machine learning models and suggests the need for privacy-preserving alternatives.
http://arxiv.org/abs/2407.11358v1
Compressor summary: The paper proposes a new graph neural network (SES) that combines explainable training and enhanced predictive learning to improve accuracy and interpretability of predictions.
http://arxiv.org/abs/2407.11348v1
Compressor summary: The study introduces a new way of creating and using fish disease images for automated detection in farmed flatfish, improving performance by 12%.
http://arxiv.org/abs/2407.11347v1
Compressor summary: The inverse image-formation module enhances visual SLAM pipelines by integrating physical imaging and optimizing variables to handle motion blur and appearance variation in casually captured videos.
http://arxiv.org/abs/2407.11345v1
Compressor summary: The text discusses the challenges of detecting different types of speech errors caused by aphasia using automatic methods and presents novel approaches based on pretrained transformers and end-to-end models that perform better than previous ones.
http://arxiv.org/abs/2407.11343v1
Compressor summary: Ev-GS is a new method that uses event cameras to reconstruct realistic views with less blur and improved quality, while being faster and more efficient than existing methods.
http://arxiv.org/abs/2407.11337v1
Compressor summary: CGNet is an end-to-end network that improves centerline graphs for autonomous driving by incorporating junction prediction, B'ezier space continuity constraints, and iterative refinement of topological connectivity.
http://arxiv.org/abs/2407.11335v1
Compressor summary: The paper proposes LaMI-DETR, an open-vocabulary object detection method that leverages visual concepts and relationships to improve concept representation and avoid overfitting, achieving state-of-the-art performance on OV-LVIS.
http://arxiv.org/abs/2407.11315v1
Compressor summary: COMET is a novel approach for generating high-quality mathematical problems by combining stem generation and problem solving, using a three-stage fine-tuning framework guided by "Cone of Experience" and a Chinese multimodal mathematical problem dataset.
http://arxiv.org/abs/2407.11310v1
Compressor summary: The paper proposes a method using digital twins and multi-agent reinforcement learning to optimize task offloading and resource allocation in vehicular edge computing networks.
http://arxiv.org/abs/2407.11309v1
Compressor summary: The authors propose a novel method to improve the reconstruction of dynamic 3D scenes from 2D images by regularizing the native warp field within the dynamic Gaussian Splatting framework using an analytical velocity field derived from the forward warp field network.
http://arxiv.org/abs/2407.11306v1
Compressor summary: PADRe is a framework that replaces self-attention in transformer models with polynomial functions for faster computation without sacrificing accuracy.
http://arxiv.org/abs/2407.11294v1
Compressor summary: The paper presents a graph-based masked autoencoder (GMAE) for generating realistic, context-sensitive urban layouts across various styles in US cities.