This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-10 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.07094v1
Compressor summary: AnyTaskTune is a fine-tuning method for large language models that improves performance on diverse domain-specific tasks by defining and enhancing targeted sub-tasks with specialized datasets.
http://arxiv.org/abs/2407.07093v1
Compressor summary: The paper introduces FBI-LLM, a large-scale binary language model that matches the performance of full-precision models and could enable specialized hardware for 1-bit LLMs.
http://arxiv.org/abs/2407.07092v1
Compressor summary: The paper proposes a method (V-VIPE) to represent 3D human pose in canonical coordinate space using a variational autoencoder, enabling various downstream tasks like retrieval and classification.
http://arxiv.org/abs/2407.07089v1
Compressor summary: Task arithmetic improves model efficiency by fine-tuning only linear layers, enhancing weight disentanglement and understanding the roles of representation and task-specific models.
http://arxiv.org/abs/2407.07088v1
Compressor summary: The paper presents new methods to train and verify deep reinforcement learning controllers for safety-critical domains, using k-induction, neural Lyapunov Barrier certificates, and reachability-based approaches.
http://arxiv.org/abs/2407.07087v1
Compressor summary: CopyBench is a benchmark to measure literal and non-literal copying in language models using copyrighted fiction books, showing that larger models have more copying issues.
http://arxiv.org/abs/2407.07086v1
Compressor summary: The Hypothetical Minds agent uses a cognitively-inspired architecture with a Theory of Mind module to generate and refine hypotheses about other agents' strategies, improving performance in multi-agent reinforcement learning tasks.
http://arxiv.org/abs/2407.07082v1
Compressor summary: The paper proposes OPEN, a method that meta-learns an update rule to improve reinforcement learning by addressing its non-stationarity, plasticity loss, and exploration needs.
http://arxiv.org/abs/2407.07080v1
Compressor summary: The paper introduces two Hebrew language models, DictaLM2.0 and DictaLM2.0-Instruct, trained on a large corpus of Hebrew and English data, and presents a new benchmark suite to evaluate them on various tasks.
http://arxiv.org/abs/2407.07078v1
Compressor summary: The text introduces MoSt-DSA, a deep learning method for interpolating DSA images that reduces radiation dose by using AI instead of image count and achieves state-of-the-art performance in various aspects.
http://arxiv.org/abs/2407.07077v1
Compressor summary: The paper introduces Unsupervised Concept Extraction, a novel task of learning multiple concepts from a single image without human annotations, using pretrained diffusion models.
http://arxiv.org/abs/2407.07071v1
Compressor summary: The paper proposes a simple model to detect contextual hallucinations in LLMs by using the ratio of attention weights on the context versus newly generated tokens, and shows that it can reduce hallucinations in various tasks and models.
http://arxiv.org/abs/2407.07066v1
Compressor summary: The study proposes a framework that uses differential privacy and hyperdimensional computing to monitor additive manufacturing processes while protecting sensitive data and maintaining accuracy.
http://arxiv.org/abs/2407.07061v1
Compressor summary: The Internet of Agents (IoA) is a novel framework that enables effective collaboration among diverse, LLM-based agents by providing a flexible and scalable platform with instant messaging-like architecture, agent integration protocol, and dynamic mechanisms for teaming and conversation flow control.
http://arxiv.org/abs/2407.07056v1
Compressor summary: The study presents CAPformer, a method that learns to enhance low-light images while considering JPEG compression effects and using Brightness-Guided Self-Attention.
http://arxiv.org/abs/2407.07053v1
Compressor summary: The authors create a new benchmark to test LMMs on abstract image understanding, spatial relations reasoning, and visual element induction using synthetic data generated by language models.
http://arxiv.org/abs/2407.07046v1
Compressor summary: CorMulT is a two-stage semi-supervised model that learns modality correlations and uses them to improve multimodal sentiment analysis performance.
http://arxiv.org/abs/2407.07045v1
Compressor summary: The paper proposes a simple probabilistic model for learning classifiers from incomplete data in Knowledge Graphs, which can be converted into axioms and initialized with expert knowledge.
http://arxiv.org/abs/2407.07042v1
Compressor summary: ProtoSAM is a one-shot medical image segmentation framework that combines prototypical networks and SAM, achieving state-of-the-art results on several datasets.
http://arxiv.org/abs/2407.07041v1
Compressor summary: The paper shows how an expert can use a complex method to hide tampering in SAR images and fool forensic detectors.
http://arxiv.org/abs/2407.07038v1
Compressor summary: The ClimateSent-GAT Model uses Graph Attention Networks to classify disagreements in Reddit comment-reply pairs about climate change, improving on existing methods and helping communicate better.
http://arxiv.org/abs/2407.07035v1
Compressor summary: This survey reviews Vision-and-Language Navigation (VLN) methods and future opportunities using a principled framework for embodied planning and reasoning, with a focus on the role of foundation models.
http://arxiv.org/abs/2407.07030v1
Compressor summary: The text describes a pipeline for mining trajectories from sensors data and using various machine learning approaches to predict travel time on common routes in Islamabad, Pakistan.
http://arxiv.org/abs/2407.07026v1
Compressor summary: The paper proposes a CoDe network that complements image and text representations with OCR text semantics, decomposes them with projection and contrastive learning, and fuses them for multimodal sentiment analysis to address sentiments discrepancy.
http://arxiv.org/abs/2407.07024v1
Compressor summary: The paper proposes a self-training method using unlabeled YouTube videos to improve open-vocabulary temporal action localization, and introduces a new evaluation protocol.
http://arxiv.org/abs/2407.07020v1
Compressor summary: The Human-Like Trajectory Prediction model (HLTP++) improves autonomous driving by mimicking human cognitive processes and using a novel teacher-student knowledge distillation framework with a new efficient neural network, achieving better trajectory prediction than existing models in various scenarios.
http://arxiv.org/abs/2407.07018v1
Compressor summary: The paper introduces NATURAL, a family of causal effect estimators using large language models, which can efficiently estimate causal effects from unstructured text data.
http://arxiv.org/abs/2407.07011v1
Compressor summary: The paper investigates how induction heads contribute to learning and performing tasks using few examples (few-shot learning) and shows their importance for abstract pattern recognition and natural language processing tasks.
http://arxiv.org/abs/2407.07009v1
Compressor summary: The text describes a novel AI-based decision-making framework for wireless communications that uses perturbation to identify relevant inputs and improve performance and trustworthiness.
http://arxiv.org/abs/2407.07004v1
Compressor summary: The text analyzes the impact of five Brazilian legal binding precedents on the Federal Supreme Court's rulings and compares different methods of natural language processing for similar case retrieval, finding that the reasons for their ineffectiveness are varied and case-dependent.
http://arxiv.org/abs/2407.07003v1
Compressor summary: LECODU is a novel method for integrating AI and humans in classification tasks that optimizes accuracy and collaboration costs by combining learning to complement and defer strategies with estimating the optimal number of users.
http://arxiv.org/abs/2407.07000v1
Compressor summary: Metron is a new framework that evaluates large language models' performance for real-time applications by considering fluidity-index, a metric that reflects the LLM inference process impact on user experience.
http://arxiv.org/abs/2407.06991v1
Compressor summary: The paper presents a new 3D instance segmentation method that corrects labelled points in processed blocks using label propagation and improves accuracy without requiring overlap between blocks.
http://arxiv.org/abs/2407.06990v1
Compressor summary: The study explores using large language models mBART and mT5 in interactive machine translation, finding that mBART performs similarly to state-of-the-art models.
http://arxiv.org/abs/2407.06985v1
Compressor summary: The paper introduces PEER, a multi-agent framework for domain-specific problem-solving that integrates question decomposition, information retrieval, summarization, and self-assistant, achieving high performance with lower cost and better data privacy than GPT-4.
http://arxiv.org/abs/2407.06984v1
Compressor summary: CODERS is a one-stage method for 3D object understanding from stereo images that improves robot manipulation by detecting objects, estimating their pose, and reconstructing them with an implicit stereo matching module and a transform-decoder architecture.
http://arxiv.org/abs/2407.06979v1
Compressor summary: This study investigates how well virtual staining models trained on imaging data from three cell types and two conditions in drug screening can generalize to other scenarios, finding that non-toxic condition training improves performance and there is variability in generalization across cell types.
http://arxiv.org/abs/2407.06964v1
Compressor summary: The paper proposes a lightweight and efficient method to adapt pre-trained Vision Transformers for downstream tasks by using a query module and a customized classification head that avoids heavy intermediate features and memory-heavy training.
http://arxiv.org/abs/2407.06958v1
Compressor summary: The paper presents a new 3D instance segmentation method that learns coefficients and prototypes, produces overcomplete predictions, and achieves faster and more reliable performance than existing methods.
http://arxiv.org/abs/2407.06950v1
Compressor summary: The study adapts a state-of-the-art English OCR model, TrOCR, to Spanish using two methods and creates a resource-efficient pipeline for generating OCR datasets in any language.
http://arxiv.org/abs/2407.06946v1
Compressor summary: The authors propose a test to check if language models can recognize themselves using security questions and find no evidence of self-recognition in any of the examined models.
http://arxiv.org/abs/2407.06941v1
Compressor summary: Raply is a GPT-2 model that generates rap lyrics with rhymes and less offensive content by using a dataset without profanities.
http://arxiv.org/abs/2407.06938v1
Compressor summary: RodinHD is a method that creates realistic 3D avatars from portraits by addressing challenges like hairstyles, sharp details, and texture cues using novel data scheduling and cross-attention techniques.
http://arxiv.org/abs/2407.06937v1
Compressor summary: AbHuman is a large benchmark for synthesized human images with anomalies, and HumanRefiner is a plug-and-play approach to improve text-to-image generation by refining human anomalies.
http://arxiv.org/abs/2407.06930v1
Compressor summary: The text introduces a method that combines expert-driven ontology design with CRISP-DM data mining process to build and update application-specific ontologies for corrective maintenance of Cyber-Physical Systems, using an anomaly detection case study as an example.
http://arxiv.org/abs/2407.06917v1
Compressor summary: The study introduces GlobalBias, a dataset to analyze how large language models propagate harmful stereotypes across various gender-by-ethnicity groups and find that larger models have higher levels of biased outputs.
http://arxiv.org/abs/2407.06908v1
Compressor summary: The text discusses how emotions reveal our values and guide our actions, and explores how different religions are represented in LLMs, finding that some are more nuanced while others are stereotyped or stigmatized due to cultural bias and lack of NLP literature on religion.
http://arxiv.org/abs/2407.06904v1
Compressor summary: Key points: - Semantic entity recognition is a task to identify semantic types of text in documents. - Existing models focus on entity categories but ignore entity boundaries. - HGA is a novel framework that uses hypergraph attention to capture both entity boundaries and categories. - HGALayoutLM is a new model based on HGA and GraphLayoutLM that achieves state-of-the-art results on several datasets. Summary: The paper proposes HGA, a hypergraph attention framework for semantic entity recognition that improves performance by focusing on both entity boundaries and categories, and HGALayoutLM, a new model based on it that sets new state-of-the-art results on several datasets.
http://arxiv.org/abs/2407.06893v1
Compressor summary: The paper proposes a method and system to classify and score sustainable funds' prospectuses based on their language specificity and transparency, using few-shot learners and a ratio metric, to help regulators, investors, and advisors assess ESG claims.
http://arxiv.org/abs/2407.06888v1
Compressor summary: The paper presents a complete set of quadratic constraints for the repeated ReLU that bounds its performance and stability in neural networks, including a less conservative Lipschitz bound compared to the standard approach.
http://arxiv.org/abs/2407.06886v1
Compressor summary: This paper surveys recent advancements in Embodied AI, focusing on perception, interaction, embodied agents, and sim-to-real adaptation using Multi-modal Large Models (MLMs) and World Models (WMs).
http://arxiv.org/abs/2407.06871v1
Compressor summary: The paper proposes a new image-to-video adaptation method that uses object discovery and slot attention to compress videos into object-centric tokens, enabling efficient temporal reasoning for video tasks with fewer parameters and better performance.
http://arxiv.org/abs/2407.06866v1
Compressor summary: The paper investigates how user context affects GPT-3.5's refusal guardrails and finds biases based on demographics, identity, and political ideology.
http://arxiv.org/abs/2407.06863v1
Compressor summary: Key points: - The text introduces a framework to evaluate cultural competence of Text-to-Image (T2I) models using structured knowledge bases and large language models - The framework builds CUBE, a benchmark with cultural artifacts from 8 countries across 3 concepts: cuisine, landmarks, and art - The evaluation reveals significant gaps in cultural awareness of existing T2I models and provides insights into their cultural diversity Summary: The text presents a framework to assess how well Text-to-Image models represent different cultures using a new benchmark called CUBE, which covers 8 countries and 3 concepts.
http://arxiv.org/abs/2407.06861v1
Compressor summary: Key points: - The paper proposes a novel method (W2W-BEV) for cross-view geo-localization that learns a bird's eye view (BEV) representation from the ground query image - The method adaptively matches BEV features to ground windows using context-aware window matching strategy and cross-attention - The method improves the accuracy under challenging conditions of unknown orientation and limited field of view Summary: The paper introduces W2W-BEV, a new method for cross-view geo-localization that learns a BEV representation from the ground image by matching BEV features to ground windows with context-aware strategy and attention, achieving better results under difficult conditions.
http://arxiv.org/abs/2407.06852v1
Compressor summary: The text introduces a new framework, TE-SSL, that uses time-to-event and event data as supervisory signals to improve disease progression analysis using deep learning and representation learning strategies.
http://arxiv.org/abs/2407.06851v1
Compressor summary: This paper explores using sentence encoders to detect and classify unsafe prompts for Large Language Models, introducing new datasets and a metric to measure their effectiveness.
http://arxiv.org/abs/2407.06849v1
Compressor summary: The paper presents TeVAE, an automatic online anomaly detection system for complex real-world data that can minimize false positives and detect root causes.
http://arxiv.org/abs/2407.06844v1
Compressor summary: The paper introduces a new task and algorithm for calibrating confidence scores in multi-label recognition problems, addressing semantic confusion and category correlations using dynamic correlation learning and regularization.
http://arxiv.org/abs/2407.06842v1
Compressor summary: The paper presents CE3D, a dialogue-based 3D scene editing approach that uses a large language model to interpret user input and autonomously invokes visual expert models, while also enabling flexible integration of existing visual models using Hash-Atlas.
http://arxiv.org/abs/2407.06841v1
Compressor summary: The paper introduces HTD-Mamba, a self-supervised method for hyperspectral target detection that uses spectrally contrastive learning and spatial-encoded spectral augmentation to address challenges caused by limited prior knowledge and spectral variations.
http://arxiv.org/abs/2407.06826v1
Compressor summary: VRDSynth is a program synthesis method that automatically extracts entity relations from multilingual visually rich documents using a domain-specific language, outperforming pre-trained models in multiple languages and reducing memory footprint.
http://arxiv.org/abs/2407.06823v1
Compressor summary: Key points: - A novel method for automatic cue point estimation in music mixing - Based on a pre-trained object detection transformer fine-tuned on a large cue point dataset - Does not require low-level musical information analysis and adheres to high-level music structure Summary: The authors present a new computer vision method for finding cue points in music, which uses a pre-trained transformer and a large annotated dataset, and works well with dance music structure.
http://arxiv.org/abs/2407.06817v1
Compressor summary: AstroSpy is a hybrid model that uses spatial and spectral information to identify real and fake astronomical images, improving authenticity detection in the field of astronomy.
http://arxiv.org/abs/2407.06814v1
Compressor summary: The note discusses the history of informal semantics for logic programming using answer set semantics, comparing two popular paradigms: Answer Set Programming and ASP-Prolog.
http://arxiv.org/abs/2407.06813v1
Compressor summary: The authors aim to create an AI agent that can excel at diplomacy by combining strategic planning, social reasoning, and self-improvement through self-play games.
http://arxiv.org/abs/2407.06797v1
Compressor summary: The Entropy Decomposed Variational Autoencoder (ED-VAE) is a new method that improves the quality of samples and latent representations by explicitly including entropy and cross-entropy components in the ELBO formulation.
http://arxiv.org/abs/2407.06796v1
Compressor summary: The paper proposes a new method to defend wireless networks against adversarial attacks in modulation classification using neural rejection, label smoothing, and noise injection.
http://arxiv.org/abs/2407.06795v1
Compressor summary: CycleSAM is a method that improves one-shot surgical scene segmentation by using trained image-mask pairs, spatial cycle-consistency constraints, and a surgical-specific ResNet50 encoder to overcome limitations of the Segment-Anything Model.
http://arxiv.org/abs/2407.06794v1
Compressor summary: ERQ is a new method that reduces quantization error in vision transformers by strategically updating weights and activations with full-precision, achieving better compression efficiency than existing methods.
http://arxiv.org/abs/2407.06782v1
Compressor summary: The paper proposes a fuzzy color model and a novel fuzzy clustering algorithm to efficiently cluster arbitrary color data with uncertainty and vagueness.
http://arxiv.org/abs/2407.06780v1
Compressor summary: The authors propose a method called Conditional Dropout and Language-driven Quality Assessment to improve dual-modal salient object detection by handling noisy inputs and missing modalities, which leads to better performance than existing models.
http://arxiv.org/abs/2407.06779v1
Compressor summary: The authors describe their system for answering biomedical questions using pre-trained LLMs, prompt engineering, and post-processing techniques, achieving competitive scores on BioASQ 2024 tasks.
http://arxiv.org/abs/2407.06774v1
Compressor summary: The text introduces a new way to measure how well fuzzy clusters are separated, using the overlap between them, and shows it works well on some examples.
http://arxiv.org/abs/2407.06771v1
Compressor summary: The paper proposes a new reservoir computing method with improved input mapping and network architectures that reduce error and uncertainty in predicting chaotic and non-chaotic time series compared to existing methods.
http://arxiv.org/abs/2407.06765v1
Compressor summary: The authors propose new generalization bounds for nonlinear networks that consider them as perturbations of linear ones and require no training data to evaluate.
http://arxiv.org/abs/2407.06762v1
Compressor summary: MToMnet is a neural network that predicts human beliefs and their changes during interactions using multiple inputs like videos, gaze, and body language.
http://arxiv.org/abs/2407.06756v1
Compressor summary: The paper examines why periodic activation functions improve sample efficiency in deep RL and finds they learn high frequency representations, but have worse generalization on noisy states and can be mitigated by weight decay regularization.
http://arxiv.org/abs/2407.06748v1
Compressor summary: The IASIS project aims to turn big biomedical data into actionable information for decision makers by integrating and analyzing data from various sources using advanced methods and generating insights for public health activities and personalized care.
http://arxiv.org/abs/2407.06740v1
Compressor summary: The paper proposes a new way to train image-based explainer for recommender systems using positive-unlabelled learning to improve explainability with user-personalized negative examples.
http://arxiv.org/abs/2407.06730v1
Compressor summary: The paper proposes a new VPR method that fuses image and text features using attention mechanisms, improving robustness against viewpoint and appearance changes.
http://arxiv.org/abs/2407.06723v1
Compressor summary: The authors propose a new annotation method for image captioning using labelled graphs to describe scenes with compositionality and hierarchical information, improving performance on downstream models.
http://arxiv.org/abs/2407.06718v1
Compressor summary: The study presents a simple architecture for secure Enterprise applications using LLMs, RAG, and MoE to filter documents and experts based on user roles and security clearance levels.
http://arxiv.org/abs/2407.06714v1
Compressor summary: The paper proposes FAUG, a feature augmentation attack that improves adversarial transferability by injecting random noise into model intermediate features without extra computation costs.
http://arxiv.org/abs/2407.06712v1
Compressor summary: The paper introduces a geometric approach to analyze MDP algorithms and shows how to split them into classes with similar dynamics, enabling the creation of new optimal policy-finding methods.
http://arxiv.org/abs/2407.06709v1
Compressor summary: The paper introduces Top-K Pairwise Ranking (TKPR), a new measure for multi-label ranking tasks, and develops an empirical surrogate risk minimization framework with theoretical guarantees.
http://arxiv.org/abs/2407.06704v1
Compressor summary: The paper proposes a method to enhance self-supervised learning of visual representations by incorporating actions performed on objects, leading to better recognition of object categories.
http://arxiv.org/abs/2407.06699v1
Compressor summary: The paper presents CovEReD, a method to generate counterfactual data for document-level relation extraction models, which helps evaluate and reduce factual biases in these models.
http://arxiv.org/abs/2407.06698v1
Compressor summary: PSPU improves over PU learning by using pseudo-supervision from confident samples and a consistency loss to reduce overfitting and perform better on various datasets.
http://arxiv.org/abs/2407.06697v1
Compressor summary: The paper proposes certified continual learning, an approach to preserve the verified correctness of neural networks when they are re-trained over time for different tasks.
http://arxiv.org/abs/2407.06690v1
Compressor summary: The paper proposes a new hierarchical reinforcement learning method for LMDPs that learns low- and high-level tasks simultaneously using state space partitions, improving average-reward performance significantly.
http://arxiv.org/abs/2407.06682v1
Compressor summary: The study presents a new predictive model using Transformer and feature embedding to improve fault detection and virtual metrology in manufacturing processes with limited sensor data.
http://arxiv.org/abs/2407.06677v1
Compressor summary: The authors propose a new Transformer architecture called mixture-of-modules (MoM) that breaks the depth-ordered convention by dynamically selecting modules to compute tokens, achieving better performance and reduced redundancy in parameterization.
http://arxiv.org/abs/2407.06676v1
Compressor summary: The paper analyzes how exponential weights algorithm with constant learning rates behaves in repeated games and shows convergence properties to certain Nash equilibria.
http://arxiv.org/abs/2407.06673v1
Compressor summary: Key points: - The paper proposes a hybrid network that combines convolution and transformers for image classification - The hybrid network consists of a convolution branch and a multi-level feature cross-attention module - The cross-attention module processes different levels of features from the convolution branch and exchanges knowledge through attention - The paper introduces novel representation fusion techniques to fuse the local and global responses - The proposed model achieves state-of-the-art performance on image classification tasks with limited or large data Summary: The paper presents CTRL-F, a hybrid network that integrates convolution and transformers for image classification. It uses multi-level feature cross-attention to learn from different feature levels and novel representation fusion techniques to combine local and global responses. CTRL-F outperforms existing models on various image classification datasets.
http://arxiv.org/abs/2407.06660v1
Compressor summary: The text describes an intervention program that helps educators learn about AI and how to integrate it into their teaching practices in creative ways, considering ethical and pedagogical aspects.
http://arxiv.org/abs/2407.06658v1
Compressor summary: TriQXNet is a novel hybrid classical-quantum neural network that predicts the disturbance storm-time index, helping to mitigate the impacts of geomagnetic storms on infrastructure.
http://arxiv.org/abs/2407.06655v1
Compressor summary: Generative AI (genAI) affects teachers' agency in education, but hybrid intelligence combining human and artificial intelligence could enhance learning design and teacher influence.
http://arxiv.org/abs/2407.06654v1
Compressor summary: The proposed soft deduplication method reduces the sampling weight of duplicated data in large language models' pre-training datasets, improving training efficiency and downstream accuracy while preserving dataset integrity.
http://arxiv.org/abs/2407.06653v1
Compressor summary: The paper proposes a novel framework called MAR-rPPG that improves facial video-based remote photoplethysmography (rPPG) measurement by addressing ROI localization and motion artifacts issues with masked attention regularization and an enhanced EREA network.
http://arxiv.org/abs/2407.06650v1
Compressor summary: The authors propose an evaluation metric for simultaneous interpretation and machine translation that focuses on maintaining word order synchronization between languages, using rank correlation coefficients and cross-lingual pre-trained models.
http://arxiv.org/abs/2407.06646v1
Compressor summary: The paper proposes two variants of LISTA, A-DLISTA and VLISTA, to solve compressed sensing problems with varying sensing matrices by jointly learning sparse representations and reconstructions while accounting for uncertainty in the dictionaries.
http://arxiv.org/abs/2407.06645v1
Compressor summary: The paper proposes a data selection method for large language models based on an "entropy law" that connects model performance to data compression ratio and first-epoch training loss, which helps improve model learning efficiency and diversity.
http://arxiv.org/abs/2407.06642v1
Compressor summary: The paper proposes a reinforcement learning framework for personalized text-to-image generation that preserves visual details and structure, outperforming existing methods.
http://arxiv.org/abs/2407.06635v1
Compressor summary: The text presents a new method that combines generative models and synthetic anomalies to improve unsupervised anomaly detection in medical images, such as brain MRI.
http://arxiv.org/abs/2407.06628v1
Compressor summary: The paper presents a new method for action recognition that combines body-worn IMUs with egocentric videos, using self-supervised pretraining and graph-based modeling to achieve state-of-the-art performance and robustness.
http://arxiv.org/abs/2407.06622v1
Compressor summary: The text proposes an approach to explain time-stamped observations using simple events called surprises, which represent changes in fluents, and discusses how to minimize them.
http://arxiv.org/abs/2407.06617v1
Compressor summary: The paper proposes Mobius, a parallel training paradigm for text-to-video generation that saves memory and time compared to traditional 3D-Unet.
http://arxiv.org/abs/2407.06613v1
Compressor summary: The paper proposes Sparse-DeRF, a method to construct deblurred neural radiance fields from limited blurry images using regularization techniques that improve the quality of the results.
http://arxiv.org/abs/2407.06611v1
Compressor summary: CEIA is a framework that learns to align event and image data using contrastive learning to overcome the lack of paired event-text data for open-world event-based understanding, achieving versatility and performance in various multi-modal applications.
http://arxiv.org/abs/2407.06606v1
Compressor summary: The paper proposes a novel audio-visual framework that uses Branchformer architecture to design parameter-efficient systems for speech recognition in noisy environments.
http://arxiv.org/abs/2407.06600v1
Compressor summary: The text describes a method to improve concept bottleneck models by integrating clinical knowledge, making them more aligned with human decision-making and better at classifying medical images in different settings.
http://arxiv.org/abs/2407.06597v1
Compressor summary: The paper introduces Ranked Video Moment Retrieval (RVMR), a task that requires finding and ranking video moments from queries in natural language, and presents the TVR-Ranking dataset with relevance annotations for evaluating RVMR models.
http://arxiv.org/abs/2407.06585v1
Compressor summary: D-MASTER is a transformer-based framework that adapts to different domains for breast cancer detection from mammograms by masking and reconstructing multi-scale features, improving sensitivity and reducing false positives.
http://arxiv.org/abs/2407.06581v1
Compressor summary: VLMs struggle with simple visual tasks that humans find easy, indicating a lack of fine detail perception or complete blindness in their vision.
http://arxiv.org/abs/2407.06579v1
Compressor summary: The paper introduces NoisyAG-News, a benchmark dataset for text classification with instance-dependent noise patterns, and shows that pre-trained language models struggle to handle such real-world noise.
http://arxiv.org/abs/2407.06576v1
Compressor summary: Anthology is a method that conditions large language models to adopt virtual personas based on life narratives, improving the representation of diverse human traits in behavioral studies.
http://arxiv.org/abs/2407.06570v1
Compressor summary: AGAN is a new attack method that exposes vulnerabilities in perceptual encryption techniques, breaking image privacy protection.
http://arxiv.org/abs/2407.06567v1
Compressor summary: FinCon is a large language model-based framework for enhanced financial decision-making with conceptual verbal reinforcement and a risk-control component.
http://arxiv.org/abs/2407.06566v1
Compressor summary: ETSEF is a novel framework that combines transfer, self-supervised, and ensemble learning with data enhancement techniques to improve automatic medical diagnostics using limited data samples.
http://arxiv.org/abs/2407.06564v1
Compressor summary: The text discusses the benefits and challenges of combining natural language processing, large language models, and knowledge graphs for enhancing artificial intelligence applications.
http://arxiv.org/abs/2407.06551v1
Compressor summary: The text describes a study that identifies six types of biases in evaluating generated responses using large language models, proposes a collection of test cases for each bias, and introduces methods to improve the robustness of these models.
http://arxiv.org/abs/2407.06547v1
Compressor summary: The Featural InfoWaveGAN model can learn Assamese vowel harmony from raw speech data, capturing its complexities and showing feature learning.
http://arxiv.org/abs/2407.06546v1
Compressor summary: The paper proposes a method to debug and understand the factors influencing end-to-end autonomous driving models, making them more transparent and trustworthy.
http://arxiv.org/abs/2407.06544v1
Compressor summary: The paper introduces cross-attention pooling (CAP), a novel approach for multiple-instance verification that uses two new attention functions to better distinguish between similar instances in a target bag, outperforming existing methods.
http://arxiv.org/abs/2407.06543v1
Compressor summary: Key points: - The paper proposes an unsupervised GAN-based method to detect concept drifts and identify their history - The method reduces the time and data needed to retrain the model for recurring drifts - The method outperforms state-of-the-art models and is applied to a real-world astrophysics use case Summary: The paper presents an unsupervised GAN method that detects concept drifts, tracks their history, and improves the model's performance for recurring drifts in less time and data, demonstrating its effectiveness on an astrophysics problem.
http://arxiv.org/abs/2407.06542v1
Compressor summary: The text discusses a three-stage training pipeline to improve language models' alignment, instruction-following, and conversational abilities by using various techniques and surpassing official instruct models.
http://arxiv.org/abs/2407.06540v1
Compressor summary: GvSeg is a versatile framework for various video segmentation tasks that considers the diversity of targets and adapts to task-specific requirements, outperforming existing methods.
http://arxiv.org/abs/2407.06538v1
Compressor summary: This paper proposes a framework that combines a multilingual encoder-based seq2seq model with knowledge distillation to improve translation for low-resource Indic languages not supported by mBART-50.
http://arxiv.org/abs/2407.06537v1
Compressor summary: The paper proposes an efficient and accurate conversation model for multi-session dialog systems that uses memory management techniques to improve response generation performance and resource utilization.
http://arxiv.org/abs/2407.06533v1
Compressor summary: The paper proposes LETS-C, a lightweight and accurate time series classifier that uses language embeddings and a simple CNN+MLP head instead of fine-tuning large language models.
http://arxiv.org/abs/2407.06531v1
Compressor summary: DecoMotion is a new test-time optimization method that decomposes video content into static scenes and dynamic objects, improving robustness and appearance in motion estimation.
http://arxiv.org/abs/2407.06529v1
Compressor summary: The GNN-CL model combines graph neural networks, convolutional neural networks and long short-term memory to improve financial fraud detection accuracy by analyzing complex transaction patterns and using intelligent purification mechanisms and reinforcement learning strategies.
http://arxiv.org/abs/2407.06518v1
Compressor summary: The paper proposes a method that combines Graph Neural Networks with Deep Reinforcement Learning to efficiently allocate resources for Vehicle-to-Vehicle and Vehicle-to-Infrastructure communication in Internet of Vehicles technology.
http://arxiv.org/abs/2407.06516v1
Compressor summary: VQA-Diff is a novel framework that uses real-world knowledge from large language models and image prior knowledge from diffusion models to generate photorealistic 3D vehicle assets for autonomous driving, achieving robust zero-shot prediction and appearance control.
http://arxiv.org/abs/2407.06513v1
Compressor summary: Key points: - Computer vision tasks are vital for aerospace missions - Traditional methods are not robust enough for harsh space conditions - Deep learning-based perception technologies outperform traditional methods and offer great potential - The survey explores techniques, datasets, and strategies for DL-based aerospace perception - The challenges and future directions of this field are discussed Summary: The text surveys deep learning-based computer vision techniques for aerospace missions, which overcome the limitations of traditional methods and offer great potential, but also face challenges and need further research.
http://arxiv.org/abs/2407.06512v1
Compressor summary: The paper introduces LuSNAR, a multi-task, multi-scene, and multi-label lunar dataset for evaluating autonomous perception and navigation systems on the moon.
http://arxiv.org/abs/2407.06507v1
Compressor summary: The text describes how a deep Q-network algorithm can be used to optimize the economic span of a bridge, reducing its construction cost.
http://arxiv.org/abs/2407.06504v1
Compressor summary: Reprogramming Distillation is a novel framework that reprograms the foundation model's feature space for downstream tasks and establishes connections between the reprogrammed knowledge and student models for personalized lightweight deployment.
http://arxiv.org/abs/2407.06503v1
Compressor summary: LOPE is a preference-guided RL framework that improves exploration efficiency in hard-exploration tasks by using human feedback as guidance, avoiding learning a separate reward model.
http://arxiv.org/abs/2407.06501v1
Compressor summary: The paper introduces STORYSUMM, a new dataset for evaluating faithfulness in summarization methods, and shows that current automatic metrics are not accurate enough for this task.
http://arxiv.org/abs/2407.06496v1
Compressor summary: The paper shows that for some loss functions, the final iterate of DP-SGD leaks as much information as all intermediate iterates combined, and thus privacy amplification is not possible for these cases.
http://arxiv.org/abs/2407.06494v1
Compressor summary: DiffPhyCon is a novel method for controlling complex physical systems that minimizes energy and control objectives, explores globally, and can discover near-optimal control sequences, outperforming classical and deep learning approaches.
http://arxiv.org/abs/2407.06491v1
Compressor summary: VideoEval is a new benchmark suite that evaluates video foundation models on task adaptability and representation power, revealing their weaknesses and potential improvements.
http://arxiv.org/abs/2407.06488v1
Compressor summary: The paper investigates how large language models learn multiple tasks by identifying and analyzing task-sensitive neurons, and proposes a continuous fine-tuning method based on these findings.
http://arxiv.org/abs/2407.06486v1
Compressor summary: The paper proposes a dynamic framework that integrates an optimization function within LLMs' decision-making process, allowing them to offer tailored, optimal solutions to complex problems.
http://arxiv.org/abs/2407.06485v1
Compressor summary: The text introduces Crowd Knowledge Transfer (CrowdTransfer), a new approach to improve Artificial Intelligence of Things (AIoT) performance by sharing prior knowledge from multiple agents, and discusses its applications and challenges.
http://arxiv.org/abs/2407.06483v1
Compressor summary: The paper introduces a framework to study and compare different test-time interventions applied sequentially to language models, revealing their interactions and limitations.
http://arxiv.org/abs/2407.06479v1
Compressor summary: The text introduces an evaluation framework that measures interactivity in English as a Second Language (ESL) speakers' dialogues using micro-level features and machine learning models.
http://arxiv.org/abs/2407.06469v1
Compressor summary: Key points: - The study proposes a new method for generating scene images from sketch inputs using diffusion models and identity embeddings. - The method decomposes the task into object-level generation and scene-level construction. - The method preserves the details of the foreground objects while blending them with the background. Summary: The study presents a novel method that uses diffusion models and identity embeddings to generate high-quality scene images from sketch inputs by decomposing the task and preserving object details.
http://arxiv.org/abs/2407.06468v1
Compressor summary: AnatoMask is a novel self-supervised learning method for 3D medical image segmentation that dynamically masks and reconstructs anatomically significant regions to improve pretraining efficiency.
http://arxiv.org/abs/2407.06464v1
Compressor summary: SideSeeing is an initiative that provides tools and datasets for assessing the built environment, using synchronized video and sensor data from chest-mounted mobile devices to evaluate sidewalk accessibility near hospitals in Brazil and the USA.