arxiv compressed, 2024-07-10

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-10 generated by the compressor, my personal LLM-based project.

AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning

http://arxiv.org/abs/2407.07094v1

Compressor summary: AnyTaskTune is a fine-tuning method for large language models that improves performance on diverse domain-specific tasks by defining and enhancing targeted sub-tasks with specialized datasets.

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

http://arxiv.org/abs/2407.07093v1

Compressor summary: The paper introduces FBI-LLM, a large-scale binary language model that matches the performance of full-precision models and could enable specialized hardware for 1-bit LLMs.

V-VIPE: Variational View Invariant Pose Embedding

http://arxiv.org/abs/2407.07092v1

Compressor summary: The paper proposes a method (V-VIPE) to represent 3D human pose in canonical coordinate space using a variational autoencoder, enabling various downstream tasks like retrieval and classification.

Fine-Tuning Linear Layers Only Is a Simple yet Effective Way for Task Arithmetic

http://arxiv.org/abs/2407.07089v1

Compressor summary: Task arithmetic improves model efficiency by fine-tuning only linear layers, enhancing weight disentanglement and understanding the roles of representation and task-specific models.

Safe and Reliable Training of Learning-Based Aerospace Controllers

http://arxiv.org/abs/2407.07088v1

Compressor summary: The paper presents new methods to train and verify deep reinforcement learning controllers for safety-critical domains, using k-induction, neural Lyapunov Barrier certificates, and reachability-based approaches.

CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

http://arxiv.org/abs/2407.07087v1

Compressor summary: CopyBench is a benchmark to measure literal and non-literal copying in language models using copyrighted fiction books, showing that larger models have more copying issues.

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

http://arxiv.org/abs/2407.07086v1

Compressor summary: The Hypothetical Minds agent uses a cognitively-inspired architecture with a Theory of Mind module to generate and refine hypotheses about other agents' strategies, improving performance in multi-agent reinforcement learning tasks.

Can Learned Optimization Make Reinforcement Learning Less Difficult?

http://arxiv.org/abs/2407.07082v1

Compressor summary: The paper proposes OPEN, a method that meta-learns an update rule to improve reinforcement learning by addressing its non-stationarity, plasticity loss, and exploration needs.

Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities

http://arxiv.org/abs/2407.07080v1

Compressor summary: The paper introduces two Hebrew language models, DictaLM2.0 and DictaLM2.0-Instruct, trained on a large corpus of Hebrew and English data, and presents a new benchmark suite to evaluate them on various tasks.

MoSt-DSA: Modeling Motion and Structural Interactions for Direct Multi-Frame Interpolation in DSA Images

http://arxiv.org/abs/2407.07078v1

Compressor summary: The text introduces MoSt-DSA, a deep learning method for interpolating DSA images that reduces radiation dose by using AI instead of image count and achieves state-of-the-art performance in various aspects.

ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction

http://arxiv.org/abs/2407.07077v1

Compressor summary: The paper introduces Unsupervised Concept Extraction, a novel task of learning multiple concepts from a single image without human annotations, using pretrained diffusion models.

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

http://arxiv.org/abs/2407.07071v1

Compressor summary: The paper proposes a simple model to detect contextual hallucinations in LLMs by using the ratio of attention weights on the context versus newly generated tokens, and shows that it can reduce hallucinations in various tasks and models.

Explainable Hyperdimensional Computing for Balancing Privacy and Transparency in Additive Manufacturing Monitoring

http://arxiv.org/abs/2407.07066v1

Compressor summary: The study proposes a framework that uses differential privacy and hyperdimensional computing to monitor additive manufacturing processes while protecting sensitive data and maintaining accuracy.

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

http://arxiv.org/abs/2407.07061v1

Compressor summary: The Internet of Agents (IoA) is a novel framework that enables effective collaboration among diverse, LLM-based agents by providing a flexible and scalable platform with instant messaging-like architecture, agent integration protocol, and dynamic mechanisms for teaming and conversation flow control.

CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement

http://arxiv.org/abs/2407.07056v1

Compressor summary: The study presents CAPformer, a method that learns to enhance low-light images while considering JPEG compression effects and using Brightness-Guided Self-Attention.

Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

http://arxiv.org/abs/2407.07053v1

Compressor summary: The authors create a new benchmark to test LMMs on abstract image understanding, spatial relations reasoning, and visual element induction using synthetic data generated by language models.

CorMulT: A Semi-supervised Modality Correlation-aware Multimodal Transformer for Sentiment Analysis

http://arxiv.org/abs/2407.07046v1

Compressor summary: CorMulT is a two-stage semi-supervised model that learns modality correlations and uses them to improve multimodal sentiment analysis performance.

Simple and Interpretable Probabilistic Classifiers for Knowledge Graphs

http://arxiv.org/abs/2407.07045v1

Compressor summary: The paper proposes a simple probabilistic model for learning classifiers from incomplete data in Knowledge Graphs, which can be converted into axioms and initialized with expert knowledge.

ProtoSAM - One Shot Medical Image Segmentation With Foundational Models

http://arxiv.org/abs/2407.07042v1

Compressor summary: ProtoSAM is a one-shot medical image segmentation framework that combines prototypical networks and SAM, achieving state-of-the-art results on several datasets.

Hiding Local Manipulations on SAR Images: a Counter-Forensic Attack

http://arxiv.org/abs/2407.07041v1

Compressor summary: The paper shows how an expert can use a complex method to hide tampering in SAR images and fool forensic detectors.

Decoding Climate Disagreement: A Graph Neural Network-Based Approach to Understanding Social Media Dynamics

http://arxiv.org/abs/2407.07038v1

Compressor summary: The ClimateSent-GAT Model uses Graph Attention Networks to classify disagreements in Reddit comment-reply pairs about climate change, improving on existing methods and helping communicate better.

Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models

http://arxiv.org/abs/2407.07035v1

Compressor summary: This survey reviews Vision-and-Language Navigation (VLN) methods and future opportunities using a principled framework for embodied planning and reasoning, with a focus on the role of foundation models.

Trajectory Data Mining and Trip Travel Time Prediction on Specific Roads

http://arxiv.org/abs/2407.07030v1

Compressor summary: The text describes a pipeline for mining trajectories from sensors data and using various machine learning approaches to predict travel time on common routes in Islamabad, Pakistan.

Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition

http://arxiv.org/abs/2407.07026v1

Compressor summary: The paper proposes a CoDe network that complements image and text representations with OCR text semantics, decomposes them with projection and contrastive learning, and fuses them for multimodal sentiment analysis to address sentiments discrepancy.

Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

http://arxiv.org/abs/2407.07024v1

Compressor summary: The paper proposes a self-training method using unlabeled YouTube videos to improve open-vocabulary temporal action localization, and introduces a new evaluation protocol.

Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

http://arxiv.org/abs/2407.07020v1

Compressor summary: The Human-Like Trajectory Prediction model (HLTP++) improves autonomous driving by mimicking human cognitive processes and using a novel teacher-student knowledge distillation framework with a new efficient neural network, achieving better trajectory prediction than existing models in various scenarios.

End-To-End Causal Effect Estimation from Unstructured Natural Language Data

http://arxiv.org/abs/2407.07018v1

Compressor summary: The paper introduces NATURAL, a family of causal effect estimators using large language models, which can efficiently estimate causal effects from unstructured text data.

Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning

http://arxiv.org/abs/2407.07011v1

Compressor summary: The paper investigates how induction heads contribute to learning and performing tasks using few examples (few-shot learning) and shows their importance for abstract pattern recognition and natural language processing tasks.

Explainable AI for Enhancing Efficiency of DL-based Channel Estimation

http://arxiv.org/abs/2407.07009v1

Compressor summary: The text describes a novel AI-based decision-making framework for wireless communications that uses perturbation to identify relevant inputs and improve performance and trustworthiness.

Empirical analysis of Biding Precedent efficiency in the Brazilian Supreme Court via Similar Case Retrieval

http://arxiv.org/abs/2407.07004v1

Compressor summary: The text analyzes the impact of five Brazilian legal binding precedents on the Federal Supreme Court's rulings and compares different methods of natural language processing for similar case retrieval, finding that the reasons for their ineffectiveness are varied and case-dependent.

Learning to Complement and to Defer to Multiple Users

http://arxiv.org/abs/2407.07003v1

Compressor summary: LECODU is a novel method for integrating AI and humans in classification tasks that optimizes accuracy and collaboration costs by combining learning to complement and defer strategies with estimating the optimal number of users.

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

http://arxiv.org/abs/2407.07000v1

Compressor summary: Metron is a new framework that evaluates large language models' performance for real-time applications by considering fluidity-index, a metric that reflects the LLM inference process impact on user experience.

Improved Block Merging for 3D Point Cloud Instance Segmentation

http://arxiv.org/abs/2407.06991v1

Compressor summary: The paper presents a new 3D instance segmentation method that corrects labelled points in processed blocks using label propagation and improves accuracy without requiring overlap between blocks.

Segment-Based Interactive Machine Translation for Pre-trained Models

http://arxiv.org/abs/2407.06990v1

Compressor summary: The study explores using large language models mBART and mT5 in interactive machine translation, finding that mBART performs similarly to state-of-the-art models.

PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

http://arxiv.org/abs/2407.06985v1

Compressor summary: The paper introduces PEER, a multi-agent framework for domain-specific problem-solving that integrates question decomposition, information retrieval, summarization, and self-assistant, achieving high performance with lower cost and better data privacy than GPT-4.

Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images

http://arxiv.org/abs/2407.06984v1

Compressor summary: CODERS is a one-stage method for 3D object understanding from stereo images that improves robot manipulation by detecting objects, estimating their pose, and reconstructing them with an implicit stereo matching module and a transform-decoder architecture.

Can virtual staining for high-throughput screening generalize?

http://arxiv.org/abs/2407.06979v1

Compressor summary: This study investigates how well virtual staining models trained on imaging data from three cell types and two conditions in drug screening can generalize to other scenarios, finding that non-toxic condition training improves performance and there is variability in generalization across cell types.

Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach

http://arxiv.org/abs/2407.06964v1

Compressor summary: The paper proposes a lightweight and efficient method to adapt pre-trained Vision Transformers for downstream tasks by using a query module and a customized classification head that avoids heavy intermediate features and memory-heavy training.

Joint prototype and coefficient prediction for 3D instance segmentation

http://arxiv.org/abs/2407.06958v1

Compressor summary: The paper presents a new 3D instance segmentation method that learns coefficients and prototypes, produces overcomplete predictions, and achieves faster and more reliable performance than existing methods.

Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation

http://arxiv.org/abs/2407.06950v1

Compressor summary: The study adapts a state-of-the-art English OCR model, TrOCR, to Spanish using two methods and creates a resource-efficient pipeline for generating OCR datasets in any language.

Self-Recognition in Language Models

http://arxiv.org/abs/2407.06946v1

Compressor summary: The authors propose a test to check if language models can recognize themselves using security questions and find no evidence of self-recognition in any of the examined models.

Raply: A profanity-mitigated rap generator

http://arxiv.org/abs/2407.06941v1

Compressor summary: Raply is a GPT-2 model that generates rap lyrics with rhymes and less offensive content by using a dataset without profanities.

RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models

http://arxiv.org/abs/2407.06938v1

Compressor summary: RodinHD is a method that creates realistic 3D avatars from portraits by addressing challenges like hairstyles, sharp details, and texture cues using novel data scheduling and cross-attention techniques.

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

http://arxiv.org/abs/2407.06937v1

Compressor summary: AbHuman is a large benchmark for synthesized human images with anomalies, and HumanRefiner is a plug-and-play approach to improve text-to-image generation by refining human anomalies.

Integrating Ontology Design with the CRISP-DM in the context of Cyber-Physical Systems Maintenance

http://arxiv.org/abs/2407.06930v1

Compressor summary: The text introduces a method that combines expert-driven ontology design with CRISP-DM data mining process to build and update application-specific ontologies for corrective maintenance of Cyber-Physical Systems, using an anomaly detection case study as an example.

Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models

http://arxiv.org/abs/2407.06917v1

Compressor summary: The study introduces GlobalBias, a dataset to analyze how large language models propagate harmful stereotypes across various gender-by-ethnicity groups and find that larger models have higher levels of biased outputs.

Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models

http://arxiv.org/abs/2407.06908v1

Compressor summary: The text discusses how emotions reveal our values and guide our actions, and explores how different religions are represented in LLMs, finding that some are more nuanced while others are stereotyped or stigmatized due to cultural bias and lack of NLP literature on religion.

Hypergraph based Understanding for Document Semantic Entity Recognition

http://arxiv.org/abs/2407.06904v1

Compressor summary: Key points: - Semantic entity recognition is a task to identify semantic types of text in documents. - Existing models focus on entity categories but ignore entity boundaries. - HGA is a novel framework that uses hypergraph attention to capture both entity boundaries and categories. - HGALayoutLM is a new model based on HGA and GraphLayoutLM that achieves state-of-the-art results on several datasets. Summary: The paper proposes HGA, a hypergraph attention framework for semantic entity recognition that improves performance by focusing on both entity boundaries and categories, and HGALayoutLM, a new model based on it that sets new state-of-the-art results on several datasets.

Measuring Sustainability Intention of ESG Fund Disclosure using Few-Shot Learning

http://arxiv.org/abs/2407.06893v1

Compressor summary: The paper proposes a method and system to classify and score sustainable funds' prospectuses based on their language specificity and transparency, using few-shot learners and a ratio metric, to help regulators, investors, and advisors assess ESG claims.

A Complete Set of Quadratic Constraints For Repeated ReLU

http://arxiv.org/abs/2407.06888v1

Compressor summary: The paper presents a complete set of quadratic constraints for the repeated ReLU that bounds its performance and stability in neural networks, including a less conservative Lipschitz bound compared to the standard approach.

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

http://arxiv.org/abs/2407.06886v1

Compressor summary: This paper surveys recent advancements in Embodied AI, focusing on perception, interaction, embodied agents, and sim-to-real adaptation using Multi-modal Large Models (MLMs) and World Models (WMs).

Rethinking Image-to-Video Adaptation: An Object-centric Perspective

http://arxiv.org/abs/2407.06871v1

Compressor summary: The paper proposes a new image-to-video adaptation method that uses object discovery and slot attention to compress videos into object-centric tokens, enabling efficient temporal reasoning for video tasks with fewer parameters and better performance.

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context

http://arxiv.org/abs/2407.06866v1

Compressor summary: The paper investigates how user context affects GPT-3.5's refusal guardrails and finds biases based on demographics, identity, and political ideology.

Beyond Aesthetics: Cultural Competence in Text-to-Image Models

http://arxiv.org/abs/2407.06863v1

Compressor summary: Key points: - The text introduces a framework to evaluate cultural competence of Text-to-Image (T2I) models using structured knowledge bases and large language models - The framework builds CUBE, a benchmark with cultural artifacts from 8 countries across 3 concepts: cuisine, landmarks, and art - The evaluation reveals significant gaps in cultural awareness of existing T2I models and provides insights into their cultural diversity Summary: The text presents a framework to assess how well Text-to-Image models represent different cultures using a new benchmark called CUBE, which covers 8 countries and 3 concepts.

Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization

http://arxiv.org/abs/2407.06861v1

Compressor summary: Key points: - The paper proposes a novel method (W2W-BEV) for cross-view geo-localization that learns a bird's eye view (BEV) representation from the ground query image - The method adaptively matches BEV features to ground windows using context-aware window matching strategy and cross-attention - The method improves the accuracy under challenging conditions of unknown orientation and limited field of view Summary: The paper introduces W2W-BEV, a new method for cross-view geo-localization that learns a BEV representation from the ground image by matching BEV features to ground windows with context-aware strategy and attention, achieving better results under difficult conditions.

TE-SSL: Time and Event-aware Self Supervised Learning for Alzheimer's Disease Progression Analysis

http://arxiv.org/abs/2407.06852v1

Compressor summary: The text introduces a new framework, TE-SSL, that uses time-to-event and event data as supervisory signals to improve disease progression analysis using deep learning and representation learning strategies.

Safe-Embed: Unveiling the Safety-Critical Knowledge of Sentence Encoders

http://arxiv.org/abs/2407.06851v1

Compressor summary: This paper explores using sentence encoders to detect and classify unsafe prompts for Large Language Models, introducing new datasets and a metric to measure their effectiveness.

TeVAE: A Variational Autoencoder Approach for Discrete Online Anomaly Detection in Variable-state Multivariate Time-series Data

http://arxiv.org/abs/2407.06849v1

Compressor summary: The paper presents TeVAE, an automatic online anomaly detection system for complex real-world data that can minimize false positives and detect root causes.

Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

http://arxiv.org/abs/2407.06844v1

Compressor summary: The paper introduces a new task and algorithm for calibrating confidence scores in multi-label recognition problems, addressing semantic confusion and category correlations using dynamic correlation learning and regularization.

Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts

http://arxiv.org/abs/2407.06842v1

Compressor summary: The paper presents CE3D, a dialogue-based 3D scene editing approach that uses a large language model to interpret user input and autonomously invokes visual expert models, while also enabling flexible integration of existing visual models using Hash-Atlas.

HTD-Mamba: Efficient Hyperspectral Target Detection with Pyramid State Space Model

http://arxiv.org/abs/2407.06841v1

Compressor summary: The paper introduces HTD-Mamba, a self-supervised method for hyperspectral target detection that uses spectrally contrastive learning and spatial-encoded spectral augmentation to address challenges caused by limited prior knowledge and spectral variations.

VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction

http://arxiv.org/abs/2407.06826v1

Compressor summary: VRDSynth is a program synthesis method that automatically extracts entity relations from multilingual visually rich documents using a domain-specific language, outperforming pre-trained models in multiple languages and reducing memory footprint.

Cue Point Estimation using Object Detection

http://arxiv.org/abs/2407.06823v1

Compressor summary: Key points: - A novel method for automatic cue point estimation in music mixing - Based on a pre-trained object detection transformer fine-tuned on a large cue point dataset - Does not require low-level musical information analysis and adheres to high-level music structure Summary: The authors present a new computer vision method for finding cue points in music, which uses a pre-trained transformer and a large annotated dataset, and works well with dance music structure.

AstroSpy: On detecting Fake Images in Astronomy via Joint Image-Spectral Representations

http://arxiv.org/abs/2407.06817v1

Compressor summary: AstroSpy is a hybrid model that uses spatial and spectral information to identify real and fake astronomical images, improving authenticity detection in the field of astronomy.

Historical Review of Variants of Informal Semantics for Logic Programs under Answer Set Semantics: GL'88, GL'91, GK'14, D-V'12

http://arxiv.org/abs/2407.06814v1

Compressor summary: The note discusses the history of informal semantics for logic programming using answer set semantics, comparing two popular paradigms: Answer Set Programming and ASP-Prolog.

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

http://arxiv.org/abs/2407.06813v1

Compressor summary: The authors aim to create an AI agent that can excel at diplomacy by combining strategic planning, social reasoning, and self-improvement through self-play games.

ED-VAE: Entropy Decomposition of ELBO in Variational Autoencoders

http://arxiv.org/abs/2407.06797v1

Compressor summary: The Entropy Decomposed Variational Autoencoder (ED-VAE) is a new method that improves the quality of samples and latent representations by explicitly including entropy and cross-entropy components in the ELBO formulation.

Countermeasures Against Adversarial Examples in Radio Signal Classification

http://arxiv.org/abs/2407.06796v1

Compressor summary: The paper proposes a new method to defend wireless networks against adversarial attacks in modulation classification using neural rejection, label smoothing, and noise injection.

CycleSAM: One-Shot Surgical Scene Segmentation using Cycle-Consistent Feature Matching to Prompt SAM

http://arxiv.org/abs/2407.06795v1

Compressor summary: CycleSAM is a method that improves one-shot surgical scene segmentation by using trained image-mask pairs, spatial cycle-consistency constraints, and a surgical-specific ResNet50 encoder to overcome limitations of the Segment-Anything Model.

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

http://arxiv.org/abs/2407.06794v1

Compressor summary: ERQ is a new method that reduces quantization error in vision transformers by strategically updating weights and activations with full-precision, achieving better compression efficiency than existing methods.

Fuzzy color model and clustering algorithm for color clustering problem

http://arxiv.org/abs/2407.06782v1

Compressor summary: The paper proposes a fuzzy color model and a novel fuzzy clustering algorithm to efficiently cluster arbitrary color data with uncertainty and vagueness.

CoLA: Conditional Dropout and Language-driven Robust Dual-modal Salient Object Detection

http://arxiv.org/abs/2407.06780v1

Compressor summary: The authors propose a method called Conditional Dropout and Language-driven Quality Assessment to improve dual-modal salient object detection by handling noisy inputs and missing modalities, which leads to better performance than existing models.

Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions

http://arxiv.org/abs/2407.06779v1

Compressor summary: The authors describe their system for answering biomedical questions using pre-trained LLMs, prompt engineering, and post-processing techniques, achieving competitive scores on BioASQ 2024 tasks.

A new validity measure for fuzzy c-means clustering

http://arxiv.org/abs/2407.06774v1

Compressor summary: The text introduces a new way to measure how well fuzzy clusters are separated, using the overlap between them, and shows it works well on some examples.

Temporal Convolution Derived Multi-Layered Reservoir Computing

http://arxiv.org/abs/2407.06771v1

Compressor summary: The paper proposes a new reservoir computing method with improved input mapping and network architectures that reduce error and uncertainty in predicting chaotic and non-chaotic time series compared to existing methods.

A Generalization Bound for Nearly-Linear Networks

http://arxiv.org/abs/2407.06765v1

Compressor summary: The authors propose new generalization bounds for nonlinear networks that consider them as perturbations of linear ones and require no training data to evaluate.

Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions

http://arxiv.org/abs/2407.06762v1

Compressor summary: MToMnet is a neural network that predicts human beliefs and their changes during interactions using multiple inputs like videos, gaze, and body language.

Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning

http://arxiv.org/abs/2407.06756v1

Compressor summary: The paper examines why periodic activation functions improve sample efficiency in deep RL and finds they learn high frequency representations, but have worse generalization on noisy states and can be mitigated by weight decay regularization.

iASiS: Towards Heterogeneous Big Data Analysis for Personalized Medicine

http://arxiv.org/abs/2407.06748v1

Compressor summary: The IASIS project aims to turn big biomedical data into actionable information for decision makers by integrating and analyzing data from various sources using advanced methods and generating insights for public health activities and personalized care.

Positive-Unlabelled Learning for Improving Image-based Recommender System Explainability

http://arxiv.org/abs/2407.06740v1

Compressor summary: The paper proposes a new way to train image-based explainer for recommender systems using positive-unlabelled learning to improve explainability with user-personalized negative examples.

LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition

http://arxiv.org/abs/2407.06730v1

Compressor summary: The paper proposes a new VPR method that fuses image and text features using attention mechanisms, improving robustness against viewpoint and appearance changes.

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

http://arxiv.org/abs/2407.06723v1

Compressor summary: The authors propose a new annotation method for image captioning using labelled graphs to describe scenes with compositionality and hierarchical information, improving performance on downstream models.

A Simple Architecture for Enterprise Large Language Model Applications based on Role based security and Clearance Levels using Retrieval-Augmented Generation or Mixture of Experts

http://arxiv.org/abs/2407.06718v1

Compressor summary: The study presents a simple architecture for secure Enterprise applications using LLMs, RAG, and MoE to filter documents and experts based on user roles and security clearance levels.

Improving the Transferability of Adversarial Examples by Feature Augmentation

http://arxiv.org/abs/2407.06714v1

Compressor summary: The paper proposes FAUG, a feature augmentation attack that improves adversarial transferability by injecting random noise into model intermediate features without extra computation costs.

MDP Geometry, Normalization and Value Free Solvers

http://arxiv.org/abs/2407.06712v1

Compressor summary: The paper introduces a geometric approach to analyze MDP algorithms and shows how to split them into classes with similar dynamics, enabling the creation of new optimal policy-finding methods.

Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification

http://arxiv.org/abs/2407.06709v1

Compressor summary: The paper introduces Top-K Pairwise Ranking (TKPR), a new measure for multi-label ranking tasks, and develops an empirical surrogate risk minimization framework with theoretical guarantees.

Self-supervised visual learning from interactions with objects

http://arxiv.org/abs/2407.06704v1

Compressor summary: The paper proposes a method to enhance self-supervised learning of visual representations by incorporating actions performed on objects, leading to better recognition of object categories.

Consistent Document-Level Relation Extraction via Counterfactuals

http://arxiv.org/abs/2407.06699v1

Compressor summary: The paper presents CovEReD, a method to generate counterfactual data for document-level relation extraction models, which helps evaluate and reduce factual biases in these models.

PSPU: Enhanced Positive and Unlabeled Learning by Leveraging Pseudo Supervision

http://arxiv.org/abs/2407.06698v1

Compressor summary: PSPU improves over PU learning by using pseudo-supervision from confident samples and a consistency loss to reduce overfitting and perform better on various datasets.

Certified Continual Learning for Neural Network Regression

http://arxiv.org/abs/2407.06697v1

Compressor summary: The paper proposes certified continual learning, an approach to preserve the verified correctness of neural networks when they are re-trained over time for different tasks.

Hierarchical Average-Reward Linearly-solvable Markov Decision Processes

http://arxiv.org/abs/2407.06690v1

Compressor summary: The paper proposes a new hierarchical reinforcement learning method for LMDPs that learns low- and high-level tasks simultaneously using state space partitions, improving average-reward performance significantly.

A Predictive Model Based on Transformer with Statistical Feature Embedding in Manufacturing Sensor Dataset

http://arxiv.org/abs/2407.06682v1

Compressor summary: The study presents a new predictive model using Transformer and feature embedding to improve fault detection and virtual metrology in manufacturing processes with limited sensor data.

Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

http://arxiv.org/abs/2407.06677v1

Compressor summary: The authors propose a new Transformer architecture called mixture-of-modules (MoM) that breaks the depth-ordered convention by dynamically selecting modules to compute tokens, achieving better performance and reduced redundancy in parameterization.

Games played by Exponential Weights Algorithms

http://arxiv.org/abs/2407.06676v1

Compressor summary: The paper analyzes how exponential weights algorithm with constant learning rates behaves in repeated games and shows convergence properties to certain Nash equilibria.

CTRL-F: Pairing Convolution with Transformer for Image Classification via Multi-Level Feature Cross-Attention and Representation Learning Fusion

http://arxiv.org/abs/2407.06673v1

Compressor summary: Key points: - The paper proposes a hybrid network that combines convolution and transformers for image classification - The hybrid network consists of a convolution branch and a multi-level feature cross-attention module - The cross-attention module processes different levels of features from the convolution branch and exchanges knowledge through attention - The paper introduces novel representation fusion techniques to fuse the local and global responses - The proposed model achieves state-of-the-art performance on image classification tasks with limited or large data Summary: The paper presents CTRL-F, a hybrid network that integrates convolution and transformers for image classification. It uses multi-level feature cross-attention to learn from different feature levels and novel representation fusion techniques to combine local and global responses. CTRL-F outperforms existing models on various image classification datasets.

Collaborative Design of AI-Enhanced Learning Activities

http://arxiv.org/abs/2407.06660v1

Compressor summary: The text describes an intervention program that helps educators learn about AI and how to integrate it into their teaching practices in creative ways, considering ethical and pedagogical aspects.

TriQXNet: Forecasting Dst Index from Solar Wind Data Using an Interpretable Parallel Classical-Quantum Framework with Uncertainty Quantification

http://arxiv.org/abs/2407.06658v1

Compressor summary: TriQXNet is a novel hybrid classical-quantum neural network that predicts the disturbance storm-time index, helping to mitigate the impacts of geomagnetic storms on infrastructure.

Teacher agency in the age of generative AI: towards a framework of hybrid intelligence for learning design

http://arxiv.org/abs/2407.06655v1

Compressor summary: Generative AI (genAI) affects teachers' agency in education, but hybrid intelligence combining human and artificial intelligence could enhance learning design and teacher influence.

SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training

http://arxiv.org/abs/2407.06654v1

Compressor summary: The proposed soft deduplication method reduces the sampling weight of duplicated data in large language models' pre-training datasets, improving training efficiency and downstream accuracy while preserving dataset integrity.

Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography

http://arxiv.org/abs/2407.06653v1

Compressor summary: The paper proposes a novel framework called MAR-rPPG that improves facial video-based remote photoplethysmography (rPPG) measurement by addressing ROI localization and motion artifacts issues with masked attention regularization and an enhanced EREA network.

A Word Order Synchronization Metric for Evaluating Simultaneous Interpretation and Translation

http://arxiv.org/abs/2407.06650v1

Compressor summary: The authors propose an evaluation metric for simultaneous interpretation and machine translation that focuses on maintaining word order synchronization between languages, using rank correlation coefficients and cross-lingual pre-trained models.

Variational Learning ISTA

http://arxiv.org/abs/2407.06646v1

Compressor summary: The paper proposes two variants of LISTA, A-DLISTA and VLISTA, to solve compressed sensing problems with varying sensing matrices by jointly learning sparse representations and reconstructions while accounting for uncertainty in the dictionaries.

Entropy Law: The Story Behind Data Compression and LLM Performance

http://arxiv.org/abs/2407.06645v1

Compressor summary: The paper proposes a data selection method for large language models based on an "entropy law" that connects model performance to data compression ratio and first-epoch training loss, which helps improve model learning efficiency and diversity.

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

http://arxiv.org/abs/2407.06642v1

Compressor summary: The paper proposes a reinforcement learning framework for personalized text-to-image generation that preserves visual details and structure, outperforming existing methods.

Ensembled Cold-Diffusion Restorations for Unsupervised Anomaly Detection

http://arxiv.org/abs/2407.06635v1

Compressor summary: The text presents a new method that combines generative models and synthetic anomalies to improve unsupervised anomaly detection in medical images, such as brain MRI.

Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

http://arxiv.org/abs/2407.06628v1

Compressor summary: The paper presents a new method for action recognition that combines body-worn IMUs with egocentric videos, using self-supervised pretraining and graph-based modeling to achieve state-of-the-art performance and robustness.

Reasoning about unpredicted change and explicit time

http://arxiv.org/abs/2407.06622v1

Compressor summary: The text proposes an approach to explain time-stamped observations using simple events called surprises, which represent changes in fluents, and discusses how to minimize them.

Mobius: An High Efficient Spatial-Temporal Parallel Training Paradigm for Text-to-Video Generation Task

http://arxiv.org/abs/2407.06617v1

Compressor summary: The paper proposes Mobius, a parallel training paradigm for text-to-video generation that saves memory and time compared to traditional 3D-Unet.

Sparse-DeRF: Deblurred Neural Radiance Fields from Sparse View

http://arxiv.org/abs/2407.06613v1

Compressor summary: The paper proposes Sparse-DeRF, a method to construct deblurred neural radiance fields from limited blurry images using regularization techniques that improve the quality of the results.

CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding

http://arxiv.org/abs/2407.06611v1

Compressor summary: CEIA is a framework that learns to align event and image data using contrastive learning to overcome the lack of paired event-text data for open-world event-based understanding, achieving versatility and performance in various multi-modal applications.

Tailored Design of Audio-Visual Speech Recognition Models using Branchformers

http://arxiv.org/abs/2407.06606v1

Compressor summary: The paper proposes a novel audio-visual framework that uses Branchformer architecture to design parameter-efficient systems for speech recognition in noisy environments.

Integrating Clinical Knowledge into Concept Bottleneck Models

http://arxiv.org/abs/2407.06600v1

Compressor summary: The text describes a method to improve concept bottleneck models by integrating clinical knowledge, making them more aligned with human decision-making and better at classifying medical images in different settings.

TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

http://arxiv.org/abs/2407.06597v1

Compressor summary: The paper introduces Ranked Video Moment Retrieval (RVMR), a task that requires finding and ranking video moments from queries in natural language, and presents the TVR-Ranking dataset with relevance annotations for evaluating RVMR models.

D-MASTER: Mask Annealed Transformer for Unsupervised Domain Adaptation in Breast Cancer Detection from Mammograms

http://arxiv.org/abs/2407.06585v1

Compressor summary: D-MASTER is a transformer-based framework that adapts to different domains for breast cancer detection from mammograms by masking and reconstructing multi-scale features, improving sensitivity and reducing false positives.

Vision language models are blind

http://arxiv.org/abs/2407.06581v1

Compressor summary: VLMs struggle with simple visual tasks that humans find easy, indicating a lack of fine detail perception or complete blindness in their vision.

NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification

http://arxiv.org/abs/2407.06579v1

Compressor summary: The paper introduces NoisyAG-News, a benchmark dataset for text classification with instance-dependent noise patterns, and shows that pre-trained language models struggle to handle such real-world noise.

Virtual Personas for Language Models via an Anthology of Backstories

http://arxiv.org/abs/2407.06576v1

Compressor summary: Anthology is a method that conditions large language models to adopt virtual personas based on life narratives, improving the representation of diverse human traits in behavioral studies.

Attack GAN (AGAN ): A new Security Evaluation Tool for Perceptual Encryption

http://arxiv.org/abs/2407.06570v1

Compressor summary: AGAN is a new attack method that exposes vulnerabilities in perceptual encryption techniques, breaking image privacy protection.

FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

http://arxiv.org/abs/2407.06567v1

Compressor summary: FinCon is a large language model-based framework for enhanced financial decision-making with conceptual verbal reinforcement and a risk-control component.

Robust and Explainable Framework to Address Data Scarcity in Diagnostic Imaging

http://arxiv.org/abs/2407.06566v1

Compressor summary: ETSEF is a novel framework that combines transfer, self-supervised, and ensemble learning with data enhancement techniques to improve automatic medical diagnostics using limited data samples.

Combining Knowledge Graphs and Large Language Models

http://arxiv.org/abs/2407.06564v1

Compressor summary: The text discusses the benefits and challenges of combining natural language processing, large language models, and knowledge graphs for enhancing artificial intelligence applications.

OffsetBias: Leveraging Debiased Data for Tuning Evaluators

http://arxiv.org/abs/2407.06551v1

Compressor summary: The text describes a study that identifies six types of biases in evaluating generated responses using large language models, proposes a collection of test cases for each bias, and introduces methods to improve the robustness of these models.

Deciphering Assamese Vowel Harmony with Featural InfoWaveGAN

http://arxiv.org/abs/2407.06547v1

Compressor summary: The Featural InfoWaveGAN model can learn Assamese vowel harmony from raw speech data, capturing its complexities and showing feature learning.

Exploring the Causality of End-to-End Autonomous Driving

http://arxiv.org/abs/2407.06546v1

Compressor summary: The paper proposes a method to debug and understand the factors influencing end-to-end autonomous driving models, making them more transparent and trustworthy.

Multiple Instance Verification

http://arxiv.org/abs/2407.06544v1

Compressor summary: The paper introduces cross-attention pooling (CAP), a novel approach for multiple-instance verification that uses two new attention functions to better distinguish between similar instances in a target bag, outperforming existing methods.

DriftGAN: Using historical data for Unsupervised Recurring Drift Detection

http://arxiv.org/abs/2407.06543v1

Compressor summary: Key points: - The paper proposes an unsupervised GAN-based method to detect concept drifts and identify their history - The method reduces the time and data needed to retrain the model for recurring drifts - The method outperforms state-of-the-art models and is applied to a real-world astrophysics use case Summary: The paper presents an unsupervised GAN method that detects concept drifts, tracks their history, and improves the model's performance for recurring drifts in less time and data, demonstrating its effectiveness on an astrophysics problem.

LIONs: An Empirically Optimized Approach to Align Language Models

http://arxiv.org/abs/2407.06542v1

Compressor summary: The text discusses a three-stage training pipeline to improve language models' alignment, instruction-following, and conversational abilities by using various techniques and surpassing official instruct models.

General and Task-Oriented Video Segmentation

http://arxiv.org/abs/2407.06540v1

Compressor summary: GvSeg is a versatile framework for various video segmentation tasks that considers the diversity of targets and adapts to task-specific requirements, outperforming existing methods.

Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study

http://arxiv.org/abs/2407.06538v1

Compressor summary: This paper proposes a framework that combines a multilingual encoder-based seq2seq model with knowledge distillation to improve translation for low-resource Indic languages not supported by mBART-50.

Efficient and Accurate Memorable Conversation Model using DPO based on sLLM

http://arxiv.org/abs/2407.06537v1

Compressor summary: The paper proposes an efficient and accurate conversation model for multi-session dialog systems that uses memory management techniques to improve response generation performance and resource utilization.

LETS-C: Leveraging Language Embedding for Time Series Classification

http://arxiv.org/abs/2407.06533v1

Compressor summary: The paper proposes LETS-C, a lightweight and accurate time series classifier that uses language embeddings and a simple CNN+MLP head instead of fine-tuning large language models.

Decomposition Betters Tracking Everything Everywhere

http://arxiv.org/abs/2407.06531v1

Compressor summary: DecoMotion is a new test-time optimization method that decomposes video content into static scenes and dynamic objects, improving robustness and appearance in motion estimation.

Advanced Financial Fraud Detection Using GNN-CL Model

http://arxiv.org/abs/2407.06529v1

Compressor summary: The GNN-CL model combines graph neural networks, convolutional neural networks and long short-term memory to improve financial fraud detection accuracy by analyzing complex transaction patterns and using intelligent purification mechanisms and reinforcement learning strategies.

Graph Neural Networks and Deep Reinforcement Learning Based Resource Allocation for V2X Communications

http://arxiv.org/abs/2407.06518v1

Compressor summary: The paper proposes a method that combines Graph Neural Networks with Deep Reinforcement Learning to efficiently allocate resources for Vehicle-to-Vehicle and Vehicle-to-Infrastructure communication in Internet of Vehicles technology.

VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

http://arxiv.org/abs/2407.06516v1

Compressor summary: VQA-Diff is a novel framework that uses real-world knowledge from large language models and image prior knowledge from diffusion models to generate photorealistic 3D vehicle assets for autonomous driving, achieving robust zero-shot prediction and appearance control.

Computer vision tasks for intelligent aerospace missions: An overview

http://arxiv.org/abs/2407.06513v1

Compressor summary: Key points: - Computer vision tasks are vital for aerospace missions - Traditional methods are not robust enough for harsh space conditions - Deep learning-based perception technologies outperform traditional methods and offer great potential - The survey explores techniques, datasets, and strategies for DL-based aerospace perception - The challenges and future directions of this field are discussed Summary: The text surveys deep learning-based computer vision techniques for aerospace missions, which overcome the limitations of traditional methods and offer great potential, but also face challenges and need further research.

LuSNAR:A Lunar Segmentation, Navigation and Reconstruction Dataset based on Muti-sensor for Autonomous Exploration

http://arxiv.org/abs/2407.06512v1

Compressor summary: The paper introduces LuSNAR, a multi-task, multi-scene, and multi-label lunar dataset for evaluating autonomous perception and navigation systems on the moon.

Economic span selection of bridge based on deep reinforcement learning

http://arxiv.org/abs/2407.06507v1

Compressor summary: The text describes how a deep Q-network algorithm can be used to optimize the economic span of a bridge, reducing its construction cost.

Reprogramming Distillation for Medical Foundation Models

http://arxiv.org/abs/2407.06504v1

Compressor summary: Reprogramming Distillation is a novel framework that reprograms the foundation model's feature space for downstream tasks and establishes connections between the reprogrammed knowledge and student models for personalized lightweight deployment.

Preference-Guided Reinforcement Learning for Efficient Exploration

http://arxiv.org/abs/2407.06503v1

Compressor summary: LOPE is a preference-guided RL framework that improves exploration efficiency in hard-exploration tasks by using human feedback as guidance, avoiding learning a separate reward model.

STORYSUMM: Evaluating Faithfulness in Story Summarization

http://arxiv.org/abs/2407.06501v1

Compressor summary: The paper introduces STORYSUMM, a new dataset for evaluating faithfulness in summarization methods, and shows that current automatic metrics are not accurate enough for this task.

It's Our Loss: No Privacy Amplification for Hidden State DP-SGD With Non-Convex Loss

http://arxiv.org/abs/2407.06496v1

Compressor summary: The paper shows that for some loss functions, the final iterate of DP-SGD leaks as much information as all intermediate iterates combined, and thus privacy amplification is not possible for these cases.

A Generative Approach to Control Complex Physical Systems

http://arxiv.org/abs/2407.06494v1

Compressor summary: DiffPhyCon is a novel method for controlling complex physical systems that minimizes energy and control objectives, explores globally, and can discover near-optimal control sequences, outperforming classical and deep learning approaches.

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

http://arxiv.org/abs/2407.06491v1

Compressor summary: VideoEval is a new benchmark suite that evaluates video foundation models on task adaptability and representation power, revealing their weaknesses and potential improvements.

Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons

http://arxiv.org/abs/2407.06488v1

Compressor summary: The paper investigates how large language models learn multiple tasks by identifying and analyzing task-sensitive neurons, and proposes a continuous fine-tuning method based on these findings.

Optimal Decision Making Through Scenario Simulations Using Large Language Models

http://arxiv.org/abs/2407.06486v1

Compressor summary: The paper proposes a dynamic framework that integrates an optimization function within LLMs' decision-making process, allowing them to offer tailored, optimal solutions to complex problems.

CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community

http://arxiv.org/abs/2407.06485v1

Compressor summary: The text introduces Crowd Knowledge Transfer (CrowdTransfer), a new approach to improve Artificial Intelligence of Things (AIoT) performance by sharing prior knowledge from multiple agents, and discusses its applications and challenges.

Composable Interventions for Language Models

http://arxiv.org/abs/2407.06483v1

Compressor summary: The paper introduces a framework to study and compare different test-time interventions applied sequentially to language models, revealing their interactions and limitations.

Interaction Matters: An Evaluation Framework for Interactive Dialogue Assessment on English Second Language Conversations

http://arxiv.org/abs/2407.06479v1

Compressor summary: The text introduces an evaluation framework that measures interactivity in English as a Second Language (ESL) speakers' dialogues using micro-level features and machine learning models.

Sketch-Guided Scene Image Generation

http://arxiv.org/abs/2407.06469v1

Compressor summary: Key points: - The study proposes a new method for generating scene images from sketch inputs using diffusion models and identity embeddings. - The method decomposes the task into object-level generation and scene-level construction. - The method preserves the details of the foreground objects while blending them with the background. Summary: The study presents a novel method that uses diffusion models and identity embeddings to generate high-quality scene images from sketch inputs by decomposing the task and preserving object details.

AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking

http://arxiv.org/abs/2407.06468v1

Compressor summary: AnatoMask is a novel self-supervised learning method for 3D medical image segmentation that dynamically masks and reconstructs anatomically significant regions to improve pretraining efficiency.

SideSeeing: A multimodal dataset and collection of tools for sidewalk assessment

http://arxiv.org/abs/2407.06464v1

Compressor summary: SideSeeing is an initiative that provides tools and datasets for assessing the built environment, using synchronized video and sensor data from chest-mounted mobile devices to evaluate sidewalk accessibility near hospitals in Brazil and the USA.