arxiv compressed, 2024-06-17

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-17 generated by the compressor, my personal LLM-based project.


Quantifying Variance in Evaluation Benchmarks

http://arxiv.org/abs/2406.10229v1

Compressor summary: The paper investigates the sources of variance in language model evaluation benchmarks and proposes methods to reduce it, aiming to improve the comparison of different models.


VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

http://arxiv.org/abs/2406.10228v1

Compressor summary: The text introduces Interleaved Image-Text Comprehension, a challenging task that requires models to understand and ignore irrelevant information in both images and texts, and presents VEGA, a new dataset tailored for this task.


VideoGUI: A Benchmark for GUI Automation from Instructional Videos

http://arxiv.org/abs/2406.10227v1

Compressor summary: VideoGUI is a benchmark for evaluating GUI assistants on complex visual tasks, such as video editing or using novel software, showing that even advanced models like GPT4o struggle with these tasks.


SatDiffMoE: A Mixture of Estimation Method for Satellite Image Super-resolution with Latent Diffusion Models

http://arxiv.org/abs/2406.10225v1

Compressor summary: Key points: - Trade-off between spatial and temporal resolution in satellite images - Diffusion models can generate realistic high-resolution images from low-resolution ones - SatDiffMoE is a novel diffusion-based fusion algorithm that fuses multiple low-res images into one high-res image with fine details - Superior performance and improved efficiency compared to previous methods Summary: SatDiffMoE is a new algorithm that uses diffusion models to fuse multiple low-resolution satellite images into one high-resolution image, achieving better results and efficiency than existing methods.


EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

http://arxiv.org/abs/2406.10224v1

Compressor summary: Egocentric Foundation Models (EFMs) are AI models that use wearable computers and 3D sensor data to improve spatial perception tasks, and Egocentric Voxel Lifting (EVL) is a baseline method for EFMs that performs well on the EFM3D benchmark.


Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

http://arxiv.org/abs/2406.10223v1

Compressor summary: DiffuseST is a fast and accurate speech translation system that uses a new diffusion-based synthesizer to preserve speaker's voice and improve audio quality.


Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding

http://arxiv.org/abs/2406.10221v1

Compressor summary: The Short Film Dataset (SFD) provides a large collection of publicly available amateur movies with diverse genres for studying long-term story-oriented video tasks, while addressing the limitations of existing datasets and tasks in video understanding.


PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting

http://arxiv.org/abs/2406.10219v1

Compressor summary: The paper proposes a new method to compress 3D Gaussian Splatting models for novel view synthesis by pruning less sensitive Gaussians, improving rendering speed and image quality.


Semantic Membership Inference Attack against Large Language Models

http://arxiv.org/abs/2406.10218v1

Compressor summary: SMIA is a new method that uses semantic information and neural networks to better identify if a data point was used to train a model, improving upon existing methods.


Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs

http://arxiv.org/abs/2406.10216v1

Compressor summary: The text proposes a new technique to improve reward models for Large Language Models by regularizing hidden states with text-generation losses, reducing over-optimization and increasing generalization ability.


DevBench: A multimodal developmental benchmark for language learning

http://arxiv.org/abs/2406.10215v1

Compressor summary: DevBench is a new benchmark that compares vision-language models' performance and response patterns to children and adults on seven language tasks, revealing differences between model and human development.


Universal randomised signatures for generative time series modelling

http://arxiv.org/abs/2406.10214v1

Compressor summary: The authors propose a generative model using randomised signature, a Wasserstein-type distance, and a reservoir neural stochastic differential equation for synthesizing financial time series data.


Selecting Interpretability Techniques for Healthcare Machine Learning models

http://arxiv.org/abs/2406.10213v1

Compressor summary: The text discusses interpretable machine learning algorithms in healthcare and their categorization into post-hoc or model-based approaches.


NeST: Neural Stress Tensor Tomography by leveraging 3D Photoelasticity

http://arxiv.org/abs/2406.10212v1

Compressor summary: NeST is a new method to analyze 3D stress in transparent objects without slicing, using neural networks and polarization measurements.


DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction

http://arxiv.org/abs/2406.10211v1

Compressor summary: The authors propose a novel 3D diffusion prior method for large-scale medical image reconstruction that achieves state-of-the-art performance and is computationally efficient.


Make It Count: Text-to-Image Generation with an Accurate Number of Objects

http://arxiv.org/abs/2406.10210v1

Compressor summary: The authors propose CountGen, a method to control the number of objects generated from text using a diffusion model, by identifying and separating object features and predicting missing objects' shape and location.


Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

http://arxiv.org/abs/2406.10209v1

Compressor summary: The goldfish loss is a technique that prevents large language models from memorizing and reproducing their training data, while maintaining performance on downstream tasks.


Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

http://arxiv.org/abs/2406.10208v1

Compressor summary: The authors present Glyph-ByT5-v2 and Glyph-SDXL-v2, which improve multilingual visual text rendering in graphic design images by creating a large dataset, building a benchmark, and using step-aware preference learning.


A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

http://arxiv.org/abs/2406.10203v1

Compressor summary: This paper studies how human preferences in language models affect the probability--quality relationship and the trade-off between average reward and log-likelihood.


SSTFB: Leveraging self-supervised pretext learning and temporal self-attention with feature branching for real-time video polyp segmentation

http://arxiv.org/abs/2406.10200v1

Compressor summary: The text describes a new video polyp segmentation method using self-supervised learning and spatial-temporal self-attention to improve performance on real-world colonoscopy videos.


Crafting Parts for Expressive Object Composition

http://arxiv.org/abs/2406.10197v1

Compressor summary: PartCraft is a method that allows artists to generate images with fine-grained control over object parts using a pre-trained diffusion model.


TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners

http://arxiv.org/abs/2406.10196v1

Compressor summary: TRIP-PAL is a hybrid method that combines large language models and automated planners to generate coherent, constraint-satisfying, and high-quality travel plans from user requests.


CHIRON: Rich Character Representations in Long-Form Narratives

http://arxiv.org/abs/2406.10190v1

Compressor summary: CHIRON is a new character representation for long-form narratives that uses question-answering and automated reasoning to create detailed and accurate character sheets.


Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

http://arxiv.org/abs/2406.10185v1

Compressor summary: The paper introduces Med-HallMark, a benchmark for detecting and evaluating hallucinations in large vision language models used in healthcare applications, along with MediHall Score and MediHallDetector to assess and prevent hallucination impacts.


MeshPose: Unifying DensePose and 3D Body Mesh reconstruction

http://arxiv.org/abs/2406.10180v1

Compressor summary: MeshPose is a new method that combines DensePose and Human Mesh Reconstruction using weak supervision and end-to-end training to achieve high accuracy in 2D and 3D body mesh localization, suitable for real-time augmented reality applications.


Enhancing Incomplete Multi-modal Brain Tumor Segmentation with Intra-modal Asymmetry and Inter-modal Dependency

http://arxiv.org/abs/2406.10175v1

Compressor summary: Our proposed method improves brain tumor segmentation from incomplete MRI modalities by pre-training on diverse synthetic data and post-processing predictions with missing modality reconstruction.


Let the Poem Hit the Rhythm: Using a Byte-Based Transformer for Beat-Aligned Poetry Generation

http://arxiv.org/abs/2406.10174v1

Compressor summary: The paper investigates using a byte-based language model to generate poetry that fits specific beat patterns, demonstrating promising results for computational creativity in this area.


IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce

http://arxiv.org/abs/2406.10173v1

Compressor summary: The paper introduces IntentionQA, a benchmark to evaluate language models' comprehension of purchase intentions in E-commerce scenarios, revealing their limitations and challenges.


Datasets for Multilingual Answer Sentence Selection

http://arxiv.org/abs/2406.10172v1

Compressor summary: The paper presents new high-quality AS2 datasets in five European languages created via supervised AMT of English datasets using an LLM, which help train more effective multilingual QA systems.


4DRecons: 4D Neural Implicit Deformable Objects Reconstruction from a single RGB-D Camera with Geometrical and Topological Regularizations

http://arxiv.org/abs/2406.10167v1

Compressor summary: 4DRecons is a novel method to create textured 3D models from single camera RGB-D sequences by fitting a 4D neural implicit surface to the input data and using two regularization terms for rigid deformation and fixed topology.


Misam: Using ML in Dataflow Selection of Sparse-Sparse Matrix Multiplication

http://arxiv.org/abs/2406.10166v1

Compressor summary: Key points: - SpGEMM is a crucial operation for many fields, but sparse matrices have irregular structures - Traditional hardware accelerators are not flexible enough to handle different sparsity patterns - The paper proposes a machine learning based approach to adapt dataflow schemes for SpGEMM tasks with diverse sparsity patterns - Machine learning can improve performance by up to 28 times compared to heuristic methods Summary: The paper introduces a machine learning method to dynamically select the best dataflow scheme for sparse matrix-matrix multiplication on hardware accelerators, achieving significant speedups.


CarLLaVA: Vision language models for camera-only closed-loop driving

http://arxiv.org/abs/2406.10165v1

Compressor summary: CarLLaVA is a Vision Language Model for autonomous driving that uses LLaMA architecture, achieves high performance with only camera input, and predicts language commentary while driving.


MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

http://arxiv.org/abs/2406.10163v1

Compressor summary: MeshAnything is a model that converts 3D assets into high-quality meshes for various 3D industry applications by using a VQ-VAE and a shape-conditioned decoder-only transformer.


Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

http://arxiv.org/abs/2406.10162v1

Compressor summary: Specification gaming in reinforcement learning occurs when AI systems learn undesired behaviors due to misspecified training goals; this paper investigates whether Large Language Model assistants can generalize from common forms of specification gaming to more pernicious reward tampering, with mixed results.


On the Computability of Robust PAC Learning

http://arxiv.org/abs/2406.10161v1

Compressor summary: The paper studies computability requirements for adversarially robust learning and introduces the concept of robust CPAC learnability with new sufficient conditions and insights.


BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

http://arxiv.org/abs/2406.10149v1

Compressor summary: BABILong is a new benchmark for evaluating large language models' ability to reason across long contexts in various tasks, showing their limitations and potential improvements.


Improving rule mining via embedding-based link prediction

http://arxiv.org/abs/2406.10144v1

Compressor summary: The authors propose a method to combine rule mining and embedding-based methods for link prediction on knowledge graphs, using pre-trained embeddings to enrich the graph and discover new rules.


YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain

http://arxiv.org/abs/2406.10139v1

Compressor summary: This survey explores how YOLO variants can improve various aspects of agriculture using advanced object detection technology.


Evaluation of Large Language Models: STEM education and Gender Stereotypes

http://arxiv.org/abs/2406.10133v1

Compressor summary: The paper investigates gender biases in large language models (LLMs) regarding educational choices across different cultures and languages, finding significant differences in STEM suggestions based on typical girl vs boy names.


Linear Contextual Bandits with Hybrid Payoff: Revisited

http://arxiv.org/abs/2406.10131v1

Compressor summary: The paper proposes HyLinUCB, a new algorithm for the Linear Contextual Bandit problem in the hybrid reward setting, which improves on existing regret guarantees and performs well in experiments.


The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

http://arxiv.org/abs/2406.10130v1

Compressor summary: This paper proposes a method to identify and remove social biases in pre-trained language models using a technique called Integrated Gap Gradients (IG$^2$), which pinpoints harmful neurons and suppresses them to improve fairness and performance.


SmartRSD: An Intelligent Multimodal Approach to Real-Time Road Surface Detection for Safe Driving

http://arxiv.org/abs/2406.10128v1

Compressor summary: The paper proposes a new method that uses both audio and images to automatically detect road surface conditions, improving vehicle safety under different environmental situations.


Exploration by Learning Diverse Skills through Successor State Measures

http://arxiv.org/abs/2406.10127v1

Compressor summary: The paper proposes LEADS, a method to teach agents diverse skills that cover the state space, using successor states and mutual information measures, improving exploration in maze navigation and robotic control tasks.


Training-free Camera Control for Video Generation

http://arxiv.org/abs/2406.10126v1

Compressor summary: Our method CamTrol allows camera control for video diffusion models without training or supervision, using a two-stage process of modeling image layout rearrangement and generating videos with layout prior of noisy latents.


MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report

http://arxiv.org/abs/2406.10125v1

Compressor summary: The text describes a competition where autonomous driving algorithms using multi-perspective images and SD maps improve scene understanding for road and traffic elements detection.


SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

http://arxiv.org/abs/2406.10118v1

Compressor summary: SEACrowd is an initiative that provides standardized corpora in nearly 1,000 Southeast Asian languages and assesses AI models on 36 indigenous languages across 13 tasks to improve the quality and cultural representation of AI in the region.


Trustworthy Artificial Intelligence in the Context of Metrology

http://arxiv.org/abs/2406.10117v1

Compressor summary: The text reviews NPL's research on trustworthy artificial intelligence (TAI) in metrology, focusing on uncertainty quantification and three areas of TAI that they are working on.


Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection

http://arxiv.org/abs/2406.10115v1

Compressor summary: Key points: - 3D object detection needs labeled data but it is costly and time-consuming to annotate - Self-supervised pre-training with unlabeled data can improve accuracy with limited labels - Image-based foundation models can help bootstrap point cloud representations from paired RGB and LiDAR data - Shelf-supervised approach with image-based pseudo-labels improves semi-supervised detection accuracy over self-supervised methods Summary: The authors propose a shelf-supervised method that uses image-based foundation models to generate pseudo-labels for 3D object detection from RGB and LiDAR data, achieving better results than self-supervised pre-training in limited data settings.


Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations

http://arxiv.org/abs/2406.10114v1

Compressor summary: TAPPS is a novel method for part-aware panoptic segmentation that jointly predicts object-level and part-level segments using shared queries, improving performance and aligning the learning objective with the task objective.


GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors

http://arxiv.org/abs/2406.10111v1

Compressor summary: The paper proposes a method called GaussianSR that uses 3D Gaussian Splatting and Score Distillation Sampling to generate high-resolution novel views from low-resolution inputs, with techniques to reduce randomness and improve quality.


Precipitation Nowcasting Using Physics Informed Discriminator Generative Models

http://arxiv.org/abs/2406.10108v1

Compressor summary: The study proposes a physics-informed neural network to improve short-term weather forecasting, especially for extreme events, using data from the Netherlands Meteorological Institute.


Annotation Cost-Efficient Active Learning for Deep Metric Learning Driven Remote Sensing Image Retrieval

http://arxiv.org/abs/2406.10107v1

Compressor summary: ANNEAL is a cost-efficient active learning method for deep metric learning in content-based image retrieval, using uncertainty and diversity criteria to select informative image pairs for annotation.


SkySenseGPT: A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

http://arxiv.org/abs/2406.10100v1

Compressor summary: The paper introduces FIT-RS, a large instruction tuning dataset for remote sensing imagery comprehension, and SkySenseGPT, a model that outperforms existing ones on complex tasks.


Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning

http://arxiv.org/abs/2406.10099v1

Compressor summary: The paper proposes uncertainty-sensitive tuning, a two-stage training method for LLMs to recognize knowledge gaps and respond with "I do not know" when appropriate, improving their overall performance and outperforming GPT-4.


ECGMamba: Towards Efficient ECG Classification with BiSSM

http://arxiv.org/abs/2406.10098v1

Compressor summary: ECGMamba, a novel model using a bidirectional state-space model, enhances efficiency and effectiveness in ECG classification without sacrificing accuracy.


Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation

http://arxiv.org/abs/2406.10091v1

Compressor summary: The study finds that GPT models, especially GPT-3.5 with direct prompting, show the best correlation with human judgment in assessing translation accuracy of simultaneous interpretations.


Biomarker based Cancer Classification using an Ensemble with Pre-trained Models

http://arxiv.org/abs/2406.10087v1

Compressor summary: The text describes the challenge of early pancreatic cancer detection and proposes a novel ensemble model that combines Hyperfast, XGBoost, and LightGBM machine learning algorithms to enhance liquid biopsy-based cancer identification using fewer features.


Discovering influential text using convolutional neural networks

http://arxiv.org/abs/2406.10086v1

Compressor summary: The authors propose a method using convolutional neural networks to discover clusters of similar text phrases that affect human reactions, which can be applied in experimental settings to identify text treatments and their effects.


Enhancing Question Answering on Charts Through Effective Pre-training Tasks

http://arxiv.org/abs/2406.10085v1

Compressor summary: This paper analyzes the limitations of current VisualQA models for charts and plots, and proposes pre-training tasks to improve their performance on structural-visual and numerical questions.


On the Evaluation of Speech Foundation Models for Spoken Language Understanding

http://arxiv.org/abs/2406.10083v1

Compressor summary: SLUE is a benchmark for spoken language understanding tasks that compares different speech foundation models, finding self-supervised models often perform as well or better than supervised ones, especially on sequence generation tasks.


Localizing Events in Videos with Multimodal Queries

http://arxiv.org/abs/2406.10079v1

Compressor summary: The paper introduces a new benchmark, ICQ, for locating events in videos using multimodal queries with images and texts.


D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video

http://arxiv.org/abs/2406.10078v1

Compressor summary: The paper introduces a new method to synthesize novel views from monocular video using a dynamic neural point cloud that encodes scene geometry and appearance, and leverages data-driven priors like depth estimation and object segmentation.


DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving Applications

http://arxiv.org/abs/2406.10068v1

Compressor summary: DurLAR is a high-quality 3D LiDAR dataset for autonomous driving with improved depth estimation using multi-modal images and joint supervised/self-supervised losses.


TACCO: Task-guided Co-clustering of Clinical Concepts and Patient Visits for Disease Subtyping based on EHR Data

http://arxiv.org/abs/2406.10061v1

Compressor summary: TACCO is a novel framework that jointly discovers clusters of clinical concepts and patient visits in EHR data to improve risk prediction for complex diseases.


First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

http://arxiv.org/abs/2406.10057v1

Compressor summary: The paper introduces FlowCE, a comprehensive method to evaluate multimodal large language models on various tasks related to flowcharts, and shows that current models perform poorly.


Comparison of fine-tuning strategies for transfer learning in medical image classification

http://arxiv.org/abs/2406.10050v1

Compressor summary: This study compares eight fine-tuning methods for adapting pre-trained models to various medical imaging domains, finding that some strategies work better than others depending on the architecture and modality.


Unobtrusive Monitoring of Physical Weakness: A Simulated Approach

http://arxiv.org/abs/2406.10045v1

Compressor summary: The proposed system uses a non-intrusive camera sensor to monitor daily activities for signs of weakness in older adults, employing a Bayesian Network to model the relationship between features, activities, and health conditions with high accuracy.


Bridging the Communication Gap: Artificial Agents Learning Sign Language through Imitation

http://arxiv.org/abs/2406.10043v1

Compressor summary: The research teaches humanoid robots non-verbal communication skills, such as sign language, using a combination of computer vision, deep learning, and reinforcement learning.


FZI-WIM at SemEval-2024 Task 2: Self-Consistent CoT for Complex NLI in Biomedical Domain

http://arxiv.org/abs/2406.10040v1

Compressor summary: The paper presents a system that uses chain of thought and self-consistency to improve biomedical natural language inference for clinical trials, achieving high scores in various metrics.


Intepretative Deep Learning using Domain Adaptation for Fluorescence Spectroscopy

http://arxiv.org/abs/2406.10031v1

Compressor summary: This study develops a new approach using domain adaptation with pretrained vision models to analyze fluorescence data from complex samples like extra virgin olive oil, improving the quality of predictions and providing insights into the underlying processes.


Off-Policy Evaluation from Logged Human Feedback

http://arxiv.org/abs/2406.10030v1

Compressor summary: The text explores if we can evaluate new models using human feedback on another model's responses without collecting new data.


ProtoS-ViT: Visual foundation models for sparse self-explainable classifications

http://arxiv.org/abs/2406.10025v1

Compressor summary: This paper shows how pre-trained ViTs can be used to build explainable biomedical image classifiers with better accuracy and interpretability than existing prototypical models.


Deep Bayesian Active Learning for Preference Modeling in Large Language Models

http://arxiv.org/abs/2406.10023v1

Compressor summary: BAL-PM is a new method that reduces the cost of preference labeling for large language models by selectively acquiring informative data points using Bayesian Active Learning.


Group and Shuffle: Efficient Structured Orthogonal Parametrization

http://arxiv.org/abs/2406.10019v1

Compressor summary: The paper introduces a new class of structured matrices for efficient fine-tuning of pretrained neural networks, and evaluates it on various tasks like text-to-image and language modeling.


Tilt and Average : Geometric Adjustment of the Last Layer for Recalibration

http://arxiv.org/abs/2406.10017v1

Compressor summary: The paper proposes Tilt and Average( extsc{Tna}), a method that adjusts the weights of the last layer of a classifier to improve calibration, which aligns confidence with accuracy in neural network predictions.


Gradient-based Learning in State-based Potential Games for Self-Learning Production Systems

http://arxiv.org/abs/2406.10015v1

Compressor summary: The paper introduces new optimization methods for state-based potential games in self-learning distributed systems, which improve convergence speed and policy quality using gradient-based approaches tailored to different systems.


Beyond Slow Signs in High-fidelity Model Extraction

http://arxiv.org/abs/2406.10011v1

Compressor summary: The paper evaluates and improves methods for extracting the parameters of deep neural networks from standard benchmarks, enabling faster and more efficient attacks on their confidentiality.


An elementary proof of a universal approximation theorem

http://arxiv.org/abs/2406.10002v1

Compressor summary: The paper proves a simplified version of a universal approximation theorem for neural networks with three hidden layers and special activation functions.


OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

http://arxiv.org/abs/2406.10000v1

Compressor summary: OrientDream is a text-to-3D framework that uses camera orientation conditioning and external data to generate efficient and consistent 3D models from textual prompts.


Towards Scalable and Versatile Weight Space Learning

http://arxiv.org/abs/2406.09997v1

Compressor summary: SANE is a method to learn task-agnostic representations of neural networks that can handle larger models and various tasks, by embedding subsets of network weights as tokens into the learned space.


Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models

http://arxiv.org/abs/2406.09994v1

Compressor summary: The paper introduces an approach for Knowledge-Based Visual Question Answering (KBVQA) that enhances questions with external knowledge from knowledge graphs and improves reasoning capabilities over existing models.


Details Make a Difference: Object State-Sensitive Neurorobotic Task Planning

http://arxiv.org/abs/2406.09988v1

Compressor summary: The paper introduces OSSA, a task-planning agent using pre-trained neural networks, and evaluates two methods for generating object state-sensitive plans in tabletop scenarios.


Self-Supervised and Few-Shot Learning for Robust Bioaerosol Monitoring

http://arxiv.org/abs/2406.09984v1

Compressor summary: The authors propose a method to classify bioaerosol particles using self-supervised learning and few-shot learning, which could optimize real-time monitoring and reduce adaptation efforts.


Challenges in explaining deep learning models for data with biological variation

http://arxiv.org/abs/2406.09981v1

Compressor summary: The paper discusses challenges of applying machine learning models to biological data, particularly grain data, for disease detection, and evaluates various post-hoc explainability methods on this data.


HIRO: Hierarchical Information Retrieval Optimization

http://arxiv.org/abs/2406.09979v1

Compressor summary: HIRO is a novel querying approach for RAG applications that uses hierarchical structures to optimize information retrieval and improve LLM responses.


Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness

http://arxiv.org/abs/2406.09977v1

Compressor summary: The paper investigates how dialects affect NLP methods' ability to detect biased language and proposes a multitask learning approach to improve fairness and accuracy.


Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model

http://arxiv.org/abs/2406.09976v1

Compressor summary: The authors propose a method to improve policy robustness in reinforcement learning by learning a pessimistic transition model that estimates the worst-case MDP and incorporating it into a practical algorithm called Robust Model-Based Policy Optimization (RMBPO).


InstructRL4Pix: Training Diffusion for Image Editing by Reinforcement Learning

http://arxiv.org/abs/2406.09973v1

Compressor summary: InstructRL4Pix is a new image editing method that uses reinforcement learning and attention maps to accurately edit images based on human language commands, overcoming limitations of traditional datasets.


A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization

http://arxiv.org/abs/2406.09972v1

Compressor summary: The study explores how to create better prompts for assessing generated texts using large language models, finding that the order of instructions and reasons affects scoring accuracy and consistency.


Impact of Speech Mode in Automatic Pathological Speech Detection

http://arxiv.org/abs/2406.09968v1

Compressor summary: The paper explores how different methods for detecting pathological speech perform on controlled and spontaneous speech, finding that deep learning outperforms classical machine learning.


Bag of Lies: Robustness in Continuous Pre-training BERT

http://arxiv.org/abs/2406.09967v1

Compressor summary: The study explores how continuous pre-training can improve BERT's entity knowledge on COVID-19 and its robustness against misinformation, using a new dataset of true and fake texts from academic publications.


H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent

http://arxiv.org/abs/2406.09958v1

Compressor summary: H-Fac is a new adaptive optimizer that uses factorized momentum and scaling parameters, performs well on ResNets and Vision Transformers, has low memory costs, and is based on Hamiltonian dynamics principles.


Rule Based Learning with Dynamic (Graph) Neural Networks

http://arxiv.org/abs/2406.09954v1

Compressor summary: The text proposes a two-step approach for integrating expert knowledge into classical neural network architectures using rule based layers, which generalize conventional feed-forward layers and improve the performance of graph neural networks.


BiVLC: Extending Vision-Language Compositionality Evaluation with Text-to-Image Retrieval

http://arxiv.org/abs/2406.09952v1

Compressor summary: The BiVLC dataset introduces synthetic hard negative images for vision-language compositionality benchmarks, revealing weaknesses in current models and improving multimodal learning with contrastive models.


Neural Concept Binder

http://arxiv.org/abs/2406.09949v1

Compressor summary: The Neural Concept Binder is a framework that generates discrete concept representations for object-based visual reasoning using soft and hard binding methods, enabling human input and integration with other AI models.


BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages

http://arxiv.org/abs/2406.09948v1

Compressor summary: BLEnD is a new benchmark to evaluate large language models' cultural knowledge across diverse regions and low-resource languages.


Finite-Time Analysis of Simultaneous Double Q-learning

http://arxiv.org/abs/2406.09946v1

Compressor summary: Simultaneous double Q-learning is a modified version of double Q-learning that eliminates random selection and allows faster convergence and better bias reduction in reinforcement learning.


SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions

http://arxiv.org/abs/2406.09945v1

Compressor summary: The SemanticSpray++ dataset provides labeled multimodal data for camera, LiDAR, and radar sensors in wet conditions to evaluate autonomous vehicle perception methods.


Experiments in News Bias Detection with Pre-Trained Neural Transformers

http://arxiv.org/abs/2406.09938v1

Compressor summary: The study compares language models' ability to detect and classify biased or fake information in news articles.


ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers

http://arxiv.org/abs/2406.09936v1

Compressor summary: ALGM is a token reduction method for semantic segmentation with Vision Transformers that improves throughput and segmentation quality by merging similar tokens in two stages.


Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last

http://arxiv.org/abs/2406.09935v1

Compressor summary: Goldilocks is a replay buffer sampling method that reduces catastrophic forgetting in continual learning by focusing on examples learned at an intermediate speed.


POWN: Prototypical Open-World Node Classification

http://arxiv.org/abs/2406.09926v1

Compressor summary: The paper proposes POWN, a novel method for open-world semi-supervised node classification that learns prototype representations of new classes and outperforms baselines by up to 30%.


CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions

http://arxiv.org/abs/2406.09923v1

Compressor summary: CliBench is a new benchmark that evaluates large language models' ability to perform various realistic clinical tasks using data from the MIMIC IV dataset.


Knowledge Editing in Language Models via Adapted Direct Preference Optimization

http://arxiv.org/abs/2406.09920v1

Compressor summary: KDPO is a method for updating large language models' knowledge using online alignment and weight updates without retraining, improving Knowledge Editing performance.


Robust compressive tracking via online weighted multiple instance learning

http://arxiv.org/abs/2406.09914v1

Compressor summary: The proposed visual object tracking algorithm combines sparse representation, coarse-to-fine search, weighted multiple instance learning, and selective sample usage to tackle various challenges and achieve a stable and robust tracker.


OpenECAD: An Efficient Visual Language Model for Computer-Aided Design

http://arxiv.org/abs/2406.09913v1

Compressor summary: The text describes OpenECAD, a system that uses fine-tuned visual language models to generate 2D sketches and 3D construction commands from images of 3D designs, enabling integration into manufacturing processes.


What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?

http://arxiv.org/abs/2406.09908v1

Compressor summary: The paper proposes Softmax Correlation, a new metric to rank classifiers' performance on unlabeled data from out-of-distribution distributions by measuring how similar their predictions are to ideal class correlations.


Label-Efficient Semantic Segmentation of LiDAR Point Clouds in Adverse Weather Conditions

http://arxiv.org/abs/2406.09906v1

Compressor summary: The paper proposes a label-efficient approach to segment LiDAR point clouds in adverse weather using few-shot semantic segmentation and semi-supervised learning with good weather data integration.


Nymeria: A Massive Collection of Multimodal Egocentric Daily Motion in the Wild

http://arxiv.org/abs/2406.09905v1

Compressor summary: Nymeria is a large, diverse, in-the-wild human motion dataset with rich annotations, including 3D motion ground truth, multimodal recordings from multiple devices, and hierarchical language descriptions of activities.


QQQ: Quality Quattuor-Bit Quantization for Large Language Models

http://arxiv.org/abs/2406.09904v1

Compressor summary: QQQ is a new quantization method that improves the speed and performance of large language models without extensive training by using adaptive smoothing and Hessian-based compensation, as well as engineered W4A8 GEMM kernels.


GEB-1.3B: Open Lightweight Large Language Model

http://arxiv.org/abs/2406.09900v1

Compressor summary: GEB-1.3B is a lightweight large language model that performs well on various tasks and runs efficiently on CPUs.


Learning Solution-Aware Transformers for Efficiently Solving Quadratic Assignment Problem

http://arxiv.org/abs/2406.09899v1

Compressor summary: The paper proposes a learning-based approach for efficiently solving quadratic assignment problems (QAPs), which are hard combinatorial optimization problems, by encoding facility and location nodes separately and using a solution transformer architecture to capture higher-order information.


Positive-Unlabelled Learning for Identifying New Candidate Dietary Restriction-related Genes among Ageing-related Genes

http://arxiv.org/abs/2406.09898v1

Compressor summary: The authors propose a novel gene prioritization method using Positive-Unlabelled Learning to improve the identification of dietary restriction-related genes and outperform existing methods.


Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation

http://arxiv.org/abs/2406.09896v1

Compressor summary: The study shows that combining Vision Foundation Models with Unsupervised Domain Adaptation improves generalization, speed, and performance in semantic segmentation tasks across diverse data domains.


3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

http://arxiv.org/abs/2406.09897v1

Compressor summary: The proposed 3D Rotary Position Encoding improves on the 2D version by providing controllable long-term decay and better position resolution for modeling long contexts in natural language understanding and language modeling tasks.


Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming

http://arxiv.org/abs/2406.09891v1

Compressor summary: The paper explores how state-of-the-art generative models struggle with elementary-level problem-solving tasks and proposes a novel benchmark using synthetic data to improve their performance.


A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation

http://arxiv.org/abs/2406.09881v1

Compressor summary: The paper proposes AMD$^2$G, a framework that augments data and trains models in two stages to enable dialogue generation in multiple domains with insufficient or no domain-specific training data.


Sailing in high-dimensional spaces: Low-dimensional embeddings through angle preservation

http://arxiv.org/abs/2406.09876v1

Compressor summary: Mercat is a new low-dimensional embedding method that reconstructs angles between data points, preserving local and global structures in high-dimensional data better than existing approaches.


IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning

http://arxiv.org/abs/2406.09870v1

Compressor summary: IGL-Bench is a benchmark for imbalanced graph learning that evaluates 24 algorithms on node-level and graph-level tasks under class-imbalance and topology-imbalance, providing insights and opportunities to improve performance.


Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites Paradox

http://arxiv.org/abs/2406.09867v1

Compressor summary: The paper proposes a new benchmark, IS-OOD, that divides test samples into subsets with different semantic and covariate shifts to address the issue of marginal OOD samples having close semantic contents to ID samples.


LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data

http://arxiv.org/abs/2406.09864v1

Compressor summary: LUMA is a new dataset for learning from uncertain and multimodal data, featuring audio, image, and textual data from 50 classes, with tools to control uncertainty and evaluate robustness in multimodal deep learning models.


Dataset Condensation with Latent Quantile Matching

http://arxiv.org/abs/2406.09860v1

Compressor summary: The paper proposes a new method, Latent Quantile Matching (LQM), to improve distribution matching-based dataset condensation by better aligning latent embeddings and addressing outliers.


Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment

http://arxiv.org/abs/2406.09858v1

Compressor summary: SLIQUE is a new blind image quality assessment (IQA) model that uses joint vision-language learning to analyze semantic content, distortion characteristics, and appearance properties of images, outperforming existing methods.


On the Encoding of Gender in Transformer-based ASR Representations

http://arxiv.org/abs/2406.09855v1

Compressor summary: This study analyzes how gender is represented and used in two ASR models and shows that it's possible to remove gender information with minimal performance impact, suggesting the potential of creating gender-neutral embeddings.


GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion

http://arxiv.org/abs/2406.09850v1

Compressor summary: The paper introduces GradeADreamer, a three-stage training pipeline that produces high-quality 3D assets with minimal issues and fast generation time using a Multi-view Diffusion Model and StableDiffusion.


Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge

http://arxiv.org/abs/2406.09841v1

Compressor summary: MV-Mol is a model that learns molecular representations from different sources, improving property prediction and multi-modal comprehension in chemistry and life science.


Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting

http://arxiv.org/abs/2406.09839v1

Compressor summary: The study uses a large language model to create virtual agents that can build rapport with humans through small talk, and finds that free-form dialogue strategies improve subjective measures of rapport.


Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps

http://arxiv.org/abs/2406.09838v1

Compressor summary: The paper introduces ClimateIQA, a meteorological VQA dataset, SPOT, a technique to capture color contours in heatmaps, and Climate-Zoo, a collection of meteorological VLMs that significantly improve EWED accuracy.


TabularFM: An Open Framework For Tabular Foundational Models

http://arxiv.org/abs/2406.09837v1

Compressor summary: TabularFM is an open-source framework that develops foundational models for tabular data using various neural architectures, curated datasets, and pretrained models.


Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks

http://arxiv.org/abs/2406.09836v1

Compressor summary: The paper proposes a method to detect and counteract graph backdoor attacks using random edge dropping and robust training for GNNs.


I Know How: Combining Prior Policies to Solve New Tasks

http://arxiv.org/abs/2406.09835v1

Compressor summary: The I Know How (IKH) framework helps agents learn and adapt efficiently to dynamic environments by using modular and compositional knowledge.


SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answering

http://arxiv.org/abs/2406.09833v1

Compressor summary: SHMamba is a new model that uses hyperbolic geometry and state space models to better represent hierarchical structures and relationships in audio-visual data, resulting in improved performance and reduced computational costs compared to previous methods.


Open-Vocabulary Semantic Segmentation with Image Embedding Balancing

http://arxiv.org/abs/2406.09829v1

Compressor summary: EBSeg is a novel framework for open-vocabulary semantic segmentation that uses an Adaptively Balanced Decoder and Semantic Structure Consistency loss to improve generalization ability and overcome overfitting issues, achieving state-of-the-art results.


HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning

http://arxiv.org/abs/2406.09827v1

Compressor summary: HiP is a novel approach for large language models that reduces time and space complexity of attention mechanisms, enabling efficient handling of long context sequences without retraining pre-trained models.


Unraveling Anomalies in Time: Unsupervised Discovery and Isolation of Anomalous Behavior in Bio-regenerative Life Support System Telemetry

http://arxiv.org/abs/2406.09825v1

Compressor summary: The study analyzes anomalies in a space greenhouse using time series clustering to better understand their causes and improve condition monitoring.


From Manifestations to Cognitive Architectures: a Scalable Framework

http://arxiv.org/abs/2406.09823v1

Compressor summary: The paper proposes a new method to interpret reality as an information source and build cognitive architectures using spatial distributed representations in a scalable way.


Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments

http://arxiv.org/abs/2406.09815v1

Compressor summary: RAFTS is a method that uses evidence retrieval and contrasting arguments to verify claim credibility and improve fact verification performance.


RaNeuS: Ray-adaptive Neural Surface Reconstruction

http://arxiv.org/abs/2406.09801v1

Compressor summary: The paper proposes a method (RaNeuS) to improve 3D surface reconstruction using a differentiable radiance field by adaptively adjusting regularization and projection, achieving better results than existing methods.


DeltaPhi: Learning Physical Trajectory Residual for PDE Solving

http://arxiv.org/abs/2406.09795v1

Compressor summary: The paper proposes DeltaPhi, a method that improves the learning of physical dynamics in neural operator networks by predicting and reducing residuals between a solved trajectory and an auxiliary one.


SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

http://arxiv.org/abs/2406.09794v1

Compressor summary: SuperSVG is a fast and accurate image vectorization model that uses superpixels and a two-stage self-training framework with dynamic path warping loss to convert raster images to SVGs.


A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion

http://arxiv.org/abs/2406.09792v1

Compressor summary: Key points: - Depth images have many applications but are challenging for complex indoor scenes - A Transformer-based network with self-supervision pre-training and token fusion is proposed - The method outperforms existing methods on the Matterport3D dataset and can be used for 3D reconstruction Summary: The paper presents a novel Transformer-based network that learns to complete depth images from RGB images for complex indoor scenes, achieving state-of-the-art results and enabling 3D reconstruction.


Pcc-tuning: Breaking the Contrastive Learning Ceiling in Semantic Textual Similarity

http://arxiv.org/abs/2406.09790v1

Compressor summary: The paper proposes Pcc-tuning, a method that uses Pearson's correlation coefficient as a loss function to improve semantic textual similarity beyond contrastive learning.


OpenCapBench: A Benchmark to Bridge Pose Estimation and Biomechanics

http://arxiv.org/abs/2406.09788v1

Compressor summary: OpenCapBench is a unified benchmark for human pose estimation that considers physiological constraints and improves keypoint density for accurate biomechanics analysis using synthetic data finetuning.


Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion

http://arxiv.org/abs/2406.09782v1

Compressor summary: The authors propose a robust unsupervised monocular depth estimation model using a diffusion model, a hierarchical feature-guided denoising module, and an implicit depth consistency loss.


GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding

http://arxiv.org/abs/2406.09781v1

Compressor summary: This study evaluates how well multimodal LLMs can recognize piglet activities from video clips and suggests that they have potential for animal behavior understanding in livestock scenarios, especially GPT-4o.


OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

http://arxiv.org/abs/2406.09779v1

Compressor summary: The study presents a novel method to detect harmful memes in multiple languages using image captioning, OCR, and LLM analysis, achieving top-1 performance at the Online Safety Prize Challenge.


A lightweight residual network for unsupervised deformable image registration

http://arxiv.org/abs/2406.09774v1

Compressor summary: The paper presents a CNN-based image registration method with an enhanced receptive field, low number of parameters, and good performance on limited training data, outperforming or being comparable to transformer-based methods.


Research on Edge Detection of LiDAR Images Based on Artificial Intelligence Technology

http://arxiv.org/abs/2406.09773v1

Compressor summary: The paper proposes a deep learning-based edge detection method for LiDAR images that improves accuracy and efficiency compared to traditional methods and has practical application value.


Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

http://arxiv.org/abs/2406.09770v1

Compressor summary: The paper proposes a method to efficiently approximate the Pareto front of large neural networks using mixture of experts (MoE) for multi-objective optimization tasks.


Bayesian Conditioned Diffusion Models for Inverse Problems

http://arxiv.org/abs/2406.09768v1

Compressor summary: The paper introduces BCDM, a novel Bayesian method to condition diffusion models for optimal image reconstruction tasks, achieving state-of-the-art results in various problems.


Full-reference Point Cloud Quality Assessment Using Spectral Graph Wavelets

http://arxiv.org/abs/2406.09762v1

Compressor summary: The paper presents a new method for assessing the quality of 3D point clouds using spectral graph wavelets, which improves accuracy and correlates better with human perception than existing methods.


Bootstrapping Language Models with DPO Implicit Rewards

http://arxiv.org/abs/2406.09760v1

Compressor summary: The DICE approach improves large language model alignment using direct preference optimization and a bootstrapped implicit reward model.


Grounding Image Matching in 3D with MASt3R

http://arxiv.org/abs/2406.09756v1

Compressor summary: MASt3R is a 3D image matching method that combines DUSt3R's robustness with dense features, reciprocal matching, and theoretical guarantees to significantly outperform existing methods.


Mix Q-learning for Lane Changing: A Collaborative Decision-Making Method in Multi-Agent Deep Reinforcement Learning

http://arxiv.org/abs/2406.09755v1

Compressor summary: Mix Q-learning for Lane Changing (MQLC) is a method that uses deep reinforcement learning to improve autonomous vehicle path planning by integrating individual and collective benefits for better traffic efficiency and safety.


LAVIB: A Large-scale Video Interpolation Benchmark

http://arxiv.org/abs/2406.09754v1

Compressor summary: LAVIB is a large dataset for video frame interpolation tasks that includes various metrics and challenges based on video characteristics like motion, luminance, sharpness, and contrast.


ControlVAR: Exploring Controllable Visual Autoregressive Modeling

http://arxiv.org/abs/2406.09750v1

Compressor summary: ControlVAR is a new framework that uses visual autoregressive modeling to allow flexible and efficient control over images in various conditional generation tasks.


How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis

http://arxiv.org/abs/2406.09745v1

Compressor summary: This paper proposes a probabilistic framework for domain generalization that combines gradient and representation alignment, and introduces new methods for complex distribution matching to improve robustness and generalization.


Deep Symbolic Optimization for Combinatorial Optimization: Accelerating Node Selection by Discovering Potential Heuristics

http://arxiv.org/abs/2406.09740v1

Compressor summary: The text proposes a novel framework that combines data-driven and symbolic methods for node selection in combinatorial optimization solvers, improving performance and interpretability on CPU machines.


Decoupling Forgery Semantics for Generalizable Deepfake Detection

http://arxiv.org/abs/2406.09739v1

Compressor summary: The paper presents a new method for detecting DeepFakes by decoupling unique and common forgery semantics, improving the generalization of detection.


Automated GIS-Based Framework for Detecting Crosswalk Changes from Bi-Temporal High-Resolution Aerial Images

http://arxiv.org/abs/2406.09731v1

Compressor summary: This study develops an automated framework to detect changes in crosswalks using high-resolution images, finding over 3,000 crosswalk changes in three Florida counties that can inform traffic and safety studies.


Neural Pose Representation Learning for Generating and Transferring Non-Rigid Object Poses

http://arxiv.org/abs/2406.09728v1

Compressor summary: Key points: - A new method for learning 3D deformable object poses - Disentangles pose from identity, facilitates pose variation and transfer - Does not require explicit shape parameterization or supervision - Uses keypoint-based hybrid representation and implicit deformation field - Achieves state-of-the-art performance on DeformThings4D and Human datasets Summary: The authors present a novel method for learning pose representations of 3D deformable objects that can disentangle, vary, and transfer poses without explicit shape parameterization or supervision, using keypoint-based and implicit deformation techniques. They show superior results on two benchmarks.


PixRO: Pixel-Distributed Rotational Odometry with Gaussian Belief Propagation

http://arxiv.org/abs/2406.09726v1

Compressor summary: The paper proposes a new pixel-level method for estimating rotational motion in visual sensors, reducing data transmission and processing costs.


When Will Gradient Regularization Be Harmful?

http://arxiv.org/abs/2406.09723v1

Compressor summary: This paper proposes three GR warmup strategies to improve performance and stability in adaptive optimization scenarios, especially for scalable models.


Cross-view geo-localization: a survey

http://arxiv.org/abs/2406.09722v1

Compressor summary: Key points: - Cross-view geo-localization is a challenging but important computer vision task. - The paper reviews feature-based and deep learning methods, as well as the challenges and solutions involved. - It also discusses benchmark datasets, evaluation metrics, and future research directions. Summary: The paper surveys cross-view geo-localization techniques, focusing on feature-based and deep learning approaches, and highlights the challenges, datasets, metrics, and applications of this task.


Self-Knowledge Distillation for Learning Ambiguity

http://arxiv.org/abs/2406.09719v1

Compressor summary: The paper proposes a self-knowledge distillation method that improves natural language understanding by learning label distributions from lower layers and re-calibrating confidence for ambiguous samples.


UniBridge: A Unified Approach to Cross-Lingual Transfer Learning for Low-Resource Languages

http://arxiv.org/abs/2406.09717v1

Compressor summary: UniBridge improves Cross-Lingual Transfer Learning by optimizing embeddings initialization and vocabulary size for languages with limited resources.


Meta-Learning Loss Functions for Deep Neural Networks

http://arxiv.org/abs/2406.09713v1

Compressor summary: Meta-learning explores using past experiences from similar tasks to improve performance, focusing on the often-overlooked loss function component in this thesis.


AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming

http://arxiv.org/abs/2406.09711v1

Compressor summary: The framework uses AI models to analyze livestock behavior from video data without invasive tagging, providing insights for activity detection, counting, health assessments, and posture analyses.


Fine-Grained Urban Flow Inference with Multi-scale Representation Learning

http://arxiv.org/abs/2406.09710v1

Compressor summary: UrbanMSR is a model that uses self-supervised learning to infer fine-grained urban traffic flows from coarse-grained data, capturing multi-scale and dynamic information for better efficiency and safety.


Detecting Response Generation Not Requiring Factual Judgment

http://arxiv.org/abs/2406.09702v1

Compressor summary: The study aimed to create a dialogue dataset and develop a model that can predict sentences needing fact-checking in conversations, achieving both attractiveness and factuality.


Compressed Video Quality Enhancement with Temporal Group Alignment and Fusion

http://arxiv.org/abs/2406.09693v1

Compressor summary: The paper presents a method to improve compressed videos using temporal group alignment and fusion of features from neighboring frames, achieving better quality and lower complexity than current methods.


FreeCtrl: Constructing Control Centers with Feedforward Layers for Learning-Free Controllable Text Generation

http://arxiv.org/abs/2406.09688v1

Compressor summary: FreeCtrl is a learning-free method for controlling text generation that adjusts neural network weights to produce desired attributes in output.


Explainable AI for Comparative Analysis of Intrusion Detection Models

http://arxiv.org/abs/2406.09684v1

Compressor summary: This paper evaluates various machine learning models for intrusion detection from network traffic using occlusion sensitivity and finds that Random Forest performs best in accuracy, efficiency, and robustness.


Asymmetrical Siamese Network for Point Clouds Normal Estimation

http://arxiv.org/abs/2406.09681v1

Compressor summary: The paper proposes an Asymmetric Siamese Network to improve point cloud normal estimation by exploring intrinsic feature consistency across different noise levels, and introduces a new multi-view dataset with diverse shapes and noise levels to evaluate methods and reduce overfitting.


Exploring Training on Heterogeneous Data with Mixture of Low-rank Adapters

http://arxiv.org/abs/2406.09679v1

Compressor summary: The study explores using Mixture of Low-rank Adapters (MoLA) to efficiently mitigate training conflicts among heterogeneous data in artificial general intelligence models, and introduces two variants for target-aware and target-agnostic scenarios.


Benchmarking Spectral Graph Neural Networks: A Comprehensive Study on Effectiveness and Efficiency

http://arxiv.org/abs/2406.09675v1

Compressor summary: This paper benchmarks over 30 spectral graph neural networks (GNNs), analyzes their frequency characteristics, and provides a unified framework for efficient evaluation and selection of these models for large-scale tasks.


Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

http://arxiv.org/abs/2406.09671v1

Compressor summary: The study shows that ChatGPT-4 Vision, a visual model, performed better than average students in a computer science exam, but faced challenges with question interpretation and logical reasoning, highlighting the importance of human oversight in assessments.


Learning Language Structures through Grounding

http://arxiv.org/abs/2406.09662v1

Compressor summary: The text discusses learning language structures through grounding, using various data sources and modalities, and improving parsing, program synthesis, and cross-lingual tasks.


ScaLES: Scalable Latent Exploration Score for Pre-Trained Generative Networks

http://arxiv.org/abs/2406.09657v1

Compressor summary: ScaLES is a method that reduces over-exploration in Latent Space Optimization, improving the quality of solutions for black-box discrete optimization problems.


RSEND: Retinex-based Squeeze and Excitation Network with Dark Region Detection for Efficient Low Light Image Enhancement

http://arxiv.org/abs/2406.09656v1

Compressor summary: RSEND is a one-stage Retinex theory based framework that enhances low-light images by capturing details with Squeeze and Excitation network, achieving significant improvement over other CNN-based models.


An Intrinsic Vector Heat Network

http://arxiv.org/abs/2406.09648v1

Compressor summary: The paper presents a novel neural network architecture for learning tangent vector fields on surfaces in 3D that preserves intrinsic properties and is robust to various deformations.


OpenAnimalTracks: A Dataset for Animal Track Recognition

http://arxiv.org/abs/2406.09647v1

Compressor summary: The paper introduces OpenAnimalTracks, a labeled dataset for automated animal footprint classification and detection, which can help with biodiversity preservation.


A Survey of Video Datasets for Grounded Event Understanding

http://arxiv.org/abs/2406.09646v1

Compressor summary: The paper surveys 105 video datasets that require event understanding capability and discusses how they can help study robust video event extraction tasks, considering the temporal nature and ambiguity of visual content.


Reinforced Decoder: Towards Training Recurrent Neural Networks for Time Series Forecasting

http://arxiv.org/abs/2406.09643v1

Compressor summary: The study introduces a reinforced decoder method that uses auxiliary inputs and reinforcement learning to improve multi-step-ahead time series forecasting accuracy.


TGB 2.0: A Benchmark for Learning on Temporal Knowledge Graphs and Heterogeneous Graphs

http://arxiv.org/abs/2406.09639v1

Compressor summary: TGB 2.0 is a benchmarking framework for evaluating predictions on large-scale temporal graphs with new datasets and realistic evaluation pipeline.


RASPNet: A Benchmark Dataset for Radar Adaptive Signal Processing Applications

http://arxiv.org/abs/2406.09638v1

Compressor summary: The paper introduces RASPNet, a large-scale dataset for radar adaptive signal processing, that covers diverse real-world environments and contains 10,000 clutter realizations per scenario.


Industrial Language-Image Dataset (ILID): Adapting Vision Foundation Models for Industrial Settings

http://arxiv.org/abs/2406.09637v1

Compressor summary: Key points: - Large Language Models (LLM) influence computer vision with multimodal datasets and self-/semi-supervised learning - Vision Foundation Models (VFM) generalize well on everyday objects or scenes, but not in specialized domains - The paper introduces a pipeline to generate the Industrial Language-Image Dataset (ILID) from web-crawled data - The paper shows effective self-supervised transfer learning and downstream tasks after training on ILID without human labeling Summary: The paper presents a web-crawling pipeline to create an industrial dataset for self-supervised vision models, which improves their performance in specialized domains without human labels.