arxiv compressed, 2024-02-20

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-20 generated by the compressor, my personal LLM-based project.


Binary Opacity Grids: Capturing Fine Geometric Detail for Mesh-Based View Synthesis

http://arxiv.org/abs/2402.12377v1

Compressor summary: The paper proposes a method to improve surface reconstruction from volumetric density fields by using a discrete opacity grid, multiple ray casting, binary entropy minimization, and mesh fusion.


FiT: Flexible Vision Transformer for Diffusion Model

http://arxiv.org/abs/2402.12376v1

Compressor summary: The Flexible Vision Transformer (FiT) is a new image generation model that can handle unrestricted resolutions and aspect ratios, overcoming the limitations of existing diffusion models.


Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

http://arxiv.org/abs/2402.12374v1

Compressor summary: Sequoia is a new algorithm for faster inference with large language models that adapts to different settings and hardware, outperforming previous methods in speed and robustness.


HunFlair2 in a cross-corpus evaluation of named entity recognition and normalization tools

http://arxiv.org/abs/2402.12372v1

Compressor summary: The study evaluates biomedical text mining tools' performance on different corpora, showing that they perform worse than reported and suggesting further research for improved robustness.


AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies

http://arxiv.org/abs/2402.12370v1

Compressor summary: ANALOBENCH is a benchmark for testing analogical reasoning ability in language models, which may struggle with lengthy scenarios and recalling relevant experiences.


A synthetic data approach for domain generalization of NLI models

http://arxiv.org/abs/2402.12368v1

Compressor summary: The paper proposes a new approach to generate diverse and creative synthetic NLI data for improving the generalization of NLI models to out-of-distribution domains.


A Critical Evaluation of AI Feedback for Aligning Large Language Models

http://arxiv.org/abs/2402.12366v1

Compressor summary: RLAIF's improvement from reinforcement learning is mostly due to using a weaker teacher model, while supervised fine-tuning with GPT-4 as the teacher performs better and its gains vary across factors.


Universal Physics Transformers

http://arxiv.org/abs/2402.12365v1

Compressor summary: Universal Physics Transformers (UPTs) are a novel learning paradigm that can model various spatio-temporal problems in computational fluid dynamics without relying on grid- or particle-based latent structures.


Emergent Word Order Universals from Cognitively-Motivated Language Models

http://arxiv.org/abs/2402.12363v1

Compressor summary: The study shows that cognitive biases and predictability can explain word-order universals in languages using computational simulations with cognitively-motivated language models.


LoRA+: Efficient Low Rank Adaptation of Large Models

http://arxiv.org/abs/2402.12354v1

Compressor summary: The paper proposes LoRA+, a corrected version of LoRA that uses different learning rates for adapter matrices A and B, improving performance and finetuning speed without increasing computational cost.


Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge

http://arxiv.org/abs/2402.12352v1

Compressor summary: The paper proposes a new method to improve large language models' ability to find relevant information in biomedical research by using a knowledge graph to reduce overrepresented concepts and achieve better results than existing methods.


GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

http://arxiv.org/abs/2402.12348v1

Compressor summary: This paper evaluates large language models' strategic reasoning in various game scenarios using GTBench, a new environment with 10 tasks, and finds differences between open-source and commercial LLMs as well as the impact of code-pretraining and advanced reasoning methods.


Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

http://arxiv.org/abs/2402.12343v1

Compressor summary: The text introduces Emulated Disalignment (ED), an attack framework that shows how safety alignment can accidentally enable harmful outcomes in large language models when manipulated adversely, and highlights the need to reevaluate open-sourcing such models.


Triple-Encoders: Representations That Fire Together, Wire Together

http://arxiv.org/abs/2402.12332v1

Compressor summary: The study proposes triple-encoders for dialog modeling that use co-occurrence learning to efficiently compute utterance mixtures and improve over bi-encoders and single-vector models.


Generating Survival Interpretable Trajectories and Data

http://arxiv.org/abs/2402.12331v1

Compressor summary: Key points: - A new autoencoder model generates survival trajectories, predictions, and additional data for event-based features. - The model uses a weighting scheme for robustness and solves a classification task to determine censored indicators. - The model is efficient and has been tested on synthetic and real datasets. Summary: The paper presents an autoencoder model that generates survival trajectories, predictions, and data for event-based features using a weighting scheme, a classification task, and a specific structure.


Query-Based Adversarial Prompt Generation

http://arxiv.org/abs/2402.12329v1

Compressor summary: The text describes a new attack method on language models that exploits API access to remotely create harmful outputs, outperforming previous methods on GPT-3.5 and OpenAI's safety classifier.


Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM Agents

http://arxiv.org/abs/2402.12327v1

Compressor summary: Large language model agents can spontaneously form collaborations and mimic human social interactions, which could be useful for computational social science.


LLM Agents for Psychology: A Study on Gamified Assessments

http://arxiv.org/abs/2402.12326v1

Compressor summary: PsychoGAT is a method that uses large language models to create engaging interactive fiction games for psychological assessment, achieving high quality results in various metrics.


Landmark Stereo Dataset for Landmark Recognition and Moving Node Localization in a Non-GPS Battlefield Environment

http://arxiv.org/abs/2402.12320v1

Compressor summary: The paper proposes a new strategy for tracking troops in GPS-denied environments using landmark recognition and stereo matching, and aims to improve troop safety.


Dynamic Environment Responsive Online Meta-Learning with Fairness Awareness

http://arxiv.org/abs/2402.12319v1

Compressor summary: FairSAOML is a novel meta-learning algorithm that adapts to dynamic environments while ensuring fairness and accuracy in acquiring new tasks over time.


ARKS: Active Retrieval in Knowledge Soup for Code Generation

http://arxiv.org/abs/2402.12317v1

Compressor summary: The paper introduces ARKS, a strategy to improve code generation by integrating various sources of knowledge and using active retrieval to refine queries and update the knowledge base.


TILP: Differentiable Learning of Temporal Logical Rules on Knowledge Graphs

http://arxiv.org/abs/2402.12309v1

Compressor summary: TILP is a framework for learning temporal logical rules in temporal knowledge graphs, which improves performance and interpretability on various challenging scenarios.


Multi-View Conformal Learning for Heterogeneous Sensor Fusion

http://arxiv.org/abs/2402.12307v1

Compressor summary: The text discusses the importance of assessing individual predictions' confidence in machine learning models, especially for critical applications, and presents multi-view conformal models for heterogeneous sensor fusion that provide trustworthiness guarantees.


UncertaintyTrack: Exploiting Detection and Localization Uncertainty in Multi-Object Tracking

http://arxiv.org/abs/2402.12303v1

Compressor summary: UncertaintyTrack is a method that uses localization uncertainty estimates from probabilistic object detection to improve multi-object tracking for autonomous driving.


Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports

http://arxiv.org/abs/2402.12298v1

Compressor summary: The paper compares GPT-4, a commercial large language model, to various open-source models in radiology report labeling tasks using different prompting techniques and finds that GPT-4 outperforms others in zero-shot but open-source models can match GPT-4 with few-shot prompts.


KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students

http://arxiv.org/abs/2402.12291v1

Compressor summary: Key points: - Flashcard schedulers use student models and teaching policies to predict and schedule flashcards - Existing student models ignore semantic ties of flashcards - KARL is a DKT-inspired model that uses retrieval and BERT embeddings for efficient recall predictions - KARL outperforms existing models on a new dataset of diverse study history - A novel teaching policy exploits the predictive power of KARL for online deployment - KARL improves medium-term learning Summary: KARL is an improved flashcard scheduler that captures semantic relations and uses retrieval and BERT embeddings to predict student recall, outperforming existing models and enhancing medium-term learning.


DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

http://arxiv.org/abs/2402.12289v1

Compressor summary: DriveVLM is an autonomous driving system that uses Vision-Language Models to understand and plan for complex urban scenarios, while DriveVLM-Dual combines its strengths with traditional methods for improved performance.


Refining Minimax Regret for Unsupervised Environment Design

http://arxiv.org/abs/2402.12284v1

Compressor summary: ReMiDi is an algorithm that trains reinforcement learning agents on levels generated by an adversary who balances exploration and exploitation, preventing learning stagnation.


Ontology Enhanced Claim Detection

http://arxiv.org/abs/2402.12282v1

Compressor summary: The authors propose a new claim detection method that combines sentence and ontology embeddings, which outperforms other models on two datasets.


Adaptive Skeleton Graph Decoding

http://arxiv.org/abs/2402.12280v1

Compressor summary: SGD improves LLM inference quality and performance by using dependencies between sub-problems and difficulty estimates to select an appropriate model size for parallel decoding.


Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks

http://arxiv.org/abs/2402.12279v1

Compressor summary: The paper compares different approaches to improve zero-shot cross-lingual text generation, finding that careful learning rate tuning and alternative backbone models can achieve similar performance to data translation.


WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

http://arxiv.org/abs/2402.12275v1

Compressor summary: Our agent builds a Python program from its environment interactions, using a world model that balances explanation and optimism, improving efficiency on gridworlds.


End-to-end Supervised Prediction of Arbitrary-size Graphs with Partially-Masked Fused Gromov-Wasserstein Matching

http://arxiv.org/abs/2402.12269v1

Compressor summary: The paper proposes a new deep learning method for Supervised Graph Prediction using an original Optimal Transport-based loss and a transformer architecture, which performs well on various tasks and datasets.


High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models

http://arxiv.org/abs/2402.12267v1

Compressor summary: The study shows that large language models can significantly improve data-to-text generation for under-resourced languages, but BLEU scores may not be a reliable metric for evaluating them.


Uncertainty quantification in fine-tuned LLMs using LoRA ensembles

http://arxiv.org/abs/2402.12264v1

Compressor summary: The text discusses improving language models' performance on specific tasks using low-rank adaptation ensembles and analyzing their uncertainty quantification on multiple-choice datasets.


Towards a tailored mixed-precision sub-8bit quantization scheme for Gated Recurrent Units using Genetic Algorithms

http://arxiv.org/abs/2402.12263v1

Compressor summary: A modular integer quantization scheme for GRUs uses Genetic Algorithms to explore bit widths and optimize model size and accuracy, achieving Pareto efficiency on sequential tasks.


NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms

http://arxiv.org/abs/2402.12261v1

Compressor summary: The text discusses how new word forms (neologisms) cause data drift in large language models, leading to reduced performance, and proposes a benchmark to evaluate their generalization ability.


Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships

http://arxiv.org/abs/2402.12259v1

Compressor summary: Open3DSG is a method to predict 3D scene graphs from point clouds using open world vision language models without labeled data, enabling expression of rare and specific objects and relationships.


Shallow Synthesis of Knowledge in GPT-Generated Texts: A Case Study in Automatic Related Work Composition

http://arxiv.org/abs/2402.12255v1

Compressor summary: The study compares AI-assisted scholarly writing with original and purely AI-generated texts, finding that GPT-4 can help with brainstorming but needs human input for detailed synthesis.


Analysis of Levenshtein Transformer's Decoder and Its Variants

http://arxiv.org/abs/2402.12249v1

Compressor summary: The text is about a machine translation model called Levenshtein transformer (LevT), which has high efficiency and quality, but may have weaknesses in its decoding results that could be improved by KD and translation memory.


Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark

http://arxiv.org/abs/2402.12243v1

Compressor summary: Text-to-SQL models face challenges due to noise in questions, gold queries, and database values, which impacts the reliability of benchmarks and affects model performance.


Synthetic location trajectory generation using categorical diffusion models

http://arxiv.org/abs/2402.12242v1

Compressor summary: The text proposes using diffusion probabilistic models (DPMs) to generate realistic individual location trajectories (ILTs) for mobility research and decision-making.


Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis

http://arxiv.org/abs/2402.12241v1

Compressor summary: The paper analyzes how gradient descent in recurrent neural networks can achieve optimality without overparameterization for dynamical systems with long-term dependencies.


BEARS Make Neuro-Symbolic Models Aware of their Reasoning Shortcuts

http://arxiv.org/abs/2402.12240v1

Compressor summary: Key points: - NeSy predictors use symbolic knowledge and can learn unintended semantics - RSs affect reliability, generalization, and confidence of NeSy models - bears is an ensembling technique that calibrates concept-level confidence and encourages uncertainty about RSs Summary: The paper proposes bears, a method to improve NeSy predictors' awareness of reasoning shortcuts that compromise their quality and confidence, by calibrating their concept-level confidence.


Mixed Gaussian Flow for Diverse Trajectory Prediction

http://arxiv.org/abs/2402.12238v1

Compressor summary: The paper proposes a flow-based model that transforms a mixed Gaussian prior into the future trajectory manifold, enabling diverse, controllable, and out-of-distribution trajectory generation with explicit interpretability.


Learning to Defer in Content Moderation: The Human-AI Interplay

http://arxiv.org/abs/2402.12237v1

Compressor summary: Key points: - The paper proposes a model to capture the human-AI collaboration in content moderation - The model accounts for prediction uncertainty, time-varying factors, and selective sampling - The algorithm makes decisions on classification, admission, and scheduling of posts - Human reviews help improve machine learning but are delayed by congestion - The paper develops a near-optimal learning algorithm that balances multiple losses Summary: The paper presents a model for human-AI collaboration in content moderation with uncertainty, selective sampling, and congestion, and designs a near-optimal learning algorithm that considers different losses.


The Fundamental Limits of Least-Privilege Learning

http://arxiv.org/abs/2402.12235v1

Compressor summary: The least-privilege principle for machine learning, which aims to find useful representations without revealing sensitive information, has a fundamental trade-off between utility and leakage that cannot be overcome by any learning technique.


Task-Oriented Dialogue with In-Context Learning

http://arxiv.org/abs/2402.12234v1

Compressor summary: The system combines large language models with business logic to create efficient chatbots that can handle complex dialogues.


Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers

http://arxiv.org/abs/2402.12233v1

Compressor summary: The paper explores the impact of updating either keys or values in feed-forward networks (FFNs) on language model performance and understanding.


Diffusion Tempering Improves Parameter Estimation with Probabilistic Integrators for Ordinary Differential Equations

http://arxiv.org/abs/2402.12231v1

Compressor summary: The text proposes diffusion tempering, a technique to improve gradient-based parameter optimization in ordinary differential equations by adjusting noise in probabilistic integrators.


AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

http://arxiv.org/abs/2402.12226v1

Compressor summary: AnyGPT is a multimodal language model that can process various modalities like speech, text, images, and music by using discrete representations without changing the LLM architecture or training paradigms.


Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

http://arxiv.org/abs/2402.12225v1

Compressor summary: The paper proposes Argus3D, a novel framework for 3D shape generation using discrete representation learning and an ensemble of public datasets, improving capacity and scalability over traditional auto-regressive models.


Reformatted Alignment

http://arxiv.org/abs/2402.12219v1

Compressor summary: The paper proposes ReAlign, a method to improve the alignment of large language models with human values by reformatting instruction data responses.


Polarization of Autonomous Generative AI Agents Under Echo Chambers

http://arxiv.org/abs/2402.12212v1

Compressor summary: The study shows that AI agents based on ChatGPT can become polarized in echo chamber environments due to their ability to update opinions considering their own and others' views, and suggests monitoring factors like persona to prevent this.


Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages

http://arxiv.org/abs/2402.12204v1

Compressor summary: SDRRL is a method to improve multilingual LLMs by self-distilling from resource-rich languages without relying solely on translation.


Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT

http://arxiv.org/abs/2402.12201v1

Compressor summary: The paper proposes a circuit discovery framework that uses sparse dictionary learning to extract monosemantic features from neural models and identify interpretable, local model behaviors.


Zero shot VLMs for hate meme detection: Are we there yet?

http://arxiv.org/abs/2402.12198v1

Compressor summary: The study investigates the effectiveness of visual language models in detecting harmful memes without labeled datasets and finds they struggle with zero-shot classification.


Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

http://arxiv.org/abs/2402.12195v1

Compressor summary: The paper proposes a browse-and-concentrate paradigm to improve multimodal context fusion in Multimodal Large Language Models for better understanding of multiple images and their instructions.


A Chinese Dataset for Evaluating the Safeguards in Large Language Models

http://arxiv.org/abs/2402.12193v1

Compressor summary: This paper introduces a dataset for evaluating the safety of Chinese LLMs and proposes fine-grained assessment criteria to identify and measure various types of risk in their responses.


Pan-Mamba: Effective pan-sharpening with State Space Model

http://arxiv.org/abs/2402.12192v1

Compressor summary: Pan-Mamba is a novel pansharpening network that uses Mamba model to efficiently exchange and fuse multi-spectral and panchromatic images for high-resolution output.


Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships

http://arxiv.org/abs/2402.12189v1

Compressor summary: The paper presents a novel attack on neural language models that amplifies the exposure of their training data by fine-tuning them with generated texts based on membership probabilities.


ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

http://arxiv.org/abs/2402.12185v1

Compressor summary: This paper introduces ChartX, a multi-modal benchmark for evaluating large language models' ability to understand and reason with charts, and ChartVLM, a new model that outperforms existing models on chart tasks.


Colorizing Monochromatic Radiance Fields

http://arxiv.org/abs/2402.12184v1

Compressor summary: The paper presents a method to reproduce color from monochromatic images using Lab color space representation and an image colorization module, enabling colorful Neural Radiance Fields (NeRF) even with monochrome input.


MultiFIX: An XAI-friendly feature inducing approach to building models from multimodal data

http://arxiv.org/abs/2402.12183v1

Compressor summary: MultiFIX is an interpretability-focused multimodal data fusion pipeline that uses deep learning and symbolic expressions to combine features from different data modalities and make predictions in health domains.


Revisiting Data Augmentation in Deep Reinforcement Learning

http://arxiv.org/abs/2402.12181v1

Compressor summary: The text discusses data augmentation techniques for image-based deep reinforcement learning, analyzes their effects, compares them, and proposes a novel regularization term called tangent prop that improves performance and sample efficiency.


Examining Monitoring System: Detecting Abnormal Behavior In Online Examinations

http://arxiv.org/abs/2402.12179v1

Compressor summary: The Exam Monitoring System detects abnormal student behavior during online exams to help proctors prevent cheating.


Mafin: Enhancing Black-Box Embeddings with Model Augmented Fine-tuning

http://arxiv.org/abs/2402.12177v1

Compressor summary: Mafin is a new method that improves fine-tuning black-box embedding models for Retrieval Augmented Generation by adding a trainable model to the process.


Learning Discretized Bayesian Networks with GOMEA

http://arxiv.org/abs/2402.12175v1

Compressor summary: DBN-GOMEA is an improved evolutionary algorithm for learning Bayesian networks from real-valued data by jointly optimizing variable discretization, resulting in compact models that can be easily inspected by experts and incorporate their knowledge.


BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence

http://arxiv.org/abs/2402.12174v1

Compressor summary: BIDER is a method that improves large language models' answer quality and efficiency by refining retrieval documents into Key Supporting Evidence using knowledge synthesis, supervised fine-tuning, and preference alignment.


Endowing Pre-trained Graph Models with Provable Fairness

http://arxiv.org/abs/2402.12161v1

Compressor summary: GraphPAR is a novel framework that improves the fairness of pre-trained graph models using parameter-efficient adapters with provable lower bounds on prediction bias.


Transformer-based Causal Language Models Perform Clustering

http://arxiv.org/abs/2402.12151v1

Compressor summary: The study analyzes how a Transformer-based language model learns to follow instructions by clustering data within its hidden space, which helps it handle new instances.


Your Large Language Model is Secretly a Fairness Proponent and You Should Prompt it Like One

http://arxiv.org/abs/2402.12150v1

Compressor summary: FairThinking is a pipeline that generates roles for large language models to express diverse viewpoints and ensure fairness in their outputs.


MLFEF: Machine Learning Fusion Model with Empirical Formula to Explore the Momentum in Competitive Sports

http://arxiv.org/abs/2402.12149v1

Compressor summary: The article proposes two models to define and quantify momentum in tennis matches using data-driven and empirical approaches, and analyzes its importance and fluctuation patterns.


End-to-end multilingual fact-checking at scale

http://arxiv.org/abs/2402.12147v1

Compressor summary: Factiverse AI models excel at end-to-end fact-checking across many languages and beat top LLMs in performance.


Meta Ranking: Less Capable Language Models are Capable for Single Response Judgement

http://arxiv.org/abs/2402.12146v1

Compressor summary: The paper proposes MetaRanking, a method for less capable LLMs to judge response reliability by comparing query-response pairs with references, improving their performance on reasoning tasks and applications like query routing and data filtering.


Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers

http://arxiv.org/abs/2402.12138v1

Compressor summary: BiXT is a novel efficient Transformer-based approach that processes semantics and location simultaneously, enabling its application to various tasks without loss of performance or modality limitations.


SSTKG: Simple Spatio-Temporal Knowledge Graph for Intepretable and Versatile Dynamic Information Embedding

http://arxiv.org/abs/2402.12132v1

Compressor summary: This paper proposes SSTKG, a framework to construct and explore spatio-temporal knowledge graphs using a novel 3-step embedding method, improving prediction and recommendation accuracy.


3D Vascular Segmentation Supervised by 2D Annotation of Maximum Intensity Projection

http://arxiv.org/abs/2402.12128v1

Compressor summary: The paper proposes a weakly-supervised method for 3D vessel segmentation using maximum intensity projection and 2D labels, which improves performance and reduces annotation efforts.


Evaluating Image Review Ability of Vision Language Models

http://arxiv.org/abs/2402.12121v1

Compressor summary: The paper examines how large-scale vision language models (LVLMs) can generate diverse review texts for images and proposes a rank correlation analysis method to evaluate their review abilities.


DualView: Data Attribution from the Dual Perspective

http://arxiv.org/abs/2402.12118v1

Compressor summary: DualView is a fast and accurate method to explain how individual training data points influence a neural network's predictions on test data, and can be combined with feature attribution methods.


Is It a Free Lunch for Removing Outliers during Pretraining?

http://arxiv.org/abs/2402.12102v1

Compressor summary: The authors improve a softmax function that helps pretrain models for quantization by making it invariant to sequence length and suitable for causal language models.


Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation

http://arxiv.org/abs/2402.12100v1

Compressor summary: Groot is a new framework that uses semantic transformation to create more effective adversarial prompts for testing the safety of text-to-image models, achieving high success rates on popular models like DALL-E 3 and Midjourney.


Human Video Translation via Query Warping

http://arxiv.org/abs/2402.12099v1

Compressor summary: QueryWarp is a new method for translating human motion videos that uses appearance flows to warp query tokens in a diffusion model, achieving better temporal coherence than existing methods.


Towards Explainable LiDAR Point Cloud Semantic Segmentation via Gradient Based Target Localization

http://arxiv.org/abs/2402.12098v1

Compressor summary: The paper presents pGS-CAM, a method to generate saliency maps for semantic segmentation of LiDAR point clouds, which helps understand how SS models make predictions and improve them.


Major TOM: Expandable Datasets for Earth Observation

http://arxiv.org/abs/2402.12095v1

Compressor summary: Key points: - Deep learning models need large datasets for training, especially in Earth Observation (EO) - EO datasets are scattered and hard to combine due to different formats and structures - Major TOM is a framework that allows users to merge and access multiple EO datasets using grid points and metadata - Major TOM-Core is a large open-access dataset that covers most of the land surface and serves as a template for future additions Summary: Major TOM is a framework that enables data fusion and access for deep learning models in Earth Observation, using a geographical indexing system and metadata. Major TOM-Core is an open-access dataset that covers most of the land surface.


Do Large Language Models Understand Logic or Just Mimick Context?

http://arxiv.org/abs/2402.12091v1

Compressor summary: The paper explores how large language models reason using context and finds that they do not truly understand logical rules; instead, they rely on in-context learning for accurate answers.


Can LLMs Compute with Reasons?

http://arxiv.org/abs/2402.12080v1

Compressor summary: The paper proposes an inductive learning method to improve small language models' reasoning abilities using a distributed network of SLMs.


LVCHAT: Facilitating Long Video Comprehension

http://arxiv.org/abs/2402.12079v1

Compressor summary: LVChat improves long video comprehension for multimodal LLMs by using Frame-Scalable Encoding and Interleaved Frame Encoding to handle over-compression and long video input.


HIP Network: Historical Information Passing Network for Extrapolation Reasoning on Temporal Knowledge Graph

http://arxiv.org/abs/2402.12074v1

Compressor summary: The paper proposes a Historical Information Passing (HIP) network that uses temporal, structural and repetitive perspectives to predict future events based on historical information in temporal knowledge graphs.


Interpretable Brain-Inspired Representations Improve RL Performance on Visual Navigation Tasks

http://arxiv.org/abs/2402.12067v1

Compressor summary: The paper proposes using slow feature analysis (SFA), inspired by neuroscience, to generate interpretable representations of visual data for visual navigation tasks in reinforcement learning.


WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

http://arxiv.org/abs/2402.12065v1

Compressor summary: The paper proposes WKVQuant, a PTQ framework for quantizing weights and KV cache of LLMs, which improves efficiency and accuracy by using past-only quantization and two-dimensional quantization strategy.


All Language Models Large and Small

http://arxiv.org/abs/2402.12061v1

Compressor summary: LONDI is a framework that learns to use large language models selectively for complex tasks, reducing computational costs and improving efficiency.


Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models

http://arxiv.org/abs/2402.12058v1

Compressor summary: The paper introduces Scaffold, a visual prompting technique for LMMs that uses dot matrices and multi-dimensional coordinates to enhance vision-language coordination.


Are LLM-based Evaluators Confusing NLG Quality Criteria?

http://arxiv.org/abs/2402.12055v1

Compressor summary: The text discusses a study that revealed confusion issues in LLMs' NLG evaluation and proposes a hierarchical classification system and fine-grained analysis using perturbation attacks to better understand and improve their performance.


Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs

http://arxiv.org/abs/2402.12052v1

Compressor summary: SlimPLM is a novel approach that improves large language models' knowledge acquisition by using a slim proxy model to detect and retrieve missing information, reducing computational costs and outperforming existing methods.


Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

http://arxiv.org/abs/2402.12048v1

Compressor summary: The paper analyzes catastrophic forgetting in multi-modal language models and proposes Model Tailor, a method that adjusts a small number of fine-tuned parameters to improve performance on both original and new tasks.


A Lightweight Parallel Framework for Blind Image Quality Assessment

http://arxiv.org/abs/2402.12043v1

Compressor summary: The paper proposes a lightweight parallel framework for blind image quality assessment using pre-trained features, feature embedding network, self-supervised subtasks, and distortion-aware quality regression network to achieve superior performance with less computation and training time.


Linear bandits with polylogarithmic minimax regret

http://arxiv.org/abs/2402.12042v1

Compressor summary: The paper proposes a novel algorithm for linear stochastic bandits with subgaussian noise that decreases as the actions approach the unknown vector, achieving a minimax regret of $\log^3(T)$.


Surround-View Fisheye Optics in Computer Vision and Simulation: Survey and Challenge

http://arxiv.org/abs/2402.12041v1

Compressor summary: The paper surveys automotive surround-view fisheye optics, discussing the challenges of optical artifacts in computer vision for autonomous driving and ADAS, and examining different simulation methods for creating synthetic datasets.


Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations

http://arxiv.org/abs/2402.12038v1

Compressor summary: Self-AMPLIFY automatically generates natural language rationales from explanation methods applied to Small Language Models, improving their performance on reasoning tasks.


Language Model Adaptation to Specialized Domains through Selective Masking based on Genre and Topical Characteristics

http://arxiv.org/abs/2402.12036v1

Compressor summary: The authors propose a genre and topic-aware masking method for tailoring language models to specialized domains, which improves performance on the LegalGLUE benchmark.


Class-incremental Learning for Time Series: Benchmark and Evaluation

http://arxiv.org/abs/2402.12035v1

Compressor summary: The paper presents a unified experimental framework for class-incremental learning (CIL) in time series data, evaluates various methods, and studies the impact of design factors on performance.


Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

http://arxiv.org/abs/2402.12030v1

Compressor summary: Universal Logit Distillation (ULD) is a new method that allows compressing knowledge from large language models to smaller ones without requiring them to share the same tokenizer, making it more versatile and useful for various applications.


Acquiring Clean Language Models from Backdoor Poisoned Datasets by Downscaling Frequency Space

http://arxiv.org/abs/2402.12026v1

Compressor summary: The paper proposes MuScleLoRA, a method to mitigate backdoor attacks in language models by adjusting the frequency space of the data and aligning gradients during adaptation.


Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?

http://arxiv.org/abs/2402.12025v1

Compressor summary: This paper reviews speech-to-text translation models that combine speech foundation models and large language models, highlighting their similarities and differences, and suggesting recommendations for future research.


Distilling Large Language Models for Text-Attributed Graph Learning

http://arxiv.org/abs/2402.12022v1

Compressor summary: The authors propose a method to combine large language models and graph models for learning graphs of connected texts, using an interpreter and a student model to transfer knowledge between them.


A Systematic Comparison of Contextualized Word Embeddings for Lexical Semantic Change

http://arxiv.org/abs/2402.12011v1

Compressor summary: The paper compares state-of-the-art models for Lexical Semantic Change across different tasks, languages, and benchmarks, finding APD and XL-LEXEME as the best approaches.


Training Green AI Models Using Elite Samples

http://arxiv.org/abs/2402.12010v1

Compressor summary: This paper proposes an evolutionary-based sampling framework to identify elite training samples for AI models that improve performance and save 98% energy compared to typical training practices.


Cluster Metric Sensitivity to Irrelevant Features

http://arxiv.org/abs/2402.12008v1

Compressor summary: This paper studies how different types of irrelevant features affect clustering performance using various metrics, and suggests that the Silhouette Coefficient and the Davies-Boudin score can be used to optimize feature selection in unsupervised clustering.


Direct Consistency Optimization for Compositional Text-to-Image Personalization

http://arxiv.org/abs/2402.12004v1

Compressor summary: The authors propose a simple yet effective method for fine-tuning text-to-image diffusion models by optimizing consistency with reference images while preserving the pretrained model's style, leading to improved personalization and image-text alignment.


A Survey on Extractive Knowledge Graph Summarization: Applications, Approaches, Evaluation, and Future Directions

http://arxiv.org/abs/2402.12001v1

Compressor summary: This paper surveys extractive Knowledge Graph summarization, its applications, and existing methods, while suggesting future directions.


Remember This Event That Year? Assessing Temporal Information and Reasoning in Large Language Models

http://arxiv.org/abs/2402.11997v1

Compressor summary: This paper tests LLMs on a new temporal dataset (TempUN) and finds they struggle with retaining and reasoning about event sequences, especially closed-source models.


ISCUTE: Instance Segmentation of Cables Using Text Embedding

http://arxiv.org/abs/2402.11996v1

Compressor summary: The paper proposes a user-friendly text-promptable model that combines two existing models to achieve high performance in identifying deformable linear objects like wires and cables.


Network Inversion of Binarised Neural Nets

http://arxiv.org/abs/2402.11995v1

Compressor summary: The text describes a method to understand how neural networks make decisions by converting them into a logical formula, which can help ensure their reliability and efficiency.


Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models

http://arxiv.org/abs/2402.11989v1

Compressor summary: The paper proposes a method called Stable PrivateLoRA to adapt latent diffusion models for generating specific objects while protecting privacy from membership inference attacks.


Weakly Supervised Object Detection in Chest X-Rays with Differentiable ROI Proposal Networks and Soft ROI Pooling

http://arxiv.org/abs/2402.11985v1

Compressor summary: WSup-OD is a useful technique for image classification, but it doesn't work well for medical images due to differences in object characteristics. The authors propose WSRPN, a new method that generates bounding box proposals using ROI-attention and improves disease localization in chest X-ray images.


Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations

http://arxiv.org/abs/2402.11975v1

Compressor summary: COMEDY is a novel framework that uses compressive memory to generate, compress, and respond in conversations without relying on traditional retrieval modules or memory databases.


Bayesian Active Learning for Censored Regression

http://arxiv.org/abs/2402.11973v1

Compressor summary: The paper proposes a new method for Bayesian active learning in censored regression, called $\mathcal{C}$-BALD, which estimates the information gain from new data points and improves performance over existing methods.


What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects

http://arxiv.org/abs/2402.11968v1

Compressor summary: The paper explores the preferences of speakers of German dialects and regional languages towards natural language processing tools that work with their dialectal input, finding them more open to virtual assistants than to those that generate dialectal output.


Imbalance in Regression Datasets

http://arxiv.org/abs/2402.11963v1

Compressor summary: The paper highlights the problem of class imbalance in regression, which leads to naive models and proposes a definition to address it.


DB-LLM: Accurate Dual-Binarization for Efficient LLMs

http://arxiv.org/abs/2402.11960v1

Compressor summary: This paper proposes DB-LLM, a novel dual-binarization method for large language models that improves both accuracy and efficiency in ultra-low bit quantization, achieving significant reductions in perplexity and computational consumption compared to existing methods.


Automatic Evaluation for Mental Health Counseling using LLMs

http://arxiv.org/abs/2402.11958v1

Compressor summary: The paper proposes using large language models to automatically evaluate the quality of psychological counseling sessions, offering a cost-effective and dependable alternative to manual evaluations.


Event-Based Motion Magnification

http://arxiv.org/abs/2402.11957v1

Compressor summary: Key points: - The paper proposes a dual-camera system using an event camera and an RGB camera for amplifying high-frequency motions - The system integrates event streams with image features to estimate motion direction and magnitude - The network uses a Second-order Recurrent Propagation module and a temporal filter to handle long-term interpolations and noise Summary: The paper presents a cost-effective dual-camera system that combines an event camera and an RGB camera to magnify high-frequency motions with accurate direction and magnitude estimation, using a novel network with special modules to deal with challenges.


Analysis of Multidomain Abstractive Summarization Using Salience Allocation

http://arxiv.org/abs/2402.11955v1

Compressor summary: Key points: - The paper compares SEASON, a summarization technique that uses salience allocation, with other models on different datasets. - The paper evaluates the models using various metrics such as ROUGE and BERTScore. - The paper provides insights into salience allocation techniques for abstractive text summarization. Summary: The paper evaluates SEASON, a salience allocation technique for abstractive summarization, against other models on news, dialogue, and financial datasets using different metrics.


Mini-Hes: A Parallelizable Second-order Latent Factor Analysis Model

http://arxiv.org/abs/2402.11948v1

Compressor summary: The paper proposes a new optimization method, Mini-Hes, for latent factor analysis models that represent high-dimensional and incomplete data and improve their performance in missing data estimation tasks.


LEMMA: Towards LVLM-Enhanced Multimodal Misinformation Detection with External Knowledge Augmentation

http://arxiv.org/abs/2402.11943v1

Compressor summary: The paper explores how Large Vision Language Models can help detect multimodal misinformation on social media, and proposes a method (LEMMA) that improves their performance by adding external knowledge.


The effect of Leaky ReLUs on the training and generalization of overparameterized networks

http://arxiv.org/abs/2402.11942v1

Compressor summary: The paper studies how different ReLU functions affect the performance of overparameterized neural networks and finds that a particular activation function provides the best trade-off between training and generalization errors.


Comprehensive Cognitive LLM Agent for Smartphone GUI Automation

http://arxiv.org/abs/2402.11941v1

Compressor summary: CoCo-Agent is a novel LLM that improves GUI automation by enhancing perception and action prediction with CEP and CAP techniques.


Team QUST at SemEval-2024 Task 8: A Comprehensive Study of Monolingual and Multilingual Approaches for Detecting AI-generated Text

http://arxiv.org/abs/2402.11934v1

Compressor summary: The paper describes QUST's participation and methods in Task 8 SemEval 2024, using data augmentation, various deep-learning models, and a stacking ensemble to achieve 8th place in multilingual subtask A.


SLADE: Detecting Dynamic Anomalies in Edge Streams without Labels via Self-Supervised Learning

http://arxiv.org/abs/2402.11933v1

Compressor summary: SLADE is a self-supervised method for detecting anomalies in edge streams by observing node interaction patterns and minimizing drift in node representations.


DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation

http://arxiv.org/abs/2402.11929v1

Compressor summary: The paper proposes a method to control fine-grained lighting in image generation using diffusion models, by guiding the process with radiance hints instead of exact geometry.


Separating common from salient patterns with Contrastive Representation Learning

http://arxiv.org/abs/2402.11928v1

Compressor summary: SepCLR is a new method for contrastive analysis that uses mutual information terms to learn semantically expressive representations and separate common and salient factors of variation in different datasets.


Energy-Efficient Edge Learning via Joint Data Deepening-and-Prefetching

http://arxiv.org/abs/2402.11925v1

Compressor summary: The paper proposes a novel offloading architecture for IoT devices that reduces energy consumption by sequentially transmitting important features and prefetching potentially needed ones for AI model training.


MRKE: The Multi-hop Reasoning Evaluation of LLMs by Knowledge Edition

http://arxiv.org/abs/2402.11924v1

Compressor summary: The authors introduce a new multi-hop question answering (MHQA) benchmark for evaluating large language models (LLMs) that considers data contamination and reasoning chain evaluation, finding significant performance gaps between LLMs on the original and edited HotpotQA datasets.


A Generative Pre-Training Framework for Spatio-Temporal Graph Transfer Learning

http://arxiv.org/abs/2402.11922v1

Compressor summary: Key points: - GPDiff is a novel generative pre-training framework for STG transfer learning - It uses a diffusion model with a transformer-based denoising network to generate tailored model parameters - It outperforms state-of-the-art baselines on traffic speed and crowd flow prediction tasks Summary: GPDiff is a new method that pre-trains STG models with generative techniques to adapt to different cities and improve prediction accuracy.


A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

http://arxiv.org/abs/2402.11917v1

Compressor summary: The study analyzes how transformers use internal mechanisms to reason on a synthetic task, revealing depth-bounded recurrent processes that may apply to other tasks.


PhySU-Net: Long Temporal Context Transformer for rPPG with Self-Supervised Pre-training

http://arxiv.org/abs/2402.11913v1

Compressor summary: PhySU-Net is a new rPPG method that uses a transformer network and self-supervised pre-training with pseudo-labels to improve cardiac activity measurement from facial videos.


One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation

http://arxiv.org/abs/2402.11909v1

Compressor summary: Key points: - The paper introduces a new approach to create high-quality head avatars from few images per user - The method uses a generative model with a 3DMM-anchored neural radiance field backbone - The method optimizes 3DMM fitting and camera calibration for better adaptation Summary: The paper presents a new approach to create personalized head avatars using few images, a novel generative model with a 3DMM-anchored neural radiance field, and joint optimization of 3DMM fitting and camera calibration.


Semantic Textual Similarity Assessment in Chest X-ray Reports Using a Domain-Specific Cosine-Based Metric

http://arxiv.org/abs/2402.11908v1

Compressor summary: The paper presents a new method for evaluating semantic similarity in medical reports using deep learning and improves upon existing metrics by providing more meaningful scores in the medical domain.


Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation

http://arxiv.org/abs/2402.11907v1

Compressor summary: The paper proposes a method called Direct Large Model Alignment (DLMA) that uses contrastive prompt pairs to evaluate and align large language models without human-annotated preference data, achieving better performance than existing methods.


Learning to Edit: Aligning LLMs with Knowledge Editing

http://arxiv.org/abs/2402.11905v1

Compressor summary: The paper proposes a Learning to Edit framework that teaches large language models to apply updated knowledge to questions efficiently and effectively without relying on memorization.


SoLA: Solver-Layer Adaption of LLM for Better Logic Reasoning

http://arxiv.org/abs/2402.11903v1

Compressor summary: SoLA is a novel method that uses a solver as an extra layer of a large language model to improve its logical reasoning and solve complex industrial problems.


Real-World Planning with PDDL+ and Beyond

http://arxiv.org/abs/2402.11901v1

Compressor summary: Nyx is a new PDDL+ planner that adapts easily to various real-world applications, overcoming limitations and increasing usage of AI planning.


Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models

http://arxiv.org/abs/2402.11900v1

Compressor summary: This paper explores how large language models can use factual shortcuts when reasoning through multi-hop facts, and proposes a method to reduce failures caused by these shortcuts.


SIBO: A Simple Booster for Parameter-Efficient Fine-Tuning

http://arxiv.org/abs/2402.11896v1

Compressor summary: SIBO is a technique that boosts the effectiveness of parameter-efficient fine-tuning for large language models by injecting an initial residual to reduce over-smoothing and improve performance.


Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation

http://arxiv.org/abs/2402.11894v1

Compressor summary: The paper proposes two strategies to automate dataset updates for LLMs, using mimicking and extending techniques, to improve evaluation reliability and timeliness.


Discerning and Resolving Knowledge Conflicts through Adaptive Decoding with Contextual Information-Entropy Constraint

http://arxiv.org/abs/2402.11893v1

Compressor summary: The paper proposes a decoding method (COIECD) for language models that adapts to knowledge conflicts and improves performance on realistic datasets.


Revisiting Knowledge Distillation for Autoregressive Language Models

http://arxiv.org/abs/2402.11890v1

Compressor summary: ATKD is a method that improves knowledge distillation by making teaching more diverse and flexible for autoregressive language models, leading to better performance and generalization.


ROSE Doesn't Do That: Boosting the Safety of Instruction-Tuned Large Language Models with Reverse Prompt Contrastive Decoding

http://arxiv.org/abs/2402.11889v1

Compressor summary: ROSE is a method that improves the safety of instruction-tuned large language models without additional training by using reverse prompts to suppress undesired outputs.


Generative Semi-supervised Graph Anomaly Detection

http://arxiv.org/abs/2402.11887v1

Compressor summary: The paper proposes a novel generative graph anomaly detection method that uses normal nodes to generate effective negative samples for training a one-class classifier, outperforming existing methods on real-world datasets.


The Colorful Future of LLMs: Evaluating and Improving LLMs as Emotional Supporters for Queer Youth

http://arxiv.org/abs/2402.11886v1

Compressor summary: The paper examines how Large Language Models can improve emotional support for queer youth online, but finds they need more personalization and empathy to be effective.


InMD-X: Large Language Models for Internal Medicine Doctors

http://arxiv.org/abs/2402.11883v1

Compressor summary: InMD-X is a set of large language models for internal medicine doctors to improve research, diagnosis, and documentation in healthcare.


NOTE: Notable generation Of patient Text summaries through Efficient approach based on direct preference optimization

http://arxiv.org/abs/2402.11882v1

Compressor summary: Key points: - Discharge summary is a critical document that covers all events during hospitalization - Clinicians face challenges in creating discharge summaries manually - NOTE is a system that generates discharge summaries using LLMs' APIs and lightweight models - NOTE can be applied to other types of summaries and aims for increased efficiency Summary: NOTE is a system that automates the generation of discharge summaries for patients, which are important documents for future care and planning. It uses lightweight models and LLMs' APIs to overcome privacy and performance issues in healthcare settings. NOTE can also be used for other types of summaries to reduce clinicians' workload.


Finite-Time Error Analysis of Online Model-Based Q-Learning with a Relaxed Sampling Model

http://arxiv.org/abs/2402.11877v1

Compressor summary: This paper investigates how using a model in $Q$-learning can reduce the number of samples needed for learning.


M2K-VDG: Model-Adaptive Multimodal Knowledge Anchor Enhanced Video-grounded Dialogue Generation

http://arxiv.org/abs/2402.11875v1

Compressor summary: The paper proposes M2K-VDG, a framework to reduce hallucinations in video-grounded dialogue generation by enhancing multimodal knowledge anchor tokens using model-adaptive and counterfactual methods.


Language-guided Image Reflection Separation

http://arxiv.org/abs/2402.11874v1

Compressor summary: The paper proposes a framework that uses language descriptions and cross-attention with contrastive learning to separate image layers in reflection separation problems, improving performance over existing methods.


LoRA Training in the NTK Regime has No Spurious Local Minima

http://arxiv.org/abs/2402.11867v1

Compressor summary: LoRA helps large language models fine-tune efficiently by avoiding bad local minima and finding good low-rank solutions that generalize well.


How Interpretable are Reasoning Explanations from Prompting Large Language Models?

http://arxiv.org/abs/2402.11863v1

Compressor summary: The paper evaluates various prompting techniques for large language models on commonsense reasoning tasks and proposes a new technique called Self-Entailment-Alignment Chain-of-thought that significantly improves interpretability.


Communication-Efficient Distributed Learning with Local Immediate Error Compensation

http://arxiv.org/abs/2402.11857v1

Compressor summary: LIEC-SGD is a new optimization algorithm for distributed learning that compresses data both ways and compensates errors to reduce communication overhead and speed up convergence.


ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image

http://arxiv.org/abs/2402.11849v1

Compressor summary: ComFusion is a novel approach that generates high-fidelity images by combining user-provided subject images with predefined text scenes, using pretrained models and preserving class-scene prior knowledge.


UnlearnCanvas: A Stylized Image Dataset to Benchmark Machine Unlearning for Diffusion Models

http://arxiv.org/abs/2402.11846v1

Compressor summary: The text discusses machine unlearning for diffusion models, introduces a new evaluation framework with a dataset (UnlearnCanvas) for assessing it, and benchmarks five methods to reveal their strengths and weaknesses.


Modularized Networks for Few-shot Hateful Meme Detection

http://arxiv.org/abs/2402.11845v1

Compressor summary: Key points: - Paper proposes a method to detect hateful memes with LoRA modules and module composer - Method uses few labeled examples and large language models - Method outperforms traditional in-context learning on three datasets Summary: The paper presents a hateful meme detection method that combines LoRA modules and module composer, using few labeled examples and large language models. It beats in-context learning on three datasets.


WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection

http://arxiv.org/abs/2402.11843v1

Compressor summary: WildFake is a large dataset for detecting AI-generated images with diverse content and generative models to assess detection robustness and generalizability.


An Endoscopic Chisel: Intraoperative Imaging Carves 3D Anatomical Models

http://arxiv.org/abs/2402.11840v1

Compressor summary: The text proposes a method to update preoperative 3D models using intraoperative endoscopy video for navigated sinus surgery, improving accuracy during surgical progression.


UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction

http://arxiv.org/abs/2402.11838v1

Compressor summary: The paper proposes UniST, a universal model for urban spatio-temporal prediction that generalizes well across diverse scenarios with minimal data.


Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization

http://arxiv.org/abs/2402.11835v1

Compressor summary: ABCs is a hybrid reinforcement learning algorithm that adapts to stationary or nonstationary environments by combining Boltzmann Q-learning and counterfactual regret minimization, achieving strong performance and convergence guarantees in various domains.


Rock Classification Based on Residual Networks

http://arxiv.org/abs/2402.11831v1

Compressor summary: The paper proposes two approaches using residual neural networks for rock classification, improving accuracy by data augmentation, and achieves an accuracy of 73.7% with a similar backbone as BoTNet.


Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios

http://arxiv.org/abs/2402.11826v1

Compressor summary: Key points: - Monocular depth estimation from RGB images is important but can fail in challenging conditions - Long-wave infrared cameras offer stable imaging but lack rich information - The paper proposes a novel approach that integrates cross-modality depth features with a learning-based framework - The approach uses separate networks for each modality, a confidence predictor network, and a multi-modal fusion network - The method shows robust performance on difficult scenarios and datasets Summary: The paper presents a new method that combines RGB and long-wave infrared images using separate and learning-based networks to estimate robust depth in challenging conditions.


Microstructures and Accuracy of Graph Recall by Large Language Models

http://arxiv.org/abs/2402.11821v1

Compressor summary: The paper investigates how large language models (LLMs) recall and encode graphs described in text, finding that they often underperform humans, favor certain structural patterns, and depend on the domain of the graph for better recall.


Head-wise Shareable Attention for Large Language Models

http://arxiv.org/abs/2402.11819v1

Compressor summary: The paper proposes two methods for fine-grained weight sharing across attention heads in large language models, reducing memory usage without sacrificing performance.


Where It Really Matters: Few-Shot Environmental Conservation Media Monitoring for Low-Resource Languages

http://arxiv.org/abs/2402.11818v1

Compressor summary: NewsSerow is a method to automatically recognize environmental conservation content in low-resource languages using large language models, enabling efficient media monitoring for conservation organizations.


Avoiding Feature Suppression in Contrastive Learning: Learning What Has Not Been Learned Before

http://arxiv.org/abs/2402.11816v1

Compressor summary: MCL is a novel framework for contrastive learning that progressively learns new features while maintaining existing ones, avoiding feature suppression and improving representation quality for downstream tasks.


HU at SemEval-2024 Task 8A: Can Contrastive Learning Learn Embeddings to Detect Machine-Generated Text?

http://arxiv.org/abs/2402.11815v1

Compressor summary: The paper presents a single machine-generated text detection model based on contrastive learning that performs well with less parameters and data augmentation.


Interpretable Embedding for Ad-hoc Video Search

http://arxiv.org/abs/2402.11812v1

Compressor summary: The paper proposes a neural network that embeds queries with semantic concepts and improves video search performance and interpretability on TRECVid datasets.


FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema

http://arxiv.org/abs/2402.11811v1

Compressor summary: FIPO is a prompt crafting method that adapts to different language models and optimizes task instructions using modular fine-tuning.


Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

http://arxiv.org/abs/2402.11809v1

Compressor summary: SPACE is an approach that accelerates large language models by parallelizing token generation and verification with semi-autoregressive inference and auto-correct decoding.


LLM as Prompter: Low-resource Inductive Reasoning on Arbitrary Knowledge Graphs

http://arxiv.org/abs/2402.11804v1

Compressor summary: The paper introduces ProLINK, a pretraining and prompting framework that uses large language models to improve graph neural networks for low-resource knowledge graph inductive reasoning tasks.


Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling

http://arxiv.org/abs/2402.11800v1

Compressor summary: The paper analyzes how delayed updates affect the convergence of stochastic approximation schemes in large-scale and multi-agent reinforcement learning, and proposes a delay-adaptive algorithm that converges faster than existing methods.


Unveiling the Magic: Investigating Attention Distillation in Retrieval-augmented Generation

http://arxiv.org/abs/2402.11794v1

Compressor summary: This paper examines how attention distillation works to improve retrieval-augmented generation models, identifying key factors and proposing indicators for better training.


Generative Kaleidoscopic Networks

http://arxiv.org/abs/2402.11793v1

Compressor summary: The paper proposes Generative Kaleidoscopic Networks that use over-generalization in Deep ReLU networks to create diverse samples from a fixed input distribution by recursively applying the network to random noise.


MM-SurvNet: Deep Learning-Based Survival Risk Stratification in Breast Cancer Through Multimodal Data Fusion

http://arxiv.org/abs/2402.11788v1

Compressor summary: The text proposes a new deep learning method for breast cancer management using histopathological imaging, genetic and clinical data, which outperforms existing methods and can help personalize treatments.


What Evidence Do Language Models Find Convincing?

http://arxiv.org/abs/2402.11782v1

Compressor summary: The study investigates how language models answer ambiguous queries by analyzing ConflictingQA, a dataset that contains controversial questions and evidence documents with different features, and suggests improving the quality of RAG corpora and training LLMs to align with human judgments.


Towards Theoretical Understandings of Self-Consuming Generative Models

http://arxiv.org/abs/2402.11778v1

Compressor summary: The paper develops a framework to analyze how training generative models in a self-consuming loop affects data distributions and shows that TV distance between synthetic and real data can be controlled by adjusting dataset sizes or proportions of real data, with a phase transition occurring at a certain threshold.


Uncovering Latent Human Wellbeing in Language Model Embeddings

http://arxiv.org/abs/2402.11777v1

Compressor summary: The study investigates if pre-trained language models implicitly learn human wellbeing concepts and finds that larger models perform better on a Utilitarianism task, suggesting pretraining imparts some understanding of ethics.


Dynamic Multi-Network Mining of Tensor Time Series

http://arxiv.org/abs/2402.11773v1

Compressor summary: The paper proposes a new method, Dynamic Multi-network Mining (DMM), for subsequence clustering and interpretation of tensor time series, which is accurate, interpretable, and scalable.


Evaluating the Effectiveness of Index-Based Treatment Allocation

http://arxiv.org/abs/2402.11771v1

Compressor summary: The paper develops new methods for evaluating index-based allocation policies using data from randomized trials, addressing challenges caused by dependencies between agents and enabling valid statistical conclusions.


Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations

http://arxiv.org/abs/2402.11770v1

Compressor summary: The paper proposes a structured approach to generate question-answer conversations using a large language model and shows improved agent faithfulness and performance in out-of-domain evaluations.


ChatGPT Based Data Augmentation for Improved Parameter-Efficient Debiasing of LLMs

http://arxiv.org/abs/2402.11764v1

Compressor summary: The authors propose a novel method using ChatGPT to generate synthetic training data for debiasing large language models, achieving better performance and generalizability across categories with minimal retraining cost.


Reinforcement Learning as a Parsimonious Alternative to Prediction Cascades: A Case Study on Image Segmentation

http://arxiv.org/abs/2402.11760v1

Compressor summary: PaSeR is a cost-aware learning pipeline for computer vision tasks that achieves better accuracy while minimizing computational cost compared to cascaded models.


MARS: Meaning-Aware Response Scoring for Uncertainty Estimation in Generative LLMs

http://arxiv.org/abs/2402.11756v1

Compressor summary: The paper proposes Meaning-Aware Response Scoring (MARS), a new scoring function for estimating the correctness of generative large language models' outputs, which improves uncertainty estimation performance across different datasets and models.


SPML: A DSL for Defending Language Models Against Prompt Attacks

http://arxiv.org/abs/2402.11755v1

Compressor summary: Key points: - Large language models (LLMs) enable chatbot applications but are vulnerable to attacks by malicious users - System Prompt Meta Language (SPML) is a domain-specific language for refining and monitoring prompts for LLM-based chatbots - SPML checks attack prompts, optimizes costs, and streamlines chatbot definition crafting with programming language capabilities - SPML outperforms GPT-4, GPT-3.5, and LLAMA in understanding attacker prompts Summary: The paper introduces SPML, a domain-specific language that enhances and monitors prompts for LLM-based chatbots, preventing attacks by checking prompts and optimizing costs, while surpassing GPT models in understanding malicious inputs.


ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

http://arxiv.org/abs/2402.11753v1

Compressor summary: The paper proposes a new ASCII art-based jailbreak attack on large language models (LLMs) that exploits their poor performance in recognizing such art to bypass safety measures and cause undesired behaviors.


Diagonalisation SGD: Fast & Convergent SGD for Non-Differentiable Models via Reparameterisation and Smoothing

http://arxiv.org/abs/2402.11752v1

Compressor summary: The paper proposes a method to improve gradient-based optimization for non-differentiable models by using a smoothed approximation that reduces variance and converges to stationary points.


In-Context Learning Demonstration Selection via Influence Analysis

http://arxiv.org/abs/2402.11750v1

Compressor summary: InfICL is a demonstration selection method for In-Context Learning that uses influence functions to identify highly influential training samples without fine tuning the large language model.


Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic

http://arxiv.org/abs/2402.11746v1

Compressor summary: RESTA is a method to improve the safety of aligned language models by adding a safety vector to their weights, reducing harmfulness while preserving task performance.


Machine-generated Text Localization

http://arxiv.org/abs/2402.11744v1

Compressor summary: The paper proposes a method to detect machine-generated text in documents by localizing the parts that are machine written, using contextual information and improving performance on five datasets.