arxiv compressed, 2024-08-19

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-19 generated by the compressor, my personal LLM-based project.


xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

http://arxiv.org/abs/2408.08872v1

Compressor summary: xGen-MM is a framework for developing large multimodal models with various applications, evaluation metrics, and safety features.


PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars

http://arxiv.org/abs/2408.08869v1

Compressor summary: PEDAL is a hybrid self-ensembling approach that uses diverse exemplar prompts and LLM aggregation to improve text generation accuracy while reducing inference cost.


DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models

http://arxiv.org/abs/2408.08855v1

Compressor summary: DPA is an unsupervised domain adaptation method for vision-language models that uses dual prototypes, robust self-training, and textual-visual alignment to improve performance on downstream tasks.


GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms

http://arxiv.org/abs/2408.08852v1

Compressor summary: GeoTransformer is a new model that uses geospatial attention to incorporate urban information and improve predictions of GDP and ride-share demand.


PsychoLex: Unveiling the Psychological Mind of Large Language Models

http://arxiv.org/abs/2408.08848v1

Compressor summary: The paper introduces PsychoLex, a set of resources to improve LLMs' performance in psychological tasks, and presents the PsychoLexLLaMA model that outperforms general models in this domain.


FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

http://arxiv.org/abs/2408.08841v1

Compressor summary: The paper proposes FLEXTAF-Single and FLEXTAF-Vote, two methods that use flexible tabular formats to improve table reasoning performance using Large Language Models (LLMs).


Entropy Coding of Unordered Data Structures

http://arxiv.org/abs/2408.08837v1

Compressor summary: Shuffle coding is a method for compressing unordered sequences of objects using bits-back coding, which works well for various data structures like graphs and molecules.


RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

http://arxiv.org/abs/2408.08827v1

Compressor summary: The paper proposes AINet, a novel network for robust RGBT tracking that efficiently interacts features of all modalities and layers using two fusion mambas with dynamic order adjustment.


LEVIS: Large Exact Verifiable Input Spaces for Neural Networks

http://arxiv.org/abs/2408.08824v1

Compressor summary: The LEVIS framework helps identify verifiable input spaces for neural networks and assess their robustness in safety-critical applications using novel techniques.


Optimal Symmetries in Binary Classification

http://arxiv.org/abs/2408.08823v1

Compressor summary: Key points: - The paper explores how group symmetries affect binary classification tasks using a novel framework based on Neyman-Pearson optimality - Smaller or more appropriate symmetry groups can improve generalisation and sample efficiency, contrary to common intuition - The paper develops a theoretical foundation for designing group equivariant neural networks that align with the data probability distributions - The paper shows that optimal performance is associated with subgroups of the likelihood ratio, not the largest possible groups Summary: The paper proposes a new framework to design group equivariant neural networks that match the symmetry groups and subgroups of the data probability distributions for optimal binary classification tasks.


PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future

http://arxiv.org/abs/2408.08822v1

Compressor summary: PFDiff is a training-free method that improves the efficiency of existing fast ODE solvers for image generation by skipping timesteps and using gradient replacement and foresight updates.


An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series

http://arxiv.org/abs/2408.08815v1

Compressor summary: The paper examines how well balancing strategies work for counterfactual estimation with time series data, and suggests they may not always be effective.


CAT: Caution Aware Transfer in Reinforcement Learning via Distributional Risk

http://arxiv.org/abs/2408.08812v1

Compressor summary: CAT framework balances reward and caution to improve safety in transfer RL.


Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge

http://arxiv.org/abs/2408.08808v1

Compressor summary: The paper introduces a novel data pipeline to create diverse, domain-specific benchmarks for evaluating large language models in various applications, improving their usefulness and alignment with human preferences.


CIKMar: A Dual-Encoder Approach to Prompt-Based Reranking in Educational Dialogue Systems

http://arxiv.org/abs/2408.08805v1

Compressor summary: CIKMar is a small but effective dialogue system for education that uses Gemma language model and dual-encoder ranking to provide relevant answers, although it tends to favor theoretical explanations.


Leveraging FourierKAN Classification Head for Pre-Trained Transformer-based Text Classification

http://arxiv.org/abs/2408.08803v1

Compressor summary: FR-KAN is a better alternative to MLPs for text classification, as it improves accuracy, training speed, and reduces parameters while using transformer-based encoders.


Representation Learning of Geometric Trees

http://arxiv.org/abs/2408.08799v1

Compressor summary: The paper introduces a new representation learning framework for geometric trees that captures their hierarchical structure and spatial constraints using a unique message passing neural network and self-supervised training targets.


Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer

http://arxiv.org/abs/2408.08793v1

Compressor summary: The paper proposes an Orthogonal Compatible Aligned (OCA) approach to update visual retrieval systems without re-indexing or backfilling, preserving compatibility with old models and achieving state-of-the-art accuracy.


Neighbor Overlay-Induced Graph Attention Network

http://arxiv.org/abs/2408.08788v1

Compressor summary: NO-GAT is a new GNN model that leverages structural information and overlaid neighbors to improve node representations and attention coefficients.


Evaluating the Evaluator: Measuring LLMs' Adherence to Task Evaluation Instructions

http://arxiv.org/abs/2408.08781v1

Compressor summary: The paper investigates how much influence prompting LLMs as judges has on their alignment with human judgments, comparing different levels of instruction and a prompt-free method using perplexity.


Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

http://arxiv.org/abs/2408.08780v1

Compressor summary: The proposed ensemble prompt framework improves in-context learning performance by describing the selection criteria of multiple examples, but the improvement mainly comes from the ensemble format rather than the descriptive content.


DAC: Decomposed Automation Correction for Text-to-SQL

http://arxiv.org/abs/2408.08779v1

Compressor summary: Decomposed Automation Correction (DAC) is a new approach that improves text-to-SQL performance by decomposing the task into entity linking and skeleton parsing, and then correcting SQL based on the differences between the generated results and the initial query.


NEAR: A Training-Free Pre-Estimator of Machine Learning Model Performance

http://arxiv.org/abs/2408.08776v1

Compressor summary: NEAR is a zero-cost proxy for Neural Architecture Search that uses the effective rank of pre- and post-activation matrices to identify optimal networks without training.


Speckle Noise Analysis for Synthetic Aperture Radar (SAR) Space Data

http://arxiv.org/abs/2408.08774v1

Compressor summary: The study compares six different speckle noise reduction techniques for Synthetic Aperture Radar images and recommends Lee or Kuan Filters depending on the application's needs.


Pessimistic Iterative Planning for Robust POMDPs

http://arxiv.org/abs/2408.08770v1

Compressor summary: The pessimistic iterative planning (PIP) framework finds robust memory-based policies for robust partially observable Markov decision processes by alternating between selecting an adversarial POMDP and computing a finite-state controller for it, using the rFSCNet algorithm which trains recurrent neural networks with supervision policies.


Lower Layer Matters: Alleviating Hallucination via Multi-Layer Fusion Contrastive Decoding with Truthfulness Refocused

http://arxiv.org/abs/2408.08769v1

Compressor summary: LOL (LOwer Layer Matters) is a novel contrastive decoding framework that mitigates hallucination in large language models by fusing lower and final layers and using contextual guidance to enhance factual encoding.


VF-NeRF: Learning Neural Vector Fields for Indoor Scene Reconstruction

http://arxiv.org/abs/2408.08766v1

Compressor summary: The paper proposes a new method, VF-NeRF, that improves surface reconstruction of indoor scenes by using Vector Fields as an implicit representation, which can better model planar surfaces and sharp corners.


SYMPOL: Symbolic Tree-Based On-Policy Reinforcement Learning

http://arxiv.org/abs/2408.08761v1

Compressor summary: SYMPOL is a novel method for learning interpretable, tree-based policies in reinforcement learning using a policy gradient approach.


SE-SGformer: A Self-Explainable Signed Graph Transformer for Link Sign Prediction

http://arxiv.org/abs/2408.08754v1

Compressor summary: The paper introduces a new framework for signed graph neural networks that improves explainability and prediction accuracy using positional encoding, transformer architecture, and nearest neighbor-based decision process.


PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders

http://arxiv.org/abs/2408.08753v1

Compressor summary: The paper proposes PCP-MAE, a method for point cloud self-supervised learning that predicts centers of masked patches and uses them to reconstruct points, improving efficiency and performance over Point-MAE.


Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution

http://arxiv.org/abs/2408.08736v1

Compressor summary: The paper proposes a Task-Aware Dynamic Transformer (TADT) for efficient image super-resolution at arbitrary scales by adapting the feature extraction based on input images and upsampling scales.


Symbolic Parameter Learning in Probabilistic Answer Set Programming

http://arxiv.org/abs/2408.08732v1

Compressor summary: The paper proposes two algorithms to learn probabilities in probabilistic logic programs using symbolic equations, and shows they perform better than existing methods.


ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language

http://arxiv.org/abs/2408.08724v1

Compressor summary: ChatZero is a zero-shot dialogue generation model that uses cross-lingual code-switching and unsupervised contrastive learning to generate responses in low-resource languages.


Beyond KAN: Introducing KarSein for Adaptive High-Order Feature Interaction Modeling in CTR Prediction

http://arxiv.org/abs/2408.08713v1

Compressor summary: KarSein is a novel network that optimizes CTR prediction by adaptively modeling high-order feature interactions efficiently and accurately.


Beam Prediction based on Large Language Models

http://arxiv.org/abs/2408.08707v1

Compressor summary: LLMs improve beam prediction in mmWave communication by converting time series data into text and using Prompt-as-Prefix for contextual enrichment, outperforming traditional LSTM models.


Efficient Multi-Policy Evaluation for Reinforcement Learning

http://arxiv.org/abs/2408.08706v1

Compressor summary: The paper proposes a tailored behavior policy to improve the efficiency and unbiasedness of evaluating multiple target policies in reinforcement learning.


Beyond the Hype: A dispassionate look at vision-language models in medical scenario

http://arxiv.org/abs/2408.08704v1

Compressor summary: The text introduces RadVUQA, a benchmark to assess Large Vision-Language Models' abilities in radiological tasks, revealing weaknesses in multimodal comprehension and quantitative reasoning.


TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

http://arxiv.org/abs/2408.08703v1

Compressor summary: The paper proposes a novel framework (TsCA) that uses conditional transport theory and homology to visual-semantics interaction in CZSL to address calibration bias, generalization, and consistency issues in recognizing novel compositions.


HyCoT: Hyperspectral Compression Transformer with an Efficient Training Strategy

http://arxiv.org/abs/2408.08700v1

Compressor summary: The paper introduces HyCoT, a transformer-based autoencoder for pixelwise hyperspectral image compression that outperforms existing models with less computational complexity and faster training.


NFDI4DSO: Towards a BFO Compliant Ontology for Data Science

http://arxiv.org/abs/2408.08698v1

Compressor summary: The NFDI4DS project seeks to improve data accessibility and interoperability in AI by connecting digital artifacts to FAIR principles using a new ontology and knowledge graph.


Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

http://arxiv.org/abs/2408.08696v1

Compressor summary: Token Recycling is a train-free technique that accelerates inference in large language models by reusing candidate tokens and achieving significant speedup with minimal storage.


Quantifying the Effectiveness of Student Organization Activities using Natural Language Processing

http://arxiv.org/abs/2408.08694v1

Compressor summary: The research study develops a machine learning workflow using BERT LLM and sentiment analysis to measure the effectiveness of student extracurricular activities based on emotional responses.


Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm

http://arxiv.org/abs/2408.08693v1

Compressor summary: The paper introduces a new evaluation paradigm for large language models in medical scenarios, showing their limitations and potential for improvement.


Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

http://arxiv.org/abs/2408.08690v1

Compressor summary: The paper proposes a decentralized algorithm for two-sided matching markets without prior preference knowledge or structural assumptions, and analyzes its performance.


The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation

http://arxiv.org/abs/2408.08688v1

Compressor summary: The paper evaluates multi-agent workflows for PO dataset generation using different prompting strategies and LLM configurations, finding that the LLM Feedback Loop with Llama as generator and Gemma as reviewer achieves high win rates.


Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks?

http://arxiv.org/abs/2408.08685v1

Compressor summary: The paper proposes LLM4RGNN, a framework that uses large language models to improve the robustness of graph neural networks against topology attacks by identifying and adding malicious and missing edges.


Research on Personalized Compression Algorithm for Pre-trained Models Based on Homomorphic Entropy Increase

http://arxiv.org/abs/2408.08684v1

Compressor summary: The article discusses challenges and solutions for deploying efficient and personalized Vision Transformer and Large Language Model AI on mobile devices using model pruning techniques.


LLM-PCGC: Large Language Model-based Point Cloud Geometry Compression

http://arxiv.org/abs/2408.08682v1

Compressor summary: Large language models can effectively compress point cloud data without text description or alignment operations, outperforming existing methods.


A Mean Field Ansatz for Zero-Shot Weight Transfer

http://arxiv.org/abs/2408.08681v1

Compressor summary: The paper proposes a mean field theory to explain how weights from small neural networks can be transferred to large ones, and tests it on simple MLPs and LLMs like GPT-3 and Llama-3.1.


Neural Reward Machines

http://arxiv.org/abs/2408.08677v1

Compressor summary: Neural Reward Machines (NRM) are a neurosymbolic framework that enables agents to reason and learn in non-markovian RL tasks using semisupervised symbol grounding, outperforming Deep RL methods.


Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Program

http://arxiv.org/abs/2408.08676v1

Compressor summary: This study shows how Large Language Models can be fine-tuned to control spacecraft in the Kerbal Space Program using language inputs and outputs, potentially expanding their applications for space operations.


Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning

http://arxiv.org/abs/2408.08670v1

Compressor summary: ALaST is a method that adapts the importance of layers during ViT fine-tuning and reduces computational cost, memory load, and training time by adjusting their compute budgets.


Robust Stochastic Shortest-Path Planning via Risk-Sensitive Incremental Sampling

http://arxiv.org/abs/2408.08668v1

Compressor summary: The paper proposes a risk-aware path planning algorithm that uses CVaR minimization to create more robust and less conservative paths for SSP problems in high-risk industries.


QMambaBSR: Burst Image Super-Resolution with Query State Space Model

http://arxiv.org/abs/2408.08665v1

Compressor summary: QMambaBSR is a novel network for burst super-resolution that uses inter-frame querying, intra-frame scanning, adaptive upsampling, and sub-pixel extraction to reconstruct high-quality images with rich details.


MIA-Tuner: Adapting Large Language Models as Pre-training Text Detector

http://arxiv.org/abs/2408.08661v1

Compressor summary: MIA-Tuner is a novel method that uses instructions to guide LLMs to detect their own pre-training data, improving detection performance and privacy safeguards.


LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs

http://arxiv.org/abs/2408.08656v1

Compressor summary: The paper evaluates and reduces format bias in large language models, which affects their performance across different formats of tasks like multiple-choice questions, lists, and mappings.


TextCAVs: Debugging vision models using text

http://arxiv.org/abs/2408.08652v1

Compressor summary: TextCAVs is a method that creates concept activation vectors using text descriptions instead of image examples, enabling interactive and cost-effective explanations for deep learning models.


Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning

http://arxiv.org/abs/2408.08651v1

Compressor summary: The text discusses biases in language models, their impact on answer choices, and proposes two methods to reduce bias and improve accuracy.


An End-to-End Model for Photo-Sharing Multi-modal Dialogue Generation

http://arxiv.org/abs/2408.08650v1

Compressor summary: The paper proposes an end-to-end model for photo-sharing multi-modal dialogue generation that integrates a large language model with an image perceptron and generator, enabling better text and image alignment and gradient propagation.


Understanding Enthymemes in Argument Maps: Bridging Argument Mining and Logic-based Argumentation

http://arxiv.org/abs/2408.08648v1

Compressor summary: The paper proposes using classical and default logic to represent arguments identified by argument mining in an argument map, enabling automated reasoning on the argumentation.


Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

http://arxiv.org/abs/2408.08640v1

Compressor summary: Math-PUMA is a method to improve multimodal large language models' math problem-solving by aligning textual and visual information using KL divergence and instruction tuning.


Magazine Supply Optimization: a Case-study

http://arxiv.org/abs/2408.08637v1

Compressor summary: AthenIA is a solution that optimizes magazine supply for 20,000 points of sale in France using a four-step pipeline and novel quantile regression method.


Historical Printed Ornaments: Dataset and Tasks

http://arxiv.org/abs/2408.08633v1

Compressor summary: The paper uses computer vision to study historical printed ornaments, introducing three tasks and a dataset for evaluation of state-of-the-art models.


A Survey on Benchmarks of Multimodal Large Language Models

http://arxiv.org/abs/2408.08632v1

Compressor summary: Key points: - The paper reviews 180 benchmarks and evaluation methods for MLLMs - It covers perception, understanding, cognition, reasoning, domains, capabilities, and other modalities - It argues that evaluation is crucial to support MLLM development Summary: The paper presents a comprehensive survey of 180 benchmarks and evaluation methods for multimodal large language models (MLLMs), covering various aspects and applications, and emphasizing the importance of evaluation for their development.


Persona is a Double-edged Sword: Enhancing the Zero-shot Reasoning by Ensembling the Role-playing and Neutral Prompts

http://arxiv.org/abs/2408.08631v1

Compressor summary: Jekyll & Hyde framework ensembles role-playing and neutral prompts to improve LLM's reasoning ability by selecting better solutions from both prompts and using a robust evaluator that reduces position bias.


Navigating Uncertainties in Machine Learning for Structural Dynamics: A Comprehensive Review of Probabilistic and Non-Probabilistic Approaches in Forward and Inverse Problems

http://arxiv.org/abs/2408.08629v1

Compressor summary: This paper reviews uncertainty-aware methods for machine learning in structural dynamics, emphasizing Bayesian neural networks and their applications.


RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions

http://arxiv.org/abs/2408.08624v1

Compressor summary: Key points: - Clinical question answering systems can help clinicians but need realistic data - RealMedQA is a new dataset of clinical questions generated by humans and LLM - The authors compare QA models on BioASQ and RealMedQA and show the LLM is more efficient - They release their code and data to encourage further research Summary: The authors introduce RealMedQA, a realistic dataset of clinical questions for question answering systems, and show that an LLM can generate better QA pairs than existing methods.


DeepDFA: Automata Learning through Neural Probabilistic Relaxations

http://arxiv.org/abs/2408.08622v1

Compressor summary: DeepDFA is a new method to find Deterministic Finite Automata from traces using a differentiable and discrete model that is interpretable, efficient, and robust.


Generative Dataset Distillation Based on Diffusion Model

http://arxiv.org/abs/2408.08610v1

Compressor summary: The paper proposes a novel high-speed and high-quality generative dataset distillation method based on Stable Diffusion, achieving significant improvement in image generation per class and placing third in the ECCV 2024 challenge.


Bi-Directional Deep Contextual Video Compression

http://arxiv.org/abs/2408.08604v1

Compressor summary: The paper presents a new bi-directional deep contextual video compression scheme (DCVC-B) for B-frames that significantly improves their compression performance compared to traditional codecs and recent advancements.


Learning A Low-Level Vision Generalist via Visual Task Prompt

http://arxiv.org/abs/2408.08601v1

Compressor summary: The paper proposes VPIP, a framework that uses visual task prompts to handle multiple low-level vision tasks with different input-target domains, improving image reconstruction quality and performance over existing methods.


RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction

http://arxiv.org/abs/2408.08593v1

Compressor summary: Key points: - Radio map (RM) technology can reduce communication costs for 6G network by using location information - Existing NN-based methods have suboptimal performance due to misalignment between generative and discrimination modeling - RadioDiff is a new method that uses denoised diffusion, attention U-Net, and decoupled diffusion to improve RM construction quality - RadioDiff achieves state-of-the-art performance in accuracy, structural similarity, and peak signal-to-noise ratio Summary: RadioDiff is a novel method for constructing radio maps that uses advanced neural networks to exploit location information and achieve high accuracy and quality in 6G network applications.


A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models

http://arxiv.org/abs/2408.08590v1

Compressor summary: This paper investigates how auto-regressive Language Models perform syllogistic reasoning and discovers a transferable circuit involving middle-term suppression for deriving valid conclusions, but finds that these mechanisms are influenced by world knowledge and not general logical principles.


GrassNet: State Space Model Meets Graph Neural Network

http://arxiv.org/abs/2408.08583v1

Compressor summary: GrassNet is a novel graph neural network that uses structured state space models to design and learn arbitrary graph spectral filters, overcoming limitations of traditional polynomial methods in spectral graph learning.


EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation

http://arxiv.org/abs/2408.08570v1

Compressor summary: EraW-Net is a novel method for estimating driver attention in scenes across two fields of view using a W-shaped architecture, dynamic adaptive filtering, and global context sharing.


Unsupervised Non-Rigid Point Cloud Matching through Large Vision Models

http://arxiv.org/abs/2408.08568v1

Compressor summary: The paper presents a new learning-based framework for matching non-rigid point clouds using semantic features from large vision models, which improves generalization, robustness, and performance on various challenging datasets.


S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

http://arxiv.org/abs/2408.08567v1

Compressor summary: S$^3$Attention is a novel attention structure that balances information preservation and computation reduction by using a smoothing block and a matrix sketching method to handle long sequences in linear complexity.


Overview of the BioLaySumm 2024 Shared Task on the Lay Summarization of Biomedical Research Articles

http://arxiv.org/abs/2408.08566v1

Compressor summary: The paper describes the second edition of a shared task on summarizing biomedical research articles, which attracted more participants and saw increased use of large language models.


A New Chinese Landscape Paintings Generation Model based on Stable Diffusion using DreamBooth

http://arxiv.org/abs/2408.08561v1

Compressor summary: The study proposes a method that combines two techniques to generate Chinese landscape paintings with high quality and fidelity, outperforming other models.


A training regime to learn unified representations from complementary breast imaging modalities

http://arxiv.org/abs/2408.08560v1

Compressor summary: The text discusses a machine learning method to improve breast cancer screening using both DBT and FFDM images, which could reduce reliance on FFDM and increase accuracy in detecting lesions.


ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models

http://arxiv.org/abs/2408.08554v1

Compressor summary: ABQ-LLM is a novel quantization algorithm and inference framework that enables efficient arbitrary-precision quantized inference on GPUs, addressing the challenges of low-bit quantization and limited integer computing units in large language models.


Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection

http://arxiv.org/abs/2408.08551v1

Compressor summary: The Multi-view Mixture-of-Experts Model (MvP) is a new method for detecting personality traits by analyzing user posts from multiple perspectives, improving performance and addressing previous limitations.


String Diagram of Optimal Transports

http://arxiv.org/abs/2408.08550v1

Compressor summary: The paper introduces a hierarchical framework of optimal transports using string diagrams and proposes a new algorithm to solve safety problems on them using cost matrix compositions.


SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

http://arxiv.org/abs/2408.08545v1

Compressor summary: SelectLLM is a novel algorithm that efficiently selects a subset of large language models to overcome their individual limitations and achieve competitive performance on complex tasks with reduced latency.


Scaling up Multimodal Pre-training for Sign Language Understanding

http://arxiv.org/abs/2408.08544v1

Compressor summary: Sign language is a vital communication tool for the deaf-mute community that involves hand gestures and body movements, and various tasks are being studied to help hearing people understand it better.


Where is the signal in tokenization space?

http://arxiv.org/abs/2408.08541v1

Compressor summary: The paper explores non-canonical tokenizations in large language models and shows that aggregating their probabilities can improve model performance.


CommunityKG-RAG: Leveraging Community Structures in Knowledge Graphs for Advanced Retrieval-Augmented Generation in Fact-Checking

http://arxiv.org/abs/2408.08535v1

Compressor summary: CommunityKG-RAG is a new framework that uses community structures within Knowledge Graphs to improve the accuracy and relevance of information retrieval for fact-checking, without needing extra training.


Detecting Unsuccessful Students in Cybersecurity Exercises in Two Different Learning Environments

http://arxiv.org/abs/2408.08531v1

Compressor summary: The paper evaluates using data from cybersecurity exercises to predict students' performance and suggests automated tools to aid instructors in helping struggling students.


Privacy-Preserving Vision Transformer Using Images Encrypted with Restricted Random Permutation Matrices

http://arxiv.org/abs/2408.08529v1

Compressor summary: The proposed encryption method enhances vision transformer performance when working with encrypted images.


Focus on Focus: Focus-oriented Representation Learning and Multi-view Cross-modal Alignment for Glioma Grading

http://arxiv.org/abs/2408.08527v1

Compressor summary: The text introduces a new framework called Focus on Focus (FoF) that improves glioma grading by enhancing molecular-pathology representation and aligning histopathology features with molecular biomarkers using paired training and pathology-only inference.


Inverse design with conditional cascaded diffusion models

http://arxiv.org/abs/2408.08526v1

Compressor summary: The conditional cascaded diffusion model (cCDM) is a machine learning approach for inverse design that leverages diffusion models' strengths to predict higher-resolution solutions from lower ones, but its performance depends on the amount of high-resolution training data available.


GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization

http://arxiv.org/abs/2408.08524v1

Compressor summary: GS-ID is a framework that decomposes light sources in images using priors, environmental and direct components, learnable environment maps, and Spherical Gaussians for realistic relighting on Gaussian Splatting.


Visual-Friendly Concept Protection via Selective Adversarial Perturbations

http://arxiv.org/abs/2408.08518v1

Compressor summary: The paper proposes VCPro, a framework to protect key concepts in images with less visible perturbations using adversarial techniques.


Mitigating Degree Bias in Signed Graph Neural Networks

http://arxiv.org/abs/2408.08508v1

Compressor summary: The paper proposes Degree Debiased Signed Graph Neural Network (DD-SGNN) to address fairness issues related to signed graphs and evaluates its effectiveness on four real-world datasets.


Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

http://arxiv.org/abs/2408.08506v1

Compressor summary: The paper proposes Ex3, a method to generate high-quality long novels using structure information extraction and fine-tuning of a large language model, improving over existing approaches.


Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

http://arxiv.org/abs/2408.08502v1

Compressor summary: The paper proposes an Image-to-Image diffusion classifier that uses image translation to predict distinguishable labels for adversarial robustness, reducing computational costs and improving performance compared to existing methods.


The Limitations of Model Retraining in the Face of Performativity

http://arxiv.org/abs/2408.08499v1

Compressor summary: Stochastic optimization for performative shifts requires regularization to achieve optimal models.


Achieving Complex Image Edits via Function Aggregation with Diffusion Models

http://arxiv.org/abs/2408.08495v1

Compressor summary: FunEditor is a diffusion model that learns atomic editing functions to perform complex edits simultaneously, faster, and more accurately than existing methods.


Fishers Harvest Parallel Unlearning in Inherited Model Networks

http://arxiv.org/abs/2408.08493v1

Compressor summary: The paper proposes a novel unlearning framework that enables parallel unlearning among models with inheritance, using a new graph and algorithm that leverages Fisher Information Matrix to reduce computational overhead and achieve effective unlearning.


Adversarial Contrastive Learning Based Physics-Informed Temporal Networks for Cuffless Blood Pressure Estimation

http://arxiv.org/abs/2408.08488v1

Compressor summary: The paper proposes a novel method, PITN, that combines physics-informed neural networks with adversarial contrastive learning to estimate cuffless blood pressure accurately with limited data.


An Unsupervised Learning Framework Combined with Heuristics for the Maximum Minimal Cut Problem

http://arxiv.org/abs/2408.08484v1

Compressor summary: The paper proposes an unsupervised learning framework with heuristics for solving the Maximum Minimal Cut Problem, a challenging combinatorial optimization problem, by using graph neural networks and tree transformations.


Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models

http://arxiv.org/abs/2408.08470v1

Compressor summary: The paper proposes a contextual bandit approach to select an assistant model for large language models, improving performance under resource constraints.


A theory of understanding for artificial intelligence: composability, catalysts, and learning

http://arxiv.org/abs/2408.08463v1

Compressor summary: The paper proposes a composability framework for analyzing understanding in various subjects, including AIs, and explores the role of learning ability and catalysts in enhancing output quality.