arxiv compressed, 2024-08-12

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-12 generated by the compressor, my personal LLM-based project.


VITA: Towards Open-Source Interactive Omni Multimodal LLM

http://arxiv.org/abs/2408.05211v1

Compressor summary: VITA is an open-source multimodal language model that excels at processing and analyzing video, image, text, and audio modalities while offering advanced interactive experiences.


Multi-Garment Customized Model Generation

http://arxiv.org/abs/2408.05206v1

Compressor summary: The paper presents a method to generate realistic images of dressed models with various clothing combinations using a garment encoder and multi-garment feature fusion.


Kalman-Inspired Feature Propagation for Video Face Super-Resolution

http://arxiv.org/abs/2408.05205v1

Compressor summary: The paper proposes a new method, KEEP, for improving the quality of low-resolution face videos by using Kalman filtering principles to maintain temporal consistency and facial details.


TaSL: Task Skill Localization and Consolidation for Language Model Continual Learning

http://arxiv.org/abs/2408.05200v1

Compressor summary: TaSL is a novel CL framework for language models that enhances knowledge transfer without relying on memory replay by dividing the model into skill units, localizing their importance for new tasks, and consolidating them to prevent forgetting and enable bi-directional knowledge transfer.


Cell Morphology-Guided Small Molecule Generation with GFlowNets

http://arxiv.org/abs/2408.05196v1

Compressor summary: The text describes a novel approach for molecule design using unsupervised multimodal joint embedding that generates new molecules with similar phenotypic effects to a given image target without pre-annotated labels.


HistoKernel: Whole Slide Image Level Maximum Mean Discrepancy Kernels for Pan-Cancer Predictive Modelling

http://arxiv.org/abs/2408.05195v1

Compressor summary: HistoKernel is a novel method that uses Maximum Mean Discrepancy to measure distributional differences between Whole Slide Images and improve prediction performance in computational pathology.


Separating Style from Substance: Enhancing Cross-Genre Authorship Attribution through Data Selection and Presentation

http://arxiv.org/abs/2408.05192v1

Compressor summary: The authors propose methods to improve machine authorship attribution by reducing reliance on topic information and focusing more on style across different genres and topics.


Deep-change at AXOLOTL-24: Orchestrating WSD and WSI Models for Semantic Change Modeling

http://arxiv.org/abs/2408.05184v1

Compressor summary: The paper presents new methods for modeling semantic change and a word-definition mismatch detection model that can improve understanding of polysemous words across different time periods.


ECG-FM: An Open Electrocardiogram Foundation Model

http://arxiv.org/abs/2408.05178v1

Compressor summary: ECG-FM is an open foundation model for ECG analysis that uses transformer architecture, pretraining on large data with ECG-specific augmentations and contrastive learning, achieving strong performance, rich embeddings, and interpretability in various downstream tasks.


Beyond Closure Models: Learning Chaotic-Systems via Physics-Informed Neural Operators

http://arxiv.org/abs/2408.05177v1

Compressor summary: The paper proposes a physics-informed neural operator (PINO) that can accurately predict the long-term behavior of chaotic systems without requiring expensive full-resolution simulations, achieving significant speedup and reduced error compared to closure models.


EasyInv: Toward Fast and Better DDIM Inversion

http://arxiv.org/abs/2408.05159v1

Compressor summary: EasyInv improves DDIM Inversion by better approximating inversion noise and prioritizing the initial latent state for faster and more accurate results.


Meta-Learning Guided Label Noise Distillation for Robust Signal Modulation Classification

http://arxiv.org/abs/2408.05151v1

Compressor summary: Key points: - The paper proposes a meta-learning guided label noise distillation method for robust automatic modulation classification (AMC) - The method uses a teacher-student heterogeneous network (TSHN) framework to distill and reuse label noise - The method also uses a multi-view signal (MVS) method to improve performance on hard-to-classify categories with few-shot trusted labels Summary: The paper presents a robust AMC method that uses a TSHN framework and an MVS method to deal with label noise and improve performance on IoT security.


Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

http://arxiv.org/abs/2408.05147v1

Compressor summary: Gemma Scope is a suite of sparse autoencoders trained on various layers of Gemma 2 models to facilitate safety and interpretability research.


Performative Prediction on Games and Mechanism Design

http://arxiv.org/abs/2408.05146v1

Compressor summary: The paper examines the consequences of performative prediction in multiagent scenarios, where accuracy maximization can harm social welfare, and proposes a method based on Bayesian agent behavior modeling to improve outcomes.


A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning

http://arxiv.org/abs/2408.05141v1

Compressor summary: The paper introduces a hybrid retrieval-augmented generation system that improves accuracy, reasoning, and numerical computation by integrating external knowledge bases and optimizing various components, achieving significant results on the CRAG dataset.


Cycle-Configuration: A Novel Graph-theoretic Descriptor Set for Molecular Inference

http://arxiv.org/abs/2408.05136v1

Compressor summary: The paper introduces cycle-configuration descriptors for molecular inference, which enable better prediction of chemical properties using mixed integer linear programming and machine learning.


Range Membership Inference Attacks

http://arxiv.org/abs/2408.05131v1

Compressor summary: Range membership inference attacks (RaMIAs) are a new method to measure privacy risks in machine learning models by testing if the model was trained on data in a specified range, rather than just checking for exact matches.


Cautious Calibration in Binary Classification

http://arxiv.org/abs/2408.05120v1

Compressor summary: The text introduces cautious calibration, a method to make machine learning probabilities intentionally underconfident for better decision-making in high-risk scenarios.


How Well Do LLMs Identify Cultural Unity in Diversity?

http://arxiv.org/abs/2408.05102v1

Compressor summary: CUNIT is a dataset for evaluating large language models' understanding of cultural unity across clothing and food concepts in 10 countries, revealing their limitations compared to humans.


MooER: LLM-based Speech Recognition and Translation Models from Moore Threads

http://arxiv.org/abs/2408.05101v1

Compressor summary: MooER is an LLM-based ASR/AST model that uses pseudo labeled data for training and achieves competitive performance with open source models.


Unlocking Decoding-time Controllability: Gradient-Free Multi-Objective Alignment with Contrastive Prompts

http://arxiv.org/abs/2408.05094v1

Compressor summary: MCA is a new method that uses expert and adversarial prompts to balance multiple alignment objectives of large language models without training separate models for each preference.


Order Matters in Hallucination: Reasoning Order as Benchmark and Reflexive Prompting for Large-Language-Models

http://arxiv.org/abs/2408.05093v1

Compressor summary: The study proposes a new method to assess and improve the consistency of large language models by comparing answers generated through different approaches, addressing the issue of fabricated responses and justifications.


PriPHiT: Privacy-Preserving Hierarchical Training of Deep Neural Networks

http://arxiv.org/abs/2408.05092v1

Compressor summary: The paper proposes a method to train deep neural networks on edge devices and cloud servers while protecting sensitive data from leaks.


Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation

http://arxiv.org/abs/2408.05090v1

Compressor summary: Loc4Plan is a novel framework for outdoor navigation tasks that uses spatial perception to improve action planning by locating the agent's position before following instructions.


UNIC: Universal Classification Models via Multi-teacher Distillation

http://arxiv.org/abs/2408.05088v1

Compressor summary: Key points: - The paper proposes a unique encoder that combines several pretrained models using multi-teacher distillation - The paper analyzes the benefits of different improvements to the basic distillation setup, such as expendable projectors and teacher dropping - The paper achieves strong or better results than the best teacher for each classification task Summary: The paper presents a method to learn a unique encoder from multiple pretrained models using multi-teacher distillation with various improvements, and shows its effectiveness on different classification tasks.


Bootstrap Latents of Nodes and Neighbors for Graph Self-Supervised Learning

http://arxiv.org/abs/2408.05087v1

Compressor summary: Our proposed method improves graph self-supervised learning by introducing noisy positive pairs from neighboring nodes and using cross-attention to score their supportiveness, reducing computation and enhancing downstream tasks.


Generating novel experimental hypotheses from language models: A case study on cross-dative generalization

http://arxiv.org/abs/2408.05086v1

Compressor summary: The authors use neural network language models to investigate cross-dative generalization in language acquisition and propose a novel hypothesis regarding the role of exposure context features and harmonic alignment.


Generalizing Few Data to Unseen Domains Flexibly Based on Label Smoothing Integrated with Distributionally Robust Optimization

http://arxiv.org/abs/2408.05082v1

Compressor summary: The paper introduces distributionally robust optimization to label smoothing for deep neural networks, improving their generalization on small-scale datasets by flexibly shifting data to unseen domains.


RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records

http://arxiv.org/abs/2408.05074v1

Compressor summary: Large language models can improve radiotherapy survival prediction by structuring unstructured electronic health record data, achieving high accuracy and interpretability.


Masked adversarial neural network for cell type deconvolution in spatial transcriptomics

http://arxiv.org/abs/2408.05065v1

Compressor summary: The text introduces MACD, a method that uses adversarial learning to align spatial transcriptomics and single-cell RNA sequencing data for cell type deconvolution in disease tissues.


GLEAMS: Bridging the Gap Between Local and Global Explanations

http://arxiv.org/abs/2408.05060v1

Compressor summary: GLEAMS is a method that combines local and global approaches to explain machine learning algorithms by partitioning the input space and learning interpretable models in each subregion.


Graph Neural Networks as Ordering Heuristics for Parallel Graph Coloring

http://arxiv.org/abs/2408.05054v1

Compressor summary: The paper presents a graph neural network (GNN) based ordering heuristic for the graph coloring problem that balances quality, performance, and scalability, outperforming existing greedy heuristics.


BoFire: Bayesian Optimization Framework Intended for Real Experiments

http://arxiv.org/abs/2408.05040v1

Compressor summary: BoFire is an open-source Python package that uses Bayesian Optimization and other experimental designs to develop and optimize new chemistry, with features that make it adaptable to real-world settings like self-driving laboratories.


A conformalized learning of a prediction set with applications to medical imaging classification

http://arxiv.org/abs/2408.05037v1

Compressor summary: The text introduces an algorithm that improves uncertainty quantification for medical imaging classifiers by generating prediction sets with specified probabilities, achieving better performance than existing methods.


Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil

http://arxiv.org/abs/2408.05035v1

Compressor summary: The study compares GPT-3.5 and 4 with MariTalk and human performance on the ENEM test, a standardized exam for university admission in Brazil, to assess AI biases and differences in answering styles.


Livestock Fish Larvae Counting using DETR and YOLO based Deep Networks

http://arxiv.org/abs/2408.05032v1

Compressor summary: The authors evaluate neural network architectures for fish larvae counting using a new annotated image dataset.


Collaborative Static-Dynamic Teaching: A Semi-Supervised Framework for Stripe-Like Space Target Detection

http://arxiv.org/abs/2408.05029v1

Compressor summary: The paper introduces a novel semi-supervised learning framework for detecting stripe-like space targets, which improves generalization and uses a feedback loop to enhance pseudo-label quality.


Investigating a Benchmark for Training-set free Evaluation of Linguistic Capabilities in Machine Reading Comprehension

http://arxiv.org/abs/2408.05023v1

Compressor summary: The text proposes an alternative method for evaluating NLP systems using synthetic data sets that are natural and diverse, while not relying on crowd-sourced datasets.


RadarPillars: Efficient Object Detection from 4D Radar Point Clouds

http://arxiv.org/abs/2408.05020v1

Compressor summary: Key points: - 4D radar data has sparsity and velocity information - Existing deep learning methods for 3D object detection are not optimized for this data - RadarPillars is a pillar-based network that decomposes radial velocity, uses PillarAttention, and studies layer scaling - RadarPillars achieves better detection results, efficiency, and real-time performance on the View-of-Delft dataset Summary: RadarPillars is a novel network for 3D object detection from 4D radar data that leverages radial velocity decomposition, PillarAttention, and layer scaling to outperform existing methods in accuracy, efficiency, and real-time performance.


Instruction Tuning-free Visual Token Complement for Multimodal LLMs

http://arxiv.org/abs/2408.05019v1

Compressor summary: The Visual Token Complement framework helps multimodal language models improve their responses by generating complementary visual tokens from text-to-image generation, without requiring additional image-text pairs.


DreamCouple: Exploring High Quality Text-to-3D Generation Via Rectified Flow

http://arxiv.org/abs/2408.05008v1

Compressor summary: The paper adapts Score Distillation Sampling to rectified flow models for 3D generation, addressing the over-smoothing issue with a novel coupled noise method called DreamCouple and achieving state-of-the-art results on NeRF and Gaussian splatting.


ProFuser: Progressive Fusion of Large Language Models

http://arxiv.org/abs/2408.04998v1

Compressor summary: The paper proposes ProFuser, a method that fuses large language models by considering both training and inference modes, improving performance in knowledge, reasoning, and safety.


On the use of neurosymbolic AI for defending against cyber attacks

http://arxiv.org/abs/2408.04996v1

Compressor summary: The paper argues for combining connectionist and symbolic AI to improve detection and response to cyber attacks, proposes use cases and challenges, and shows two experiments demonstrating feasibility of neurosymbolic AI.


Get Confused Cautiously: Textual Sequence Memorization Erasure with Selective Entropy Maximization

http://arxiv.org/abs/2408.04983v1

Compressor summary: The paper introduces EMSO, a framework for textual sequence memorization erasure in large language models, which balances the trade-off between effectiveness and model utility using entropy maximization and contrastive gradient metric.


\textit{re}CSE: Portable Reshaping Features for Sentence Embedding in Self-supervised Contrastive Learning

http://arxiv.org/abs/2408.04975v1

Compressor summary: The text introduces a new sentence representation framework called reCSE that uses feature reshaping to improve semantic similarity and reduce memory consumption.


Towards aerodynamic surrogate modeling based on $β$-variational autoencoders

http://arxiv.org/abs/2408.04969v1

Compressor summary: The proposed surrogate model combines a $\beta$-VAE with PCA and Gaussian Process Regression to predict pressure distributions on a transonic wing using flight conditions, providing a fast, cost-effective, and accurate alternative to high-fidelity CFD data.


Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks

http://arxiv.org/abs/2408.04965v1

Compressor summary: The study investigates memorisation in neural language models across 12 tasks, finding it to be gradual, task-dependent, and challenging the generalisation-first hypothesis.


DAFT-GAN: Dual Affine Transformation Generative Adversarial Network for Text-Guided Image Inpainting

http://arxiv.org/abs/2408.04962v1

Compressor summary: The paper introduces DAFT-GAN, a model that uses two affine transformation networks to combine text and image features for text-guided image inpainting, improving quality and consistency.


Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

http://arxiv.org/abs/2408.04958v1

Compressor summary: VQLA is a new method that improves surgical VQA by providing precise and context-aware answers to questions about surgical images, while C$^2$G-ViL embeddings ensure safety and robustness in surgical scenarios.


LLaVA-VSD: Large Language-and-Vision Assistant for Visual Spatial Description

http://arxiv.org/abs/2408.04957v1

Compressor summary: The paper introduces LLaVA-VSD, a large language-and-vision model that can classify, describe, and generate diverse sentences for visual spatial relationships in images using figure-caption pairs and a refinement step.


Model Debiasing by Learnable Data Augmentation

http://arxiv.org/abs/2408.04955v1

Compressor summary: This paper proposes a 2-stage learning pipeline that uses data augmentation and over-biased models to improve Deep Neural Networks' generalization capabilities on biased and unbiased datasets.


HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

http://arxiv.org/abs/2408.04948v1

Compressor summary: The HybridRAG technique combines VectorRAG and GraphRAG to improve question-answer systems for extracting information from complex financial documents, outperforming traditional methods.


Quantitative Information Extraction from Humanitarian Documents

http://arxiv.org/abs/2408.04941v1

Compressor summary: The authors provide an annotated dataset and a custom NLP pipeline for extracting quantitative information from humanitarian texts, improving performance especially in documents about the Dominican Republic and some African countries.


Capsule Vision 2024 Challenge: Multi-Class Abnormality Classification for Video Capsule Endoscopy

http://arxiv.org/abs/2408.04940v1

Compressor summary: The Capsule Vision 2024 Challenge is a virtual competition on detecting abnormalities in video capsule endoscopy images using multi-class classification.


UAV-Enhanced Combination to Application: Comprehensive Analysis and Benchmarking of a Human Detection Dataset for Disaster Scenarios

http://arxiv.org/abs/2408.04922v1

Compressor summary: The C2A dataset improves machine learning models' performance in detecting humans in disaster scenes, enhancing search and rescue operations.


Avoid Wasted Annotation Costs in Open-set Active Learning with Pre-trained Vision-Language Model

http://arxiv.org/abs/2408.04917v1

Compressor summary: CLIPNAL is a novel active learning strategy that minimizes annotation costs by selectively collecting informative data without requiring out-of-distribution samples.


PTrajM: Efficient and Semantic-rich Trajectory Learning with Pretrained Trajectory-Mamba

http://arxiv.org/abs/2408.04916v1

Compressor summary: The paper introduces PTrajM, a novel method for efficient and semantic-rich vehicle trajectory learning that can extract continuous movement behavior and travel purposes from irregular and discrete trajectory points.


Knowledge Base Embeddings: Semantics and Theoretical Properties

http://arxiv.org/abs/2408.04913v1

Compressor summary: This paper studies how to embed knowledge bases in description logic into vector spaces while considering conceptual knowledge and geometric-based semantics.


A Geometric Nash Approach in Tuning the Learning Rate in Q-Learning Algorithm

http://arxiv.org/abs/2408.04911v1

Compressor summary: The paper presents a geometric method to optimize the learning rate in Q learning by linking it to the angle between time steps and reward vector, using the angular bisector and Nash Equilibrium concepts.


Unleashing Artificial Cognition: Integrating Multiple AI Systems

http://arxiv.org/abs/2408.04910v1

Compressor summary: The study introduces a novel AI system that combines language models and query analysis to explain its decisions in Chess and potentially other domains.


Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy and Novel Ensemble Method

http://arxiv.org/abs/2408.04909v1

Compressor summary: The authors provide a comprehensive survey and taxonomy of image captioning metrics, propose EnsembEval, an ensemble of methods with the highest correlation to human judgements, and suggest that more diverse metrics can improve image captioning evaluation.


Towards a Generative Approach for Emotion Detection and Reasoning

http://arxiv.org/abs/2408.04906v1

Compressor summary: The paper proposes a novel generative question-answering method to detect and reason about emotions using large language models, instead of relying on textual entailment models with fixed labels.


GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

http://arxiv.org/abs/2408.04905v1

Compressor summary: The text discusses the issue of glitch tokens in large language models, which can cause incorrect or harmful outputs, and presents GlitchProber, a tool for detecting and mitigating these tokens.


Axiomatic Characterisations of Sample-based Explainers

http://arxiv.org/abs/2408.04903v1

Compressor summary: The paper examines feature-based explainers for black-box classifiers, identifies desirable properties, characterizes their subfamilies, and introduces new instances with guaranteed existence and consistency of explanations.


Communicate to Play: Pragmatic Reasoning for Efficient Cross-Cultural Communication in Codenames

http://arxiv.org/abs/2408.04900v1

Compressor summary: RSA+C3, a method to improve cross-cultural communication, is tested and shown to ease collaboration in the game Codenames Duet by inferring sociocultural context from interaction.


Better Not to Propagate: Understanding Edge Uncertainty and Over-smoothing in Signed Graph Neural Networks

http://arxiv.org/abs/2408.04895v1

Compressor summary: The paper proposes a method to estimate graph properties and dynamically select between blocked and signed propagation schemes for GNNs, improving their performance on both homophilic and heterophilic graphs.


Clustering-friendly Representation Learning for Enhancing Salient Features

http://arxiv.org/abs/2408.04891v1

Compressor summary: Key points: - Representation learning with contrastive learning algorithms is effective for unlabeled datasets but struggles to distinguish important features from unimportant ones. - The paper proposes a method that enhances features critical for unsupervised image clustering by using a reference dataset and a contrastive analysis approach. - The method outperforms conventional contrastive analysis and deep clustering methods on three datasets with different backgrounds. Summary: The paper presents a novel representation learning method for unsupervised image clustering that uses a reference dataset and a contrastive analysis approach to distinguish important features from unimportant ones, achieving higher clustering scores than existing methods.


On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

http://arxiv.org/abs/2408.04879v1

Compressor summary: This paper reviews element-wise zero-shot image recognition techniques, which learn generalized knowledge from limited data to recognize and reason about unseen domains.


Unsupervised Episode Detection for Large-Scale News Events

http://arxiv.org/abs/2408.04873v1

Compressor summary: The paper proposes a novel task called episode detection to identify cohesive clusters of entities and actions in news articles related to key events, and introduces EpiMine, an unsupervised framework that significantly improves performance on this task.


SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation

http://arxiv.org/abs/2408.04872v1

Compressor summary: The paper introduces SCOI, a method that uses both syntactic and lexical information to select better in-context examples for machine translation.


UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

http://arxiv.org/abs/2408.04869v1

Compressor summary: Key points: - The paper proposes a Bayesian UCB algorithm for fixed-budget best-arm identification - The algorithm learns prior information to enhance performance - The paper provides theoretical and empirical bounds on the regret and failure probability Summary: The paper introduces a Bayesian UCB exploration algorithm that learns prior information to improve fixed-budget best-arm identification, and shows its theoretical and empirical advantages.


ChatGPT Meets Iris Biometrics

http://arxiv.org/abs/2408.04868v1

Compressor summary: The study shows that GPT-4 is effective at iris recognition, even under challenging conditions, outperforming Google's Gemini Advanced model.


An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting

http://arxiv.org/abs/2408.04867v1

Compressor summary: The paper studies how Large Language Models (LLMs) perform in predicting time series, finding that their accuracy drops when faced with complex or diverse data.


High dimensional Bayesian Optimization via Condensing-Expansion Projection

http://arxiv.org/abs/2408.04860v1

Compressor summary: The paper introduces CEPBO, a new random projection-based Bayesian optimization method for high-dimensional problems that does not rely on the effective subspace assumption and performs better than existing methods in most cases.


MSG-Chart: Multimodal Scene Graph for ChartQA

http://arxiv.org/abs/2408.04852v1

Compressor summary: The paper proposes a multimodal scene graph for charts to better understand their structure and semantics, improving performance on chart question answering tasks.


Your Classifier Can Be Secretly a Likelihood-Based OOD Detector

http://arxiv.org/abs/2408.04851v1

Compressor summary: INK is a novel OOD detection method that leverages likelihood interpretation of discriminative classifiers on hyperspherical embeddings, achieving state-of-the-art performance in various scenarios.


Ensemble BERT: A student social network text sentiment classification model based on ensemble learning and BERT architecture

http://arxiv.org/abs/2408.04849v1

Compressor summary: The paper proposes a new BERT-based ensemble learning network to assess emotional tendencies in middle school students' social network texts, finding that deeper networks are more efficient but ensembles offer better interpretability.


MDS-GNN: A Mutual Dual-Stream Graph Neural Network on Graphs with Incomplete Features and Structure

http://arxiv.org/abs/2408.04845v1

Compressor summary: MDS-GNN is a novel method that leverages both node features and graph structure for learning on incomplete graphs using contrastive learning.


Counterfactual Explanations with Probabilistic Guarantees on their Robustness to Model Change

http://arxiv.org/abs/2408.04842v1

Compressor summary: BetaRCE is a method for generating counterfactual explanations for machine learning models that can handle data or model changes, providing probabilistic guarantees on robustness and interpretability.


Kolmogorov-Arnold Network for Online Reinforcement Learning

http://arxiv.org/abs/2408.04841v1

Compressor summary: The paper explores using Kolmogorov-Arnold Networks in Proximal Policy Optimization and shows they can achieve comparable performance to MLPs with fewer parameters, suggesting efficiency gains.


mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

http://arxiv.org/abs/2408.04840v1

Compressor summary: mPLUG-Owl3 is a versatile multi-modal large language model that excels at understanding long image sequences by efficiently integrating vision and language with novel hyper attention blocks, achieving state-of-the-art performance on various benchmarks.


Dual-Channel Latent Factor Analysis Enhanced Graph Contrastive Learning for Recommendation

http://arxiv.org/abs/2408.04838v1

Compressor summary: LFA-GCL is a new graph contrastive learning technique that improves recommender systems by refining global collaborative graphs without noise and outperforming existing methods.


Self-augmented Gaussian Splatting with Structure-aware Masks for Sparse-view 3D Reconstruction

http://arxiv.org/abs/2408.04831v1

Compressor summary: The paper proposes a new method for 3D reconstruction from few images, using a coarse-to-fine Gaussian model with structure-aware mask and augmentation, achieving state-of-the-art results on two datasets.


Interventional Causal Structure Discovery over Graphical Models with Convergence and Optimality Guarantees

http://arxiv.org/abs/2408.04819v1

Compressor summary: Bloom is a novel framework for discovering causal structures from observational and interventional data, outperforming existing methods in efficiency and privacy.


Performance Metric for Multiple Anomaly Score Distributions with Discrete Severity Levels

http://arxiv.org/abs/2408.04817v1

Compressor summary: The text proposes a new performance metric (WS-AUROC) for anomaly detection in smart factories, which considers severity levels and physical quantities causing anomalies.


FUSE-ing Language Models: Zero-Shot Adapter Discovery for Prompt Optimization Across Tokenizers

http://arxiv.org/abs/2408.04816v1

Compressor summary: FUSE is a method to unify different language models' semantic embeddings using a tensor representation, enabling knowledge transfer across tokenizers for tasks like image captioning.


Towards improving Alzheimer's intervention: a machine learning approach for biomarker detection through combining MEG and MRI pipelines

http://arxiv.org/abs/2408.04815v1

Compressor summary: MEG can detect brain changes in Alzheimer's disease before symptoms appear, and combining MRI and MEG data improves the accuracy of diagnosis.


Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

http://arxiv.org/abs/2408.04813v1

Compressor summary: Key points: - MIL problem is formulated as a semi-supervised instance classification problem to utilize labeled and unlabeled instances fully - Traditional self-training techniques degenerate in generating pseudo labels for unlabeled instances in MIL - A weakly-supervised self-training method is proposed that uses bag labels to prevent pseudo label degradation and learn hard positive instances - Experiments on various datasets show the superiority of the proposed method over existing methods Summary: The paper proposes a weakly-supervised self-training method for MIL that leverages bag labels to generate non-degenerate pseudo labels and learn hard positive instances, achieving new SOTA performance on multiple datasets.


UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

http://arxiv.org/abs/2408.04810v1

Compressor summary: UniBench is a unified framework for evaluating vision-language models on over 50 benchmarks, revealing the limitations and strengths of current models.


On the Geometry of Deep Learning

http://arxiv.org/abs/2408.04809v1

Compressor summary: The paper explores how deep learning's connection to affine splines, continuous piecewise linear functions in multiple dimensions, can help understand and improve deep networks by analyzing their geometrical properties and input space tessellation.


Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

http://arxiv.org/abs/2408.04804v1

Compressor summary: Hyper-YOLO is a new object detection method that uses hypergraph computations to capture complex high-order correlations among visual features, improving performance on various scale models and datasets.


FewShotNeRF: Meta-Learning-based Novel View Synthesis for Rapid Scene-Specific Adaptation

http://arxiv.org/abs/2408.04803v1

Compressor summary: FewShotNeRF is a method that uses meta-learning to quickly adapt NeRF models to new scenes with limited multi-view images, producing realistic views of various objects.


AI and Machine Learning Driven Indoor Localization and Navigation with Mobile Embedded Systems

http://arxiv.org/abs/2408.04797v1

Compressor summary: This article discusses the challenges of indoor navigation and how AI algorithms can help solve them using WiFi and sensors in mobile devices.