arxiv compressed, 2024-09-11

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-09-11 generated by the compressor, my personal LLM-based project.


GeoCalib: Learning Single-image Calibration with Geometric Optimization

http://arxiv.org/abs/2409.06704v1

Compressor summary: GeoCalib is a deep neural network that uses 3D geometry to estimate camera parameters from a single image, outperforming existing methods in accuracy and robustness.


LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation

http://arxiv.org/abs/2409.06703v1

Compressor summary: LEIA is a novel NeRF-based method to represent and interpolate dynamic 3D objects without relying on heuristics or motion information.


DANCE: Deep Learning-Assisted Analysis of Protein Sequences Using Chaos Enhanced Kaleidoscopic Images

http://arxiv.org/abs/2409.06694v1

Compressor summary: Key points: - Cancer is a disease of uncontrolled cell growth - TCRs are proteins that help recognize antigens, including cancer-related ones - TCR-based immunotherapies use sequencing technologies to find potent anti-cancer TCRs - DANCE is a method that generates images from TCR sequences using CGR and kaleidoscopic images - The study classifies TCRs based on their target cancer cells using deep learning vision models Summary: The paper introduces DANCE, a method that converts TCR protein sequences into chaos-enhanced kaleidoscopic images for visual analysis and classification of their target cancer cells using deep learning.


HybridFC: A Hybrid Fact-Checking Approach for Knowledge Graphs

http://arxiv.org/abs/2409.06692v1

Compressor summary: The paper proposes a hybrid fact-checking approach for knowledge graphs that combines different methods to achieve better performance than existing approaches.


Geometric-Averaged Preference Optimization for Soft Preference Labels

http://arxiv.org/abs/2409.06691v1

Compressor summary: The authors propose a distributional soft preference labeling method to improve Direct Preference Optimization (DPO) by using weighted geometric averaging, which improves performance on standard benchmarks for alignment research.


GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction

http://arxiv.org/abs/2409.06685v1

Compressor summary: GigaGS is a novel 3D Gaussian Splatting method that efficiently and effectively reconstructs large-scale scene surfaces with high quality by applying partitioning and consistency constraints.


Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences

http://arxiv.org/abs/2409.06683v1

Compressor summary: The proposed method improves object pose distribution estimation in robotics by using CAD models and correspondence distributions, leading to faster convergence and better performance.


E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning

http://arxiv.org/abs/2409.06679v1

Compressor summary: E2LLM is a novel approach that improves large language models' ability to process long contexts while reducing computational complexity and leveraging pretrained models.


A Semantic Segmentation Approach on Sweet Orange Leaf Diseases Detection Utilizing YOLO

http://arxiv.org/abs/2409.06671v1

Compressor summary: The study uses advanced AI models like YOLOv8 to accurately diagnose diseases in sweet orange leaves, potentially transforming disease detection in agriculture and promoting sustainable farming practices.


DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models

http://arxiv.org/abs/2409.06669v1

Compressor summary: The paper introduces a new dynamic router mechanism for Mixture-of-Experts models that adapts the number of experts assigned to each input token based on its importance, improving performance on NLP tasks.


LLaMA-Omni: Seamless Speech Interaction with Large Language Models

http://arxiv.org/abs/2409.06666v1

Compressor summary: LLaMA-Omni is a new model that enables real-time, high-quality speech interaction with large language models without transcription and with low latency.


World-Grounded Human Motion Recovery via Gravity-View Coordinates

http://arxiv.org/abs/2409.06662v1

Compressor summary: Key points: - novel method for recovering human motion from monocular video - uses Gravity-View (GV) coordinate system to reduce ambiguity and error accumulation - outperforms state-of-the-art methods in accuracy and speed Summary: The paper proposes a new method that estimates human motion from monocular video using a gravity-aligned coordinate system, which improves accuracy and speed over existing methods.


Image Vectorization with Depth: convexified shape layers with depth ordering

http://arxiv.org/abs/2409.06648v1

Compressor summary: The paper proposes a new image vectorization method that considers depth ordering, convexification, and curvature-based inpainting to create scalable shape layers for better editing and semantic vectorization.


EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis

http://arxiv.org/abs/2409.06644v1

Compressor summary: EyeCLIP is a multi-modal visual-language foundation model that uses partial text data to improve early detection of eye diseases by leveraging large unlabeled and labeled data through a pretraining strategy.


TeXBLEU: Automatic Metric for Evaluate LaTeX Format

http://arxiv.org/abs/2409.06639v1

Compressor summary: TeXBLEU is a new metric for evaluating mathematical expressions in LaTeX format that performs better than traditional metrics and has high correlation with human evaluation.


SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation

http://arxiv.org/abs/2409.06633v1

Compressor summary: The authors propose SaRA, a method to improve image and video generation tasks by re-utilizing ineffective parameters in pre-trained diffusion models and fine-tuning them with a sparse weight matrix.


A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

http://arxiv.org/abs/2409.06624v1

Compressor summary: The paper studies how to choose the right mixture of extra languages and learning rate for continuous pre-training large language models to improve their Chinese ability and adapt to other domains, and deploys a 70B version on a chat system.


Exploring Italian sentence embeddings properties through multi-tasking

http://arxiv.org/abs/2409.06622v1

Compressor summary: The study examines how well LLMs capture syntactic and semantic information in Italian using synthetic data from BLMs, finding that abstract linguistic concepts are not well-represented in pre-trained sentence embeddings.


MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification

http://arxiv.org/abs/2409.06620v1

Compressor summary: The paper presents a unified framework for text-to-3D content generation that uses multi-view guidance and a novel densification algorithm to produce realistic 3D models efficiently.


Hierarchical Multi-Label Classification with Missing Information for Benthic Habitat Imagery

http://arxiv.org/abs/2409.06618v1

Compressor summary: Key points: - Apply self-supervised learning techniques to seafloor imagery dataset (BenthicNet) - Study performance for complex hierarchical multi-label classification task - Show benefits of using in-domain data for pre-training over ImageNet Summary: The authors use self-supervised learning on a large seafloor imagery dataset to improve hierarchical multi-label classification and show that pre-training with in-domain data outperforms ImageNet.


When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking

http://arxiv.org/abs/2409.06617v1

Compressor summary: The paper proposes a selective feature extraction method for Multiple Object Tracking that reduces overhead and improves accuracy in occlusion scenarios.


Label-free Monitoring of Self-Supervised Learning Progress

http://arxiv.org/abs/2409.06612v1

Compressor summary: The study proposes label-free evaluation metrics for SSL encoders using unlabelled data and investigates their correlation with linear probe accuracy across different SSL methods.


Improving the Precision of CNNs for Magnetic Resonance Spectral Modeling

http://arxiv.org/abs/2409.06609v1

Compressor summary: This paper discusses challenges and solutions for using machine learning to analyze magnetic resonance spectroscopic imaging data, focusing on improving precision and error characterization.


A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising

http://arxiv.org/abs/2409.06603v1

Compressor summary: The proposed GRTN network uses a multi-fusion gated recurrent Transformer to achieve state-of-the-art video denoising performance with minimal delay, by selectively fusing relevant information from previous frames.


Alleviating Hallucinations in Large Language Models with Scepticism Modeling

http://arxiv.org/abs/2409.06601v1

Compressor summary: The Skepticism Modeling (SM) approach improves large language models' (LLMs) uncertainty estimation by combining token and logits information, pre-training, and fine-tuning with doubt emotion aware data.


GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering

http://arxiv.org/abs/2409.06595v1

Compressor summary: The paper introduces GroUSE, a benchmark to evaluate judge models in RAG systems, and finds that existing judges have limitations, while finetuning Llama-3 improves its performance.


Lightweight Multiscale Feature Fusion Super-Resolution Network Based on Two-branch Convolution and Transformer

http://arxiv.org/abs/2409.06590v1

Compressor summary: Key points: - Paper proposes a new multi-scale feature fusion network for single image super-resolution using convolutional and Transformer networks - Model fuses global and local information through two-branch architecture - Model uses modular connection to supplement low-pixel images with shallow and deep features - Model outperforms other lightweight models with same parameters Summary: The paper introduces a novel network that combines convolutional and Transformer networks for super-resolution, fusing global and local image features in two branches and supplementing low-pixel images with shallow and deep features, achieving better results than other lightweight models.


Developing the Temporal Graph Convolutional Neural Network Model to Predict Hip Replacement using Electronic Health Records

http://arxiv.org/abs/2409.06585v1

Compressor summary: Key points: - Hip replacement procedures improve quality of life and mobility - A temporal graph convolutional neural network (TG-CNN) model predicts hip replacement risk one year in advance using primary care medical event codes - The model achieves high accuracy and calibration, and outperforms four baselines Summary: The study developed a TG-CNN model that can accurately predict hip replacement need a year ahead by analysing primary care data, potentially improving patient care and health service efficiency.


Semi-Supervised 3D Object Detection with Chanel Augmentation using Transformation Equivariance

http://arxiv.org/abs/2409.06583v1

Compressor summary: The authors propose a teacher-student framework using channel augmentation for 3D semi-supervised object detection, which improves performance on the KITTI dataset.


Exploring syntactic information in sentence embeddings through multilingual subject-verb agreement

http://arxiv.org/abs/2409.06567v1

Compressor summary: The paper explores how well multilingual pretrained language models capture abstract linguistic representations by using synthetic data and a new multiple-choice task focused on subject-verb agreement in various languages, finding that these models have language-specific differences and syntactic structure is not shared even among closely related languages.


Learn2Aggregate: Supervised Generation of Chvátal-Gomory Cuts Using Graph Neural Networks

http://arxiv.org/abs/2409.06559v1

Compressor summary: Learn2Aggregate is a machine learning framework that uses graph neural networks to selectively aggregate constraints in Chv'atal-Gomory cuts for faster and stronger mixed integer linear programming solutions.


From LIMA to DeepLIMA: following a new path of interoperability

http://arxiv.org/abs/2409.06550v1

Compressor summary: The article presents LIMA, a framework for text analysis with deep neural networks, supporting over 60 languages and integrating with other platforms using Universal Dependencies.


Dynamic Decoupling of Placid Terminal Attractor-based Gradient Descent Algorithm

http://arxiv.org/abs/2409.06542v1

Compressor summary: The paper proposes four adaptive learning rates for gradient descent based on terminal attractor theory and terminal sliding mode theory to improve its convergence speed, and evaluates them in simulations.


Mapping News Narratives Using LLMs and Narrative-Structured Text Embeddings

http://arxiv.org/abs/2409.06540v1

Compressor summary: The authors propose a numerical representation of narratives based on Greimas' Actantial Model, which can be used to analyze news articles and understand how different texts present the same topic with different structures.


MENSA: A Multi-Event Network for Survival Analysis under Informative Censoring

http://arxiv.org/abs/2409.06525v1

Compressor summary: MENSA is a deep learning method for predicting the time until a patient with ALS loses various physical functions, improving on existing approaches by jointly learning covariate representations and event dependencies.


Deep Learning for Koopman Operator Estimation in Idealized Atmospheric Dynamics

http://arxiv.org/abs/2409.06522v1

Compressor summary: The paper presents methods to make data-driven weather forecasting models more interpretable using the Koopman operator, while addressing the challenges of applying it to large-scale atmospheric problems.


In Flight Boresight Rectification for Lightweight Airborne Pushbroom Imaging Spectrometry

http://arxiv.org/abs/2409.06520v1

Compressor summary: Our method automatically calibrates hyperspectral cameras on small aircraft using only spectral imagery and GPS/INS trajectory data, achieving accuracy similar to manual calibration.


Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games

http://arxiv.org/abs/2409.06518v1

Compressor summary: The paper explores how large language models represent knowledge about Olympic medal counts and finds they excel at reporting total medals but struggle with ranking information, unlike humans.


Aligning Machine and Human Visual Representations across Abstraction Levels

http://arxiv.org/abs/2409.06509v1

Compressor summary: The authors propose a method to make neural networks more human-like by transferring knowledge from a teacher model trained to imitate human judgments, improving their performance on various tasks and generalization abilities.


Neural Laplacian Operator for 3D Point Clouds

http://arxiv.org/abs/2409.06506v1

Compressor summary: Key points: - Discrete Laplacian operator is important for 3D geometry processing but hard to define on point clouds - Previous methods used local triangulation, which is not robust or accurate - Proposed method uses KNN graph and GNNs to learn Laplacian operator - Novel training scheme imitates ground-truth Laplacian behavior on probe functions - Method reduces error by an order of magnitude and handles sparse point clouds well - Method enables geometry processing on point clouds with learned Laplacian operator Summary: The paper proposes a method to learn the discrete Laplacian operator on point clouds using GNNs and a novel training scheme, achieving high accuracy and generalization, and enabling geometry processing applications on point clouds.


Mitigating Hallucination in Visual-Language Models via Re-Balancing Contrastive Decoding

http://arxiv.org/abs/2409.06485v1

Compressor summary: The paper proposes a Re-Balancing Contrastive Decoding (RBD) method that improves attention distribution in Visual-Language Models (VLMs) to reduce textual bias and enhance visual information, mitigating hallucinations.


Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout

http://arxiv.org/abs/2409.06477v1

Compressor summary: The paper applies MPC, rollout, and RL to computer chess, using a new architecture for move selection that incorporates multiple chess engines and improves their performance, particularly for position evaluation.


Weakly-supervised Camera Localization by Ground-to-satellite Image Registration

http://arxiv.org/abs/2409.06471v1

Compressor summary: This paper proposes a weakly supervised learning method for improving camera pose accuracy using satellite images, without requiring accurate GPS labels for training.


An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition

http://arxiv.org/abs/2409.06468v1

Compressor summary: The paper proposes a context-balanced learning objective for contextual adapters in end-to-end speech recognition models to address data imbalance issues and improve performance on rare words.


Learning Generative Interactive Environments By Trained Agent Exploration

http://arxiv.org/abs/2409.06445v1

Compressor summary: The paper introduces GenieRedux, an improved model that uses reinforcement learning agents for data generation, enhancing its ability to adapt and perform well in complex environments.


Knowledge Distillation via Query Selection for Detection Transformer

http://arxiv.org/abs/2409.06443v1

Compressor summary: The paper proposes a novel query selection method for compressing DETR models using knowledge distillation, which improves performance and reduces size without high computational costs.


Prompt2Fashion: An automatically generated fashion dataset

http://arxiv.org/abs/2409.06442v1

Compressor summary: The authors use generative models to create a fashion image dataset tailored to users' preferences and needs, and discuss the importance of expert evaluation for such datasets.


Extending Explainable Ensemble Trees (E2Tree) to regression contexts

http://arxiv.org/abs/2409.06439v1

Compressor summary: The paper introduces E2Tree, a method to explain random forests in both classification and regression tasks, by showing relationships between response variables, predictors, and their associations using dissimilarity measures.


A Short Information-Theoretic Analysis of Linear Auto-Regressive Learning

http://arxiv.org/abs/2409.06437v1

Compressor summary: The note proves that the Gaussian maximum likelihood estimator is consistent (works well) in linear auto-regressive models, using information theory and getting close to optimal performance.


Length Desensitization in Directed Preference Optimization

http://arxiv.org/abs/2409.06411v1

Compressor summary: LD-DPO is a method to reduce verbosity in language models by decoupling length preference from other preferences, leading to more concise and human-aligned responses.


Sources of Uncertainty in 3D Scene Reconstruction

http://arxiv.org/abs/2409.06407v1

Compressor summary: The paper introduces a taxonomy of uncertainties in NeRF and GS methods for 3D scene reconstruction and proposes techniques to estimate and capture these uncertainties.


Symmetry Breaking in Neural Network Optimization: Insights from Input Dimension Expansion

http://arxiv.org/abs/2409.06402v1

Compressor summary: The paper proposes that symmetry breaking is important for neural network optimization and introduces a metric to measure it, which can help improve network design and performance.


Coarse-Grained Sense Inventories Based on Semantic Matching between English Dictionaries

http://arxiv.org/abs/2409.06386v1

Compressor summary: The paper proposes new coarse-grained sense inventories for natural language processing tasks by semantically matching WordNet and Cambridge dictionaries and shows their advantages in semantic coherence, resource dependency, and usability.


AMNS: Attention-Weighted Selective Mask and Noise Label Suppression for Text-to-Image Person Retrieval

http://arxiv.org/abs/2409.06385v1

Compressor summary: The paper proposes a method to improve text-to-image person retrieval by suppressing noise labels and using attention-weighted selective mask to handle noisy image-text pairings.


A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions

http://arxiv.org/abs/2409.06381v1

Compressor summary: The paper proposes a new network method to help decipher ancient Chinese writing system by finding similarities between different font styles.