arxiv compressed, 2024-02-01

This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-01 generated by the compressor, my personal LLM-based project.


Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators

http://arxiv.org/abs/2401.18085v1

Compressor summary: Motion guidance is a technique that enables precise editing of image layouts, positions, poses, and shapes by using dense, complex motion fields guided by an optical flow network in the diffusion sampling process.


Binding Touch to Everything: Learning Unified Multimodal Tactile Representations

http://arxiv.org/abs/2401.18084v1

Compressor summary: UniTouch is a unified tactile model that enables vision-based touch sensors to learn from multiple modalities and perform various touch sensing tasks in the zero-shot setting.


Improved Scene Landmark Detection for Camera Localization

http://arxiv.org/abs/2401.18083v1

Compressor summary: The paper proposes a method to improve scene landmark detection for camera localization by splitting landmarks into subgroups, using dense reconstructions, and having a compact architecture.


KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

http://arxiv.org/abs/2401.18079v1

Compressor summary: KVQuant is a new method that accurately compresses key-value cache activations in large language models for better memory efficiency and performance.


CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting

http://arxiv.org/abs/2401.18075v1

Compressor summary: CARFF is a method that predicts future 3D scenes from past images using a probabilistic encoder and a Neural Radiance Field, handling uncertainty and dynamics for applications like autonomous driving.


Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?

http://arxiv.org/abs/2401.18070v1

Compressor summary: The study examines how well large language models can solve arithmetic word problems like children, focusing on comprehension, planning, and execution steps.


RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

http://arxiv.org/abs/2401.18059v1

Compressor summary: The text proposes a new approach called RAPTOR that improves retrieval-augmented language models by recursively embedding, clustering, and summarizing chunks of text, leading to better integration of information across documents and complex reasoning tasks.


LongAlign: A Recipe for Long Context Alignment of Large Language Models

http://arxiv.org/abs/2401.18058v1

Compressor summary: LongAlign is a method for improving large language models' performance on long context tasks by fine-tuning them with instruction data, using packing, sorted batching, and loss weighting strategies, and introducing a new benchmark called LongBench-Chat.


Rank Supervised Contrastive Learning for Time Series Classification

http://arxiv.org/abs/2401.18057v1

Compressor summary: Rank Supervised Contrastive Learning is a new technique for time series classification that uses targeted data augmentation, selective filtering, and a novel rank loss to capture fine-grained similarity information and achieve state-of-the-art performance.


Benchmarking Sensitivity of Continual Graph Learning for Skeleton-Based Action Recognition

http://arxiv.org/abs/2401.18054v1

Compressor summary: The paper proposes a new benchmark for continual graph learning with spatio-temporal graphs, studies the impact of learning order and GNN architecture on performance, and reveals novel insights on class-order and task-order sensitivity.


Epidemic Modeling using Hybrid of Time-varying SIRD, Particle Swarm Optimization, and Deep Learning

http://arxiv.org/abs/2401.18047v1

Compressor summary: The paper proposes a hybrid model using epidemic modeling, particle swarm optimization, and deep learning to better predict multiple waves of an epidemic, and shows its effectiveness in forecasting COVID-19 cases for the USA, India, and the UK.


Multipath parsing in the brain

http://arxiv.org/abs/2401.18046v1

Compressor summary: The study examines how humans process syntactic ambiguities in sentences by comparing two hypotheses using fMRI data and finding evidence for multipath parsing in brain regions like the superior temporal gyrus.


SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition

http://arxiv.org/abs/2401.18045v1

Compressor summary: SpeechComposer is a novel speech language model that enhances performance in multiple speech tasks by composing prompt tokens, enabling knowledge sharing among tasks.


Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability

http://arxiv.org/abs/2401.18040v1

Compressor summary: The study investigates intrinsic motivation reinforcement learning algorithms for dialogue systems, achieving improved performance and domain resilience by using random network distillation and curiosity-driven exploration.


Optimizing contrastive learning for cortical folding pattern detection

http://arxiv.org/abs/2401.18035v1

Compressor summary: Key points: - The text describes a self-supervised deep learning model for detecting folding patterns in the cerebral cortex, using MRI datasets of thousands of subjects. - The model is applied to the cingulate region and can distinguish between double-parallel pattern, which is linked to schizophrenia characteristics. - This is a novel approach that could be extended to other brain regions for biomarker detection. Summary: The authors develop and test a deep learning model that can automatically detect cortical folding patterns from MRI scans of large cohorts, and demonstrate its ability to identify a pattern associated with schizophrenia in the cingulate region.


Paramanu: A Family of Novel Efficient Indic Generative Foundation Language Models

http://arxiv.org/abs/2401.18034v1

Compressor summary: Gyan AI Paramanu is a family of novel Indic language models pretrained on a single GPU for 10 Indian languages, outperforming large LLMs while being much smaller and efficient.


DROP: Decouple Re-Identification and Human Parsing with Task-specific Features for Occluded Person Re-identification

http://arxiv.org/abs/2401.18032v1

Compressor summary: The DROP method decouples and combines features for occluded person re-identification and human parsing, improving performance over existing approaches.


Supporting Anticipatory Governance using LLMs: Evaluating and Aligning Large Language Models with the News Media to Anticipate the Negative Impacts of AI

http://arxiv.org/abs/2401.18028v1

Compressor summary: The paper explores using large language models (LLMs) to generate and compare categories of negative AI impacts from news articles, finding that fine-tuned models perform better than instruction-based ones in reflecting the taxonomy of impacts.


Prompt-Driven LLM Safeguarding via Directed Representation Optimization

http://arxiv.org/abs/2401.18018v1

Compressor summary: The study explores how safety prompts affect language models' representations and proposes a method called DRO to optimize them for better LLM safety.


Desiderata for the Context Use of Question Answering Systems

http://arxiv.org/abs/2401.18001v1

Compressor summary: Our summary is: The paper surveys QA systems' problems, proposes desiderata to evaluate them, and finds novel trends across multiple issues.


Multilinear Operator Networks

http://arxiv.org/abs/2401.17992v1

Compressor summary: The paper introduces MONet, a new model for image recognition that uses only multilinear operators and outperforms previous polynomial networks, achieving similar results to modern neural network architectures.


Shrub of a thousand faces: an individual segmentation from satellite images using deep learning

http://arxiv.org/abs/2401.17985v1

Compressor summary: This study uses deep learning models and remotely sensed imagery to map individual juniper shrubs in Sierra Nevada, Spain, and develops a new evaluation metric for complex growth patterns.


Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study

http://arxiv.org/abs/2401.17981v1

Compressor summary: The paper studies how to improve MLLMs' image understanding by infusing detection information and evaluates different models for this purpose, achieving significant improvements in multimodal dialogue.


Entity Linking in the Job Market Domain

http://arxiv.org/abs/2401.17979v1

Compressor summary: The paper explores entity linking for skill mentions in job market domain using neural models and ESCO taxonomy.


Understanding polysemanticity in neural networks through coding theory

http://arxiv.org/abs/2401.17975v1

Compressor summary: The paper proposes a novel method to interpret neural networks using neuroscience and information theory tools, which can reveal the level of redundancy, smoothness, and differentiability of network codes, and explains how these properties affect learning performance and polysemantic neurons.


GUMsley: Evaluating Entity Salience in Summarization for 12 English Genres

http://arxiv.org/abs/2401.17974v1

Compressor summary: The paper introduces GUMsley, a dataset for evaluating entity salience in different text genres, and shows that using salient entities improves summarization quality.


MelNet: A Real-Time Deep Learning Algorithm for Object Detection

http://arxiv.org/abs/2401.17972v1

Compressor summary: The study introduces MelNet, a novel deep learning algorithm for object detection, and compares its performance with other models using the KITTI dataset.


HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full Context Interaction

http://arxiv.org/abs/2401.17948v1

Compressor summary: The paper proposes a novel Terminator architecture that uses coordinate-based implicit MLPs to generate hyper-kernels for enhancing feature extraction in self-attention mechanisms, achieving faster training convergence and better performance on image classification tasks.


[Lions: 1] and [Tigers: 2] and [Bears: 3], Oh My! Literary Coreference Annotation with LLMs

http://arxiv.org/abs/2401.17922v1

Compressor summary: Coreference annotation and resolution is essential for computational literary studies, but challenging for fiction due to structured outputs, inferences, and varied language; seq2seq systems can address these issues by learning to generate marked-up copies of sentences.


LOCOST: State-Space Models for Long Document Abstractive Summarization

http://arxiv.org/abs/2401.17919v1

Compressor summary: LOCOST is a state-space model-based encoder-decoder architecture that generates long text summaries from long context inputs with low complexity and memory efficiency, outperforming sparse transformers on full-book summarization tasks.


Source-free Domain Adaptive Object Detection in Remote Sensing Images

http://arxiv.org/abs/2401.17916v1

Compressor summary: The paper proposes a source-free object detection method for remote sensing images that uses perturbation and alignment techniques to adapt to different domains without accessing the source data.


SNNLP: Energy-Efficient Natural Language Processing Using Spiking Neural Networks

http://arxiv.org/abs/2401.17911v1

Compressor summary: The paper proposes a new spike-based text encoding method for natural language processing tasks on spiking neural networks, which shows better performance and energy efficiency compared to traditional deep learning models.


Controllable Dense Captioner with Multimodal Embedding Bridging

http://arxiv.org/abs/2401.17910v1

Compressor summary: ControlCap is a multimodal embedding architecture that uses linguistic guidance to produce dense captions for images and videos, achieving state-of-the-art results.


Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation

http://arxiv.org/abs/2401.17904v1

Compressor summary: Hi-SAM is a unified model that excels in hierarchical text segmentation using SAM and offers automatic and promptable mask generation modes.


Employing Label Models on ChatGPT Answers Improves Legal Text Entailment Performance

http://arxiv.org/abs/2401.17897v1

Compressor summary: Legal text entailment using ChatGPT can be improved by consolidating its provisional answers with label models, achieving a state-of-the-art accuracy of 76.15%.


ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

http://arxiv.org/abs/2401.17895v1

Compressor summary: RAM3D is a new method that lets you replace specific objects in a 3D scene using text prompts and multi-view images, while keeping the scene consistent and realistic.


Reimagining Reality: A Comprehensive Survey of Video Inpainting Techniques

http://arxiv.org/abs/2401.17883v1

Compressor summary: The paper reviews recent advancements in video inpainting, evaluates them on visual quality and computational efficiency using human annotators and standardized hardware, and suggests future directions.


I Think, Therefore I am: Awareness in Large Language Models

http://arxiv.org/abs/2401.17882v1

Compressor summary: The paper introduces the concept of awareness in large language models and proposes a way to measure it using a new dataset.


PVLR: Prompt-driven Visual-Linguistic Representation Learning for Multi-Label Image Recognition

http://arxiv.org/abs/2401.17881v1

Compressor summary: PVLR is a framework that uses dual prompting strategies to leverage language models for multi-label image recognition, improving performance over previous methods.


AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error

http://arxiv.org/abs/2401.17879v1

Compressor summary: AEROBLADE detects deepfake images by measuring the reconstruction error of an autoencoder, without needing training or special tools.


VR-based generation of photorealistic synthetic data for training hand-object tracking models

http://arxiv.org/abs/2401.17874v1

Compressor summary: Blender-hoisynth is an interactive tool that generates realistic 3D hand-object interaction data for training supervised learning models, enabling users to annotate and control the data with virtual reality hands.


Efficient Subseasonal Weather Forecast using Teleconnection-informed Transformers

http://arxiv.org/abs/2401.17870v1

Compressor summary: The authors propose a new machine learning model that uses teleconnection information to improve subseasonal forecasting and reduce carbon emissions compared to current methods.


Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

http://arxiv.org/abs/2401.17868v1

Compressor summary: Conv-LoRA is a method that improves image segmentation by integrating lightweight convolutional parameters into the Segment Anything Model, enhancing its performance across various domains.


Manipulating Predictions over Discrete Inputs in Machine Teaching

http://arxiv.org/abs/2401.17865v1

Compressor summary: Key points: - Machine teaching: creating optimal dataset for student model to achieve goals given by teacher - Focus: discrete domain, manipulating student's predictions based on teacher's goals - Method: combinatorial optimization and iterative searching algorithm - Application: correcting errors or causing misclassification for personal gain - Result: superior performance over baselines Summary: The paper proposes a method to manipulate student model's predictions in the discrete domain using machine teaching, which involves optimizing data and applying an iterative searching algorithm. This can be used for error correction or malicious manipulation, and shows better results than conventional approaches.


Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis

http://arxiv.org/abs/2401.17862v1

Compressor summary: The text introduces Proximity QA, a framework to improve MLLMs' depth perception and proximity analysis of objects in images using visual instruction tuning and a new dataset called Proximity-110K.


Semantic Anything in 3D Gaussians

http://arxiv.org/abs/2401.17857v1

Compressor summary: SA-GS is a novel interactive method for object segmentation in 3D Gaussian Splatting, enabling applications like VR, AR, and movie production.


Probing Language Models' Gesture Understanding for Enhanced Human-AI Interaction

http://arxiv.org/abs/2401.17858v1

Compressor summary: The project proposal investigates how Large Language Models can understand and generate gestures in different contexts by using established psycholinguistic study designs and evaluating their ability to simulate human behaviour.


Instruction-Guided Scene Text Recognition

http://arxiv.org/abs/2401.17851v1

Compressor summary: IGTR is a novel method that uses instruction learning to improve scene text recognition by understanding character attributes and handling various recognition tasks.


Global-Liar: Factuality of LLMs over Time and Geographic Regions

http://arxiv.org/abs/2401.17839v1

Compressor summary: The study evaluates GPT models' accuracy, stability, and biases, using a balanced dataset called 'Global-Liar,' and finds that GPT-4 has regional and temporal biases, while configuration settings affect factuality.


A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint Prediction

http://arxiv.org/abs/2401.17838v1

Compressor summary: The paper proposes a novel framework, CHGH, that uses graph learning to predict skill demand and supply variations in the labor market, considering their complex interconnections.


Predicting the Future with Simple World Models

http://arxiv.org/abs/2401.17835v1

Compressor summary: The Parsimonious Latent Space Model (PLSM) simplifies the dynamics of high-dimensional world models to improve their generalization and performance in various tasks.


Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation

http://arxiv.org/abs/2401.17828v1

Compressor summary: SWTformer is a method that uses Swin Transformer to generate class probabilities and CAMs from image-level labels, improving semantic segmentation accuracy by combining local and global views.


Neural Machine Translation for Malayalam Paraphrase Generation

http://arxiv.org/abs/2401.17827v1

Compressor summary: The study examines four methods to create Malayalam paraphrases and compares automated metrics and human evaluation, finding discrepancies that highlight the need for more nuanced evaluation for agglutinative languages.


A Survey of Pre-trained Language Models for Processing Scientific Text

http://arxiv.org/abs/2401.17824v1

Compressor summary: This paper reviews scientific language models (SciLMs) and compares their performance across various applications and data sets, highlighting the need for further research in this growing field.


Privacy-preserving data release leveraging optimal transport and particle gradient descent

http://arxiv.org/abs/2401.17823v1

Compressor summary: PrivPGD is a new method for generating private data synthesis from tabular datasets, using optimal transport and particle gradient descent, which improves upon existing methods and can handle domain-specific constraints.


Do Object Detection Localization Errors Affect Human Performance and Trust?

http://arxiv.org/abs/2401.17821v1

Compressor summary: The study examines how bounding box accuracy affects human trust and performance in object detection tasks and suggests using F1 score optimization and center dots for better results.


SWEA: Changing Factual Knowledge in Large Language Models via Subject Word Embedding Altering

http://arxiv.org/abs/2401.17809v1

Compressor summary: The text proposes an expandable framework for modifying subject word embeddings to edit knowledge in LLMs without damaging them or increasing inference overhead, and shows its performance on various datasets.


Advances in 3D Generation: A Survey

http://arxiv.org/abs/2401.17807v1

Compressor summary: The text provides an overview of 3D content generation methods, including representation, algorithms, datasets, and applications, to help readers understand the current state and future directions of the field.


SimAda: A Simple Unified Framework for Adapting Segment Anything Model in Underperformed Scenes

http://arxiv.org/abs/2401.17803v1

Compressor summary: The paper proposes SimAda, a simple framework to improve the generalization of SAM across various downstream tasks by adapting its general modules without dataset-specific designs.


Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning

http://arxiv.org/abs/2401.17802v1

Compressor summary: DE-TSMCL is a novel framework that leverages contrastive learning and data augmentation to improve long sequence time series forecasting by enhancing feature representations.


M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

http://arxiv.org/abs/2401.17797v1

Compressor summary: M2-RAAP is a multi-modal recipe for improving zero-shot video-text retrieval by addressing data quality, input type, temporal modeling, and feature enhancement issues.


Graph Transformers without Positional Encodings

http://arxiv.org/abs/2401.17791v1

Compressor summary: Eigenformer introduces a new attention mechanism for graph representation learning that utilizes the Laplacian spectrum and achieves comparable or better performance than existing methods while being faster to train.


RADIN: Souping on a Budget

http://arxiv.org/abs/2401.17790v1

Compressor summary: This paper proposes a method to speed up model soups by using ensemble logits instead of subset selection, and shows its effectiveness in various settings.


Robustly overfitting latents for flexible neural image compression

http://arxiv.org/abs/2401.17789v1

Compressor summary: Key points: - Neural image compression models use variational autoencoders to encode and decode images - SGA+ is a refinement method that improves performance, sensitivity to hyperparameters, and extends to three-class rounding - Refinement with the best method reduces compression error on Tecnick dataset and moves along rate-distortion curve Summary: SGA+ enhances neural image compression models by refining latent representations and achieving better compression performance.


SDRDPy: An application to graphically visualize the knowledge obtained with supervised descriptive rule algorithms

http://arxiv.org/abs/2401.17783v1

Compressor summary: SDRDPy is an easy-to-use app that helps experts analyze and report on rules discovered by supervised data mining algorithms.


A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees

http://arxiv.org/abs/2401.17780v1

Compressor summary: The paper proposes a new primal-dual RL algorithm for online CMDPs with Uniform-PAC guarantees of convergence, sublinear regret, and polynomial sample complexity.


Double InfoGAN for Contrastive Analysis

http://arxiv.org/abs/2401.17776v1

Compressor summary: Double InfoGAN is a GAN-based method for contrastive analysis that achieves better latent separation and image quality than existing VAE-based methods on various visual datasets, including medical images.


SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks

http://arxiv.org/abs/2401.17773v1

Compressor summary: Key points: - A framework for learning cross-modal video representations by pre-training on raw data - Proposed Shared Network Pre-training (SNP) and Significant Semantic Strengthening (S3) strategies - Achieved state-of-the-art in pixel-level video-text pre-training and balanced efficiency and performance Summary: The paper presents a framework that uses SNP and S3 to pre-train cross-modal video representations from raw data, achieving state-of-the-art results and efficiency.


Fine-Grained Zero-Shot Learning: Advances, Challenges, and Prospects

http://arxiv.org/abs/2401.17766v1

Compressor summary: The paper reviews recent advances in fine-grained analysis for zero-shot learning, providing a taxonomy of methods, a benchmark of datasets and models, and discussing challenges and future directions.


Tiered approach for rapid damage characterisation of infrastructure enabled by remote sensing and deep learning technologies

http://arxiv.org/abs/2401.17759v1

Compressor summary: The text discusses a new three-level approach to using technology for rapid damage assessment of bridges and other critical infrastructure during wars and disasters, which can improve decision-making and resilience.


CauESC: A Causal Aware Model for Emotional Support Conversation

http://arxiv.org/abs/2401.17755v1

Compressor summary: CauESC is a novel framework that recognizes emotion causes of distress, understands verbal grooming strategies, and improves emotional support in conversations by modeling causal and interactive effects of emotions.


PF-GNN: Differentiable particle filtering based approximation of universal graph representations

http://arxiv.org/abs/2401.17752v1

Compressor summary: The authors propose a method to improve the expressive power of GNNs by using exact isomorphism solver techniques and probabilistic sampling, achieving better graph representation learning with linear increase in runtime.


SwarmBrain: Embodied agent for real-time strategy game StarCraft II via large language models

http://arxiv.org/abs/2401.17749v1

Compressor summary: Key points: - The paper introduces SwarmBrain, an embodied agent using LLMs for real-time strategy in StarCraft II - SwarmBrain has two components: Overmind Intelligence Matrix and Swarm ReflexNet - SwarmBrain can achieve victory against Computer players at different difficulty levels Summary: The paper presents SwarmBrain, a StarCraft II agent that uses large language models to orchestrate macro-level strategies and respond quickly to tactical situations. SwarmBrain can beat various Computer opponents.


Algorithmic Robust Forecast Aggregation

http://arxiv.org/abs/2401.17743v1

Compressor summary: The paper proposes an algorithmic framework for robust forecast aggregation that finds the best aggregator for different information structures, and shows its effectiveness in numerical experiments.


Leveraging Human-Machine Interactions for Computer Vision Dataset Quality Enhancement

http://arxiv.org/abs/2401.17736v1

Compressor summary: Multilabelfy is a framework that combines human and machine intelligence to validate and enhance dataset quality for multi-label classification tasks.


COMET: Contrastive Mean Teacher for Online Source-Free Universal Domain Adaptation

http://arxiv.org/abs/2401.17728v1

Compressor summary: COMET is a novel online test-time adaptation method that adapts pre-trained models to new classes without source data, using contrastive and entropy losses within a mean teacher framework.


Enhancing Large Language Model with Decomposed Reasoning for Emotion Cause Pair Extraction

http://arxiv.org/abs/2401.17716v1

Compressor summary: DECC is a framework that uses large language models to extract emotion-cause pairs from text by guiding them with chain-of-thought and enhancing with in-context learning.


3D-Plotting Algorithm for Insects using YOLOv5

http://arxiv.org/abs/2401.17714v1

Compressor summary: The study developed a simple and inexpensive method to monitor insects in 3D using computer vision techniques, enabling better understanding of their behavior and ecology.


Aesthetic Preference Prediction in Interior Design: Fuzzy Approach

http://arxiv.org/abs/2401.17710v1

Compressor summary: The paper presents a novel method for quantifying and predicting aesthetic preferences in interior design using fuzzy logic and image processing techniques.


Predicting suicidal behavior among Indian adults using childhood trauma, mental health questionnaires and machine learning cascade ensembles

http://arxiv.org/abs/2401.17705v1

Compressor summary: The study shows that machine learning algorithms can accurately predict suicidal behavior in young Indians based on childhood trauma and mental health data, suggesting potential for early intervention.


WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts

http://arxiv.org/abs/2401.17703v1

Compressor summary: The paper introduces Tree-of-Experts, a method to improve generating Winograd Schema Challenge questions, and WSC+, a new dataset with more categories and insights into LLMs' overconfidence and bias.


Unified Physical-Digital Face Attack Detection

http://arxiv.org/abs/2401.17699v1

Compressor summary: The authors propose a unified dataset and a vision-language model to detect both physical and digital attacks on face recognition systems in a single framework, improving detection performance.


Datacube segmentation via Deep Spectral Clustering

http://arxiv.org/abs/2401.17695v1

Compressor summary: The paper explores using unsupervised clustering methods to analyze large data cubes from physics experiments, such as X-ray fluorescence on artworks and simulated astrophysical observations.


Mitigating the Problem of Strong Priors in LMs with Context Extrapolation

http://arxiv.org/abs/2401.17692v1

Compressor summary: The authors propose a technique to mitigate strong priors problem in language models by generating weakened versions of instructions and extrapolating continuations from them, leading to improvements on eleven models across four tasks.


Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning

http://arxiv.org/abs/2401.17686v1

Compressor summary: The paper proposes Deductive Beam Search, which improves Large Language Models' reasoning capabilities by integrating chain-of-thought and deductive reasoning with step-wise beam search and verification.


Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain

http://arxiv.org/abs/2401.17671v1

Compressor summary: This paper investigates how high-performance large language models (LLMs) resemble the brain's language processing mechanisms and suggests that contextual information is crucial for improving both model performance and brain similarity.


Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation

http://arxiv.org/abs/2401.17664v1

Compressor summary: ImgAny is a novel multi-modal generative model that can create high-quality images from any combination of seven input modalities, mimicking human reasoning and perception.


Document Structure in Long Document Transformers

http://arxiv.org/abs/2401.17658v1

Compressor summary: The text discusses how long-document Transformers acquire and use document structure during pre-training and inference, and evaluates the effects of structure infusion on two challenging NLP tasks.


An attempt to generate new bridge types from latent space of energy-based model

http://arxiv.org/abs/2401.17657v1

Compressor summary: Key points: - Energy-based model for bridge-type innovation using neural network and Langevin dynamics - Bridge-type population follows Boltzmann distribution - Train energy function on structured image dataset of four types of bridges - Generate new bridge types from latent space with low energy scores Summary: The paper proposes an energy-based model for generating new bridge types by training a neural network on bridge images and using Langevin dynamics to sample from the Boltzmann distribution of bridge populations.


All Beings Are Equal in Open Set Recognition

http://arxiv.org/abs/2401.17654v1

Compressor summary: DCTAU is a novel open-set recognition framework that models potential open space by expanding unknown classes near targeted known classes and uses a dual contrastive loss to effectively alleviate distribution disruption and imbalance issues.


A primer on synthetic health data

http://arxiv.org/abs/2401.17653v1

Compressor summary: The text discusses advances in creating realistic synthetic health datasets to preserve characteristics and enable safe data sharing without revealing patient identity, while addressing challenges, evaluation methods, deployment examples, regulation, ethics, access, governance, and future opportunities.


Exploring the Common Appearance-Boundary Adaptation for Nighttime Optical Flow

http://arxiv.org/abs/2401.17642v1

Compressor summary: The paper proposes a method to improve nighttime optical flow by using a common-latent space to align features between daytime and nighttime images.


Navigating the OverKill in Large Language Models

http://arxiv.org/abs/2401.17633v1

Compressor summary: The paper explores why large language models may refuse harmless queries and proposes Self-Contrastive Decoding, a technique to reduce this problem without retraining the model.


What Do Self-Supervised Speech and Speaker Models Learn? New Findings From a Cross Model Layer-Wise Analysis

http://arxiv.org/abs/2401.17632v1

Compressor summary: This paper investigates how speech self-supervised learning (SSL) and speaker self-supervised learning (SSSL) represent speech properties and speaker information, revealing differences in their capacities and layer usage.


Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models

http://arxiv.org/abs/2401.17629v1

Compressor summary: SaFaRI is a diffusion model for image restoration that preserves data-fidelity in spatial and frequency domains, achieving state-of-the-art performance on various noisy inverse problems.


Neighboring Perturbations of Knowledge Editing on Large Language Models

http://arxiv.org/abs/2401.17623v1

Compressor summary: This paper explores how appending new knowledge to large language models affects their existing knowledge and introduces a framework to minimize this impact.


Graph Multi-Similarity Learning for Molecular Property Prediction

http://arxiv.org/abs/2401.17615v1

Compressor summary: GraphMSL is a novel molecular representation learning framework that captures self-similarity and relative similarities using multimodal continuous similarity metrics, improving effectiveness in predicting molecular properties and enabling drug discovery.


IGCN: Integrative Graph Convolutional Networks for Multi-modal Data

http://arxiv.org/abs/2401.17612v1

Compressor summary: The paper introduces Integrative Graph Convolutional Networks (IGCN), a novel neural network approach for multi-modal data networks, which learns node embeddings from multiple topologies and fuses them using attention to improve model interpretability and performance on various node classification tasks.


LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement

http://arxiv.org/abs/2401.17609v1

Compressor summary: LaneGraph2Seq is a novel method for extracting lane graphs from images using a language model with vertex-edge encoding and connectivity enhancement, achieving better results than existing approaches.


Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition

http://arxiv.org/abs/2401.17604v1

Compressor summary: The Economical Cued Speech Fusion Transformer (EcoCued) is a new method that uses a novel Token-Importance-Aware Attention mechanism to improve automatic Cued Speech recognition by efficiently capturing cross-modal relationships between lip reading and hand cueing.


Topology-Aware Latent Diffusion for 3D Shape Generation

http://arxiv.org/abs/2401.17603v1

Compressor summary: Key points: - New generative model combines latent diffusion with persistent homology - Model creates 3D shapes with high diversity and topological characteristics - Shape generation involves embedding implicit representation into latent vectors and navigating through them via diffusion - Framework is flexible, supporting various input modalities and topology modifications Summary: The paper presents a new generative model that uses latent diffusion and persistent homology to create diverse 3D shapes with controlled topological features, and works with different inputs and outputs.


Assertion Detection Large Language Model In-context Learning LoRA Fine-tuning

http://arxiv.org/abs/2401.17602v1

Compressor summary: The study proposes a novel method using Large Language Models and advanced reasoning techniques for assertion detection in clinical NLP, improving the understanding of medical conditions from unstructured texts.


Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data

http://arxiv.org/abs/2401.17600v1

Compressor summary: The paper introduces a benchmark to evaluate how well vision-language models perform on tasks involving Earth observation data, and finds that they excel at open-ended tasks but struggle with spatial reasoning.


SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization

http://arxiv.org/abs/2401.17597v1

Compressor summary: The paper proposes a method for summarizing long dialogues by using speaker information and pre-training on diverse datasets, achieving state-of-the-art performance.


Local Feature Matching Using Deep Learning: A Survey

http://arxiv.org/abs/2401.17592v1

Compressor summary: Local feature matching methods are categorized into detector-based and detector-free techniques, which use deep learning models to improve accuracy and robustness in computer vision applications like image retrieval, 3D reconstruction, and object recognition.


Local and Global Contexts for Conversation

http://arxiv.org/abs/2401.17588v1

Compressor summary: The paper proposes a local and global conversation model (LGCM) that uses both local and global contexts to generate accurate responses in open domain conversations.


Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks

http://arxiv.org/abs/2401.17585v1

Compressor summary: The paper introduces ReCoE, a dataset to analyze challenges in updating interconnected facts for accurate reasoning, and finds existing knowledge editing methods perform poorly on it.


Graph Contrastive Learning with Cohesive Subgraph Awareness

http://arxiv.org/abs/2401.17580v1

Compressor summary: CTAug is a framework that improves graph contrastive learning by preserving cohesion properties and enhancing the encoder's ability to discern subgraph patterns.


Scavenging Hyena: Distilling Transformers into Long Convolution Models

http://arxiv.org/abs/2401.17574v1

Compressor summary: The paper proposes using Hyena mechanism instead of attention heads in transformer models for more efficient pre-training of large language models.


Rethinking Channel Dependence for Multivariate Time Series Forecasting: Learning from Leading Indicators

http://arxiv.org/abs/2401.17548v1

Compressor summary: LIFT is a new method for multivariate time series forecasting that leverages local lead-lag relationships between variates to improve accuracy by 5.5%.


Task-Oriented Diffusion Model Compression

http://arxiv.org/abs/2401.17547v1

Compressor summary: Key points: - Recent T2I models generate high-quality images but are computationally expensive - Paper proposes a method to compress I2I models in a task-oriented way - Method reduces model size and number of timesteps - Method improves efficiency for image editing and restoration tasks Summary: The paper presents a novel compression method for I2I models that reduces their size and computational cost while maintaining near-optimal results for image editing and restoration.


Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs

http://arxiv.org/abs/2401.17544v1

Compressor summary: QFX is a trainable fixed-point quantization method that learns binary-point positions and minimizes DSP usage, achieving higher accuracy on FPGA deployment of deep learning models compared to post-training quantization.


Data-Effective Learning: A Comprehensive Medical Benchmark

http://arxiv.org/abs/2401.17542v1

Compressor summary: The paper introduces a benchmark for medical data-effective learning, which aims to use data efficiently and effectively to train AI models in healthcare.


Towards Understanding Variants of Invariant Risk Minimization through the Lens of Calibration

http://arxiv.org/abs/2401.17541v1

Compressor summary: The study investigates approximate IRM techniques for robustness and finds Information Bottleneck-based IRM improves ECE while preserving accuracy without overfitting.


Enhancing Score-Based Sampling Methods with Ensembles

http://arxiv.org/abs/2401.17539v1

Compressor summary: Ensembles improve score-based sampling methods by using particle dynamics to approximate reverse diffusion drifts for modeling complex probability distributions without gradients.


PipeNet: Question Answering with Semantic Pruning over Knowledge Graphs

http://arxiv.org/abs/2401.17536v1

Compressor summary: The paper proposes a method to improve question answering efficiency by finding semantically related entity nodes in knowledge graphs and pruning noisy ones using dependency distance and graph attention network.


Learning to Stop Cut Generation for Efficient Mixed-Integer Linear Programming

http://arxiv.org/abs/2401.17527v1

Compressor summary: The paper proposes a novel reinforcement learning method (HYGRO) to learn optimal stopping strategies for cuts generation in mixed-integer linear programs, improving their solving efficiency.


Game-Theoretic Unlearnable Example Generator

http://arxiv.org/abs/2401.17523v1

Compressor summary: The paper proposes a novel game-theoretic approach for unlearnable example attacks on deep neural networks, called Game Unlearnable Example (GUE), which effectively degrades test accuracy by adding imperceptible perturbations to training data.


Towards Image Semantics and Syntax Sequence Learning

http://arxiv.org/abs/2401.17515v1

Compressor summary: The authors propose a two-stage approach to learn image grammar, which represents the semantics and order of parts in an image, to help image classifiers detect corruptions involving missing or disarrayed objects.


FEUDA: Frustratingly Easy Prompt Based Unsupervised Domain Adaptation

http://arxiv.org/abs/2401.17514v1

Compressor summary: FEUDA is a frustratingly easy UDA method that uses two instruction-tuning tasks and masked language modeling to adapt to different domains without labeled data from the target domain.


Linguistically Communicating Uncertainty in Patient-Facing Risk Prediction Models

http://arxiv.org/abs/2401.17511v1

Compressor summary: The paper discusses the difficulties of explaining AI models for healthcare risks in natural language and proposes a solution for predicting IVF outcomes.