This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-07 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.03326v1
Compressor summary: LLaVA-OneVision is a multimodal model that excels in single-image, multi-image, and video scenarios and enables strong transfer learning across different modalities/scenarios.
http://arxiv.org/abs/2408.03325v1
Compressor summary: CoverBench is a benchmark for verifying language models' outputs in complex reasoning settings by using diverse datasets from various domains, types of reasoning, and standardizations.
http://arxiv.org/abs/2408.03319v1
Compressor summary: The study analyzes hedges in Roadrunner cartoon narratives, comparing three LLM-based approaches for hedge detection and improving the gold standard coding.
http://arxiv.org/abs/2408.03314v1
Compressor summary: This paper studies how using more test-time computation can improve LLMs' performance on open-ended natural language tasks and proposes a "compute-optimal" strategy to adaptively allocate infer-time compute per prompt, leading to significant efficiency improvements.
http://arxiv.org/abs/2408.03312v1
Compressor summary: The authors introduce a novel Masked Diffusion Transformer for co-speech gesture generation, which improves contextual reasoning and incorporates multi-modal information, achieving faster learning and inference speeds compared to traditional models.
http://arxiv.org/abs/2408.03304v1
Compressor summary: The authors propose a method to improve the automatic tracing of elaborate illustrations on Etruscan mirrors using a deep neural network that interactively refines existing annotations based on human guidance, achieving higher quality with less manual input and faster speed.
http://arxiv.org/abs/2408.03302v1
Compressor summary: TextIM is a framework for generating realistic human interactive motions based on textual descriptions, focusing on aligning part-level semantics and achieving spatial coherence.
http://arxiv.org/abs/2408.03297v1
Compressor summary: KaPO is a method to improve large language models' ability to handle knowledge conflicts and select relevant information in real retrieval scenarios using preference optimization.
http://arxiv.org/abs/2408.03291v1
Compressor summary: DopQ-ViT is a new method to compress vision transformers by using a distribution-friendly Tan Quantizer and a scaling factor compensation technique, which improves accuracy and efficiency in low-bit settings.
http://arxiv.org/abs/2408.03290v1
Compressor summary: The paper proposes SARA and Mo-SARA, which are adaptive low-rank methods for fine-tuning pre-trained models that adjust the rank based on layer performance and reduce parameters by selectively updating singular values.
http://arxiv.org/abs/2408.03287v1
Compressor summary: Key points: - The paper proposes a new method for detecting malicious behavior in large networks using graph data and neural networks - The method achieves high expressivity and scalability by modeling network entity interactions as a heterogeneous graph and performing local graph inference - The method outperforms the state-of-the-art PTP algorithm and generalizes well to new entities Summary: The paper introduces a novel graph-based neural network method for detecting malicious behavior in large networks, which is more expressive, scalable, and robust than existing approaches.
http://arxiv.org/abs/2408.03284v1
Compressor summary: The paper proposes ReSyncer, a framework that creates high-fidelity lip-synced videos with various features suitable for virtual presenters and performers.
http://arxiv.org/abs/2408.03282v1
Compressor summary: Key points: - The paper proposes a transformer-based model for image retrieval re-ranking with low memory usage (1KB per image) - The model can estimate asymmetric similarity between query and database images using local descriptors - The model adapts to different applications and outperforms current methods in both performance and memory efficiency Summary: The paper presents a transformer-based image retrieval re-ranking model that can estimate similarity with low memory, adapt to different applications, and achieve better performance than existing methods.
http://arxiv.org/abs/2408.03281v1
Compressor summary: StructEval is a new framework that evaluates large language models by conducting structured assessments across multiple cognitive levels and critical concepts, improving reliability and consistency in model evaluation.
http://arxiv.org/abs/2408.03256v1
Compressor summary: The paper proposes a synthetic data method to improve text-to-SQL models' domain generalization and error supervision, leading to SENSE, an open-source model that outperforms existing methods on SPIDER and BIRD benchmarks.
http://arxiv.org/abs/2408.03247v1
Compressor summary: The paper examines how Large Language Models use their stored knowledge for reasoning tasks, finding that they sometimes rely on shortcuts instead of recalling facts accurately, and explores techniques to improve their reasoning performance.
http://arxiv.org/abs/2408.03246v1
Compressor summary: The paper proposes Reasoning with Attributions, a method to improve language models' ability to reason across multiple steps and handle noisy contexts by asking them to explain their assertions.
http://arxiv.org/abs/2408.03236v1
Compressor summary: The paper proposes a DOA estimation algorithm for partially-calibrated sparse subarrays, using coarray properties to estimate more sources than physical sensors and outperforming other methods.
http://arxiv.org/abs/2408.03230v1
Compressor summary: The authors present MoCo v2, a framework that uses contrastive learning to represent image complexity and enhance computer vision tasks without expensive manual annotations or human biases.
http://arxiv.org/abs/2408.03223v1
Compressor summary: StreamiNNC optimizes computational efficiency for deep learning time-series processing with overlapping windows by exploiting shift-invariance and reducing zero-padding and pooling errors.
http://arxiv.org/abs/2408.03219v1
Compressor summary: The paper proposes a meta-learning transformer-based optimizer to enhance continual learning in machine learning models by selectively updating parameters and preventing unnecessary forgetting.
http://arxiv.org/abs/2408.03209v1
Compressor summary: IPAdapter-Instruct is a method that uses natural-image conditioning and Instruct prompts to switch between different image generation tasks with one model, improving efficiency and quality.
http://arxiv.org/abs/2408.03202v1
Compressor summary: The paper introduces DENN, a framework for multi-label text classification that mitigates embedding alignment and confidence estimation biases in the $k$NN approach by proposing debiased contrastive learning and confidence estimation strategies.
http://arxiv.org/abs/2408.03195v1
Compressor summary: RELIEF is a method that uses reinforcement learning to strategically add feature prompts to certain graph nodes to improve task performance and data efficiency.
http://arxiv.org/abs/2408.03193v1
Compressor summary: The paper proposes an online hard sample mining technique for efficient NeRF training that reduces compute time, memory usage, and improves view-synthesis quality.
http://arxiv.org/abs/2408.03178v1
Compressor summary: The "Object Images" approach creates realistic 3D models by representing complex shapes as 2D images, allowing for efficient image generation and PBR material support.
http://arxiv.org/abs/2408.03172v1
Compressor summary: The paper explores Parameter Efficient Fine-Tuning methods for low-resource Marathi BERT models and shows they can achieve competitive results with less computational cost.
http://arxiv.org/abs/2408.03164v1
Compressor summary: DCLS improves model interpretability by aligning it with human visual attention, and Threshold-Grad-CAM enhances interpretability further for some models.
http://arxiv.org/abs/2408.03156v1
Compressor summary: The study proposes a new CT reconstruction method that combines denoising diffusion models with iterative CT reconstruction, optimizing the fidelity loss and suppressing anatomical structure changes, resulting in high-quality images while preserving structures.
http://arxiv.org/abs/2408.03152v1
Compressor summary: The paper proposes a Two-Sided Constraint for Graph Convolutional Networks to address both causes of over-smoothing by using random masking and contrastive constraint, improving node discriminability and reducing convergence issues.
http://arxiv.org/abs/2408.03149v1
Compressor summary: EGMS is a multimodal summarization model that uses dual encoders and a gating mechanism to integrate fine-grained entity knowledge from images for enhanced textual summary generation.
http://arxiv.org/abs/2408.03143v1
Compressor summary: SuperSimpleNet is a discriminative model that detects surface defects on objects using normal or abnormal training images, improving performance and speed over previous methods.
http://arxiv.org/abs/2408.03130v1
Compressor summary: This literature review explores different techniques for compressing large language models, such as quantization, pruning, knowledge distillation, and architectural optimizations, and categorizes them into a taxonomy.
http://arxiv.org/abs/2408.03127v1
Compressor summary: The paper presents a method for classifying statements about clinical trial reports using Mistral-7B, an open-source large language model, with promising results and some limitations.
http://arxiv.org/abs/2408.03125v1
Compressor summary: COMMENTATOR is a tool that speeds up multilingual text annotation, especially for code-mixed languages like Hinglish.
http://arxiv.org/abs/2408.03120v1
Compressor summary: The paper introduces a new dataset with text descriptions for plant diseases, which can help recognize them better in real-world images and handle variability within and between classes.
http://arxiv.org/abs/2408.03119v1
Compressor summary: The paper introduces Euas-20, a dataset to evaluate large language models' translation abilities across different languages and the impact of pre-training data.
http://arxiv.org/abs/2408.03099v1
Compressor summary: The paper proposes FT-Topic, an unsupervised fine-tuning method for LLMs using sentence groups to improve topic modeling performance and introduce SenClu, a fast inference algorithm based on the approach.
http://arxiv.org/abs/2408.03094v1
Compressor summary: The 500xCompressor method compresses natural language contexts into one special token, improving inference speed, reducing costs, and enhancing user experience without requiring fine-tuning of the large language model.
http://arxiv.org/abs/2408.03093v1
Compressor summary: Key points: - Data-driven approach for learning robust MDP policies across unknown environments - Interval MDP model built from finite samples of trajectories - Synthesize one policy that meets requirements and bound its risk - Trade-off between performance and risk - Exploit state space, graph structure, and parametric structure knowledge Summary: The paper proposes a method to learn robust MDP policies using interval MDP models from finite samples of trajectories, synthesizing one policy that meets requirements and bounding its risk, while leveraging environment knowledge.
http://arxiv.org/abs/2408.03092v1
Compressor summary: The paper proposes WIDEN, a method to merge large language models with different parameter changes, improving their capabilities in various tasks.
http://arxiv.org/abs/2408.03079v1
Compressor summary: The paper proposes UniCE, a unified framework for event causality extraction that addresses complex causality, subtask interaction, and knowledge fusion, achieving state-of-the-art results and outperforming ChatGPT.
http://arxiv.org/abs/2408.03074v1
Compressor summary: The paper surveys how pragmatic abilities are tested in large language models, dividing them into discourse and interactional pragmatics, and discussing their analysis methods.
http://arxiv.org/abs/2408.03070v1
Compressor summary: This paper investigates how pretrained language models encode negation and its impact on neighboring words, finding that they capture negation scope and its effect on NPI licensing.
http://arxiv.org/abs/2408.03062v1
Compressor summary: This study trains an LSTM network on a custom dataset to explore ASC representation and processing in the brain, finding that the model can effectively differentiate between various construction types and may reflect linguistic processing in humans.
http://arxiv.org/abs/2408.03060v1
Compressor summary: The paper proposes Masked Gaussian Fields (MGFs), which generate accurate building surface reconstructions from images using multi-level masks and innovative losses, improving accuracy and efficiency over traditional methods and other state-of-the-art solutions.
http://arxiv.org/abs/2408.03043v1
Compressor summary: The paper introduces targeted visual prompting to improve region-based question answering in medical images using multimodal large language models.
http://arxiv.org/abs/2408.03033v1
Compressor summary: The article discusses L3iTC's participation in a financial text challenge, where they fine-tuned large language models for classification and summarization using low GPU memory and 4-bit quantization.
http://arxiv.org/abs/2408.03030v1
Compressor summary: The study proposes Fore-Background Contrast Attention (FBCA), which uses background information to improve pedestrian detection under low-light conditions, achieving state-of-the-art results on three datasets.
http://arxiv.org/abs/2408.03029v1
Compressor summary: The text proposes a novel reward shaping method for reinforcement learning that uses success rates from historical experiences and Beta distributions to balance exploration and exploitation efficiently.
http://arxiv.org/abs/2408.03014v1
Compressor summary: The paper proposes CKNN, a method for unsupervised video anomaly detection that filters out anomaly clusters in the training dataset, achieving better results than previous methods and comparable to those trained with anomaly-free data.
http://arxiv.org/abs/2408.03010v1
Compressor summary: The authors propose a hybrid system that combines large language models with knowledge graphs to improve factual correctness and completeness in answering natural language queries, especially for medical applications.
http://arxiv.org/abs/2408.03006v1
Compressor summary: The paper proposes a dual-path collaborative generation network that dynamically perceives and generates emotional captions for videos, balancing factual content and emotional cues.
http://arxiv.org/abs/2408.02993v1
Compressor summary: The paper proposes DreamLCM, a method to improve text-to-3D quality by incorporating a Latent Consistency Model, and introduces two strategies to enhance generation further.
http://arxiv.org/abs/2408.02987v1
Compressor summary: The paper introduces a Compact-Dynamic Graph Convolutional Network (CDGCN) that uses a unified tensor graph convolution framework to simultaneously process spatial and temporal patterns for spatiotemporal signal recovery, achieving better results than existing models.
http://arxiv.org/abs/2408.02983v1
Compressor summary: Diffusion-based Feature Replay (DiffFR) is a simple and effective method for non-exemplar class-incremental learning that uses self-supervised learning and prototype calibration to reduce catastrophic forgetting and improve feature representation.
http://arxiv.org/abs/2408.02976v1
Compressor summary: The EmpRL framework uses reinforcement learning to generate empathetic responses in dialogue systems by training a T5 model with an empathy reward function based on three communication mechanisms.
http://arxiv.org/abs/2408.02971v1
Compressor summary: WINO is a fast and accurate surrogate solver that interpolates electric field predictions across a wide range of wavelengths using Fourier Group Convolution Shuffling and conditioning techniques.
http://arxiv.org/abs/2408.02970v1
Compressor summary: EC-Guide is a model-agnostic e-commerce guide that improves the performance of large language models in various tasks by tuning instructions and quantizing them, achieving top ranks at the Amazon KDD Cup'24.
http://arxiv.org/abs/2408.02966v1
Compressor summary: The paper proposes a method to compress irregular point clouds using adaptive conditional probability modeling and a dual-layer architecture with implicit neural representation, achieving high rate-distortion performance, low complexity, and arbitrary-scale upsampling.
http://arxiv.org/abs/2408.02965v1
Compressor summary: The paper presents a new data-driven framework for building stochastic and non-local closure models for complex dynamical systems using conditional diffusion model and neural operator, improving their performance in real-world applications.
http://arxiv.org/abs/2408.02964v1
Compressor summary: The paper evaluates three large language models' accuracy and consistency in answering 1050 nutrition questions using different prompts and techniques, finding that GPT-4o with CoT-SC prompting performed best overall and Gemini 1.5 Pro with ZS had the highest consistency.
http://arxiv.org/abs/2408.02960v1
Compressor summary: ADDRESS is a new MAPF-LNS variant that uses restricted Thompson Sampling to adaptively select promising destroy heuristics for large-scale multi-agent path finding, achieving cost improvements of 50% or more.
http://arxiv.org/abs/2408.02957v1
Compressor summary: MATR is a memory-augmented transformer that leverages long-term context for online temporal action localization by selectively preserving past segment features and predicting action start and end times.
http://arxiv.org/abs/2408.02954v1
Compressor summary: FakeMix is a new benchmark for detecting manipulated segments in videos and audio, revealing limitations of existing deepfake detection models.
http://arxiv.org/abs/2408.02948v1
Compressor summary: The study explores if occupations have typical genders like bananas have typical colors, but finds that gender mentioning is more related to the femaleness of an occupation than typicality.
http://arxiv.org/abs/2408.02945v1
Compressor summary: The paper explores self-supervised learning with wav2vec 2.0 for multi-channel end-to-end ASR and finds feature-wise quantization to be the most effective pre-training method.
http://arxiv.org/abs/2408.02936v1
Compressor summary: The paper proposes a method to improve ensemble learning by using confidence tensors to integrate weak base learners more efficiently and enhancing generalization performance with a smooth convex objective function.
http://arxiv.org/abs/2408.02932v1
Compressor summary: The paper proposes ANCMM, an algorithm that learns sparse, doubly stochastic similarity graphs for clustering problems using Marcus mapping, which relates to optimal transport efficiency.
http://arxiv.org/abs/2408.02930v1
Compressor summary: The text discusses how the "small agent, big world" frame motivates the need for continual learning and proposes two desiderata for designing better synthetic environments to evaluate agents' performance.
http://arxiv.org/abs/2408.02927v1
Compressor summary: The paper introduces HARMONIC, a new framework that uses large language models (LLMs) to generate realistic and private tabular data by fine-tuning them with k-nearest neighbors algorithm and evaluating their performance and privacy risks with specific metrics.
http://arxiv.org/abs/2408.02923v1
Compressor summary: The paper introduces a new method called intermediate direct preference optimization (DPO) for fine-tuning large language models that uses losses from multiple intermediate layers to improve performance.
http://arxiv.org/abs/2408.02919v1
Compressor summary: The paper proposes a data checklist, a set of principled unit tests based on V-information literature, to systematically evaluate datasets for annotation artifacts and improve LLM alignment.
http://arxiv.org/abs/2408.02907v1
Compressor summary: IIER is a new framework for question-answering tasks that uses chunk interactions to enhance retrieval and improve performance.
http://arxiv.org/abs/2408.02906v1
Compressor summary: The paper proposes dual-view pyramid pooling (DVPP), a new method to aggregate features in deep neural networks for better medical image classification and confidence calibration by combining spatial pooling and cross-channel pooling.
http://arxiv.org/abs/2408.02904v1
Compressor summary: The paper presents a two-stage framework for accurate Egyptian license plate recognition using image processing and deep learning, with applications in traffic management.
http://arxiv.org/abs/2408.02901v1
Compressor summary: Lighthouse is a user-friendly library for video moment retrieval and highlight detection, addressing the issues of lack of comprehensive experiments and user-unfriendly design in the research community.
http://arxiv.org/abs/2408.02899v1
Compressor summary: The paper proposes a method to represent stocks as vectors using both text and network data, and shows that it improves performance on related tasks and fund creation in wealth management.
http://arxiv.org/abs/2408.02897v1
Compressor summary: The paper proposes a metric-driven approach to choosing low precision numerics for deep learning models, which can reduce hardware costs and enable scaling of training.
http://arxiv.org/abs/2408.02891v1
Compressor summary: Our new technique uses conditional diffusion models to create diverse and semantically consistent augmented images for better object detection.
http://arxiv.org/abs/2408.02888v1
Compressor summary: VizECGNet is a multi-modal deep learning model that uses only printed ECG images to diagnose cardiovascular diseases, outperforming signal-based models with higher precision, recall, and F1-Score.
http://arxiv.org/abs/2408.02879v1
Compressor summary: The authors propose a realistic humanoid agent that can communicate using speech, body movements, and manipulate objects in real-time, integrating audio and visual inputs from large language model.