This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-26 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.17777v1
Compressor summary: Text-Animator is a novel method for generating videos with accurate and coherent visual texts by controlling camera movement and refining text motions.
http://arxiv.org/abs/2406.17774v1
Compressor summary: The authors propose a fast and accurate method to estimate material properties of objects under uncontrolled lighting using signal-processing techniques, while also quantifying uncertainty for improved acquisition quality.
http://arxiv.org/abs/2406.17770v1
Compressor summary: MG-LLaVA is a multi-modal large language model that enhances visual processing by using multi-granularity features and outperforms existing models on perception tasks.
http://arxiv.org/abs/2406.17764v1
Compressor summary: The authors introduce BMIKE-53, a benchmark for evaluating cross-lingual knowledge editing methods on 53 languages, and propose MIKE, a gradient-free method that shows promising results.
http://arxiv.org/abs/2406.17763v1
Compressor summary: The paper presents DiffusionPDE, a method that uses generative diffusion models to solve partial differential equations (PDEs) with missing information, achieving better results than existing methods.
http://arxiv.org/abs/2406.17762v1
Compressor summary: The authors use various ATP and AI methods to solve over 3000 previously unsolved Mizar problems, increasing the percentage of ATP-solved Mizar problems from 75% to above 80%.
http://arxiv.org/abs/2406.17761v1
Compressor summary: CaLMQA is a diverse dataset of complex questions in 23 languages that reveals limitations of large language models in handling low-resource, culturally specific questions.
http://arxiv.org/abs/2406.17759v1
Compressor summary: This paper trains sparse autoencoders on attention layer outputs in transformers to decompose them into interpretable features, discovering different feature families and roles, and using them to better understand and explain model behavior.
http://arxiv.org/abs/2406.17758v1
Compressor summary: MotionBooth is a framework that animates customized subjects with precise control over their shape, attributes, and motions using text-to-video models and training-free techniques.
http://arxiv.org/abs/2406.17755v1
Compressor summary: TrialMind is a generative AI pipeline for conducting medical systematic reviews, using large language models and human expert oversight, that outperforms traditional methods in literature search, screening, and data extraction.
http://arxiv.org/abs/2406.17753v1
Compressor summary: The authors study how well Large Language Models (LLMs) can produce persuasive text and create a new dataset, Persuasive-Pairs, to measure and compare LLMs' abilities across various domains.
http://arxiv.org/abs/2406.17748v1
Compressor summary: Shampoo is an optimization algorithm that approximates the Gauss-Newton component or the covariance matrix using a Kronecker product, and its approximation is close to the optimal one.
http://arxiv.org/abs/2406.17746v1
Compressor summary: The text proposes a taxonomy for memorization in language models, considering various factors that affect each type of memorization and using it to build a predictive model.
http://arxiv.org/abs/2406.17744v1
Compressor summary: Instruction-length controlling models perform better than standard models in evaluations that consider response length.
http://arxiv.org/abs/2406.17741v1
Compressor summary: Point-SAM is a transformer-based 3D model that leverages 2D knowledge from SAM to segment point clouds with part-level and object-level annotations, achieving state-of-the-art performance on various benchmarks.
http://arxiv.org/abs/2406.17740v1
Compressor summary: The authors propose a new framework for fine-tuning large Transformer models using structured unrestricted-rank matrices, which offer more flexibility and parameter efficiency than existing methods like Adapters and LoRA.
http://arxiv.org/abs/2406.17739v1
Compressor summary: ATTEMPT is a novel method for updating taxonomies by inserting new concepts at the appropriate position using pre-trained language models.
http://arxiv.org/abs/2406.17737v1
Compressor summary: The study examines how the quality of responses from large language models varies depending on a user's English proficiency, education level, and country of origin, finding that these models are less reliable for users with lower proficiency or education, and those outside the US.
http://arxiv.org/abs/2406.17720v1
Compressor summary: Arboretum is a large dataset of diverse species images from iNaturalist with rich annotations for AI applications in biodiversity assessment and agriculture research.
http://arxiv.org/abs/2406.17718v1
Compressor summary: The paper examines how auxiliary tasks like observation reconstruction and latent self-prediction affect representation learning in reinforcement learning, and shows that latent self-prediction is more helpful as an auxiliary task than observation reconstruction when dealing with distractions and non-linear functions.
http://arxiv.org/abs/2406.17716v1
Compressor summary: The ViANLI dataset is an adversarial NLP dataset for Vietnamese natural language inference that challenges current machine learning models and improves their performance.
http://arxiv.org/abs/2406.17714v1
Compressor summary: The paper proposes a modular, compositional approach to estimate individual treatment effects in structured systems composed of multiple heterogeneous components, with benefits such as systematic generalization and improved overlap guarantees.
http://arxiv.org/abs/2406.17711v1
Compressor summary: JEST is an algorithm that selects batches of data jointly and improves training speed and efficiency in multimodal contrastive learning.
http://arxiv.org/abs/2406.17707v1
Compressor summary: The authors propose a new method to estimate forces in robotic surgery using video data and frequency domain analysis of organ motion.
http://arxiv.org/abs/2406.17697v1
Compressor summary: HGTDP-DTA is a novel method for predicting drug target binding affinity using dynamic prompts and a hybrid Graph-Transformer architecture that integrates structural, sequence, and contextual information.
http://arxiv.org/abs/2406.17692v1
Compressor summary: Alignment changes large language models' output distribution, but the effects are mostly superficial and can be replicated without fine-tuning.
http://arxiv.org/abs/2406.17688v1
Compressor summary: UMD is a new auto-encoder that combines patch-based and noise-based image corruption techniques, leading to improved generative and representation learning performance.
http://arxiv.org/abs/2406.17681v1
Compressor summary: The authors propose a method to dynamically evaluate language models by variabilizing benchmarks and sampling new values from test cases, ensuring fresh evaluations and reducing data contamination.
http://arxiv.org/abs/2406.17680v1
Compressor summary: UAD is a vision-based autonomous driving method that uses unsupervised learning to reduce annotation requirements, computation overhead, and improve performance in nuScenes and CARLA.
http://arxiv.org/abs/2406.17679v1
Compressor summary: The LoGoCAF framework uses a two-branch semantic segmentation architecture with local-to-global encoder and MLP decoder to fuse hyperspectral and X-modality data for efficient, accurate, and generalizable classification.
http://arxiv.org/abs/2406.17675v1
Compressor summary: The paper presents a framework to study psychological attributes in large language models, creating a benchmark with six dimensions, and finds discrepancies between self-reported traits and real-world behaviors.
http://arxiv.org/abs/2406.17673v1
Compressor summary: The paper introduces LaTable, a novel diffusion model for generating tabular data that works across different datasets and improves out-of-distribution performance, while exploring its limitations in zero-shot settings.
http://arxiv.org/abs/2406.17663v1
Compressor summary: LLM-ARC combines a large language model with an automated reasoning critic to improve logical reasoning and achieve state-of-the-art accuracy on the FOLIO benchmark.
http://arxiv.org/abs/2406.17660v1
Compressor summary: Grass is a novel sparse projection-based optimization method that reduces memory usage and computational costs for large language model training, enabling half-precision pretraining on a 13B parameter model with a $2 imes$ throughput improvement.
http://arxiv.org/abs/2406.17659v1
Compressor summary: DKPROMPT combines vision-language models with domain knowledge in PDDL to improve robot task planning in open worlds.
http://arxiv.org/abs/2406.17650v1
Compressor summary: ELIZA, the first chatbot created by Joseph Weizenbaum in the early 1960s, was actually meant for research on human-machine conversation, but its accidental release and misunderstanding led to its fame as a chatbot and loss of the original source for over 50 years.
http://arxiv.org/abs/2406.17649v1
Compressor summary: The paper proposes a meta algorithm to make any Reinforcement Learning (RL) algorithm privacy-preserving in the setting of population processes, such as controlling epidemics, and shows that it achieves reasonable trade-offs between privacy and utility.
http://arxiv.org/abs/2406.17647v1
Compressor summary: Variationist is a new tool that helps researchers explore and visualize language variation and bias across multiple variables and metrics.
http://arxiv.org/abs/2406.17642v1
Compressor summary: Large Language Models (LLMs) often generate false information, and this study explores why traditional methods fail to prevent it and proposes a new model called Lamini-1 that uses multiple memory experts to store facts and reduce hallucinations.
http://arxiv.org/abs/2406.17640v1
Compressor summary: BayTTA is a framework that optimizes test-time augmentation for computer vision tasks using Bayesian Model Averaging, improving accuracy and robustness on various medical image analysis and gene editing datasets.
http://arxiv.org/abs/2406.17639v1
Compressor summary: AlignCLIP is a method to improve cross-modal alignment in CLIP embeddings by sharing parameters and separating uni-modal embeddings, reducing the modality gap while maintaining performance on various tasks.
http://arxiv.org/abs/2406.17636v1
Compressor summary: The proposed method improves text-to-image diffusion models by using a perceptual objective in the U-Net embedding space, leading to better human preference alignment and reduced computational cost.
http://arxiv.org/abs/2406.17633v1
Compressor summary: The study shows that using large language models to generate training labels can replace human annotations for text classification tasks in computational social science, leading to similar performance with faster and cheaper methods.
http://arxiv.org/abs/2406.17628v1
Compressor summary: The text proposes a method called ViLocal that uses contrastive learning to identify inpainted regions in videos and localize them using a 3D Uniformer encoder and a lightweight convolution decoder.
http://arxiv.org/abs/2406.17626v1
Compressor summary: The text discusses a study on large language models' safety in multi-turn dialogue coreference and reveals their vulnerability to such attacks.
http://arxiv.org/abs/2406.17624v1
Compressor summary: This paper reviews current research on personality in large language models, categorizing studies into self-assessment, exhibition, and recognition, and providing a comprehensive overview of findings, challenges, resources, and future directions.
http://arxiv.org/abs/2406.17617v1
Compressor summary: The research introduces an embedded neuromorphic testbench using SPLEAT accelerator to train and deploy efficient SNNs for event-based object detection on low-power hardware.
http://arxiv.org/abs/2406.17614v1
Compressor summary: The paper proposes a regularization technique called MSRS for training speech recognition models from scratch, which reduces costs and improves performance.
http://arxiv.org/abs/2406.17611v1
Compressor summary: The paper proposes a variable compression scheme for distributed GNN training that reduces communication volume without sacrificing accuracy, and shows its effectiveness in empirical results.
http://arxiv.org/abs/2406.17608v1
Compressor summary: The paper proposes using a generative model to create multiple views of test images for medical image segmentation, improving performance and error estimation.
http://arxiv.org/abs/2406.17601v1
Compressor summary: Director3D is a framework for generating realistic 3D scenes and camera trajectories from textual descriptions, using a combination of transformers, diffusion models, and refinement losses.
http://arxiv.org/abs/2406.17600v1
Compressor summary: The study suggests using expert labels and explanations with LLMs to approximate human label variation in NLI tasks, improving the scalability of annotations.
http://arxiv.org/abs/2406.17591v1
Compressor summary: DocParseNet combines deep learning and multi-modal learning to improve text and image recognition in scanned documents, achieving high accuracy and efficiency for real-world document processing applications.
http://arxiv.org/abs/2406.17588v1
Compressor summary: LongIns is a new benchmark dataset that tests large language models' long-context and reasoning abilities in various settings, revealing their limitations in handling short context windows.
http://arxiv.org/abs/2406.17585v1
Compressor summary: The paper provides a guide on learning Dynamic Bayesian Networks from multiple trajectory samples, covering formalism, structure-weight interdependence, learning methods, and optimization functions, with comparisons of various algorithms.
http://arxiv.org/abs/2406.17583v1
Compressor summary: The authors propose a categorical approach to define and compare AI models' interpretability using string diagrams, revealing common themes in XAI and demonstrating explainability benefits of compositionally-interpretable models.
http://arxiv.org/abs/2406.17575v1
Compressor summary: The paper proposes a continual learning method for universal 3D medical image registration using meta-learning and sharpness-aware meta-continual learning, achieving better or comparable results to sequential or centralized multi-task training strategies on four datasets.
http://arxiv.org/abs/2406.17574v1
Compressor summary: The research introduces a new text-to-SQL dataset for IoT devices and shows that joint training on query generation and data inference improves performance.
http://arxiv.org/abs/2406.17566v1
Compressor summary: The authors create a French dataset for evaluating and improving toxicity detection in language models, as current efforts mainly focus on English.
http://arxiv.org/abs/2406.17563v1
Compressor summary: This paper evaluates activation steering methods for language models and proposes Dynamic Activation Composition, an approach to modulate steering intensity based on multiple properties during generation.
http://arxiv.org/abs/2406.17559v1
Compressor summary: Edge tuning uses pretrained models as feature extractors on cloud servers and fine-tunes small networks on edge devices with minimal information transfer and high adaptation capability using MIET.
http://arxiv.org/abs/2406.17557v1
Compressor summary: FineWeb is a large pretraining dataset for language models that outperforms other open datasets and reveals insights into data curation strategies.
http://arxiv.org/abs/2406.17553v1
Compressor summary: The paper explores using large language models to predict actions in Minecraft collaborative building with few-shot prompts and analyzes performance gaps.
http://arxiv.org/abs/2406.17542v1
Compressor summary: CDQuant is a simple and scalable alternative to GPTQ that outperforms it in compressing large language models with minimal impact on performance.
http://arxiv.org/abs/2406.17541v1
Compressor summary: Key points: - Method for generating synthetic dataset for semantic segmentation using latent diffusion model - No need for additional segmentation models - Part of submission to CVPR 2024 workshop challenge - Self-attentions for semantic information condensation - Non-prompt-influencing cross-attentions for mask classification - Mask refinement step using only output image by Stable Diffusion Summary: The authors present a method to create synthetic images with segmented objects using a latent diffusion model, without extra segmentation models, and propose various attention mechanisms and a mask refinement step for the CVPR 2024 workshop challenge.
http://arxiv.org/abs/2406.17538v1
Compressor summary: The paper proposes a novel network that uses motion magnification, channel attention, temporal modeling, and self-knowledge distillation to enhance micro-expression recognition performance.
http://arxiv.org/abs/2406.17537v1
Compressor summary: The text proposes a semi-supervised deep learning method called SincVAE for detecting epileptic seizures in EEG data, which can identify early and late stages of seizures.
http://arxiv.org/abs/2406.17535v1
Compressor summary: The text introduces a structured benchmark using INVALSI tests to evaluate Large Language Models in Italian, providing a reference point for researchers and assessing their performance against human results.
http://arxiv.org/abs/2406.17534v1
Compressor summary: The authors propose a framework that combines in-context learning with large language models to improve few-shot hierarchical text classification by using a retrieval database, label-aware representations, and a novel contrastive learning objective.
http://arxiv.org/abs/2406.17532v1
Compressor summary: Large language models can understand some aspects of Description Logic ontologies, but struggle with others like transitivity and handling large amounts of data.
http://arxiv.org/abs/2406.17530v1
Compressor summary: The Point Tree Transformer (PTT) is a novel transformer-based approach for point cloud registration that efficiently extracts local and global features while maintaining linear computational complexity by constructing hierarchical feature trees and using a new Point Tree Attention mechanism.
http://arxiv.org/abs/2406.17526v1
Compressor summary: LumberChunker is a method that uses an LLM to dynamically segment documents for dense retrieval, and it outperforms other methods on the GutenQA benchmark.
http://arxiv.org/abs/2406.17523v1
Compressor summary: The paper studies how reliable hyper-parameter selection affects value-based deep reinforcement learning agents and introduces a new score to measure consistency.
http://arxiv.org/abs/2406.17520v1
Compressor summary: The authors propose a multimodal approach using vision-based retrieval and language-based reasoning for visual place recognition in robotics, without requiring supervised training.
http://arxiv.org/abs/2406.17519v1
Compressor summary: The paper proposes a decoding method to improve retrieval-augmented LLMs by prioritizing relevant and reliable external knowledge, reducing distractibility issues.
http://arxiv.org/abs/2406.17518v1
Compressor summary: The text proposes a method to build causal knowledge networks for effective adaptive learning systems using Bayesian networks and recommendations based on human-centered explainable AI in education.
http://arxiv.org/abs/2406.17517v1
Compressor summary: The authors propose a technique to improve graph autoencoders by transferring node similarity knowledge from raw graphs to reconstructed graphs using a KL constraint, enhancing their distinctiveness and performance.
http://arxiv.org/abs/2406.17513v1
Compressor summary: The study examines how different language model characteristics affect their ability to represent mental states and reason about them, using probes and prompt variations.
http://arxiv.org/abs/2406.17503v1
Compressor summary: WAVE is a multitasking method that initializes variable-sized models with weight templates learned from pre-trained models, improving efficiency and performance.
http://arxiv.org/abs/2406.17484v1
Compressor summary: The paper proposes a two-stage fine-tuning pipeline for large language models to improve their performance on diverse medical tasks by encoding knowledge and filtering noise, as well as aligning the model with task-specific data.
http://arxiv.org/abs/2406.17483v1
Compressor summary: TRIP is a hardware-efficient hard attention framework for event-based vision processing on neuromorphic processors that produces low-resolution ROIs for efficient and accurate classification, achieving state-of-the-art accuracies and significant improvements in computation, latency, and energy.
http://arxiv.org/abs/2406.17474v1
Compressor summary: The study explores transformer-based models for named entity recognition, finding that combining different data representation strategies improves performance across multiple languages and datasets.
http://arxiv.org/abs/2406.17472v1
Compressor summary: The authors present a new Image Quality Assessment dataset with 6073 high-resolution, aesthetic images, annotated by experts and enriched with metadata, to improve perceptual image quality evaluation research.
http://arxiv.org/abs/2406.17469v1
Compressor summary: The paper proposes S2-ShadowNet, a network that uses both visible and infrared images for shadow removal, by learning cross-domain mapping and exploiting a spherical feature space with similarity and orthogonality losses.
http://arxiv.org/abs/2406.17467v1
Compressor summary: The text describes how deep neural networks initially learn the optimal constant solution (OCS), which is a pattern in the target labels, before adapting to more complex functions during training. This OCS phase is observed not only in linear networks but also in nonlinear ones and human learners, suggesting it as a universal learning principle.
http://arxiv.org/abs/2406.17465v1
Compressor summary: The paper proposes a method to improve tool learning for large language models using iterative feedback between the tool usage model and the tool retriever model, addressing challenges such as complex user instructions and misalignment between models.
http://arxiv.org/abs/2406.17462v1
Compressor summary: TDL is a method to visualize and understand the data evolution in diffusion models by embedding high-dimensional samples into a lower-dimensional space preserving their relations and evolutionary structure.
http://arxiv.org/abs/2406.17460v1
Compressor summary: The paper compares different self-supervised learning tasks for vision transformers and introduces a framework using masked image modelling and clustering that performs well on low-shot downstream tasks.
http://arxiv.org/abs/2406.17458v1
Compressor summary: The paper proposes a deep learning method using self-attention and Markov networks for continuous urban change detection from satellite image time series, achieving promising results.
http://arxiv.org/abs/2406.17456v1
Compressor summary: The paper proposes a contextual augmentation method for creating synthetic data in Grammatical Error Correction, which improves error distribution consistency and uses relabeling to reduce noisy labels.
http://arxiv.org/abs/2406.17453v1
Compressor summary: The paper proposes a method to improve LLM-generated questions' informativeness for 20-question game dialogues using Direct Preference Optimization on question pairs.
http://arxiv.org/abs/2406.17450v1
Compressor summary: The paper proposes an enhanced Masked Autoencoder model that uses token-level reconstruction and pseudo labeling with a decoupled teacher network to improve performance on various image tasks.
http://arxiv.org/abs/2406.17443v1
Compressor summary: The paper introduces biomechanical notions to convert keypoint data into joint angles that are suitable for machine learning and interpretation by humans in various applications like sports and medicine.
http://arxiv.org/abs/2406.17442v1
Compressor summary: Mamba is a new architecture that uses state space models to improve 3D point cloud semantic segmentation with linear complexity and strong global modeling capability.
http://arxiv.org/abs/2406.17438v1
Compressor summary: Implicit-Zoo is a large dataset for neural implicit functions that improves performance in computer vision tasks like image classification, semantic segmentation, and 3D pose regression.
http://arxiv.org/abs/2406.17437v1
Compressor summary: Key points: - Paper proposes a novel recognition-based approach for question-answering handwritten documents - Model uses transformer-based document retrieval and ensemble methods - Achieves state-of-the-art results on HW-SQuAD and BenthamQA datasets - Code and trained models will be publicly available Summary: The paper presents a novel recognition-based approach that uses transformer-based document retrieval and ensemble methods to improve question-answering on handwritten documents, achieving state-of-the-art results and releasing code and models.
http://arxiv.org/abs/2406.17433v1
Compressor summary: This paper studies how data balancing can affect fairness and robustness in machine learning, and emphasizes the need to consider the causal graph for effective mitigation strategies.
http://arxiv.org/abs/2406.17430v1
Compressor summary: The paper proposes a taxonomy of speech-specific risks, such as sarcasm, imitation, and biases, and evaluates LMMs' effectiveness in detecting them.
http://arxiv.org/abs/2406.17427v1
Compressor summary: The Extreme Learning Machine (ELM) lacks rigorous mathematical justification, and we refute its proofs, create a counterexample dataset, and offer alternative foundational statements.
http://arxiv.org/abs/2406.17419v1
Compressor summary: Loong is a novel benchmark for evaluating large language models in realistic long-context scenarios through extended multi-document question answering with diverse tasks and context lengths.
http://arxiv.org/abs/2406.17418v1
Compressor summary: The paper introduces a new framework, SE-VGAE, that uses unsupervised disentangled representation learning to generate and interpret architectural layout graphs from floor plan images.
http://arxiv.org/abs/2406.17415v1
Compressor summary: The authors propose a variable quantization approach for large language models, where different layers are quantized at varying bit levels based on their importance, resulting in minimal performance drop and compressed model size.
http://arxiv.org/abs/2406.17414v1
Compressor summary: Our method, based on Deep Sets, estimates the essential matrix by identifying outliers and modeling noise in point matches, outperforming complex networks.
http://arxiv.org/abs/2406.17413v1
Compressor summary: The Depth-Guided Semi-Supervised Instance Segmentation framework uses depth maps to generate precise contours for distinct instances, overcoming limitations of RGB information, and achieves better performance than previous methods.
http://arxiv.org/abs/2406.17405v1
Compressor summary: The paper investigates how demographic biases, especially stereotypical ones, in facial expression recognition datasets affect machine learning models' predictions.
http://arxiv.org/abs/2406.17404v1
Compressor summary: The Make Some Noise (MSN) training framework improves parallel decoding and inference speed of large language models without sacrificing performance by introducing noise and using a tree-based retrieval-augmented decoding strategy.
http://arxiv.org/abs/2406.17399v1
Compressor summary: The study analyzes how to improve the quality of samples from a specific type of model using different techniques, focusing on the stability of gradients for better guidance.
http://arxiv.org/abs/2406.17396v1
Compressor summary: SyncNoise is a novel approach for consistent 3D scene editing using 2D diffusion models, achieving high-quality results with global and local consistency across multiple viewpoints.
http://arxiv.org/abs/2406.17385v1
Compressor summary: The study shows that LLMs give lower-quality or factually incorrect responses to non-native English speakers more frequently than to native speakers, and this difference persists across different regions.
http://arxiv.org/abs/2406.17382v1
Compressor summary: This paper tests seven human pose estimation methods on infant videos and finds ViTPose to be the best performer for understanding infant movement in natural settings.
http://arxiv.org/abs/2406.17381v1
Compressor summary: ILR is a new continual learning approach that uses rectifier units to correct and update old task representations in deep neural networks, improving performance on several benchmarks.
http://arxiv.org/abs/2406.17378v1
Compressor summary: The text describes an interesting phenomenon in large language models where the text embeddings can be aligned with key tokens, and shows its potential applications in information retrieval and understanding fuzzy concepts.
http://arxiv.org/abs/2406.17377v1
Compressor summary: The text explores three cross-lingual methods to adapt an English-dominated LLM to Indic languages and finds that additional supervisory signals and continued pre-training in one low-resource language help improve performance.
http://arxiv.org/abs/2406.17375v1
Compressor summary: The authors create a Bangla dataset for measuring gender bias in pretrained language models and show that context length affects bias metrics, calling for more nuanced analysis.
http://arxiv.org/abs/2406.17374v1
Compressor summary: The paper proposes a mathematical formalization of experimental studies in machine learning and develops a quantifiable measure of generalizability, which is applied to two benchmarks and can be used with a provided Python module.
http://arxiv.org/abs/2406.17363v1
Compressor summary: The paper presents Irish-to-English speech translation systems using Whisper with various data augmentation techniques to improve performance.
http://arxiv.org/abs/2406.17346v1
Compressor summary: SCORE is a new tool for machine learning applications that helps users understand how uncertain their predictions are by using stacked confusion matrices and reject curves.
http://arxiv.org/abs/2406.17345v1
Compressor summary: The paper introduces NerfBaselines, a framework to simplify installation and evaluation of novel view synthesis methods, and provides a web platform for comparison.
http://arxiv.org/abs/2406.17343v1
Compressor summary: The paper proposes Q-DiT, a technique to improve image synthesis quality and efficiency for diffusion transformers by fine-grained quantization and other techniques.
http://arxiv.org/abs/2406.17342v1
Compressor summary: Point-MAGE is a framework that leverages generative modeling and representation learning for point cloud data, achieving state-of-the-art performance in various tasks.
http://arxiv.org/abs/2406.17341v1
Compressor summary: ConStruct is a novel framework that allows hard-constraining graph diffusion models to incorporate specific properties such as planarity or acyclicity, ensuring valid graphs for practical applications.
http://arxiv.org/abs/2406.17328v1
Compressor summary: DSKD is a novel framework to compress large language models by unifying their output spaces and aligning their representations using cross-model attention.
http://arxiv.org/abs/2406.17326v1
Compressor summary: The study applies reinforcement learning (RL) to evolutionary game theory, using the SARSA algorithm to model how cooperative behavior emerges and changes among self-interested agents.
http://arxiv.org/abs/2406.17324v1
Compressor summary: The study examines the use of large language models in astronomy papers and finds a significant increase in words favored by ChatGPT, suggesting widespread adoption and need for ethical guidelines.
http://arxiv.org/abs/2406.17323v1
Compressor summary: Key points: - Scattered light artefacts in astronomical observations are problematic and need automated detection - A dataset of images with different types of artefacts from the XMM-Newton space telescope is presented - A hybrid model combining CNNs and transformers is used to detect and mask artefacts using instance segmentation - The method and dataset can help advance artefact detection in astronomical observations Summary: The authors present a new dataset and a hybrid model that can detect and mask scattered light artefacts in astronomical images from the XMM-Newton space telescope using instance segmentation.
http://arxiv.org/abs/2406.17322v1
Compressor summary: ALPBench is a tool for benchmarking and comparing active learning pipelines, which consists of 86 real-world datasets and supports various query strategies and learning algorithms.
http://arxiv.org/abs/2406.17319v1
Compressor summary: The paper introduces a new dual-channel modality fusion network (DMF-Net) for completing partial point clouds using image guidance, which performs better than existing methods.
http://arxiv.org/abs/2406.17312v1
Compressor summary: The paper proposes a method to select which response pairs to annotate for iterative preference learning, considering uncertainty and distribution shifts, to achieve better performance with less annotation cost.
http://arxiv.org/abs/2406.17309v1
Compressor summary: MM-Screenplayer is an advanced system that converts long videos into textual screenplays by organizing them into scenes and using a "Look Back" strategy to validate uncertain information, achieving high scores in a challenge.
http://arxiv.org/abs/2406.17305v1
Compressor summary: The paper explores using Retrieval Augmented Instruction Tuning (RA-IT) for information extraction in open named entity recognition tasks, showing its effectiveness across languages and data sizes.
http://arxiv.org/abs/2406.17304v1
Compressor summary: The paper explores using large language models (LLMs) for evaluating dialogue quality, finding that larger models, algorithmic example selection, chain-of-thought reasoning, and fine-tuning improve performance.
http://arxiv.org/abs/2406.17300v1
Compressor summary: CausalScore is a novel metric that measures the relevance of responses in open-domain dialogues by estimating the causal strength between dialogue histories and responses, outperforming existing metrics in aligning with human judgements.
http://arxiv.org/abs/2406.17298v1
Compressor summary: The paper investigates the high computational cost of differentially private deep learning training and compares methods to reduce it.
http://arxiv.org/abs/2406.17297v1
Compressor summary: OS-Det3D is a two-stage framework that uses a novel 3D Object Discovery Network and Joint Objectness Selection module to improve camera 3D object detection for both known and unknown objects.
http://arxiv.org/abs/2406.17296v1
Compressor summary: BlockLLM reduces memory requirements for training large language models by selecting and updating a small subset of parameters without changing the model architecture or training procedure.
http://arxiv.org/abs/2406.17294v1
Compressor summary: The authors create MathV360K, a diverse multimodal dataset for image instruction fine-tuning, and introduce Math-LLaVA, a model that improves multimodal mathematical reasoning with this dataset.
http://arxiv.org/abs/2406.17287v1
Compressor summary: The study shows that Large Language Models can predict personality traits from counseling dialogues using role-play simulations and questionnaires, outperforming previous methods.
http://arxiv.org/abs/2406.17283v1
Compressor summary: The paper introduces a recursive encoding system for cuneiform signs that simplifies sign lookup, allows for computer processing, and enables new methods of rendering and displaying signs and tablets.
http://arxiv.org/abs/2406.17282v1
Compressor summary: SetBERT is a small and effective BERT-based model for enhancing logic-structured queries by using inversed-contrastive loss and outperforming BERT-base.
http://arxiv.org/abs/2406.17281v1
Compressor summary: The paper presents two new methods to improve Graph Neural Networks (GNNs) by dynamically adjusting node distances and local graph structures for better representation and aggregation of complex and dynamic graphs.
http://arxiv.org/abs/2406.17276v1
Compressor summary: Speculative decoding uses adaptive draft trees to generate multiple tokens per step, improving inference efficiency of autoregressive language models.
http://arxiv.org/abs/2406.17274v1
Compressor summary: The paper introduces a comprehensive benchmark for evaluating uncertainty estimation in text summarization, incorporating various NLG metrics and human annotations.
http://arxiv.org/abs/2406.17272v1
Compressor summary: The paper proposes a solution to improve speech recognition by connecting speech encoders to large language models with fine-tuning schemes, modality alignment enhancement, and methods to reduce insertion errors.
http://arxiv.org/abs/2406.17271v1
Compressor summary: The authors propose a method called DARG to generate dynamic and diverse evaluation data for Large Language Models, revealing their performance and bias patterns under different complexity levels.
http://arxiv.org/abs/2406.17265v1
Compressor summary: The paper introduces a new algorithm for assessing LiDAR point cloud quality in outdoor autonomous driving environments using both image data and ground truth annotations, improving detection performance.
http://arxiv.org/abs/2406.17263v1
Compressor summary: The paper proposes GMKI, an efficient derivative-free sampler for handling multi-modal distributions in Bayesian inference for large-scale inverse problems.
http://arxiv.org/abs/2406.17262v1
Compressor summary: D2LLM combines a bi-encoder and an interaction module to achieve both efficiency and nuanced understanding in semantic search, outperforming five baselines on three tasks.
http://arxiv.org/abs/2406.17261v1
Compressor summary: TRAWL optimizes large language models through tensor decomposition to improve performance without retraining or extra data, making AI systems more efficient and sustainable.
http://arxiv.org/abs/2406.17260v1
Compressor summary: The authors present a method to reduce hallucination in role-playing dialogues by adjusting the influence of large language models' world knowledge with a confidence threshold, and demonstrate its effectiveness on a new dataset.
http://arxiv.org/abs/2406.17257v1
Compressor summary: Key points: - TTS models face challenges in synthesizing speech for multiple languages - Standard approach uses transformers and large multilingual datasets - Paper proposes using PETL methods (adapters, hypernetworks) for better performance with less parameters Summary: The paper introduces a new method to improve TTS models for multiple languages by using PETL techniques that require fewer parameters and achieve similar or better results than standard fine-tuning.
http://arxiv.org/abs/2406.17256v1
Compressor summary: The paper proposes MoMo, a diffusion-based approach for video frame interpolation that enhances visual quality by focusing on intermediate motion modeling, using a novel diffusion U-Net architecture and achieving superior perceptual quality with reduced computational demands.
http://arxiv.org/abs/2406.17255v1
Compressor summary: MPCoder uses explicit and implicit residual learning to generate personalized, multi-user code with a contrastive adapter and a novel evaluation metric.
http://arxiv.org/abs/2406.17254v1
Compressor summary: ScalpVision is an AI-driven system that uses innovative methods to segment hair and generate data for diagnosing scalp diseases and alopecia.
http://arxiv.org/abs/2406.17253v1
Compressor summary: The study explores how different levels of "perplexingness" affect the ability to update large language models with new knowledge, and introduces a novel dataset to investigate this phenomenon.
http://arxiv.org/abs/2406.17251v1
Compressor summary: Graph contrastive learning improves representations by incorporating latent shape properties of graphs at multiple resolutions, enhancing performance in unsupervised graph classification.
http://arxiv.org/abs/2406.17245v1
Compressor summary: Key points: - LMs struggle with catastrophic forgetting in continual learning (CL) - MIGU is a rehearsal-free and task-label-free method that updates parameters with large output magnitudes in linear layers - MIGU is universal, effective, and can integrate with existing CL types Summary: MIGU is a novel method that improves LMs' continual learning performance by updating parameters based on output magnitude distribution in linear layers, without rehearsal or task labels.
http://arxiv.org/abs/2406.17241v1
Compressor summary: The authors propose a novel method to learn the meanings of circuit representations in GPT2-XL using knowledge editing and explore their properties, such as size, composition, and overlap with other datasets.
http://arxiv.org/abs/2406.17238v1
Compressor summary: The paper proposes an Expansive Synthesis model that generates large-quality datasets from minimal samples by using expander graph mappings, feature interpolation, Koopman operator, and optimal transport, achieving performance on par with classifiers trained on full-scale datasets.
http://arxiv.org/abs/2406.17236v1
Compressor summary: LIPE is a two-stage framework that learns a personalized identity prior for text-based non-rigid image editing, improving consistency and quality in editing results.
http://arxiv.org/abs/2406.17232v1
Compressor summary: The study explored how integrating human belief networks into large language models can improve their alignment with human opinions on related topics.
http://arxiv.org/abs/2406.17231v1
Compressor summary: The CogMG framework uses knowledge graphs to improve question-answering by large language models, reducing hallucinations and increasing accuracy.
http://arxiv.org/abs/2406.17224v1
Compressor summary: The paper proposes using large language models and symbolic programs to create interpretable and expressive predictive models for classification tasks, and introduces IL-Bench, a collection of diverse tasks for evaluation.
http://arxiv.org/abs/2406.17219v1
Compressor summary: The paper proposes a new face anonymization method that distracts both intrinsic and extrinsic identity attention, allowing for flexible manipulation of appearance and geometry structure to protect privacy better than existing methods.
http://arxiv.org/abs/2406.17216v1
Compressor summary: Existing machine unlearning methods fail to effectively remove the effects of data poisoning on deep learning models, highlighting the need for more rigorous evaluation metrics and provable guarantees.
http://arxiv.org/abs/2406.17213v1
Compressor summary: The study combines article text and image features to identify news frames, finding that relevant images improve frame prediction and concreteness affects image usefulness.
http://arxiv.org/abs/2406.17199v1
Compressor summary: The text introduces a new method called Graph-centric Contrastive framework for Graph Matching (GCGM) that uses graph augmentations for contrastive learning without side information and outperforms existing self-supervised methods in pattern recognition tasks.
http://arxiv.org/abs/2406.17188v1
Compressor summary: The text proposes a new algorithm, Geometric Median Matching, for selecting informative subsets from large noisy datasets to train deep learning models efficiently and robustly.