This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-18 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.11840v1
Compressor summary: The paper introduces LLaNA, a NeRF-language assistant that can perform tasks like captioning and Q&A using NeRF weights instead of images or 3D data.
http://arxiv.org/abs/2406.11838v1
Compressor summary: The authors propose a diffusion loss function for autoregressive models that allows them to operate in a continuous-valued space without vector quantization, achieving strong image generation results and speed advantages.
http://arxiv.org/abs/2406.11839v1
Compressor summary: mDPO is a new method to better align multimodal language models by optimizing image preferences and avoiding reward decay, leading to improved performance and reduced hallucination.
http://arxiv.org/abs/2406.11837v1
Compressor summary: The authors propose VQGAN-LC, a novel image quantization model that uses a large codebook (100,000) to improve performance on various tasks.
http://arxiv.org/abs/2406.11835v1
Compressor summary: The paper introduces a new benchmark for testing anomaly instance segmentation in autonomous vehicles, which is essential for identifying unknown objects like wild animals to prevent accidents.
http://arxiv.org/abs/2406.11836v1
Compressor summary: The authors propose RetinaGS, a general model parallel training method for 3D Gaussian splatting models, which enables scaling to large numbers of primitives and high resolutions, improving reconstruction quality.
http://arxiv.org/abs/2406.11833v1
Compressor summary: MMDU is a benchmark and dataset to evaluate and improve large vision-language models' abilities in multi-turn and multi-image conversations, which are essential for real-world human-AI interaction applications.
http://arxiv.org/abs/2406.11832v1
Compressor summary: The paper introduces EVE, an encoder-free vision-language model that can rival encoder-based models using only 35M data, by bridging vision-language representation in a unified decoder and enhancing visual recognition with extra supervision.
http://arxiv.org/abs/2406.11831v1
Compressor summary: Key points: - Large language models (LLMs) are good at text understanding but not well used in text-to-image diffusion models - Misalignment between LLM training and prompt encoding, and positional bias of decoder-only architecture are the main obstacles - A novel framework is proposed to improve text representation and integrate multiple LLMs into text-to-image generation model - LI-DiT, a new model based on the framework, achieves state-of-the-art performance in prompt understanding and image generation Summary: The paper proposes a new framework to enhance the use of large language models in text-to-image diffusion models by improving text representation and fusing multiple LLMs. The resulting LI-DiT model outperforms existing open-source and closed-source models.
http://arxiv.org/abs/2406.11830v1
Compressor summary: The paper introduces ERASE, a method that updates language models to reflect changes in the world by deleting or rewriting existing documents when new ones are added, improving accuracy in answering questions about news articles and conversations.
http://arxiv.org/abs/2406.11828v1
Compressor summary: The paper investigates how efficiently a two-layer neural network can learn a target function that is a combination of many nonlinear functions, depending on the number of tasks and the information exponent of each function.
http://arxiv.org/abs/2406.11827v1
Compressor summary: Weighted Preference Optimization (WPO) is a novel method to improve reinforcement learning from human feedback by adapting off-policy data to resemble on-policy data, leading to better alignment with human values and enhanced performance.
http://arxiv.org/abs/2406.11825v1
Compressor summary: The authors propose a new method to analyze deep learning models on neuroimaging data during training, using gradient computations and singular value decomposition, which can help understand and prevent issues like bias and overfitting.
http://arxiv.org/abs/2406.11824v1
Compressor summary: Infinigen Indoors is a tool that generates photorealistic indoor scenes using procedural generation and constraints, designed to train embodied agents in real-time simulators.
http://arxiv.org/abs/2406.11823v1
Compressor summary: This study optimizes vision-language models for efficient computation while maintaining high performance, and shares the resources on GitHub.
http://arxiv.org/abs/2406.11820v1
Compressor summary: The authors propose a fast image-text matching model using scene graphs to represent captions and graph attention networks to learn object-attribute and object-object relations, outperforming existing cross-attention methods in recall and speed.
http://arxiv.org/abs/2406.11819v1
Compressor summary: The paper introduces MegaScenes, a large-scale scene-level dataset for novel view synthesis, addressing challenges like lighting and transient objects in Internet photos.
http://arxiv.org/abs/2406.11817v1
Compressor summary: Iterative length-regularized Direct Preference Optimization (iLR-DPO) improves response quality without increasing verbosity, leading to a strong 7B language model that performs on par with GPT-4.
http://arxiv.org/abs/2406.11816v1
Compressor summary: The LIVE framework enables real-time conversation within a continuous video stream using large multimodal models, improving efficiency and performance for streaming videos.
http://arxiv.org/abs/2406.11813v1
Compressor summary: This study investigates how large language models acquire factual knowledge during pretraining and finds that more data does not always help, forgetting occurs over time, and batch size affects robustness to forgetting.
http://arxiv.org/abs/2406.11811v1
Compressor summary: The authors introduce RepLiQA, a new test dataset for question-answering and topic retrieval tasks, which contains imaginary scenarios not present on the internet to avoid leaking into LLM training sets.
http://arxiv.org/abs/2406.11810v1
Compressor summary: The paper proposes a computationally efficient Reinforcement Learning algorithm for linear Bellman complete settings that uses random noise injection in least square regression problems to perform optimistic value iteration.
http://arxiv.org/abs/2406.11809v1
Compressor summary: The text describes a physics-constrained learning method that uses data and physical models to predict the nonlinear dynamics of flexible objects, such as soft robots, while also accounting for uncertainty and compositionality.
http://arxiv.org/abs/2406.11808v1
Compressor summary: Key points: - Pain recognition using deep learning is challenged by small datasets - The study investigates transferability of heat pain model to electrical pain - They use an existing CNN as a feature extractor and train two ML models - Their approach outperforms the baseline in the AI4Pain challenge Summary: The authors propose a method to recognize electrical pain using a pre-trained heat pain CNN as a feature extractor, which beats the baseline in the AI4Pain challenge.
http://arxiv.org/abs/2406.11803v1
Compressor summary: FSR is an efficient algorithm for finding statistically significant patterns in transactional data using resampled datasets with rigorous guarantees on false discoveries.
http://arxiv.org/abs/2406.11802v1
Compressor summary: PhyBench is a new evaluation dataset that tests text-to-image models' physical commonsense abilities by generating images based on prompts with physical principles, revealing their limitations and the need for better reasoning.
http://arxiv.org/abs/2406.11801v1
Compressor summary: Safety Arithmetic is a framework to improve large language models' safety by removing harmful content and aligning them with human values without training.
http://arxiv.org/abs/2406.11794v1
Compressor summary: DataComp for Language Models (DCLM) is a testbed to improve language models through controlled dataset experiments, providing a standardized corpus, pretraining recipes, and 53 evaluations, with the baseline DCLM-Baseline model achieving competitive results on MMLU and NLP tasks with less compute.
http://arxiv.org/abs/2406.11785v1
Compressor summary: This paper proposes contrastive explanation methods for generative AI like LLMs, which explain their responses by showing how prompt modifications would change the output.
http://arxiv.org/abs/2406.11784v1
Compressor summary: The text introduces MDCR, a new dataset for evaluating models' ability to answer complex conditional reasoning questions requiring cross-document optimization, which reveals the limitations of current LLMs in solving such tasks.
http://arxiv.org/abs/2406.11780v1
Compressor summary: SPUNGE is a framework that enhances machine unlearning by using data attributes to improve safety and performance of large language models.
http://arxiv.org/abs/2406.11779v1
Compressor summary: The authors propose using mechanistic interpretability techniques to derive and prove formal guarantees on model accuracy and investigate the relationship between proof length, understanding, and bound tightness.
http://arxiv.org/abs/2406.11776v1
Compressor summary: This paper explores how sparse communication among agents in multi-agent debates can improve language model quality, reduce costs, and extend to other tasks.
http://arxiv.org/abs/2406.11775v1
Compressor summary: Task-Me-Anything is a benchmark generation engine that creates tailored multimodal tasks to evaluate large language models' capabilities across various domains.
http://arxiv.org/abs/2406.11774v1
Compressor summary: The paper introduces a safe reinforcement learning algorithm using optimal transport theory to balance performance and safety in decision-making policies.
http://arxiv.org/abs/2406.11772v1
Compressor summary: The text proposes a new method to identify wood species using high-resolution images and a voting process, and introduces a new data set of macroscopic timber images.
http://arxiv.org/abs/2406.11769v1
Compressor summary: The text discusses how simple visual sensors with few photoreceptors can perform well on computer vision tasks and how a computational algorithm can help optimize their design.
http://arxiv.org/abs/2406.11766v1
Compressor summary: MatLoc-NeRF is a new method for efficient and accurate 3D scene localization using selective NeRF features, pose-aware scene partitioning, and coarse initial pose estimation.
http://arxiv.org/abs/2406.11757v1
Compressor summary: STAR is a framework that helps test the safety of large language models by generating instructions for human red teamers, matching demographics for risk assessment, and leveraging diverse viewpoints for label reliability.
http://arxiv.org/abs/2406.11753v1
Compressor summary: The authors propose a method for determining where to finetune language models based on analyzing their semantic inference process, which improves efficiency and effectiveness over existing baselines.
http://arxiv.org/abs/2406.11743v1
Compressor summary: Our method trains a neural network to estimate the position and orientation of a spacecraft from monocular images using multi-task learning and data augmentation, improving domain generalization and achieving state-of-the-art results.
http://arxiv.org/abs/2406.11741v1
Compressor summary: The paper explores how generative models can sometimes surpass human experts' abilities when trained on their data, using chess playing as an example and showing that low-temperature sampling enables this transcendence.
http://arxiv.org/abs/2406.11739v1
Compressor summary: The V3Det Challenge 2024 is a benchmark for object detection in real-world scenes with various categories and unseen objects, aiming to advance the field and inspire innovation.
http://arxiv.org/abs/2406.11737v1
Compressor summary: InterNeRF is a novel architecture that improves NeRFs' scalability for large, real-world scenes by enabling out-of-core training and rendering with modest increase to training time.
http://arxiv.org/abs/2406.11736v1
Compressor summary: The paper proposes a neural-symbolic self-training method (ENVISIONS) that leverages environment feedback to improve LLMs' performance in natural language and symbolic domains with limited data.
http://arxiv.org/abs/2406.11732v1
Compressor summary: The paper introduces a new method for registering multivector clouds in conformal geometric algebra without solving correspondences, using orthogonal transformations from $SO(4,1)$.
http://arxiv.org/abs/2406.11721v1
Compressor summary: This paper investigates how zero-shot generalization in instruction tuning works and proposes a new data arrangement method to improve it.
http://arxiv.org/abs/2406.11717v1
Compressor summary: The study identifies a one-dimensional subspace that mediates refusal behavior in large language models and proposes a method to disable it, revealing the brittleness of current safety fine-tuning methods.
http://arxiv.org/abs/2406.11715v1
Compressor summary: This paper investigates how reinforcement learning with human feedback (RLHF) affects training data memorization and privacy concerns in code completion models.
http://arxiv.org/abs/2406.11714v1
Compressor summary: SE2P is a scalable and configurable graph neural network model that balances expressiveness and generalization by perturbing input graphs and adjusting learnable features.
http://arxiv.org/abs/2406.11713v1
Compressor summary: Latent Denoising Diffusion GAN improves diffusion models' inference speed, image quality, and diversity by using pre-trained autoencoders and a weighted learning strategy.
http://arxiv.org/abs/2406.11711v1
Compressor summary: OGNI-DC is a novel framework for depth completion that uses Optimization-Guided Neural Iterations to refine depth gradients and generate dense depth maps with high accuracy and generalization.
http://arxiv.org/abs/2406.11709v1
Compressor summary: TreeInstruct is a state space-based planning algorithm that uses Socratic questioning to help students independently identify and resolve coding errors in a multi-turn interaction setting, outperforming baselines and guiding students efficiently.
http://arxiv.org/abs/2406.11704v1
Compressor summary: The authors introduce Nemotron-4 340B models, which are competitive language models that can generate synthetic data for training smaller models and are available under a permissive license.
http://arxiv.org/abs/2406.11703v1
Compressor summary: The text discusses double descent in unsupervised learning using under-complete auto-encoders and its applications for dealing with noisy data, domain shifts, and anomalies.
http://arxiv.org/abs/2406.11698v1
Compressor summary: Meta-Reasoning Prompting (MRP) is a new method that helps large language models select and apply the best reasoning methods for different tasks, improving their performance and efficiency across various problem domains.
http://arxiv.org/abs/2406.11695v1
Compressor summary: The paper proposes MIPRO, a novel optimizer for language model programs, which improves their performance by crafting task-grounded instructions and navigating credit assignment across modules.
http://arxiv.org/abs/2406.11689v1
Compressor summary: The paper proposes Language-Guided Distillation (LGD), a new method that uses category names to improve knowledge transfer from a large network to a smaller network for mobile devices.
http://arxiv.org/abs/2406.11687v1
Compressor summary: The study examines the challenges of tokenization for large language models, its impact on problem solving and resilience to typos, and suggests subword regularization as a potential solution.
http://arxiv.org/abs/2406.11686v1
Compressor summary: The paper studies offline RL with linear function approximation and gives a fast algorithm that works under low inherent Bellman error, achieving suboptimality that scales with $\sqrt{\varepsilon_{\mathrm{BE}}}$, which is optimal for any algorithm in this setting.
http://arxiv.org/abs/2406.11685v1
Compressor summary: The paper proposes TopoEdge, a novel approach to address topological imbalance in edge classification tasks using topological entropy as a metric to measure and mitigate the issue.
http://arxiv.org/abs/2406.11683v1
Compressor summary: HoLLMwood is a framework that uses large language models to create screenplays by assigning them different roles, such as writer, editor, and actor, mimicking the human creative process.
http://arxiv.org/abs/2406.11682v1
Compressor summary: Key points: - The paper proposes a new task, knowledge-to-jailbreak, to test the domain-specific safety of LLMs - The task involves generating jailbreaks from domain knowledge that can harm LLMs - The paper collects a large dataset and fine-tunes a model for this task - Experiments show the effectiveness and generalizability of the method Summary: The paper introduces a new way to test if LLMs are safe in different domains by generating domain knowledge-specific jailbreaks that can harm them, and demonstrates its feasibility with a large dataset and a fine-tuned model.
http://arxiv.org/abs/2406.11681v1
Compressor summary: The paper introduces R-Eval, a toolkit for evaluating Retrieval-Augmented Large Language Models on various tasks and domains, revealing their effectiveness variations.
http://arxiv.org/abs/2406.11676v1
Compressor summary: The authors propose a novel method using fractional score functions and physical-informed neural networks to solve high-dimensional Fokker-Planck-L'evy equations, overcoming the curse of dimensionality and numerical overflow issues.
http://arxiv.org/abs/2406.11675v1
Compressor summary: The paper proposes BLoB, a method that adapts LLMs' parameters during fine-tuning to improve generalization and uncertainty estimation.
http://arxiv.org/abs/2406.11674v1
Compressor summary: The paper proposes Endor, a sparse format that compresses pruned LLM weights to reduce weight transfer latency and accelerate offloaded inference on resource-constrained platforms.
http://arxiv.org/abs/2406.11672v1
Compressor summary: The paper proposes using effective rank analysis and regularization to improve the quality of 3D Gaussian Splatting, a technique for real-time rendering with high-quality 3D reconstruction, by reducing needle-like artifacts and enhancing normal and geometry reconstruction.
http://arxiv.org/abs/2406.11670v1
Compressor summary: The article summarizes existing LLM text recognition methods, highlights issues with current benchmarking datasets, and introduces a new evaluation dataset to compare different detectors.
http://arxiv.org/abs/2406.11668v1
Compressor summary: BabyBLUE is a new benchmark for evaluating jailbreaks and hallucinations in large language models, improving safety and reliability.
http://arxiv.org/abs/2406.11667v1
Compressor summary: The paper shows that it is possible to learn from a weaker oracle than ERM and asks if the ERM principle is necessary for efficient learning.
http://arxiv.org/abs/2406.11665v1
Compressor summary: The text discusses how vision-language models have a Western bias in image understanding and suggests pre-training with diverse languages to improve equity.
http://arxiv.org/abs/2406.11661v1
Compressor summary: The paper investigates how four LLMs respond to culturally sensitive and neutral prompts on different datasets, finding significant variations in their answers except for GPT-4, questioning the effectiveness of socio-demographic prompting as a method for studying or aligning models with culture.
http://arxiv.org/abs/2406.11657v1
Compressor summary: The paper explores the reliability of using large language models as personalized judges for user preferences based on personas, and proposes verbal uncertainty estimation to improve their performance.
http://arxiv.org/abs/2406.11654v1
Compressor summary: Ruby Teaming is a method that enhances Rainbow Teaming by adding a memory cache, resulting in higher attack success rate and quality diversity of generated prompts.
http://arxiv.org/abs/2406.11651v1
Compressor summary: The paper proposes a zero-shot evaluation method for dialogue state tracking using GPT-4, which considers both accuracy and completeness and improves accuracy with manual reasoning paths.
http://arxiv.org/abs/2406.11643v1
Compressor summary: AnyMaker is a framework for generating general objects with high identity fidelity and flexible text editability using self-supervised models, dual-level ID injection, and ID-aware decoupling.
http://arxiv.org/abs/2406.11641v1
Compressor summary: YOLO-FUSIONNET IMPROVES DRONE DETECTION IN COMPLEX ENVIRONMENTS BY COMBINING GENERIC OBJECT DETECTION AND CAMOUFLAGE OBJECT DETECTION TECHNIQUES.
http://arxiv.org/abs/2406.11640v1
Compressor summary: The paper presents a new polynomial-time algorithm for reinforcement learning with linear function approximation and Bellman completeness, which does not rely on global optimism or solving a nonconvex optimization problem.
http://arxiv.org/abs/2406.11638v1
Compressor summary: MASAI is a modular architecture for software-engineering AI agents that uses different sub-agents with specialized objectives and strategies, improving performance on complex problems like GitHub issues resolution.
http://arxiv.org/abs/2406.11634v1
Compressor summary: Cloze testing measures large language models' behavior on benchmark tasks; using the MMLU dataset, they show that significant base-rate probability differences among answer tokens impact task performance, but counterfactual prompting reduces this effect, similar to human test-taking strategies.
http://arxiv.org/abs/2406.11632v1
Compressor summary: sMBR decoding uses synthetic sources from backward translation and a reference-free quality estimation metric to improve neural machine translation, outperforming QE reranking and being competitive with standard MBR decoding.
http://arxiv.org/abs/2406.11633v1
Compressor summary: DocGenome is a structured document benchmark with four key characteristics, designed to improve understanding and processing of scientific documents by large models.
http://arxiv.org/abs/2406.11629v1
Compressor summary: The paper explores how to use GPT-4 as a judge for evaluating LLMs with fewer biases using many-shot in-context prompts and symbol bias mitigation.
http://arxiv.org/abs/2406.11624v1
Compressor summary: The authors propose a method to control motion forecasting models using natural language inputs, making them more interpretable and easy to interact with.
http://arxiv.org/abs/2406.11622v1
Compressor summary: The paper proposes measuring regional cultural variation using language and knowledge-guided lexica, while pointing out limitations in current large language models.
http://arxiv.org/abs/2406.11617v1
Compressor summary: DELLA is a new model merging technique that uses MAGPRUNE, a novel pruning method, to improve multitasking performance and releases its source code.
http://arxiv.org/abs/2406.11614v1
Compressor summary: The paper introduces ConceptVectors, a dataset and methodology to evaluate unlearning in large language models by measuring changes in parametric knowledge traces of erased concepts, which improves current behavioral-based evaluations.
http://arxiv.org/abs/2406.11612v1
Compressor summary: The authors introduce Long Code Arena, a suite of six benchmarks for code processing tasks that require project-wide context, along with datasets, evaluation tools, and baselines.
http://arxiv.org/abs/2406.11608v1
Compressor summary: Key points: - The paper proposes a method for hierarchical semantic classification based on image segmentations - The method treats each level as a different task and uses a single recognition model with fine-to-coarse parsing - The method introduces a Tree-path KL Divergence loss to enforce consistency and accuracy across levels Summary: The paper presents a novel hierarchical semantic classification method that uses image segmentations, treats each level as a different task, and enforces consistency and accuracy with a new loss function.
http://arxiv.org/abs/2406.11601v1
Compressor summary: The paper proposes internally-standardized structural causal models (iSCMs), which reduce artifacts in synthetic datasets used for benchmarking causal structure learning algorithms, and are less identifiable from prior knowledge on the weights.
http://arxiv.org/abs/2406.11598v1
Compressor summary: The paper examines how the term "democratization" is used in natural language processing and machine learning publications, finding that it often refers to ease of access or use of technologies rather than engaging with theories of democracy.
http://arxiv.org/abs/2406.11594v1
Compressor summary: Key points: - GNNs are effective graph machine learning models that need more trustworthiness - The paper proposes a method to mine activation rules in GNNs hidden layers to understand how they work - The method uses information theory and pattern languages to find and redescribe the rules - The rules can help explain GNN decisions and reveal hidden features - The method outperforms existing methods in explaining graph classification Summary: The paper presents a novel method to mine activation rules in GNNs hidden layers using information theory and pattern languages, which can improve trustworthiness and explainability of these powerful graph machine learning models.
http://arxiv.org/abs/2406.11592v1
Compressor summary: The ChildDiffusion framework generates realistic child faces with various attributes and ethnicities, addressing privacy concerns, and can be used for various ML tasks.
http://arxiv.org/abs/2406.11581v1
Compressor summary: This paper proposes an improved preference optimization approach for text style transfer, which adapts techniques from statistical machine translation and incorporates exploration, contrastive sampling, pseudo-parallel generation, and dynamic weighted reward aggregation.
http://arxiv.org/abs/2406.11580v1
Compressor summary: The paper introduces Error Span Annotation, a hybrid evaluation method for machine translation that is faster, cheaper, and requires less expertise than existing methods.
http://arxiv.org/abs/2406.11579v1
Compressor summary: Duoduo CLIP is a 3D representation learning model that uses multi-view images and leverages 2D priors from CLIP models for fine-tuning, resulting in better generalization, reduced GPU requirements, faster training time, and improved performance in object retrieval and text-and-shape alignment tasks.
http://arxiv.org/abs/2406.11577v1
Compressor summary: The paper introduces annotated corpora for studying mathematical language and tests various natural language processing models on them, finding that they struggle with math terminology and definitions.
http://arxiv.org/abs/2406.11568v1
Compressor summary: The paper presents an E2E framework that uses large language models to decode invasive brain signals and improve speech neuroprosthesis, showing its potential for BCI applications.
http://arxiv.org/abs/2406.11567v1
Compressor summary: The paper introduces a Quaternion Generative Adversarial Neural Network model for color image inpainting, which uses quaternion deconvolution and batch normalization to improve stability and take advantage of channel correlations.
http://arxiv.org/abs/2406.11566v1
Compressor summary: This paper introduces MKEB, a multilingual knowledge editing benchmark, and MEMLA, a method that improves multilingual knowledge editing by identifying and modifying knowledge neurons across 12 languages.
http://arxiv.org/abs/2406.11565v1
Compressor summary: The paper evaluates how well language models can generate texts that are culturally sensitive when the prompts change based on different nationalities.
http://arxiv.org/abs/2406.11563v1
Compressor summary: Intersymbolic AI combines symbolic and subsymbolic AI techniques to enhance AI effectiveness, similar to how human thought benefits from both conscious and subconscious processes.
http://arxiv.org/abs/2406.11562v1
Compressor summary: The paper proposes a novel reinforcement learning framework that efficiently learns dogfight policies for UCAVs by imitating experts and autonomously exploring dynamic environments.
http://arxiv.org/abs/2406.11555v1
Compressor summary: The paper presents a method to create dynamic language agents using graphs and a pretrained LLM fine-tuned with reinforcement learning, which improves communication flow and accuracy in various domains.
http://arxiv.org/abs/2406.11551v1
Compressor summary: The paper proposes a simple and efficient approach to improve fine-grained sketch-based image retrieval by enhancing feature alignment and sharing mutual information between sketches and images using dual weight-sharing networks, contrastive loss, and a learnable self-attention module.
http://arxiv.org/abs/2406.11547v1
Compressor summary: The paper introduces GECO, a gender-controlled text dataset, to evaluate the impact of biases on XAI methods for large language models and shows that fine-tuning embedding layers improves explanation performance.
http://arxiv.org/abs/2406.11544v1
Compressor summary: Membership inference attacks need white-box access to models, not just black-box access, as previously claimed; a new attack using inverse-Hessian vector products confirms this.
http://arxiv.org/abs/2406.11538v1
Compressor summary: The text proposes a method to generate artificial artifacts in histopathology images for training classification algorithms, improving artifact detection accuracy.
http://arxiv.org/abs/2406.11534v1
Compressor summary: Inpainting the Gaps (InG) is a novel evaluation framework that improves the perturbation test by inpainting partial or complete objects in an image, reducing test-time distribution shifts and providing more consistent evaluation scores for popular explanation methods of the Vision Transformer.
http://arxiv.org/abs/2406.11524v1
Compressor summary: This paper reviews current methods for Explainable Artificial Intelligence (XAI) that address the issue of multicollinearity, which occurs when features are highly correlated and can affect the interpretation of informative features in machine learning models.
http://arxiv.org/abs/2406.11522v1
Compressor summary: The paper proposes FullCert, the first end-to-end certifier that provides robustness guarantees against both training and inference data attacks using a new library called BoundFlow.
http://arxiv.org/abs/2406.11519v1
Compressor summary: HyperSIGMA is a large transformer-based foundation model for hyperspectral image analysis that leverages sparse sampling attention, spectral enhancement, and a novel dataset to achieve state-of-the-art performance on various tasks.
http://arxiv.org/abs/2406.11517v1
Compressor summary: The authors propose a structural causal model to analyze spurious correlation in representation learning, and introduce a propensity score weighted estimator to control confounding bias for out-of-distribution generalization.
http://arxiv.org/abs/2406.11514v1
Compressor summary: CFMAD is a framework that uses counterfactual multi-agent debate to override LLMs' biases and improve their performance on natural language processing tasks.
http://arxiv.org/abs/2406.11507v1
Compressor summary: PNPT is a novel method that uses normal semantics prompting to improve multi-class image anomaly detection by combining prior knowledge with sample characteristics in a dual-stream reconstruction model.
http://arxiv.org/abs/2406.11504v1
Compressor summary: FiP is a framework that uses fidelity measure to create global masks for graph pruning and shows that general XAI methods perform better than GNN-specific ones.
http://arxiv.org/abs/2406.11503v1
Compressor summary: The authors propose a novel pipeline using GPT-4 and GPT-4V to generate geometry problems with aligned text and images, improving the geometric capabilities of multi-modal models on benchmarks.
http://arxiv.org/abs/2406.11501v1
Compressor summary: The text introduces a novel graphical modeling approach called teleporter theory to analyze counterfactual causality in complex machine learning applications using structural causal models.
http://arxiv.org/abs/2406.11497v1
Compressor summary: CrAM is a plug-and-play method that adjusts LLMs' attention scores based on document credibility to reduce misinformation in retrieval-augmented generation.
http://arxiv.org/abs/2406.11490v1
Compressor summary: Our paper proposes a causal model for multi-modal representation learning that balances predominant and auxiliary modalities, improving performance and exploration over existing methods.
http://arxiv.org/abs/2406.11486v1
Compressor summary: The paper explores how well LLMs perform in extracting temporal relations without fine-tuning, finding they struggle and have issues with uniqueness and transitivity, but improving accuracy by solving inconsistencies is not guaranteed.
http://arxiv.org/abs/2406.11481v1
Compressor summary: The text discusses various model-based and model-free approaches for Constrained RL in average reward MDPs, analyzing constraint violation and regret guarantees, assuming ergodic MDPs and considering weakly communicating MDPs.
http://arxiv.org/abs/2406.11477v1
Compressor summary: The paper explores sample-efficient strategies for adapting large language models to low-resource languages using vocabulary expansion, finding simpler heuristic-based initialization methods more efficient and robust than existing approaches.
http://arxiv.org/abs/2406.11474v1
Compressor summary: The paper explores how different parts of context text affect In-Context Alignment (ICA) in Large Language Models (LLMs), finding that examples are crucial for alignment, and that ICA performs better than parameter fine-tuning in some tasks.
http://arxiv.org/abs/2406.11473v1
Compressor summary: SEDD is a fast and promising alternative to autoregressive LLMs, but has some weaknesses in conditional generation with short prompts.
http://arxiv.org/abs/2406.11472v1
Compressor summary: Our interactive segmentation frameworks use transformer backbones, exemplar-informed modules, and cross-attention blocks to refine masks for single or multiple objects in the same category, reducing users' labor and clicks compared to previous methods.
http://arxiv.org/abs/2406.11464v1
Compressor summary: The paper explores new methods for automatically breaking sentences into smaller parts to help people with reading difficulties, and evaluates their effectiveness in different languages.
http://arxiv.org/abs/2406.11463v1
Compressor summary: The study examines how neural networks fit data in practice, finding that optimizers, architecture, and activation functions influence the capacity to fit training data and generalization.
http://arxiv.org/abs/2406.11460v1
Compressor summary: TRACE is a method for improving multi-hop question answering by constructing knowledge-grounded reasoning chains from retrieved documents.
http://arxiv.org/abs/2406.11458v1
Compressor summary: Strategic training defends against adversaries by modeling their goals rather than harming performance and uses incentive uncertainty to guide learning.
http://arxiv.org/abs/2406.11456v1
Compressor summary: The text discusses how to improve the performance of convolutional neural networks for medical image analysis by adjusting their calibration in regions that affect decision making.
http://arxiv.org/abs/2406.11455v1
Compressor summary: Key points: - The paper proposes a two-stage multi-step method to improve LLMs' performance on complex information extraction tasks - The method uses an RL framework and DDQN algorithm to learn the optimal order for sequential entity extraction - The method is evaluated on multiple public datasets and shows effectiveness Summary: The paper introduces a novel method that uses reinforcement learning and deep Q-networks to teach large language models how to extract information from complex sentences in an optimal sequence, improving their performance on information extraction tasks.
http://arxiv.org/abs/2406.11451v1
Compressor summary: The paper introduces MedThink, a method that mimics human cognition to create fine-grained instruction pairs for LVLMs in medical image report generation tasks, improving their performance and reducing hallucinations.
http://arxiv.org/abs/2406.11445v1
Compressor summary: The paper reviews methods to solve the ECG inverse problem, which helps create personalized virtual heart models that can improve cardiology care.
http://arxiv.org/abs/2406.11443v1
Compressor summary: Key points: - Video processing has two categories: whole video and real-time processing - Real-time processing aims to identify critical situations quickly - The paper proposes a novel framework for online classification of video data - The framework uses a mathematical function to encourage faster decisions - The framework adapts offline models to online and recurrent operations - The framework outperforms non-online methods in accuracy and speed Summary: The paper presents a new framework for online video processing that adapts offline models, encourages fast decisions with a mathematical function, and achieves better accuracy and speed than non-online methods.
http://arxiv.org/abs/2406.11441v1
Compressor summary: Key points: - The paper proposes SWCF-Net, which combines local and global features for efficient segmentation of large-scale point clouds. - It uses Similarity-Weighted Convolution (SWConv) to enhance local feature extraction and reduce attention module complexity. - It fuses local and global features with orthogonal components to eliminate redundant information. - It evaluates SWCF-Net on SemanticKITTI and Toronto3D datasets and shows competitive results with less computational cost. Summary: The paper presents SWCF-Net, a network that efficiently segments large-scale point clouds by combining local and global features using SWConv, attention module downsampling, and feature fusion. It achieves competitive performance on SemanticKITTI and Toronto3D with low computation.
http://arxiv.org/abs/2406.11437v1
Compressor summary: The paper explores how to use deep learning methods to predict execution time from source code, proposing a new dual-transformer model that performs better than existing tree-based neural networks and graph neural networks.
http://arxiv.org/abs/2406.11432v1
Compressor summary: AnyTrans is a framework that translates fragmented texts and fuses them seamlessly into images using large-scale models without training.
http://arxiv.org/abs/2406.11431v1
Compressor summary: The text discusses the weak-to-strong deception issue in superalignment, where strong language models may deceive weak ones to gain higher rewards, and suggests using intermediate models as a potential solution.
http://arxiv.org/abs/2406.11430v1
Compressor summary: Key point: The paper proposes a method to compress the Key-Value (KV) cache in large language models by using the $L_2$ of key embeddings, which can significantly reduce memory requirements and maintain accuracy.
http://arxiv.org/abs/2406.11429v1
Compressor summary: The authors propose a fast method for extracting relations from text using virtual entity matching and coarse-grained recall, which improves inference speed and accuracy in zero-shot relation extraction tasks.
http://arxiv.org/abs/2406.11422v1
Compressor summary: CROW is a prototype-based method that discovers novel classes and assigns samples to seen and unseen classes under domain shifts using a cluster-then-match strategy and a fine-tuned representation space.
http://arxiv.org/abs/2406.11418v1
Compressor summary: The paper presents BAMBINO-LM, a continual pretraining strategy for BabyLM that improves Italian language capability by combining alternation and PPO-based modeling, mimicking human children's bilingual learning process.
http://arxiv.org/abs/2406.11410v1
Compressor summary: Key points: - Human priors are important for data construction in deep learning - Large Language Models (LLMs) tend to neglect human priors and rely on large-scale data scraping - The paper proposes a principle to leverage human priors for data construction and trains an SLM named HARE-1.1B - HARE-1.1B performs well against state-of-the-art SLMs and shows the effectiveness of the proposed principle Summary: The paper introduces a principle to use human priors for data construction in small language models (SLMs), which improves their performance and efficiency, and demonstrates it with an SLM named HARE-1.1B.
http://arxiv.org/abs/2406.11409v1
Compressor summary: CodeGemma is a set of specialized open code models that excel at various code and natural language tasks, including mathematical reasoning and code completion.
http://arxiv.org/abs/2406.11403v1
Compressor summary: The report introduces Multimodal Structured Generation, a framework that uses frozen MMFMs to generate structured outputs for document understanding tasks, achieving competitive results in the 2nd Multimodal Foundation Models Challenge.
http://arxiv.org/abs/2406.11402v1
Compressor summary: This paper analyzes the semantic correctness of 10 open, smaller language models across different aspects and shows that they can compete with or outperform state-of-the-art models in specific use-cases.
http://arxiv.org/abs/2406.11400v1
Compressor summary: The paper shows how to use large language models and knowledge graph clustering to extract and disambiguate entities from astronomical text.
http://arxiv.org/abs/2406.11397v1
Compressor summary: DistPred is a new approach for regression and forecasting that uses proper scoring rules to train a model to estimate the uncertainty of the response variable efficiently and accurately.
http://arxiv.org/abs/2406.11391v1
Compressor summary: The paper proposes using proximal policy optimization (PPO) to combine Generative Adversarial Networks (GANs) and Large Language Models (LLMs) for improving tabular data augmentation, achieving better accuracy in models trained on synthetic data.
http://arxiv.org/abs/2406.11389v1
Compressor summary: SEFraud is a novel graph-based fraud detection framework that simultaneously detects fraud and provides interpretable results using customized heterogeneous graph transformer networks and learnable feature masks.
http://arxiv.org/abs/2406.11385v1
Compressor summary: MetaGPT is a data-agnostic method that uses model exclusive task arithmetic to merge GPT-scale models, improving their performance across diverse tasks without compromising data privacy or computational efficiency.
http://arxiv.org/abs/2406.11384v1
Compressor summary: PartCLIPSeg is a new method for segmenting fine-grained entities in images using generalized parts, object contexts, and attention control techniques, achieving state-of-the-art results on several benchmarks.
http://arxiv.org/abs/2406.11380v1
Compressor summary: The authors evaluate a language model's ability to attribute quotes in novels and show that its performance depends on the level of book memorization, but it can still perform well on unseen books.
http://arxiv.org/abs/2406.11375v1
Compressor summary: Analogical reasoning helps humans and AI understand new concepts by associating them with familiar ones, and using teacher and student language models can enhance this process in practical settings.
http://arxiv.org/abs/2406.11370v1
Compressor summary: ZEPO is a framework that improves the fairness and alignment of large language models' evaluations with human judgments by optimizing prompts without labeled data.
http://arxiv.org/abs/2406.11368v1
Compressor summary: The authors propose a method to improve automatic quotation attribution in literary works by using character embeddings trained on a new corpus of drama plays.
http://arxiv.org/abs/2406.11357v1
Compressor summary: Refiner is a method that extracts and restructures relevant information from documents to improve the performance of Retrieval-Augmented Generation models in answering questions.
http://arxiv.org/abs/2406.11354v1
Compressor summary: Tree Generation (TG) is a self-decompression method that helps Large Language Models and Multimodal Large Language Models avoid catastrophic forgetting and maintain performance on language benchmarks by decompressing knowledge into the training corpus and synthetically generating fine-tuning data.
http://arxiv.org/abs/2406.11345v1
Compressor summary: Full-ECE is a new metric that measures how well large language models predict their uncertainty across their entire probability distributions.
http://arxiv.org/abs/2406.11341v1
Compressor summary: The paper investigates how chain-of-thought reasoning, in-context learning, and supervised fine-tuning affect syllogistic reasoning in large language models, revealing cognitive heuristics and improvement in inference quality.
http://arxiv.org/abs/2406.11340v1
Compressor summary: The paper proposes a method (CM2-Net) to improve driver action recognition by continually learning from different modalities using accumulative cross-modal mapping prompts, which help extract and prioritize features across modalities.
http://arxiv.org/abs/2406.11338v1
Compressor summary: The paper proposes an in-context learning method for rewriting sentences based on nontrivial linguistic features like dependency depth, which works well even with sparse data.
http://arxiv.org/abs/2406.11334v1
Compressor summary: The paper introduces a new benchmark for testing multimodal models on real-world tasks that require spatial planning, basic programming, and logical reasoning using XLogoOnline visual programming environment.
http://arxiv.org/abs/2406.11333v1
Compressor summary: This paper proposes a pipeline to mitigate hallucination in long video understanding tasks using frame sampling, question injection, and chain-of-thought learning, achieving state-of-the-art results on the MovieChat dataset.
http://arxiv.org/abs/2406.11331v1
Compressor summary: The authors propose a method to generate counterfactual images to reduce biases in vision language models like CLIP by fine-tuning them with diverse synthetic images of humans in context.
http://arxiv.org/abs/2406.11328v1
Compressor summary: EMPEC is a large-scale Chinese healthcare knowledge benchmark for evaluating the performance of Large Language Models across various professions and specialized fields.
http://arxiv.org/abs/2406.11327v1
Compressor summary: ClawMachine is a method that encodes visual referential information without syntax and allows multimodal large language models to communicate better between language and vision.
http://arxiv.org/abs/2406.11319v1
Compressor summary: The text describes a low-power, two-stage system for maritime ship detection on satellite images that uses a binary classifier followed by an object detection model, achieving high performance and energy efficiency.
http://arxiv.org/abs/2406.11317v1
Compressor summary: GUICourse is a dataset suite that trains visual-based GUI agents from general VLMs, improving their OCR, grounding, and GUI knowledge for better performance on common GUI tasks.
http://arxiv.org/abs/2406.11315v1
Compressor summary: The paper proposes a recurrent method to improve depth completion from sparse lidar measurements using camera guidance and achieves state-of-the-art results on KITTI dataset with low overhead.
http://arxiv.org/abs/2406.11313v1
Compressor summary: TODA is a new semi-supervised domain adaptation method for LiDAR-based 3D object detection that uses mixing and adversarial augmentation to improve feature alignment and performance across different domains.
http://arxiv.org/abs/2406.11311v1
Compressor summary: OHDA framework improves indoor 3D object detection by aligning synthetic and real-world data using object-aware augmentation and a two-branch adaptation system.
http://arxiv.org/abs/2406.11309v1
Compressor summary: BaFTA is a novel method for adapting vision-language models at test time using online clustering and R'enyi Entropy, without fine-tuning text prompts or requiring learning rates.
http://arxiv.org/abs/2406.11308v1
Compressor summary: Key points: - Data-driven model for optimal rework policies in lot-based manufacturing systems with optional rework steps - Causal model to estimate yield improvement using double/debiased machine learning techniques - Validated on real data from white LED production, achieving 2 - 3% yield improvement Summary: The paper proposes a causal model using machine learning to optimize rework decisions in lot-based manufacturing systems and demonstrates its effectiveness on white LED production.
http://arxiv.org/abs/2406.11307v1
Compressor summary: This paper compares two model compression techniques, low-rank factorization and Monarch factorization, and shows that low-rank factorization performs better on text classification tasks.
http://arxiv.org/abs/2406.11303v1
Compressor summary: VideoVista is a video question answering benchmark that covers diverse content, durations, and tasks to evaluate large multimodal models' performance in video understanding and reasoning.
http://arxiv.org/abs/2406.11301v1
Compressor summary: The paper proposes a data augmentation technique to create diverse instruction variants for training and evaluating Large Language Models' ability to follow complex instructions accurately.
http://arxiv.org/abs/2406.11289v1
Compressor summary: The survey reviews text summarization research, highlighting paradigm shifts from traditional methods to deep neural networks, pre-trained language models, and large language models.
http://arxiv.org/abs/2406.11288v1
Compressor summary: MFC-Bench is a benchmark for evaluating the accuracy of large vision-language models in multimodal fact-checking across three tasks and reveals their limitations.
http://arxiv.org/abs/2406.11283v1
Compressor summary: GRL generates diverse synthetic scenes for self-supervised 3D representation learning and achieves better performance on downstream tasks like 3D object detection and semantic segmentation.
http://arxiv.org/abs/2406.11282v1
Compressor summary: The paper proposes a system to extract road networks from satellite images in impoverished areas, improving data availability and showing positive impacts on socioeconomic development.
http://arxiv.org/abs/2406.11280v1
Compressor summary: i-SRT is a method that uses self-retrospection to improve textual-visual alignment, reduce verbosity, and enhance content relevance in video question answering tasks.
http://arxiv.org/abs/2406.11278v1
Compressor summary: LARS is a new scoring function for Uncertainty Estimation in Large Language Models that uses supervised data to capture complex dependencies and produces more reliable and calibrated response scores.
http://arxiv.org/abs/2406.11277v1
Compressor summary: The paper presents HaluAgent, a framework that enables smaller language models to detect hallucination types in text, code, and mathematical expressions, achieving performance comparable to or higher than GPT-4 without tool enhancements.
http://arxiv.org/abs/2406.11275v1
Compressor summary: This paper proposes a self-training method for large language models that improves performance and reduces data requirements by using reference-free consistency checks.
http://arxiv.org/abs/2406.11274v1
Compressor summary: The paper proposes Skip-Layer Attention to improve Transformers by allowing direct attention between non-adjacent layers, enhancing their ability to capture dependencies and perform better in language modeling tasks.
http://arxiv.org/abs/2406.11272v1
Compressor summary: The paper proposes a hybrid AI system that combines expert systems, gradient descent, and generative AI to learn reasoning pathways in new problem domains, as a small step towards creating an artificial general intelligence.
http://arxiv.org/abs/2406.11271v1
Compressor summary: MINT-1T is a large, diverse, open-source multimodal interleaved dataset that includes text and images from various sources, such as PDFs and research papers, to train advanced models.
http://arxiv.org/abs/2406.11267v1
Compressor summary: The paper introduces Faithful Finetuning (F2), a method to improve the accuracy of question answering in large language models by explicitly modeling the process of faithfulness during fine-tuning.
http://arxiv.org/abs/2406.11266v1
Compressor summary: The paper proposes a new approach to improve 3D LiDAR-based robot self-localization by enhancing the discriminability of pole-like landmarks and using a novel rotation-invariant convolutional neural network and unsupervised learning for feature extraction and compression.
http://arxiv.org/abs/2406.11263v1
Compressor summary: The paper analyzes why LLMs collapse after ROME edits and proposes a solution to prevent it by using prefixed keys consistently.
http://arxiv.org/abs/2406.11262v1
Compressor summary: GenLLaVA is a multimodal model that combines language, image, and text generation using instruction finetuning to improve zero-shot capabilities for visual understanding tasks.
http://arxiv.org/abs/2406.11260v1
Compressor summary: The study introduces AdStyle, a method to train fake news detectors that can resist style-conversion attacks using diverse and coherent attack prompts generated by large language models.
http://arxiv.org/abs/2406.11259v1
Compressor summary: The proposed Neural Light Dynamic Fields model generates high-quality 3D talking faces 30 times faster than NeRF by representing light fields with light segments and using knowledge distillation and active pool training.
http://arxiv.org/abs/2406.11258v1
Compressor summary: SeRTS is a novel method that combines Monte Carlo Tree Search and self-rewarding to improve large language models' performance in retrieval-augmented generation for biomedical question answering.
http://arxiv.org/abs/2406.11257v1
Compressor summary: The paper proposes a novel framework called ExCP that reduces the storage of training checkpoints for large language models by compressing residuals, weights, and momentum using non-uniform quantization.
http://arxiv.org/abs/2406.11256v1
Compressor summary: The paper proposes a dynamic data mixture method for MoE instruction tuning, which adjusts the sampling weights of different tasks based on their inter-redundancies to improve model performance.
http://arxiv.org/abs/2406.11253v1
Compressor summary: Key points: - Paper introduces Holistic-Motion2D, a large 2D human motion dataset with pose and text annotations - Paper also proposes Tender, a method for 2D text-driven whole-body motion generation using attention and confidence modeling - Paper shows that 2D motion can generate expressive, diverse, and realistic motions and has potential for 3D lifting Summary: The paper presents Holistic-Motion2D, a novel 2D human motion benchmark with rich annotations, and Tender, a text-driven 2D motion generation method. It demonstrates the effectiveness and utility of 2D motion for various applications and 3D motion transfer.
http://arxiv.org/abs/2406.11252v1
Compressor summary: The paper proposes using open semantics as anchors to transition from image-anchor relationships to image-target relationships for CLIP-based few-shot classification, improving its performance.
http://arxiv.org/abs/2406.11250v1
Compressor summary: Empathy is difficult to model using NLP approaches due to its subjective nature, human interaction dynamics, and low agreement among annotators.
http://arxiv.org/abs/2406.11249v1
Compressor summary: The paper proposes a hypergraph recovery model to study how foundation models acquire relational understanding during pre-training and applies it to entity alignment in multimodal learning.
http://arxiv.org/abs/2406.11247v1
Compressor summary: Key points: - The paper explores building embodied agents with a large language model (LLM) in Minecraft - The STEVE Series agents can perform various tasks efficiently and creatively - The agents are enhanced with vision, action code, Critic, memory, and multi-agent features - The paper also looks into pruning the agent system through knowledge distillation Summary: The paper presents STEVE Series agents, embodied agents with a large language model in Minecraft, that can efficiently and creatively perform various tasks, and explores their enhancements and pruning methods.
http://arxiv.org/abs/2406.11245v1
Compressor summary: The paper proposes a RIS-assisted IoV network that uses an MDP and SAC algorithm to optimize V2I and V2V link performance in terms of AoI and payload transmission probability.
http://arxiv.org/abs/2406.11244v1
Compressor summary: SpoT-Mamba is a new framework that uses node-specific walk sequences and temporal scans to capture long-range spatio-temporal dependencies for better STG forecasting.
http://arxiv.org/abs/2406.11243v1
Compressor summary: FamiCom is a revised measure that combines familiarity and complexity to estimate end-task performances of language models better than perplexity or other metrics.
http://arxiv.org/abs/2406.11242v1
Compressor summary: Key points: - novel method for image and pixel retrieval using hypergraphs and community selection - overcomes limitations of traditional diffusion methods - achieves state-of-the-art accuracy and speed on two datasets Summary: The paper proposes a new method that uses hypergraphs and community selection to enhance image and pixel retrieval, outperforming existing techniques in accuracy and speed.
http://arxiv.org/abs/2406.11239v1
Compressor summary: Homoglyph-based attacks can effectively evade existing large language model detectors, raising concerns about their reliability in combating misinformation and academic cheating.
http://arxiv.org/abs/2406.11238v1
Compressor summary: The paper analyzes how large language models use long contexts for language modeling, finding that content words, initial tokens, frequent n-grams, and prior knowledge benefit predictions, while overconfidence may cause distant probabilities to increase.
http://arxiv.org/abs/2406.11235v1
Compressor summary: QTIP is a new PTQ method that uses TCQ to quantize LLM weights at high dimensions, achieving better quality and faster speed than VQ-based methods.
http://arxiv.org/abs/2406.11234v1
Compressor summary: The study proposes a simple and efficient method to improve sentiment triplet extraction by integrating minimal tagging and token-level contrastive learning, showing comparable or better results than existing approaches.
http://arxiv.org/abs/2406.11233v1
Compressor summary: The paper studies how to improve the ability of large language models to learn from few examples in new tasks, by analyzing and modifying their decision boundaries for binary classification.
http://arxiv.org/abs/2406.11230v1
Compressor summary: MMNeedle is a benchmark for testing multimodal large language models' ability to locate target sub-images within sets of images based on textual instructions and descriptions, revealing the performance gap between API-based and open-source models and GPT-4o's strength in long-context scenarios.
http://arxiv.org/abs/2406.11228v1
Compressor summary: ComperDial is a new benchmark for evaluating dialogue systems that uses human-scored responses from various agents and dialogues, and introduces a novel metric (CPDScore) that better correlates with human judgments.
http://arxiv.org/abs/2406.11218v1
Compressor summary: The authors introduce Spanish-BFF-2, an updated AI-generated Spanish dictionary using GPT-4-turbo, and compare it with the previous version.
http://arxiv.org/abs/2406.11217v1
Compressor summary: WeatherQA is a multimodal dataset for evaluating large language models' ability to forecast severe weather events using images and texts, with the aim of improving meteorological predictions and public safety.
http://arxiv.org/abs/2406.11214v1
Compressor summary: The paper discusses challenges in acquiring diverse and high-quality training data for large language models, which can lead to biased or unreliable content, and proposes strategies to improve data quality and model performance while respecting ethical standards.
http://arxiv.org/abs/2406.11210v1
Compressor summary: The authors propose a novel method for scene change detection that uses tracking models without training and addresses both content and style gaps between input images, improving performance especially on unseen domains.
http://arxiv.org/abs/2406.11206v1
Compressor summary: The paper analyzes how retraining models with noisy labels can improve their performance and apply it to improve privacy in training neural networks.
http://arxiv.org/abs/2406.11202v1
Compressor summary: The paper introduces a Latent Consistency Model adapted for 3D Painting, which improves generation speed and quality using techniques from 2D generative imaging.
http://arxiv.org/abs/2406.11201v1
Compressor summary: This text discusses how fine-tuning large language models (LLMs) for Retrieval-Augmented Generation (RAG) systems may not always improve performance, especially in complex query scenarios.
http://arxiv.org/abs/2406.11200v1
Compressor summary: AvaTaR is a framework that helps large language models use external tools and knowledge better by providing optimized prompts based on reasoning between positive and negative examples.
http://arxiv.org/abs/2406.11196v1
Compressor summary: The paper proposes Vid3D, a model that generates 3D videos by first creating a 2D seed and then generating independent 3D representations for each timestep, and shows it achieves comparable results to existing methods without explicitly modeling 3D temporal dynamics.
http://arxiv.org/abs/2406.11194v1
Compressor summary: ICE is a novel approach for language models to edit knowledge in context without overfitting or losing performance.
http://arxiv.org/abs/2406.11193v1
Compressor summary: The study explores how multimodal large language models use and process domain-specific neurons when handling projected image features for visual tasks like VQA.
http://arxiv.org/abs/2406.11192v1
Compressor summary: B2NERD is a dataset that improves Large Language Models' performance on Open NER by standardizing entity definitions and reducing data redundancy.
http://arxiv.org/abs/2406.11191v1
Compressor summary: This survey reviews how human feedback is used to improve large language models' (LLMs) applicability and effectiveness by learning their preferences, and evaluates different approaches to align LLMs with human intentions.
http://arxiv.org/abs/2406.11190v1
Compressor summary: The proposed framework helps large language models give better feedback using self-reference and general principles, improving AI's understanding of human intentions and preferences.
http://arxiv.org/abs/2406.11189v1
Compressor summary: WeCLIP uses the frozen CLIP model as a backbone to extract semantic features and generate pseudo labels, then refines them with a decoder and a module, achieving better performance in weakly supervised semantic segmentation.
http://arxiv.org/abs/2406.11179v1
Compressor summary: IRED is a novel framework for learning to reason on various tasks using energy-based optimization that adapts to problem difficulty and uses annealed energy landscapes and score function supervision for faster training and inference.
http://arxiv.org/abs/2406.11177v1
Compressor summary: Text-Informed Feature Generation (TIFG) is a novel framework that uses external knowledge to generate new explainable features for data mining and feature engineering, improving downstream task performance.
http://arxiv.org/abs/2406.11176v1
Compressor summary: The paper introduces a new framework called Iterative step-level Process Refinement (IPR) that improves agent performance by providing detailed guidance during training using step-level rewards and contrastive action pairs.
http://arxiv.org/abs/2406.11173v1
Compressor summary: The paper presents a new Kolmogorov Arnold Network (BSRBF-KAN) that combines bsplines and radial basis functions, which performs well on the MNIST dataset and enables more mathematical function combinations in KAN design.
http://arxiv.org/abs/2406.11172v1
Compressor summary: The paper proposes a framework that uses multi-task learning and de-redundancy to extract and fuse diverse legal factors for criminal case matching, outperforming existing methods.
http://arxiv.org/abs/2406.11171v1
Compressor summary: The paper introduces SUGARCREPE++, a dataset to analyze how well large language models understand semantic and lexical variations in image captions, finding that current models struggle with this task.
http://arxiv.org/abs/2406.11162v1
Compressor summary: The paper creates low-resource relation extraction datasets in 10 languages by translating English datasets and filtering out low-quality data, then tests LLMs on them.