This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-28 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.17766v1
Compressor summary: ShapeLLM is a 3D LLM that understands objects from multiple views and interacts with them using language.
http://arxiv.org/abs/2402.17764v1
Compressor summary: The text introduces BitNet b1.58, a 1-bit Large Language Model that achieves similar performance to full-precision models while being more cost-effective and enabling new hardware optimizations.
http://arxiv.org/abs/2402.17762v1
Compressor summary: The paper studies how large language models have a few extremely large activation values that affect their behavior and introduce biases.
http://arxiv.org/abs/2402.17759v1
Compressor summary: The paper proposes a theory for optimal learning of language models by maximizing data compression ratio and shows that it leads to faster learning and improved performance in experiments.
http://arxiv.org/abs/2402.17758v1
Compressor summary: ADL4D is a large 4D HOI dataset with multiple subjects, objects, and actions for learning hand interactions in daily activities.
http://arxiv.org/abs/2402.17756v1
Compressor summary: Key points: - Learning Single-Index Models under $L_2^2$ loss in agnostic model - Efficient algorithm with constant factor approximation to optimal loss - Works for various distributions and link functions - First efficient approximate learner for Gaussian data and nontrivial link functions - Local error bound notion: alignment sharpness Summary: The paper presents an efficient algorithm for learning Single-Index Models under $L_2^2$ loss in the agnostic model, achieving constant factor approximation to the optimal loss, and introduces a new concept of alignment sharpness.
http://arxiv.org/abs/2402.17753v1
Compressor summary: The text introduces a machine-human pipeline for generating very long-term open-domain dialogues, evaluates the performance of large language models on various tasks, and presents a dataset (LoCoMo) with 35 sessions of conversations.
http://arxiv.org/abs/2402.17747v1
Compressor summary: Partial human observations in reinforcement learning can lead to deception and overjustification, which are addressed by studying the return function ambiguity.
http://arxiv.org/abs/2402.17744v1
Compressor summary: The authors propose a novel method to analyze 3D polarized light images of the human brain's hippocampus using unfolding methods and self-supervised contrastive learning, which can identify classical subfield boundaries.
http://arxiv.org/abs/2402.17739v1
Compressor summary: reBandit is an online RL algorithm that uses random effects and Bayesian priors to deliver personalized mobile health interventions for reducing cannabis use among emerging adults.
http://arxiv.org/abs/2402.17733v1
Compressor summary: The authors propose a method to adapt large language models for multiple tasks in translation workflows by combining pretraining on multilingual data with finetuning on task-specific instructions, achieving competitive results compared to general-purpose models and releasing new datasets and evaluation tools.
http://arxiv.org/abs/2402.17730v1
Compressor summary: The text introduces a new method for learning mixtures of continuous-time Markov chains (CTMCs) from sequential data, explores its impact on learnability, and applies it to analyze user preferences on social media and NBA team tactics.
http://arxiv.org/abs/2402.17729v1
Compressor summary: The paper proposes a new adversarial training method, FAAL, that ensures both robustness and fairness of models across different categories by finding the worst distribution among them.
http://arxiv.org/abs/2402.17726v1
Compressor summary: The paper introduces a novel Visual Reference Prompt encoder that helps the Segment Anything Model use annotated reference images as prompts for object segmentation, achieving state-of-the-art performance with minimal parameters and generalizing well to unseen objects and domains.
http://arxiv.org/abs/2402.17723v1
Compressor summary: Key points: - Paper proposes a framework for joint video-audio generation using existing models - Uses multimodality latent aligner with ImageBind model to bridge different modalities - Shows superior performance on various tasks involving vision and audio Summary: The paper presents a method to generate videos and audios together using pre-trained models, by aligning their latent representations with ImageBind, and demonstrates its effectiveness on different tasks.
http://arxiv.org/abs/2402.17720v1
Compressor summary: SMART is an online learning algorithm that adapts to data and achieves near-optimal regret by switching between follow-the-leader and worst-case policies.
http://arxiv.org/abs/2402.17718v1
Compressor summary: The text proposes a digital twin framework that uses machine learning and optimization techniques to predict and control heat accumulation in laser-based additive manufacturing, improving material properties and part quality.
http://arxiv.org/abs/2402.17717v1
Compressor summary: AmbigNLG is a new task that tackles task ambiguity in instructions for natural language generation tasks by creating a dataset and taxonomy to improve instruction clarity and LLM performance.
http://arxiv.org/abs/2402.17710v1
Compressor summary: The paper proposes ProxConnect++, a principled method for neural network binarization with automatic theoretical guarantees and enhanced performance in image classification tasks.
http://arxiv.org/abs/2402.17709v1
Compressor summary: The text discusses the difference between rule-based and case-based reasoning in language models, and proposes a method (RFFT) to improve math problem-solving by teaching transformers to follow explicit rules.
http://arxiv.org/abs/2402.17706v1
Compressor summary: The paper introduces LCPAQ, a fast and effective method to quantize neural networks for low-resource hardware by using mixed-precision, adaptive quantization, and low-cost proxy search.
http://arxiv.org/abs/2402.17700v1
Compressor summary: RAVEL is a dataset for comparing interpretability methods in neural networks and introduces MDAS, a new method that finds distributed representations satisfying multiple causal criteria.
http://arxiv.org/abs/2402.17699v1
Compressor summary: The paper proposes a cyclical scheduling method to efficiently and accurately sample multimodal discrete distributions in deep models, overcoming the limitations of gradient-based discrete sampling.
http://arxiv.org/abs/2402.17690v1
Compressor summary: The paper explores how AI and learning algorithms have evolved and are used in autonomous vehicles, influencing their decision-making, development life cycle, ethical considerations, and performance improvement.
http://arxiv.org/abs/2402.17689v1
Compressor summary: The paper evaluates machine learning tree-ensemble methods for predicting wireless communication quality in the automotive industry, using data from a cellular test network and radio environment characteristics to improve accuracy and support longer prediction horizons.
http://arxiv.org/abs/2402.17682v1
Compressor summary: NextLevelBERT uses higher-level semantic text embeddings to improve large language models' performance on long documents tasks without sacrificing much detail.
http://arxiv.org/abs/2402.17680v1
Compressor summary: Key points: - The paper proposes a method to prevent catastrophic forgetting in video captioning tasks with sequential input - The method uses fine-grained sensitivity selection and two-stage knowledge distillation to retain old task information - The paper introduces a metric to measure the forgetting rate and evaluates the method on MSR-VTT dataset Summary: The paper presents a novel approach to mitigate catastrophic forgetting in video captioning by using fine-grained sensitivity selection and two-stage knowledge distillation, and proposes a metric to assess the forgetting rate.
http://arxiv.org/abs/2402.17678v1
Compressor summary: CAD-SIGNet is a model that can recover the design history and provide multiple plausible design choices for CAD models given 3D scans of physical objects.
http://arxiv.org/abs/2402.17672v1
Compressor summary: The paper introduces a new deep learning method for PolSAR image classification that fuses three branches and outperforms existing approaches on several datasets.
http://arxiv.org/abs/2402.17671v1
Compressor summary: This text discusses recent progress in improving the reliability and trustworthiness of foundation models within in-context learning frameworks, addressing issues such as toxicity, hallucination, disparity, adversarial vulnerability, and inconsistency.
http://arxiv.org/abs/2402.17666v1
Compressor summary: The paper presents a multi-agent deep learning method for satellite routing in low Earth orbit constellations, using a global neural network to learn optimal paths and local ones for fast on-board routing.
http://arxiv.org/abs/2402.17664v1
Compressor summary: The paper proposes a new method for digitalizing cloth using data from strict measuring protocols, a new dataset, and a Bayesian differentiable cloth model.
http://arxiv.org/abs/2402.17660v1
Compressor summary: The paper presents advancements in TorchMD-Net, a neural network-based molecular simulation software with improved efficiency, modular design, and physical priors integration.
http://arxiv.org/abs/2402.17655v1
Compressor summary: The paper proposes a confidence-aware multi-field calibration method for ad ranking that adjusts the calibration intensity based on sample statistics and uses multiple feature fields to mitigate data sparsity.
http://arxiv.org/abs/2402.17653v1
Compressor summary: The text describes a segmentation network that can detect errors caused by different test domains without extra annotation, using uncurated data and a novel benchmark based on the SAX Dataset.
http://arxiv.org/abs/2402.17649v1
Compressor summary: Large language models show left-leaning political leanings that vary by policy domain and are more reliable with increasing model size.
http://arxiv.org/abs/2402.17644v1
Compressor summary: The QRData benchmark evaluates Large Language Models' ability to reason with quantitative data from real-world sources, finding that current models struggle with causal reasoning and using both data and code simultaneously.
http://arxiv.org/abs/2402.17641v1
Compressor summary: IVON is a better optimizer than Adam for large neural networks, as it has lower computational costs and higher predictive uncertainty, and can improve various tasks such as fine-tuning, model merging, generalization error prediction, and sensitivity estimation.
http://arxiv.org/abs/2402.17633v1
Compressor summary: The paper introduces YTSeg, a benchmark for spoken content segmentation, and MiniSeg, a model that performs hierarchical segmentation and smart chaptering.
http://arxiv.org/abs/2402.17630v1
Compressor summary: InFusE is a novel approach that uses variable premise size and simplifies summary sentences for diverse summarization tasks, outperforming existing methods based on off-the-shelf NLI models.
http://arxiv.org/abs/2402.17624v1
Compressor summary: The paper proposes CustomSketching, a framework that extracts novel sketch concepts from sketch-image pairs to enable fine-grained image synthesis and editing for large text-to-image models.
http://arxiv.org/abs/2402.17622v1
Compressor summary: The paper presents a semantic segmentation network that uses foundation models and Masked Image Modeling to produce high-quality uncertainty estimates for safety-critical applications, and shows its effectiveness on the SAX Segmentation benchmark.
http://arxiv.org/abs/2402.17614v1
Compressor summary: The authors propose a new approach for cross-domain few-shot segmentation that uses task-adaption and consistency across augmented views without training or using a main segmentation network, achieving state-of-the-art performance.
http://arxiv.org/abs/2402.17613v1
Compressor summary: The paper proposes an integrated system for assessing writing and correcting errors using NLP and machine learning to help second language learners improve their proficiency efficiently and cost-effectively.
http://arxiv.org/abs/2402.17611v1
Compressor summary: This paper evaluates various pretraining methods for detecting defects in solar cells using electroluminescence images and finds that supervised, semi-supervised, and self-supervised pretraining schemes yield similar performance, while some are better for underrepresented classes.
http://arxiv.org/abs/2402.17608v1
Compressor summary: The paper examines how fine-tuning pre-trained Encoder-Decoder models like T5 on linguistic tasks improves their ability to predict sentence complexity in Italian and English.
http://arxiv.org/abs/2402.17606v1
Compressor summary: This paper introduces TBGAT, a novel GNN architecture for job shop scheduling, which embeds disjunctive graphs in a forward and backward view and uses topological sorts as features to capture the graph topology.
http://arxiv.org/abs/2402.17601v1
Compressor summary: The study presents a novel method for sleep detection using weakly supervised learning, which outperforms conventional methods and improves calibration accuracy.
http://arxiv.org/abs/2402.17595v1
Compressor summary: This paper explores implicit regularization in non-linear neural networks for matrix sensing problems, proposes Spectral Neural Networks (SNN) architecture, and shows its effectiveness with theoretical analysis and experiments.
http://arxiv.org/abs/2402.17589v1
Compressor summary: The paper proposes an end-to-end contrastive learning framework for learning with noisy labels that improves performance by using a Pseudo-Label Relaxed loss and a two-dimensional Gaussian Mixture Model.
http://arxiv.org/abs/2402.17572v1
Compressor summary: Hyperdimensional computing is an efficient and interpretable alternative to deep learning for bioinformatics, using high-dimensional vectors to represent biological concepts and simple operators to manipulate them.
http://arxiv.org/abs/2402.17574v1
Compressor summary: Agent-Pro is a large language model agent that uses policy-level reflection and optimization to learn from interactive experiences and improve its behavior in complex dynamic scenarios like games.
http://arxiv.org/abs/2402.17570v1
Compressor summary: The paper introduces a new method for Gaussian Processes using contaminated normal noise to handle heteroscedastic variance and outliers, and shows its advantages over neural networks in predicting geomagnetic ground perturbations.
http://arxiv.org/abs/2402.17564v1
Compressor summary: The paper proposes a new method called GPO that uses gradient-based optimization techniques to improve the performance of language models as prompt optimizers.
http://arxiv.org/abs/2402.17563v1
Compressor summary: SADM is a new diffusion model that learns structural relationships among samples in the data distribution using adversarial training, improving generative performance on various tasks.
http://arxiv.org/abs/2402.17562v1
Compressor summary: The text discusses the factors that influence the robustness of 3D object detectors, focusing on architecture, voxel encoding, data augmentations, and anchor strategies, and how they can be improved for different domains such as sensor type, weather, and location.
http://arxiv.org/abs/2402.17561v1
Compressor summary: The paper proposes a patch-based harmonization network for composite images, which improves local visual coherence and achieves state-of-the-art results on iHarmony4 and a new human portrait dataset.
http://arxiv.org/abs/2402.17555v1
Compressor summary: The paper proposes a new method for semantic segmentation using scribble annotations and pseudo-labels, which considers both local and global cues, corrects feature representations, and reduces uncertainty with a distance entropy loss.
http://arxiv.org/abs/2402.17554v1
Compressor summary: The authors propose a method to assess the reliability of Machine Learning predictions in critical contexts like medicine, using Autoencoders and proxy models, and demonstrate its effectiveness on a Multiple Sclerosis prediction model.
http://arxiv.org/abs/2402.17553v1
Compressor summary: The paper introduces OmniACT, a dataset for testing virtual agents' ability to generate executable programs from screen images and natural language tasks, with GPT-4 as the current best baseline but still far from human proficiency.
http://arxiv.org/abs/2402.17546v1
Compressor summary: The paper presents CoCoA, a psychological counseling agent that uses CBT techniques, memory management, and dynamic prompting to address cognitive distortions in client statements.
http://arxiv.org/abs/2402.17532v1
Compressor summary: The paper proposes a novel method to generate text by selecting context-aware phrases from supporting documents, overcoming challenges in training oracles, and showing improved performance on knowledge-intensive tasks and open-ended text generation.
http://arxiv.org/abs/2402.17527v1
Compressor summary: Key points: - Language models (LMs) are statistical models that assign probability to text - LMs may not capture human linguistic variability well - Word-level exact matching can evaluate LM's ability to predict human words - LMs have low calibration to human uncertainty and ECE fails to reflect this Summary: The authors use word-level exact matching to test how well language models capture human variability in word prediction, and find that they are poorly calibrated and ECE is unreliable.
http://arxiv.org/abs/2402.17525v1
Compressor summary: The text provides a comprehensive survey of diffusion models for image editing, covering various methods and tasks, and introduces a benchmark and metric for evaluating them.
http://arxiv.org/abs/2402.17521v1
Compressor summary: The paper introduces an advanced sampler that efficiently downsamples point clouds using adaptive voxel sizes and a network compatible with arbitrary voxel sizes, achieving high accuracy on ShapeNetPart and ScanNet benchmarks.
http://arxiv.org/abs/2402.17517v1
Compressor summary: The paper proposes TDSM, a method to train conditional diffusion models with noisy labels by incorporating transition probabilities and improving the quality of generated samples.
http://arxiv.org/abs/2402.17516v1
Compressor summary: QUCE is a method that improves interpretability of DNNs by minimizing path uncertainty and generating certain counterfactual examples.
http://arxiv.org/abs/2402.17514v1
Compressor summary: The paper proposes an adaptive resolution SEEM method with robust localization, point pseudo-labels, and a loss function to improve crowd counting performance without extensive training data.
http://arxiv.org/abs/2402.17512v1
Compressor summary: The "Latte Transformer" model improves the time complexity of the standard attention mechanism by using latent vectors, enabling efficient language generation for large context windows.
http://arxiv.org/abs/2402.17510v1
Compressor summary: Key points: - VLMs use contrastive training to learn image-caption representations - The paper introduces synthetic shortcuts to test if contrastive losses are sufficient - Contrastive losses are not enough to learn task-optimal representations - Two methods to reduce shortcut learning are latent target decoding and implicit feature modification Summary: The paper tests the sufficiency of contrastive losses for VLMs using synthetic shortcuts and shows that they fail to learn task-optimal representations. It also proposes two methods to mitigate shortcut learning.
http://arxiv.org/abs/2402.17507v1
Compressor summary: The paper proposes an efficient method to enhance information flow in multi-head self-attention by decomposing the attention operation into query- and key-less components, which reduces computational complexity while maintaining performance.
http://arxiv.org/abs/2402.17501v1
Compressor summary: The paper introduces Healthcare as Sequence Modeling, a paradigm for reinforcement learning in healthcare that uses event streams to represent interactions between patients and providers, and presents MIMIC-SEQ, a new benchmark dataset based on clinical records from MIMIC-IV.
http://arxiv.org/abs/2402.17497v1
Compressor summary: REAR is a new approach for question answering that helps large language models assess the relevance of retrieved documents to improve their performance.
http://arxiv.org/abs/2402.17493v1
Compressor summary: Large language models can predict postoperative risks using clinical texts with various training strategies, improving performance compared to traditional word embeddings and offering opportunities for personalized perioperative care.
http://arxiv.org/abs/2402.17487v1
Compressor summary: Neural network-based image compression outperforms classical methods and has led to JPEG-AI standardization; however, a gradual algorithmic optimization improves speed and performance of the current verification model.
http://arxiv.org/abs/2402.17486v1
Compressor summary: The paper presents a method to efficiently generate and enhance deep learning models without training, achieving comparable or better performance and few-shot learning capabilities, as well as possible adversarial defense.
http://arxiv.org/abs/2402.17485v1
Compressor summary: EMO is a new framework for generating talking head videos that captures audio cues and facial movements more accurately and expressively than previous methods.
http://arxiv.org/abs/2402.17483v1
Compressor summary: The paper proposes AlignMiF, a method to align LiDAR and camera data within a multimodal implicit field, improving novel view synthesis performance.
http://arxiv.org/abs/2402.17478v1
Compressor summary: The paper introduces ArPro, a large propaganda detection dataset for multiple languages, and evaluates GPT-4's performance on fine-grained propaganda detection tasks.
http://arxiv.org/abs/2402.17472v1
Compressor summary: RAGFormer is a novel framework that combines local and global features for fraud detection using GNN and Transformer networks, outperforming previous methods on heterogeneous graphs.
http://arxiv.org/abs/2402.17470v1
Compressor summary: The text discusses neural network-based image compression codecs and proposes a method to improve the JPEG-AI verification model's bit distribution by adopting VVC intra's adaptable bit distribution structure.
http://arxiv.org/abs/2402.17464v1
Compressor summary: The paper proposes a part-whole hierarchy message passing network for efficient 3D part assembly, using super-parts to provide hints about part poses and enable interpretability.
http://arxiv.org/abs/2402.17463v1
Compressor summary: Dual Chunk Attention improves LLMs' ability to handle long context sequences without finetuning, enabling better performance on practical tasks and as an open-source alternative to proprietary models.
http://arxiv.org/abs/2402.17457v1
Compressor summary: The study finds that learning rate transfer in neural networks can be explained by the consistency of sharpness across model sizes, which is related to feature learning dynamics.
http://arxiv.org/abs/2402.17453v1
Compressor summary: DS-Agent is a novel framework that combines large language models and case-based reasoning to automate data science tasks, improving task success rates and reducing costs.
http://arxiv.org/abs/2402.17447v1
Compressor summary: Key points: - The text discusses named entity recognition (NER) techniques for recipe ingredients - The authors created three datasets with different levels of annotation and analysis - They tested various NER models and found the best one to be spaCy-transformer with high macro-F1 scores Summary: The paper compares NER models for identifying recipe ingredients from unstructured text, using three datasets with different levels of quality. The best model is spaCy-transformer with high accuracy.
http://arxiv.org/abs/2402.17440v1
Compressor summary: The paper proposes a method to optimize hyperparameters based on the network architecture, which improves generalization and affects auto ML comparisons.
http://arxiv.org/abs/2402.17437v1
Compressor summary: The text introduces a new model (ESCM) for generating empathetic responses that considers both emotions and semantics as dynamic variables in dialogue.
http://arxiv.org/abs/2402.17433v1
Compressor summary: The paper proposes a new model (CET-MAE) and a framework (E2T-PTR) that improve EEG-based language decoding for brain-computer interfaces by integrating self-supervised learning and pre-trained modules, achieving state-of-the-art results.
http://arxiv.org/abs/2402.17431v1
Compressor summary: KANDY is a benchmarking framework that generates diverse learning and reasoning tasks inspired by Kandinsky patterns to test AI models' performance in continual and semi-supervised learning with symbol compositionality, challenging both neural and symbolic approaches.
http://arxiv.org/abs/2402.17430v1
Compressor summary: MapQR is a method for constructing online vectorized maps in autonomous driving that enhances query capabilities using scatter-and-gather queries and exploits prior information to improve accuracy and efficiency.
http://arxiv.org/abs/2402.17427v1
Compressor summary: VastGaussian is a new method that improves 3D Gaussian Splatting for large scene reconstruction by using progressive partitioning, airspace-aware visibility, parallel optimization, and decoupled appearance modeling, achieving fast and high-quality results.
http://arxiv.org/abs/2402.17424v1
Compressor summary: Key points: - Paper introduces a robust framework for automated disease identification in plant leaf images - Framework uses Vision Transformers for feature extraction and tests different linear projections - Finds top model with 0.054 Hamming loss and proposes a low-cost hardware design for scanning leaves Summary: The paper presents a framework that uses Vision Transformers to automatically identify plant diseases from leaf images, evaluates different linear projections, and suggests a Raspberry Pi-based hardware solution.
http://arxiv.org/abs/2402.17423v1
Compressor summary: RIBBO learns a black-box optimization algorithm from data using expressive sequence models and regret-to-go tokens to generate query points that meet user-desired regret.
http://arxiv.org/abs/2402.17420v1
Compressor summary: PANDAS is a simple method to detect new classes and adapt an object detector to spot them using prototypes for old and new classes.
http://arxiv.org/abs/2402.17417v1
Compressor summary: CARZero is a novel approach for radiology zero-shot classification that uses cross-attention mechanisms and large language models to align image and text features and improve performance on chest radiograph diagnostic sets.
http://arxiv.org/abs/2402.17414v1
Compressor summary: The paper proposes a conditional coding-based neural video codec that solves two critical problems and achieves superior performance over previous methods.
http://arxiv.org/abs/2402.17412v1
Compressor summary: DiffuseKronA is a novel Kronecker product-based adaptation module for text-to-image generation that reduces parameters by up to 99.95% and improves image quality while being more interpretable and less sensitive to hyperparameters.
http://arxiv.org/abs/2402.17411v1
Compressor summary: The text describes a dataset and baselines for LLM consistency research, using LightGBM as the base model to evaluate and improve NLP models.
http://arxiv.org/abs/2402.17410v1
Compressor summary: The text proposes a method to analyze noise propagation in multi-layer CNNs for MRI reconstructions using an image space formalism, which allows accurate noise resilience characterization.
http://arxiv.org/abs/2402.17406v1
Compressor summary: LSPT is a novel approach to visual representation learning that leverages long-term gated prompts and patch tokens for improved performance on downstream tasks.
http://arxiv.org/abs/2402.17403v1
Compressor summary: The paper introduces a new benchmark for evaluating video generation models like Sora by assessing their ability to satisfy geometric constraints in 3D reconstruction, indicating their adherence to real-world physics principles.
http://arxiv.org/abs/2402.17400v1
Compressor summary: The paper explores continual domain-adaptive pretraining in large language models, introduces a new benchmark to measure adaptability, and reveals insights about model size, domain progression, and knowledge transfer.
http://arxiv.org/abs/2402.17396v1
Compressor summary: Key points: - Large Language Models (LLMs) are powerful but lack systematic generalization - The paper benchmarks GPT-4 on three algorithmic tasks with controllable difficulty - Advanced prompting techniques improve GPT-4's performance on all tasks Summary: The paper evaluates GPT-4, a state-of-the-art LLM, on challenging algorithmic tasks and shows that advanced prompting boosts its performance.
http://arxiv.org/abs/2402.17392v1
Compressor summary: The paper compares human-written and bot-generated texts' semantic structures using n-grams and finds differences across four languages.
http://arxiv.org/abs/2402.17390v1
Compressor summary: The text discusses how machine learning models may experience negative flips when updated, affecting both accuracy and robustness to adversarial examples, and proposes a technique called robustness-congruent adversarial training to address this issue.
http://arxiv.org/abs/2402.17389v1
Compressor summary: FairBelief is a method to analyze and assess the biases in language models that may negatively affect minorities and underrepresented groups.
http://arxiv.org/abs/2402.17385v1
Compressor summary: Key points: - The study explores factors that influence decision-making with LLM support - It presents a dependency framework that systematizes possible interactions between these factors - It reveals significant aspects such as trust, mental model, and information processing Summary: The study analyzes how various factors affect decision-making with LLM support and proposes a framework to understand their interdependencies. It identifies trust, mental model, and information processing as important aspects to improve decision quality in human-AI collaboration.
http://arxiv.org/abs/2402.17377v1
Compressor summary: KoDialogBench is a benchmark for evaluating language models' conversational skills in Korean, revealing room for improvement and highlighting effective training techniques.
http://arxiv.org/abs/2402.17376v1
Compressor summary: The proposed framework optimizes time steps for numerical ODE solvers in diffusion probabilistic models (DPMs) to improve image synthesis efficiency and quality by minimizing the distance between the ground-truth solution and the approximate solution.
http://arxiv.org/abs/2402.17372v1
Compressor summary: This paper proposes a new technique for matching point clouds using graph Laplacian eigenmaps and a new operator called Coupled Laplacian, which improves accuracy on object anomaly localization and bone side estimation tasks.
http://arxiv.org/abs/2402.17371v1
Compressor summary: The paper introduces a dataset of ancient Hebrew poetry with labeled metaphors for studying figurative language in the Humanities.
http://arxiv.org/abs/2402.17370v1
Compressor summary: The paper presents a fast and accurate method for segmenting ore images using a lightweight MLP framework with a feature pyramid network and a novel loss function.
http://arxiv.org/abs/2402.17364v1
Compressor summary: The paper introduces DynTet, a hybrid representation that combines neural networks and tetrahedra grids to generate realistic and animatable head avatars with improved fidelity, lip synchronization, and real-time performance.
http://arxiv.org/abs/2402.17360v1
Compressor summary: The paper introduces CAPT, a transformer-based method for accurately and robustly estimating joint parameters and states of articulated objects from a single point cloud, with improved performance by using a motion loss approach and a double voting strategy.
http://arxiv.org/abs/2402.17358v1
Compressor summary: The paper proposes a new alignment method for LLMs using priority rules and presents PriorityDistill, an approach to distill these rules from LLM simulations for robust rule integration.
http://arxiv.org/abs/2402.17355v1
Compressor summary: RECOST is a framework that uses external knowledge to improve data-efficient instruction tuning by evaluating and selecting high-quality samples synthesized by LLMs.
http://arxiv.org/abs/2402.17351v1
Compressor summary: ICP-Flow is a learning-free method that uses histogram-based initialization and Iterative Closest Point algorithm to estimate rigid transformation between LiDAR scans for autonomous driving, achieving high performance and real-time inference.
http://arxiv.org/abs/2402.17345v1
Compressor summary: Key points: - Graph representation learning (GRL) encodes graphs into embeddings - Self-supervised learning (SSL) techniques reduce manual labeling costs - Contrastive learning (CL) overemphasizes global patterns in graphs - Local-aware Graph Contrastive Learning (methname) captures local graph information with masking-based modeling - Methname outperforms existing methods and shows promise as a graph representation learner Summary: Methname is a self-supervised learning framework that improves graph representation learning by balancing global and local patterns in graphs.
http://arxiv.org/abs/2402.17343v1
Compressor summary: The paper proposes a human-AI collaboration method to improve Bayesian optimization by incorporating expert preferences for unmeasured properties in the surrogate modeling, handling biases, and discussing convergence behavior.
http://arxiv.org/abs/2402.17339v1
Compressor summary: The paper proposes SocialCVAE, a model that predicts pedestrian trajectories using behavioral uncertainty and socially explainable interaction energy map to improve accuracy.
http://arxiv.org/abs/2402.17333v1
Compressor summary: This paper presents a framework that generates synthetic multiple-choice question answering data without manual annotation, using named entities and knowledge graphs.
http://arxiv.org/abs/2402.17327v1
Compressor summary: Key points: - The paper proposes a data selection method based on $k$-means clustering and sensitivity sampling - The method can select a small subset of data that approximates the average loss of the whole dataset - The method works well for fine-tuning foundation models and linear regression Summary: The paper introduces a new data selection approach that uses $k$-means and sensitivity sampling to find a representative subset of data with minimal loss, and shows its effectiveness on foundation models and linear regression.
http://arxiv.org/abs/2402.17323v1
Compressor summary: The paper introduces SDDGR, a novel generative replay method for class incremental object detection that uses diffusion models and knowledge distillation to mitigate catastrophic forgetting and achieve state-of-the-art results.
http://arxiv.org/abs/2402.17319v1
Compressor summary: The report introduces UniNet, a vanilla framework for multi-task visual perception that combines DETR3D, Mask2Former, and BinsFormer with InternImage-L backbone, achieving a 49.6 overall score in the VCL Challenge.
http://arxiv.org/abs/2402.17316v1
Compressor summary: CEMA is a paradigm that allows online adaptation of edge models using forward propagation, limited data transmission, and distilled normalization layers.
http://arxiv.org/abs/2402.17311v1
Compressor summary: The paper introduces SKT5SciSumm, a hybrid framework for summarizing long scientific texts using Sentence-Transformer and T5 models, achieving state-of-the-art performance on Multi-XScience dataset.
http://arxiv.org/abs/2402.17310v1
Compressor summary: The paper proposes a new method for analyzing cell images in drug screening that tracks cells and quantifies cytoplasm-to-nuclei signal ratio using automatic thresholding and labeling algorithms.
http://arxiv.org/abs/2402.17304v1
Compressor summary: This study investigates if multimodal large language models can capture both global and local image information, finding that they excel at local object detection but struggle with global semantic understanding.
http://arxiv.org/abs/2402.17302v1
Compressor summary: The study explores using large language models to create commonsense question answering datasets for Indonesian and Sundanese, finding GPT-4 Turbo generates good questions in Indonesian but not Sundanese, and LLMs perform better on their own datasets.
http://arxiv.org/abs/2402.17298v1
Compressor summary: The study proposes a novel method called ArcSin to improve zero-shot cross-modal transfer for visual tasks by adaptively injecting noise into textual elements, expanding the domain generalization potential and preserving content integrity.
http://arxiv.org/abs/2402.17296v1
Compressor summary: Key points: - Video exposure correction is challenging and less explored than image exposure correction - Authors construct a real-world paired video dataset with underexposure and overexposure scenes - They propose a Video Exposure Correction Network (VECNet) based on Retinex theory that enhances both factors - Their method outperforms existing methods Summary: The authors present a new dataset and a network for video exposure correction that improves the quality of videos with improper exposure using Retinex theory.
http://arxiv.org/abs/2402.17292v1
Compressor summary: DivAvatar generates diverse 3D avatars from text using a finetuned 3D generative model and novel techniques for appearance and geometry quality.
http://arxiv.org/abs/2402.17287v1
Compressor summary: The paper proposes a spectral method, Kernel-based Entropic Novelty (KEN) score, to measure the novelty of multi-modal generative models compared to a reference dataset.
http://arxiv.org/abs/2402.17285v1
Compressor summary: The DMGASR method combines a Group-Autoencoder with a diffusion model to improve hyperspectral image super-resolution, overcoming challenges such as convergence issues and long inference time.
http://arxiv.org/abs/2402.17275v1
Compressor summary: OSASIS is a novel one-shot image stylization method that preserves structure and semantics, and outperforms existing methods in various experimental settings.
http://arxiv.org/abs/2402.17270v1
Compressor summary: This survey summarizes AI advancements in understanding and enhancing cooperation in social dilemmas, covering multi-agent cooperation, human-agent cooperation, and using AI to improve human cooperation.
http://arxiv.org/abs/2402.17264v1
Compressor summary: The paper introduces EINet, a novel fusion-based network that explicitly interacts with LiDAR and camera modalities for place recognition in GPS-denied scenarios, and proposes a new benchmark based on the nuScenes dataset.
http://arxiv.org/abs/2402.17263v1
Compressor summary: MELoRA is a method that uses mini-ensembles of low-rank adapters to fine-tune large language models with fewer parameters and better performance than LoRA on various NLP tasks.
http://arxiv.org/abs/2402.17262v1
Compressor summary: This paper shows that large language models can be tricked into generating harmful information through multi-turn dialogue, revealing safety issues in these models.
http://arxiv.org/abs/2402.17257v1
Compressor summary: RIME is a robust preference-based reinforcement learning algorithm that uses human preferences as rewards, improves learning from noisy data, and incorporates a warm start to bridge the performance gap during transition.
http://arxiv.org/abs/2402.17256v1
Compressor summary: The paper evaluates large language models' performance on out-of-domain intent detection in task-oriented dialogue systems, finding they perform well with little data but still lag behind fine-tuned models.
http://arxiv.org/abs/2402.17251v1
Compressor summary: CDS-CZSL is a novel framework that improves attribute recognition by considering object diversity and context, achieving state-of-the-art results in both closed and open world scenarios.