This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-20 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2402.12377v1
Compressor summary: The paper proposes a method to improve surface reconstruction from volumetric density fields by using a discrete opacity grid, multiple ray casting, binary entropy minimization, and mesh fusion.
http://arxiv.org/abs/2402.12376v1
Compressor summary: The Flexible Vision Transformer (FiT) is a new image generation model that can handle unrestricted resolutions and aspect ratios, overcoming the limitations of existing diffusion models.
http://arxiv.org/abs/2402.12374v1
Compressor summary: Sequoia is a new algorithm for faster inference with large language models that adapts to different settings and hardware, outperforming previous methods in speed and robustness.
http://arxiv.org/abs/2402.12372v1
Compressor summary: The study evaluates biomedical text mining tools' performance on different corpora, showing that they perform worse than reported and suggesting further research for improved robustness.
http://arxiv.org/abs/2402.12370v1
Compressor summary: ANALOBENCH is a benchmark for testing analogical reasoning ability in language models, which may struggle with lengthy scenarios and recalling relevant experiences.
http://arxiv.org/abs/2402.12368v1
Compressor summary: The paper proposes a new approach to generate diverse and creative synthetic NLI data for improving the generalization of NLI models to out-of-distribution domains.
http://arxiv.org/abs/2402.12366v1
Compressor summary: RLAIF's improvement from reinforcement learning is mostly due to using a weaker teacher model, while supervised fine-tuning with GPT-4 as the teacher performs better and its gains vary across factors.
http://arxiv.org/abs/2402.12365v1
Compressor summary: Universal Physics Transformers (UPTs) are a novel learning paradigm that can model various spatio-temporal problems in computational fluid dynamics without relying on grid- or particle-based latent structures.
http://arxiv.org/abs/2402.12363v1
Compressor summary: The study shows that cognitive biases and predictability can explain word-order universals in languages using computational simulations with cognitively-motivated language models.
http://arxiv.org/abs/2402.12354v1
Compressor summary: The paper proposes LoRA+, a corrected version of LoRA that uses different learning rates for adapter matrices A and B, improving performance and finetuning speed without increasing computational cost.
http://arxiv.org/abs/2402.12352v1
Compressor summary: The paper proposes a new method to improve large language models' ability to find relevant information in biomedical research by using a knowledge graph to reduce overrepresented concepts and achieve better results than existing methods.
http://arxiv.org/abs/2402.12348v1
Compressor summary: This paper evaluates large language models' strategic reasoning in various game scenarios using GTBench, a new environment with 10 tasks, and finds differences between open-source and commercial LLMs as well as the impact of code-pretraining and advanced reasoning methods.
http://arxiv.org/abs/2402.12343v1
Compressor summary: The text introduces Emulated Disalignment (ED), an attack framework that shows how safety alignment can accidentally enable harmful outcomes in large language models when manipulated adversely, and highlights the need to reevaluate open-sourcing such models.
http://arxiv.org/abs/2402.12332v1
Compressor summary: The study proposes triple-encoders for dialog modeling that use co-occurrence learning to efficiently compute utterance mixtures and improve over bi-encoders and single-vector models.
http://arxiv.org/abs/2402.12331v1
Compressor summary: Key points: - A new autoencoder model generates survival trajectories, predictions, and additional data for event-based features. - The model uses a weighting scheme for robustness and solves a classification task to determine censored indicators. - The model is efficient and has been tested on synthetic and real datasets. Summary: The paper presents an autoencoder model that generates survival trajectories, predictions, and data for event-based features using a weighting scheme, a classification task, and a specific structure.
http://arxiv.org/abs/2402.12329v1
Compressor summary: The text describes a new attack method on language models that exploits API access to remotely create harmful outputs, outperforming previous methods on GPT-3.5 and OpenAI's safety classifier.
http://arxiv.org/abs/2402.12327v1
Compressor summary: Large language model agents can spontaneously form collaborations and mimic human social interactions, which could be useful for computational social science.
http://arxiv.org/abs/2402.12326v1
Compressor summary: PsychoGAT is a method that uses large language models to create engaging interactive fiction games for psychological assessment, achieving high quality results in various metrics.
http://arxiv.org/abs/2402.12320v1
Compressor summary: The paper proposes a new strategy for tracking troops in GPS-denied environments using landmark recognition and stereo matching, and aims to improve troop safety.
http://arxiv.org/abs/2402.12319v1
Compressor summary: FairSAOML is a novel meta-learning algorithm that adapts to dynamic environments while ensuring fairness and accuracy in acquiring new tasks over time.
http://arxiv.org/abs/2402.12317v1
Compressor summary: The paper introduces ARKS, a strategy to improve code generation by integrating various sources of knowledge and using active retrieval to refine queries and update the knowledge base.
http://arxiv.org/abs/2402.12309v1
Compressor summary: TILP is a framework for learning temporal logical rules in temporal knowledge graphs, which improves performance and interpretability on various challenging scenarios.
http://arxiv.org/abs/2402.12307v1
Compressor summary: The text discusses the importance of assessing individual predictions' confidence in machine learning models, especially for critical applications, and presents multi-view conformal models for heterogeneous sensor fusion that provide trustworthiness guarantees.
http://arxiv.org/abs/2402.12303v1
Compressor summary: UncertaintyTrack is a method that uses localization uncertainty estimates from probabilistic object detection to improve multi-object tracking for autonomous driving.
http://arxiv.org/abs/2402.12298v1
Compressor summary: The paper compares GPT-4, a commercial large language model, to various open-source models in radiology report labeling tasks using different prompting techniques and finds that GPT-4 outperforms others in zero-shot but open-source models can match GPT-4 with few-shot prompts.
http://arxiv.org/abs/2402.12291v1
Compressor summary: Key points: - Flashcard schedulers use student models and teaching policies to predict and schedule flashcards - Existing student models ignore semantic ties of flashcards - KARL is a DKT-inspired model that uses retrieval and BERT embeddings for efficient recall predictions - KARL outperforms existing models on a new dataset of diverse study history - A novel teaching policy exploits the predictive power of KARL for online deployment - KARL improves medium-term learning Summary: KARL is an improved flashcard scheduler that captures semantic relations and uses retrieval and BERT embeddings to predict student recall, outperforming existing models and enhancing medium-term learning.
http://arxiv.org/abs/2402.12289v1
Compressor summary: DriveVLM is an autonomous driving system that uses Vision-Language Models to understand and plan for complex urban scenarios, while DriveVLM-Dual combines its strengths with traditional methods for improved performance.
http://arxiv.org/abs/2402.12284v1
Compressor summary: ReMiDi is an algorithm that trains reinforcement learning agents on levels generated by an adversary who balances exploration and exploitation, preventing learning stagnation.
http://arxiv.org/abs/2402.12282v1
Compressor summary: The authors propose a new claim detection method that combines sentence and ontology embeddings, which outperforms other models on two datasets.
http://arxiv.org/abs/2402.12280v1
Compressor summary: SGD improves LLM inference quality and performance by using dependencies between sub-problems and difficulty estimates to select an appropriate model size for parallel decoding.
http://arxiv.org/abs/2402.12279v1
Compressor summary: The paper compares different approaches to improve zero-shot cross-lingual text generation, finding that careful learning rate tuning and alternative backbone models can achieve similar performance to data translation.
http://arxiv.org/abs/2402.12275v1
Compressor summary: Our agent builds a Python program from its environment interactions, using a world model that balances explanation and optimism, improving efficiency on gridworlds.
http://arxiv.org/abs/2402.12269v1
Compressor summary: The paper proposes a new deep learning method for Supervised Graph Prediction using an original Optimal Transport-based loss and a transformer architecture, which performs well on various tasks and datasets.
http://arxiv.org/abs/2402.12267v1
Compressor summary: The study shows that large language models can significantly improve data-to-text generation for under-resourced languages, but BLEU scores may not be a reliable metric for evaluating them.
http://arxiv.org/abs/2402.12264v1
Compressor summary: The text discusses improving language models' performance on specific tasks using low-rank adaptation ensembles and analyzing their uncertainty quantification on multiple-choice datasets.
http://arxiv.org/abs/2402.12263v1
Compressor summary: A modular integer quantization scheme for GRUs uses Genetic Algorithms to explore bit widths and optimize model size and accuracy, achieving Pareto efficiency on sequential tasks.
http://arxiv.org/abs/2402.12261v1
Compressor summary: The text discusses how new word forms (neologisms) cause data drift in large language models, leading to reduced performance, and proposes a benchmark to evaluate their generalization ability.
http://arxiv.org/abs/2402.12259v1
Compressor summary: Open3DSG is a method to predict 3D scene graphs from point clouds using open world vision language models without labeled data, enabling expression of rare and specific objects and relationships.
http://arxiv.org/abs/2402.12255v1
Compressor summary: The study compares AI-assisted scholarly writing with original and purely AI-generated texts, finding that GPT-4 can help with brainstorming but needs human input for detailed synthesis.
http://arxiv.org/abs/2402.12249v1
Compressor summary: The text is about a machine translation model called Levenshtein transformer (LevT), which has high efficiency and quality, but may have weaknesses in its decoding results that could be improved by KD and translation memory.
http://arxiv.org/abs/2402.12243v1
Compressor summary: Text-to-SQL models face challenges due to noise in questions, gold queries, and database values, which impacts the reliability of benchmarks and affects model performance.
http://arxiv.org/abs/2402.12242v1
Compressor summary: The text proposes using diffusion probabilistic models (DPMs) to generate realistic individual location trajectories (ILTs) for mobility research and decision-making.
http://arxiv.org/abs/2402.12241v1
Compressor summary: The paper analyzes how gradient descent in recurrent neural networks can achieve optimality without overparameterization for dynamical systems with long-term dependencies.
http://arxiv.org/abs/2402.12240v1
Compressor summary: Key points: - NeSy predictors use symbolic knowledge and can learn unintended semantics - RSs affect reliability, generalization, and confidence of NeSy models - bears is an ensembling technique that calibrates concept-level confidence and encourages uncertainty about RSs Summary: The paper proposes bears, a method to improve NeSy predictors' awareness of reasoning shortcuts that compromise their quality and confidence, by calibrating their concept-level confidence.
http://arxiv.org/abs/2402.12238v1
Compressor summary: The paper proposes a flow-based model that transforms a mixed Gaussian prior into the future trajectory manifold, enabling diverse, controllable, and out-of-distribution trajectory generation with explicit interpretability.
http://arxiv.org/abs/2402.12237v1
Compressor summary: Key points: - The paper proposes a model to capture the human-AI collaboration in content moderation - The model accounts for prediction uncertainty, time-varying factors, and selective sampling - The algorithm makes decisions on classification, admission, and scheduling of posts - Human reviews help improve machine learning but are delayed by congestion - The paper develops a near-optimal learning algorithm that balances multiple losses Summary: The paper presents a model for human-AI collaboration in content moderation with uncertainty, selective sampling, and congestion, and designs a near-optimal learning algorithm that considers different losses.
http://arxiv.org/abs/2402.12235v1
Compressor summary: The least-privilege principle for machine learning, which aims to find useful representations without revealing sensitive information, has a fundamental trade-off between utility and leakage that cannot be overcome by any learning technique.
http://arxiv.org/abs/2402.12234v1
Compressor summary: The system combines large language models with business logic to create efficient chatbots that can handle complex dialogues.
http://arxiv.org/abs/2402.12233v1
Compressor summary: The paper explores the impact of updating either keys or values in feed-forward networks (FFNs) on language model performance and understanding.
http://arxiv.org/abs/2402.12231v1
Compressor summary: The text proposes diffusion tempering, a technique to improve gradient-based parameter optimization in ordinary differential equations by adjusting noise in probabilistic integrators.
http://arxiv.org/abs/2402.12226v1
Compressor summary: AnyGPT is a multimodal language model that can process various modalities like speech, text, images, and music by using discrete representations without changing the LLM architecture or training paradigms.
http://arxiv.org/abs/2402.12225v1
Compressor summary: The paper proposes Argus3D, a novel framework for 3D shape generation using discrete representation learning and an ensemble of public datasets, improving capacity and scalability over traditional auto-regressive models.
http://arxiv.org/abs/2402.12219v1
Compressor summary: The paper proposes ReAlign, a method to improve the alignment of large language models with human values by reformatting instruction data responses.
http://arxiv.org/abs/2402.12212v1
Compressor summary: The study shows that AI agents based on ChatGPT can become polarized in echo chamber environments due to their ability to update opinions considering their own and others' views, and suggests monitoring factors like persona to prevent this.
http://arxiv.org/abs/2402.12204v1
Compressor summary: SDRRL is a method to improve multilingual LLMs by self-distilling from resource-rich languages without relying solely on translation.
http://arxiv.org/abs/2402.12201v1
Compressor summary: The paper proposes a circuit discovery framework that uses sparse dictionary learning to extract monosemantic features from neural models and identify interpretable, local model behaviors.
http://arxiv.org/abs/2402.12198v1
Compressor summary: The study investigates the effectiveness of visual language models in detecting harmful memes without labeled datasets and finds they struggle with zero-shot classification.
http://arxiv.org/abs/2402.12195v1
Compressor summary: The paper proposes a browse-and-concentrate paradigm to improve multimodal context fusion in Multimodal Large Language Models for better understanding of multiple images and their instructions.
http://arxiv.org/abs/2402.12193v1
Compressor summary: This paper introduces a dataset for evaluating the safety of Chinese LLMs and proposes fine-grained assessment criteria to identify and measure various types of risk in their responses.
http://arxiv.org/abs/2402.12192v1
Compressor summary: Pan-Mamba is a novel pansharpening network that uses Mamba model to efficiently exchange and fuse multi-spectral and panchromatic images for high-resolution output.
http://arxiv.org/abs/2402.12189v1
Compressor summary: The paper presents a novel attack on neural language models that amplifies the exposure of their training data by fine-tuning them with generated texts based on membership probabilities.
http://arxiv.org/abs/2402.12185v1
Compressor summary: This paper introduces ChartX, a multi-modal benchmark for evaluating large language models' ability to understand and reason with charts, and ChartVLM, a new model that outperforms existing models on chart tasks.
http://arxiv.org/abs/2402.12184v1
Compressor summary: The paper presents a method to reproduce color from monochromatic images using Lab color space representation and an image colorization module, enabling colorful Neural Radiance Fields (NeRF) even with monochrome input.
http://arxiv.org/abs/2402.12183v1
Compressor summary: MultiFIX is an interpretability-focused multimodal data fusion pipeline that uses deep learning and symbolic expressions to combine features from different data modalities and make predictions in health domains.
http://arxiv.org/abs/2402.12181v1
Compressor summary: The text discusses data augmentation techniques for image-based deep reinforcement learning, analyzes their effects, compares them, and proposes a novel regularization term called tangent prop that improves performance and sample efficiency.
http://arxiv.org/abs/2402.12179v1
Compressor summary: The Exam Monitoring System detects abnormal student behavior during online exams to help proctors prevent cheating.
http://arxiv.org/abs/2402.12177v1
Compressor summary: Mafin is a new method that improves fine-tuning black-box embedding models for Retrieval Augmented Generation by adding a trainable model to the process.
http://arxiv.org/abs/2402.12175v1
Compressor summary: DBN-GOMEA is an improved evolutionary algorithm for learning Bayesian networks from real-valued data by jointly optimizing variable discretization, resulting in compact models that can be easily inspected by experts and incorporate their knowledge.
http://arxiv.org/abs/2402.12174v1
Compressor summary: BIDER is a method that improves large language models' answer quality and efficiency by refining retrieval documents into Key Supporting Evidence using knowledge synthesis, supervised fine-tuning, and preference alignment.
http://arxiv.org/abs/2402.12161v1
Compressor summary: GraphPAR is a novel framework that improves the fairness of pre-trained graph models using parameter-efficient adapters with provable lower bounds on prediction bias.
http://arxiv.org/abs/2402.12151v1
Compressor summary: The study analyzes how a Transformer-based language model learns to follow instructions by clustering data within its hidden space, which helps it handle new instances.
http://arxiv.org/abs/2402.12150v1
Compressor summary: FairThinking is a pipeline that generates roles for large language models to express diverse viewpoints and ensure fairness in their outputs.
http://arxiv.org/abs/2402.12149v1
Compressor summary: The article proposes two models to define and quantify momentum in tennis matches using data-driven and empirical approaches, and analyzes its importance and fluctuation patterns.
http://arxiv.org/abs/2402.12147v1
Compressor summary: Factiverse AI models excel at end-to-end fact-checking across many languages and beat top LLMs in performance.
http://arxiv.org/abs/2402.12146v1
Compressor summary: The paper proposes MetaRanking, a method for less capable LLMs to judge response reliability by comparing query-response pairs with references, improving their performance on reasoning tasks and applications like query routing and data filtering.
http://arxiv.org/abs/2402.12138v1
Compressor summary: BiXT is a novel efficient Transformer-based approach that processes semantics and location simultaneously, enabling its application to various tasks without loss of performance or modality limitations.
http://arxiv.org/abs/2402.12132v1
Compressor summary: This paper proposes SSTKG, a framework to construct and explore spatio-temporal knowledge graphs using a novel 3-step embedding method, improving prediction and recommendation accuracy.
http://arxiv.org/abs/2402.12128v1
Compressor summary: The paper proposes a weakly-supervised method for 3D vessel segmentation using maximum intensity projection and 2D labels, which improves performance and reduces annotation efforts.
http://arxiv.org/abs/2402.12121v1
Compressor summary: The paper examines how large-scale vision language models (LVLMs) can generate diverse review texts for images and proposes a rank correlation analysis method to evaluate their review abilities.
http://arxiv.org/abs/2402.12118v1
Compressor summary: DualView is a fast and accurate method to explain how individual training data points influence a neural network's predictions on test data, and can be combined with feature attribution methods.
http://arxiv.org/abs/2402.12102v1
Compressor summary: The authors improve a softmax function that helps pretrain models for quantization by making it invariant to sequence length and suitable for causal language models.
http://arxiv.org/abs/2402.12100v1
Compressor summary: Groot is a new framework that uses semantic transformation to create more effective adversarial prompts for testing the safety of text-to-image models, achieving high success rates on popular models like DALL-E 3 and Midjourney.
http://arxiv.org/abs/2402.12099v1
Compressor summary: QueryWarp is a new method for translating human motion videos that uses appearance flows to warp query tokens in a diffusion model, achieving better temporal coherence than existing methods.
http://arxiv.org/abs/2402.12098v1
Compressor summary: The paper presents pGS-CAM, a method to generate saliency maps for semantic segmentation of LiDAR point clouds, which helps understand how SS models make predictions and improve them.
http://arxiv.org/abs/2402.12095v1
Compressor summary: Key points: - Deep learning models need large datasets for training, especially in Earth Observation (EO) - EO datasets are scattered and hard to combine due to different formats and structures - Major TOM is a framework that allows users to merge and access multiple EO datasets using grid points and metadata - Major TOM-Core is a large open-access dataset that covers most of the land surface and serves as a template for future additions Summary: Major TOM is a framework that enables data fusion and access for deep learning models in Earth Observation, using a geographical indexing system and metadata. Major TOM-Core is an open-access dataset that covers most of the land surface.
http://arxiv.org/abs/2402.12091v1
Compressor summary: The paper explores how large language models reason using context and finds that they do not truly understand logical rules; instead, they rely on in-context learning for accurate answers.
http://arxiv.org/abs/2402.12080v1
Compressor summary: The paper proposes an inductive learning method to improve small language models' reasoning abilities using a distributed network of SLMs.
http://arxiv.org/abs/2402.12079v1
Compressor summary: LVChat improves long video comprehension for multimodal LLMs by using Frame-Scalable Encoding and Interleaved Frame Encoding to handle over-compression and long video input.
http://arxiv.org/abs/2402.12074v1
Compressor summary: The paper proposes a Historical Information Passing (HIP) network that uses temporal, structural and repetitive perspectives to predict future events based on historical information in temporal knowledge graphs.
http://arxiv.org/abs/2402.12067v1
Compressor summary: The paper proposes using slow feature analysis (SFA), inspired by neuroscience, to generate interpretable representations of visual data for visual navigation tasks in reinforcement learning.
http://arxiv.org/abs/2402.12065v1
Compressor summary: The paper proposes WKVQuant, a PTQ framework for quantizing weights and KV cache of LLMs, which improves efficiency and accuracy by using past-only quantization and two-dimensional quantization strategy.
http://arxiv.org/abs/2402.12061v1
Compressor summary: LONDI is a framework that learns to use large language models selectively for complex tasks, reducing computational costs and improving efficiency.
http://arxiv.org/abs/2402.12058v1
Compressor summary: The paper introduces Scaffold, a visual prompting technique for LMMs that uses dot matrices and multi-dimensional coordinates to enhance vision-language coordination.
http://arxiv.org/abs/2402.12055v1
Compressor summary: The text discusses a study that revealed confusion issues in LLMs' NLG evaluation and proposes a hierarchical classification system and fine-grained analysis using perturbation attacks to better understand and improve their performance.
http://arxiv.org/abs/2402.12052v1
Compressor summary: SlimPLM is a novel approach that improves large language models' knowledge acquisition by using a slim proxy model to detect and retrieve missing information, reducing computational costs and outperforming existing methods.
http://arxiv.org/abs/2402.12048v1
Compressor summary: The paper analyzes catastrophic forgetting in multi-modal language models and proposes Model Tailor, a method that adjusts a small number of fine-tuned parameters to improve performance on both original and new tasks.
http://arxiv.org/abs/2402.12043v1
Compressor summary: The paper proposes a lightweight parallel framework for blind image quality assessment using pre-trained features, feature embedding network, self-supervised subtasks, and distortion-aware quality regression network to achieve superior performance with less computation and training time.
http://arxiv.org/abs/2402.12042v1
Compressor summary: The paper proposes a novel algorithm for linear stochastic bandits with subgaussian noise that decreases as the actions approach the unknown vector, achieving a minimax regret of $\log^3(T)$.
http://arxiv.org/abs/2402.12041v1
Compressor summary: The paper surveys automotive surround-view fisheye optics, discussing the challenges of optical artifacts in computer vision for autonomous driving and ADAS, and examining different simulation methods for creating synthetic datasets.
http://arxiv.org/abs/2402.12038v1
Compressor summary: Self-AMPLIFY automatically generates natural language rationales from explanation methods applied to Small Language Models, improving their performance on reasoning tasks.
http://arxiv.org/abs/2402.12036v1
Compressor summary: The authors propose a genre and topic-aware masking method for tailoring language models to specialized domains, which improves performance on the LegalGLUE benchmark.
http://arxiv.org/abs/2402.12035v1
Compressor summary: The paper presents a unified experimental framework for class-incremental learning (CIL) in time series data, evaluates various methods, and studies the impact of design factors on performance.
http://arxiv.org/abs/2402.12030v1
Compressor summary: Universal Logit Distillation (ULD) is a new method that allows compressing knowledge from large language models to smaller ones without requiring them to share the same tokenizer, making it more versatile and useful for various applications.
http://arxiv.org/abs/2402.12026v1
Compressor summary: The paper proposes MuScleLoRA, a method to mitigate backdoor attacks in language models by adjusting the frequency space of the data and aligning gradients during adaptation.
http://arxiv.org/abs/2402.12025v1
Compressor summary: This paper reviews speech-to-text translation models that combine speech foundation models and large language models, highlighting their similarities and differences, and suggesting recommendations for future research.
http://arxiv.org/abs/2402.12022v1
Compressor summary: The authors propose a method to combine large language models and graph models for learning graphs of connected texts, using an interpreter and a student model to transfer knowledge between them.
http://arxiv.org/abs/2402.12011v1
Compressor summary: The paper compares state-of-the-art models for Lexical Semantic Change across different tasks, languages, and benchmarks, finding APD and XL-LEXEME as the best approaches.
http://arxiv.org/abs/2402.12010v1
Compressor summary: This paper proposes an evolutionary-based sampling framework to identify elite training samples for AI models that improve performance and save 98% energy compared to typical training practices.
http://arxiv.org/abs/2402.12008v1
Compressor summary: This paper studies how different types of irrelevant features affect clustering performance using various metrics, and suggests that the Silhouette Coefficient and the Davies-Boudin score can be used to optimize feature selection in unsupervised clustering.
http://arxiv.org/abs/2402.12004v1
Compressor summary: The authors propose a simple yet effective method for fine-tuning text-to-image diffusion models by optimizing consistency with reference images while preserving the pretrained model's style, leading to improved personalization and image-text alignment.
http://arxiv.org/abs/2402.12001v1
Compressor summary: This paper surveys extractive Knowledge Graph summarization, its applications, and existing methods, while suggesting future directions.
http://arxiv.org/abs/2402.11997v1
Compressor summary: This paper tests LLMs on a new temporal dataset (TempUN) and finds they struggle with retaining and reasoning about event sequences, especially closed-source models.
http://arxiv.org/abs/2402.11996v1
Compressor summary: The paper proposes a user-friendly text-promptable model that combines two existing models to achieve high performance in identifying deformable linear objects like wires and cables.
http://arxiv.org/abs/2402.11995v1
Compressor summary: The text describes a method to understand how neural networks make decisions by converting them into a logical formula, which can help ensure their reliability and efficiency.
http://arxiv.org/abs/2402.11989v1
Compressor summary: The paper proposes a method called Stable PrivateLoRA to adapt latent diffusion models for generating specific objects while protecting privacy from membership inference attacks.
http://arxiv.org/abs/2402.11985v1
Compressor summary: WSup-OD is a useful technique for image classification, but it doesn't work well for medical images due to differences in object characteristics. The authors propose WSRPN, a new method that generates bounding box proposals using ROI-attention and improves disease localization in chest X-ray images.
http://arxiv.org/abs/2402.11975v1
Compressor summary: COMEDY is a novel framework that uses compressive memory to generate, compress, and respond in conversations without relying on traditional retrieval modules or memory databases.
http://arxiv.org/abs/2402.11973v1
Compressor summary: The paper proposes a new method for Bayesian active learning in censored regression, called $\mathcal{C}$-BALD, which estimates the information gain from new data points and improves performance over existing methods.
http://arxiv.org/abs/2402.11968v1
Compressor summary: The paper explores the preferences of speakers of German dialects and regional languages towards natural language processing tools that work with their dialectal input, finding them more open to virtual assistants than to those that generate dialectal output.
http://arxiv.org/abs/2402.11963v1
Compressor summary: The paper highlights the problem of class imbalance in regression, which leads to naive models and proposes a definition to address it.
http://arxiv.org/abs/2402.11960v1
Compressor summary: This paper proposes DB-LLM, a novel dual-binarization method for large language models that improves both accuracy and efficiency in ultra-low bit quantization, achieving significant reductions in perplexity and computational consumption compared to existing methods.
http://arxiv.org/abs/2402.11958v1
Compressor summary: The paper proposes using large language models to automatically evaluate the quality of psychological counseling sessions, offering a cost-effective and dependable alternative to manual evaluations.
http://arxiv.org/abs/2402.11957v1
Compressor summary: Key points: - The paper proposes a dual-camera system using an event camera and an RGB camera for amplifying high-frequency motions - The system integrates event streams with image features to estimate motion direction and magnitude - The network uses a Second-order Recurrent Propagation module and a temporal filter to handle long-term interpolations and noise Summary: The paper presents a cost-effective dual-camera system that combines an event camera and an RGB camera to magnify high-frequency motions with accurate direction and magnitude estimation, using a novel network with special modules to deal with challenges.
http://arxiv.org/abs/2402.11955v1
Compressor summary: Key points: - The paper compares SEASON, a summarization technique that uses salience allocation, with other models on different datasets. - The paper evaluates the models using various metrics such as ROUGE and BERTScore. - The paper provides insights into salience allocation techniques for abstractive text summarization. Summary: The paper evaluates SEASON, a salience allocation technique for abstractive summarization, against other models on news, dialogue, and financial datasets using different metrics.
http://arxiv.org/abs/2402.11948v1
Compressor summary: The paper proposes a new optimization method, Mini-Hes, for latent factor analysis models that represent high-dimensional and incomplete data and improve their performance in missing data estimation tasks.
http://arxiv.org/abs/2402.11943v1
Compressor summary: The paper explores how Large Vision Language Models can help detect multimodal misinformation on social media, and proposes a method (LEMMA) that improves their performance by adding external knowledge.
http://arxiv.org/abs/2402.11942v1
Compressor summary: The paper studies how different ReLU functions affect the performance of overparameterized neural networks and finds that a particular activation function provides the best trade-off between training and generalization errors.
http://arxiv.org/abs/2402.11941v1
Compressor summary: CoCo-Agent is a novel LLM that improves GUI automation by enhancing perception and action prediction with CEP and CAP techniques.
http://arxiv.org/abs/2402.11934v1
Compressor summary: The paper describes QUST's participation and methods in Task 8 SemEval 2024, using data augmentation, various deep-learning models, and a stacking ensemble to achieve 8th place in multilingual subtask A.
http://arxiv.org/abs/2402.11933v1
Compressor summary: SLADE is a self-supervised method for detecting anomalies in edge streams by observing node interaction patterns and minimizing drift in node representations.
http://arxiv.org/abs/2402.11929v1
Compressor summary: The paper proposes a method to control fine-grained lighting in image generation using diffusion models, by guiding the process with radiance hints instead of exact geometry.
http://arxiv.org/abs/2402.11928v1
Compressor summary: SepCLR is a new method for contrastive analysis that uses mutual information terms to learn semantically expressive representations and separate common and salient factors of variation in different datasets.
http://arxiv.org/abs/2402.11925v1
Compressor summary: The paper proposes a novel offloading architecture for IoT devices that reduces energy consumption by sequentially transmitting important features and prefetching potentially needed ones for AI model training.
http://arxiv.org/abs/2402.11924v1
Compressor summary: The authors introduce a new multi-hop question answering (MHQA) benchmark for evaluating large language models (LLMs) that considers data contamination and reasoning chain evaluation, finding significant performance gaps between LLMs on the original and edited HotpotQA datasets.
http://arxiv.org/abs/2402.11922v1
Compressor summary: Key points: - GPDiff is a novel generative pre-training framework for STG transfer learning - It uses a diffusion model with a transformer-based denoising network to generate tailored model parameters - It outperforms state-of-the-art baselines on traffic speed and crowd flow prediction tasks Summary: GPDiff is a new method that pre-trains STG models with generative techniques to adapt to different cities and improve prediction accuracy.
http://arxiv.org/abs/2402.11917v1
Compressor summary: The study analyzes how transformers use internal mechanisms to reason on a synthetic task, revealing depth-bounded recurrent processes that may apply to other tasks.
http://arxiv.org/abs/2402.11913v1
Compressor summary: PhySU-Net is a new rPPG method that uses a transformer network and self-supervised pre-training with pseudo-labels to improve cardiac activity measurement from facial videos.
http://arxiv.org/abs/2402.11909v1
Compressor summary: Key points: - The paper introduces a new approach to create high-quality head avatars from few images per user - The method uses a generative model with a 3DMM-anchored neural radiance field backbone - The method optimizes 3DMM fitting and camera calibration for better adaptation Summary: The paper presents a new approach to create personalized head avatars using few images, a novel generative model with a 3DMM-anchored neural radiance field, and joint optimization of 3DMM fitting and camera calibration.
http://arxiv.org/abs/2402.11908v1
Compressor summary: The paper presents a new method for evaluating semantic similarity in medical reports using deep learning and improves upon existing metrics by providing more meaningful scores in the medical domain.
http://arxiv.org/abs/2402.11907v1
Compressor summary: The paper proposes a method called Direct Large Model Alignment (DLMA) that uses contrastive prompt pairs to evaluate and align large language models without human-annotated preference data, achieving better performance than existing methods.
http://arxiv.org/abs/2402.11905v1
Compressor summary: The paper proposes a Learning to Edit framework that teaches large language models to apply updated knowledge to questions efficiently and effectively without relying on memorization.
http://arxiv.org/abs/2402.11903v1
Compressor summary: SoLA is a novel method that uses a solver as an extra layer of a large language model to improve its logical reasoning and solve complex industrial problems.
http://arxiv.org/abs/2402.11901v1
Compressor summary: Nyx is a new PDDL+ planner that adapts easily to various real-world applications, overcoming limitations and increasing usage of AI planning.
http://arxiv.org/abs/2402.11900v1
Compressor summary: This paper explores how large language models can use factual shortcuts when reasoning through multi-hop facts, and proposes a method to reduce failures caused by these shortcuts.
http://arxiv.org/abs/2402.11896v1
Compressor summary: SIBO is a technique that boosts the effectiveness of parameter-efficient fine-tuning for large language models by injecting an initial residual to reduce over-smoothing and improve performance.
http://arxiv.org/abs/2402.11894v1
Compressor summary: The paper proposes two strategies to automate dataset updates for LLMs, using mimicking and extending techniques, to improve evaluation reliability and timeliness.
http://arxiv.org/abs/2402.11893v1
Compressor summary: The paper proposes a decoding method (COIECD) for language models that adapts to knowledge conflicts and improves performance on realistic datasets.
http://arxiv.org/abs/2402.11890v1
Compressor summary: ATKD is a method that improves knowledge distillation by making teaching more diverse and flexible for autoregressive language models, leading to better performance and generalization.
http://arxiv.org/abs/2402.11889v1
Compressor summary: ROSE is a method that improves the safety of instruction-tuned large language models without additional training by using reverse prompts to suppress undesired outputs.
http://arxiv.org/abs/2402.11887v1
Compressor summary: The paper proposes a novel generative graph anomaly detection method that uses normal nodes to generate effective negative samples for training a one-class classifier, outperforming existing methods on real-world datasets.
http://arxiv.org/abs/2402.11886v1
Compressor summary: The paper examines how Large Language Models can improve emotional support for queer youth online, but finds they need more personalization and empathy to be effective.
http://arxiv.org/abs/2402.11883v1
Compressor summary: InMD-X is a set of large language models for internal medicine doctors to improve research, diagnosis, and documentation in healthcare.
http://arxiv.org/abs/2402.11882v1
Compressor summary: Key points: - Discharge summary is a critical document that covers all events during hospitalization - Clinicians face challenges in creating discharge summaries manually - NOTE is a system that generates discharge summaries using LLMs' APIs and lightweight models - NOTE can be applied to other types of summaries and aims for increased efficiency Summary: NOTE is a system that automates the generation of discharge summaries for patients, which are important documents for future care and planning. It uses lightweight models and LLMs' APIs to overcome privacy and performance issues in healthcare settings. NOTE can also be used for other types of summaries to reduce clinicians' workload.
http://arxiv.org/abs/2402.11877v1
Compressor summary: This paper investigates how using a model in $Q$-learning can reduce the number of samples needed for learning.
http://arxiv.org/abs/2402.11875v1
Compressor summary: The paper proposes M2K-VDG, a framework to reduce hallucinations in video-grounded dialogue generation by enhancing multimodal knowledge anchor tokens using model-adaptive and counterfactual methods.
http://arxiv.org/abs/2402.11874v1
Compressor summary: The paper proposes a framework that uses language descriptions and cross-attention with contrastive learning to separate image layers in reflection separation problems, improving performance over existing methods.
http://arxiv.org/abs/2402.11867v1
Compressor summary: LoRA helps large language models fine-tune efficiently by avoiding bad local minima and finding good low-rank solutions that generalize well.
http://arxiv.org/abs/2402.11863v1
Compressor summary: The paper evaluates various prompting techniques for large language models on commonsense reasoning tasks and proposes a new technique called Self-Entailment-Alignment Chain-of-thought that significantly improves interpretability.
http://arxiv.org/abs/2402.11857v1
Compressor summary: LIEC-SGD is a new optimization algorithm for distributed learning that compresses data both ways and compensates errors to reduce communication overhead and speed up convergence.
http://arxiv.org/abs/2402.11849v1
Compressor summary: ComFusion is a novel approach that generates high-fidelity images by combining user-provided subject images with predefined text scenes, using pretrained models and preserving class-scene prior knowledge.
http://arxiv.org/abs/2402.11846v1
Compressor summary: The text discusses machine unlearning for diffusion models, introduces a new evaluation framework with a dataset (UnlearnCanvas) for assessing it, and benchmarks five methods to reveal their strengths and weaknesses.
http://arxiv.org/abs/2402.11845v1
Compressor summary: Key points: - Paper proposes a method to detect hateful memes with LoRA modules and module composer - Method uses few labeled examples and large language models - Method outperforms traditional in-context learning on three datasets Summary: The paper presents a hateful meme detection method that combines LoRA modules and module composer, using few labeled examples and large language models. It beats in-context learning on three datasets.
http://arxiv.org/abs/2402.11843v1
Compressor summary: WildFake is a large dataset for detecting AI-generated images with diverse content and generative models to assess detection robustness and generalizability.
http://arxiv.org/abs/2402.11840v1
Compressor summary: The text proposes a method to update preoperative 3D models using intraoperative endoscopy video for navigated sinus surgery, improving accuracy during surgical progression.
http://arxiv.org/abs/2402.11838v1
Compressor summary: The paper proposes UniST, a universal model for urban spatio-temporal prediction that generalizes well across diverse scenarios with minimal data.
http://arxiv.org/abs/2402.11835v1
Compressor summary: ABCs is a hybrid reinforcement learning algorithm that adapts to stationary or nonstationary environments by combining Boltzmann Q-learning and counterfactual regret minimization, achieving strong performance and convergence guarantees in various domains.
http://arxiv.org/abs/2402.11831v1
Compressor summary: The paper proposes two approaches using residual neural networks for rock classification, improving accuracy by data augmentation, and achieves an accuracy of 73.7% with a similar backbone as BoTNet.
http://arxiv.org/abs/2402.11826v1
Compressor summary: Key points: - Monocular depth estimation from RGB images is important but can fail in challenging conditions - Long-wave infrared cameras offer stable imaging but lack rich information - The paper proposes a novel approach that integrates cross-modality depth features with a learning-based framework - The approach uses separate networks for each modality, a confidence predictor network, and a multi-modal fusion network - The method shows robust performance on difficult scenarios and datasets Summary: The paper presents a new method that combines RGB and long-wave infrared images using separate and learning-based networks to estimate robust depth in challenging conditions.
http://arxiv.org/abs/2402.11821v1
Compressor summary: The paper investigates how large language models (LLMs) recall and encode graphs described in text, finding that they often underperform humans, favor certain structural patterns, and depend on the domain of the graph for better recall.
http://arxiv.org/abs/2402.11819v1
Compressor summary: The paper proposes two methods for fine-grained weight sharing across attention heads in large language models, reducing memory usage without sacrificing performance.
http://arxiv.org/abs/2402.11818v1
Compressor summary: NewsSerow is a method to automatically recognize environmental conservation content in low-resource languages using large language models, enabling efficient media monitoring for conservation organizations.
http://arxiv.org/abs/2402.11816v1
Compressor summary: MCL is a novel framework for contrastive learning that progressively learns new features while maintaining existing ones, avoiding feature suppression and improving representation quality for downstream tasks.
http://arxiv.org/abs/2402.11815v1
Compressor summary: The paper presents a single machine-generated text detection model based on contrastive learning that performs well with less parameters and data augmentation.
http://arxiv.org/abs/2402.11812v1
Compressor summary: The paper proposes a neural network that embeds queries with semantic concepts and improves video search performance and interpretability on TRECVid datasets.
http://arxiv.org/abs/2402.11811v1
Compressor summary: FIPO is a prompt crafting method that adapts to different language models and optimizes task instructions using modular fine-tuning.
http://arxiv.org/abs/2402.11809v1
Compressor summary: SPACE is an approach that accelerates large language models by parallelizing token generation and verification with semi-autoregressive inference and auto-correct decoding.
http://arxiv.org/abs/2402.11804v1
Compressor summary: The paper introduces ProLINK, a pretraining and prompting framework that uses large language models to improve graph neural networks for low-resource knowledge graph inductive reasoning tasks.
http://arxiv.org/abs/2402.11800v1
Compressor summary: The paper analyzes how delayed updates affect the convergence of stochastic approximation schemes in large-scale and multi-agent reinforcement learning, and proposes a delay-adaptive algorithm that converges faster than existing methods.
http://arxiv.org/abs/2402.11794v1
Compressor summary: This paper examines how attention distillation works to improve retrieval-augmented generation models, identifying key factors and proposing indicators for better training.
http://arxiv.org/abs/2402.11793v1
Compressor summary: The paper proposes Generative Kaleidoscopic Networks that use over-generalization in Deep ReLU networks to create diverse samples from a fixed input distribution by recursively applying the network to random noise.
http://arxiv.org/abs/2402.11788v1
Compressor summary: The text proposes a new deep learning method for breast cancer management using histopathological imaging, genetic and clinical data, which outperforms existing methods and can help personalize treatments.
http://arxiv.org/abs/2402.11782v1
Compressor summary: The study investigates how language models answer ambiguous queries by analyzing ConflictingQA, a dataset that contains controversial questions and evidence documents with different features, and suggests improving the quality of RAG corpora and training LLMs to align with human judgments.
http://arxiv.org/abs/2402.11778v1
Compressor summary: The paper develops a framework to analyze how training generative models in a self-consuming loop affects data distributions and shows that TV distance between synthetic and real data can be controlled by adjusting dataset sizes or proportions of real data, with a phase transition occurring at a certain threshold.
http://arxiv.org/abs/2402.11777v1
Compressor summary: The study investigates if pre-trained language models implicitly learn human wellbeing concepts and finds that larger models perform better on a Utilitarianism task, suggesting pretraining imparts some understanding of ethics.
http://arxiv.org/abs/2402.11773v1
Compressor summary: The paper proposes a new method, Dynamic Multi-network Mining (DMM), for subsequence clustering and interpretation of tensor time series, which is accurate, interpretable, and scalable.
http://arxiv.org/abs/2402.11771v1
Compressor summary: The paper develops new methods for evaluating index-based allocation policies using data from randomized trials, addressing challenges caused by dependencies between agents and enabling valid statistical conclusions.
http://arxiv.org/abs/2402.11770v1
Compressor summary: The paper proposes a structured approach to generate question-answer conversations using a large language model and shows improved agent faithfulness and performance in out-of-domain evaluations.
http://arxiv.org/abs/2402.11764v1
Compressor summary: The authors propose a novel method using ChatGPT to generate synthetic training data for debiasing large language models, achieving better performance and generalizability across categories with minimal retraining cost.
http://arxiv.org/abs/2402.11760v1
Compressor summary: PaSeR is a cost-aware learning pipeline for computer vision tasks that achieves better accuracy while minimizing computational cost compared to cascaded models.
http://arxiv.org/abs/2402.11756v1
Compressor summary: The paper proposes Meaning-Aware Response Scoring (MARS), a new scoring function for estimating the correctness of generative large language models' outputs, which improves uncertainty estimation performance across different datasets and models.
http://arxiv.org/abs/2402.11755v1
Compressor summary: Key points: - Large language models (LLMs) enable chatbot applications but are vulnerable to attacks by malicious users - System Prompt Meta Language (SPML) is a domain-specific language for refining and monitoring prompts for LLM-based chatbots - SPML checks attack prompts, optimizes costs, and streamlines chatbot definition crafting with programming language capabilities - SPML outperforms GPT-4, GPT-3.5, and LLAMA in understanding attacker prompts Summary: The paper introduces SPML, a domain-specific language that enhances and monitors prompts for LLM-based chatbots, preventing attacks by checking prompts and optimizing costs, while surpassing GPT models in understanding malicious inputs.
http://arxiv.org/abs/2402.11753v1
Compressor summary: The paper proposes a new ASCII art-based jailbreak attack on large language models (LLMs) that exploits their poor performance in recognizing such art to bypass safety measures and cause undesired behaviors.
http://arxiv.org/abs/2402.11752v1
Compressor summary: The paper proposes a method to improve gradient-based optimization for non-differentiable models by using a smoothed approximation that reduces variance and converges to stationary points.
http://arxiv.org/abs/2402.11750v1
Compressor summary: InfICL is a demonstration selection method for In-Context Learning that uses influence functions to identify highly influential training samples without fine tuning the large language model.
http://arxiv.org/abs/2402.11746v1
Compressor summary: RESTA is a method to improve the safety of aligned language models by adding a safety vector to their weights, reducing harmfulness while preserving task performance.
http://arxiv.org/abs/2402.11744v1
Compressor summary: The paper proposes a method to detect machine-generated text in documents by localizing the parts that are machine written, using contextual information and improving performance on five datasets.