This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-27 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.18534v1
Compressor summary: The paper introduces Compositional Concept Extraction (CCE), a method to find more useful concept representations for foundation models that can be combined to explain the full sample.
http://arxiv.org/abs/2406.18533v1
Compressor summary: Grendel is a distributed system that partitions and parallelizes 3D Gaussian Splatting (3DGS) computation across multiple GPUs, improving rendering quality for large-scale 3D reconstruction tasks.
http://arxiv.org/abs/2406.18532v1
Compressor summary: The paper proposes a framework for language agents to optimize themselves using natural language versions of weights, loss, and gradients, enabling them to evolve in the wild.
http://arxiv.org/abs/2406.18530v1
Compressor summary: The paper presents a method to improve soccer game commentary generation by aligning video and text using manual annotations and a multi-modal pipeline, leading to better results on a benchmark dataset.
http://arxiv.org/abs/2406.18529v1
Compressor summary: The paper proposes a new algorithm to learn efficiently in constrained reinforcement learning with linear function approximation and $q_{\pi}$-realizability, achieving polynomial sample complexity.
http://arxiv.org/abs/2406.18528v1
Compressor summary: PrExMe is a large-scale comparison of prompts for evaluating natural language generation models, revealing the stability and variability of different strategies in machine translation and summarization tasks.
http://arxiv.org/abs/2406.18524v1
Compressor summary: MultiDiff is a novel method for generating consistent novel views from a single image using depth predictors and video-diffusion models to improve geometric stability and pixel accuracy.
http://arxiv.org/abs/2406.18522v1
Compressor summary: ChronoMagic-Bench is a new text-to-video benchmark that evaluates models' ability to generate coherent and diverse time-lapse videos across various categories of natural phenomena.
http://arxiv.org/abs/2406.18521v1
Compressor summary: CharXiv is a new evaluation suite for multimodal language models that tests their ability to understand diverse and challenging charts, revealing significant performance gaps and weaknesses.
http://arxiv.org/abs/2406.18518v1
Compressor summary: APIGen generates diverse and reliable function-calling datasets for agents, achieving state-of-the-art performance on benchmarks.
http://arxiv.org/abs/2406.18516v1
Compressor summary: The paper proposes a domain adaptation method for image restoration tasks using diffusion models and noise prediction in the noise-space to align synthetic and real-world data to a common clean distribution.
http://arxiv.org/abs/2406.18512v1
Compressor summary: The research explores how conversational explanations work and tests the effectiveness of large language models like GPT4 in generating explanation dialogues using a 5-Levels dataset with annotated explanatory acts.
http://arxiv.org/abs/2406.18505v1
Compressor summary: The study investigates how well large language models can understand an agent's behavior in the physical world by using their knowledge and reasoning, finding that they are not yet fully capable without further improvements.
http://arxiv.org/abs/2406.18501v1
Compressor summary: The paper proposes a method to diagnose if large language models' in-context learning is equivalent to gradient-based learning using the inverse frequency effect and finds evidence supporting this hypothesis.
http://arxiv.org/abs/2406.18495v1
Compressor summary: WildGuard is a tool that detects malicious intent and safety risks in Large Language Models (LLMs) interactions and evaluates their refusal rates, outperforming existing tools and matching GPT-4's performance.
http://arxiv.org/abs/2406.18481v1
Compressor summary: The authors propose a robust surgical phase recognition method that can handle missing annotations and introduce SkipTag@K, an annotation approach that reduces costs while maintaining performance.
http://arxiv.org/abs/2406.18462v1
Compressor summary: The paper introduces GaussianDreamerPro, a framework that binds 3D Gaussians to reasonable geometry to generate high-quality assets with improved details and applicability in various tasks.
http://arxiv.org/abs/2406.18460v1
Compressor summary: The study explores using role-play zero-shot prompting with multilingual LLMs and an instruction-following model to create efficient and cost-effective open-domain conversational agents, achieving high results in French.
http://arxiv.org/abs/2406.18459v1
Compressor summary: Key points: - The text discusses the challenges of generating high-resolution images with diffusion models, which are widely used for computer vision tasks. - The text proposes a novel progressive approach that uses low-resolution images to guide higher resolution image generation without additional training or fine-tuning. - The text claims that this method is efficient and effective in producing high-quality images. Summary: The text introduces a new method to generate high-resolution images with diffusion models using low-resolution images as guidance, avoiding extra training or fine-tuning and achieving good results.
http://arxiv.org/abs/2406.18453v1
Compressor summary: The authors propose a novel method for relative pose estimation without training data that uses 2.5D shape from RGB-D reference, off-the-shelf differentiable renderer, and semantic cues from pretrained model.
http://arxiv.org/abs/2406.18450v1
Compressor summary: Sim-OPRL is an offline preference-based RL algorithm that uses a learned environment model to get preference feedback without interacting with the real environment.
http://arxiv.org/abs/2406.18449v1
Compressor summary: The paper proposes CALLMSAE, a framework that uses large language models to generate event graphs from documents, focusing on salient events and their relations.
http://arxiv.org/abs/2406.18445v1
Compressor summary: The paper proposes an autotuning-based framework to optimize hyperparameters in SVMs using mixed kernels for high-dimensional data in HEP and MKH applications, achieving high classification accuracy.
http://arxiv.org/abs/2406.18443v1
Compressor summary: The paper presents a novel open-set object detection framework that uses conditional evidence decoupling to reject unknown classes and improve performance with scarce training data.
http://arxiv.org/abs/2406.18430v1
Compressor summary: The paper explores how training features on a specific domain affects distance measurement in facial images using self-supervision learning.
http://arxiv.org/abs/2406.18423v1
Compressor summary: The study proposes an equivariant graph convolutional network (EGCN) as a more accurate and efficient emulator for ice sheet dynamics modeling than convolutional neural networks (CNNs).
http://arxiv.org/abs/2406.18422v1
Compressor summary: The paper proposes a simple technique to translate 2D X-ray images into 3D CT-like reconstructions by concatenating multiple 2D views and using neural optimal transport for effective information integration.
http://arxiv.org/abs/2406.18420v1
Compressor summary: The text discusses Mixtures of Experts (MoEs) in Deep Reinforcement Learning (DRL), their benefits for handling non-stationarity, and how multi-task training can enhance MoE's performance and understanding.
http://arxiv.org/abs/2406.18417v1
Compressor summary: Latent diffusion models can generate realistic Arctic sea-ice states using less computational resources than traditional methods, but they may introduce smoothing that affects their accuracy.
http://arxiv.org/abs/2406.18414v1
Compressor summary: BiTrack is a novel 3D object tracking framework that fuses 2D-3D detection, generates reliable trajectories, and refines them using point-level registration, data association, and re-optimization techniques.
http://arxiv.org/abs/2406.18406v1
Compressor summary: IRCAN is a framework that strengthens large language models to handle contextual cues and resolve knowledge conflicts using context-aware attribution scores and neuron reweighting.
http://arxiv.org/abs/2406.18403v1
Compressor summary: JUDGE-BENCH is a dataset for evaluating LLMs' ability to replicate human annotations in NLP, showing they still cannot fully replace human judgments.
http://arxiv.org/abs/2406.18400v1
Compressor summary: LLMs can recall facts based on context, acting like an associative memory model with self-attention and value matrix.
http://arxiv.org/abs/2406.18387v1
Compressor summary: Our model uses self-generated geometric hints from previous frames to improve depth estimation and 3D scene reconstruction in real time.
http://arxiv.org/abs/2406.18380v1
Compressor summary: Graph Neural Networks (GNNs) use message passing layers to update node representations, with MLPs as a common transformation; Kolmogorov-Arnold Networks (KANs), based on a theorem for function representation, are a promising alternative and perform well in graph regression tasks.
http://arxiv.org/abs/2406.18375v1
Compressor summary: The authors propose a method to improve skin cancer diagnosis for minority groups using synthetic images generated from majority group data.
http://arxiv.org/abs/2406.18373v1
Compressor summary: The paper proposes dynamic data pruning for ASR, which can save training time and maintain performance by selecting relevant data.
http://arxiv.org/abs/2406.18365v1
Compressor summary: The paper introduces a large corpus for natural language generation evaluation and a new language model (Themis) that can perform flexible and accurate evaluations without references.
http://arxiv.org/abs/2406.18364v1
Compressor summary: The paper proposes an improved BERTSum-LSTM model for information extraction from Chinese news, addressing the challenges of complex semantics, large information volume, and language particularity.
http://arxiv.org/abs/2406.18361v1
Compressor summary: The paper introduces SDSeg, a latent diffusion segmentation model that overcomes challenges in medical image segmentation with stable diffusion and single-step reverse process.
http://arxiv.org/abs/2406.18360v1
Compressor summary: This paper introduces a new driving view synthesis dataset and benchmark for testing autonomous vehicle systems in challenging scenarios beyond real-world data.
http://arxiv.org/abs/2406.18354v1
Compressor summary: GKAN is a new graph neural network model that combines spline functions on edges with accuracy and interpretability, performing well in various graph-based tasks.
http://arxiv.org/abs/2406.18351v1
Compressor summary: The paper proposes a decision framework using reinforcement learning, feedback graph, and intrinsic motivation to improve sample efficiency and inventory control in real-world applications.
http://arxiv.org/abs/2406.18350v1
Compressor summary: The paper proposes using Knowledge Distillation and Logits Regularization to reduce the spiking activity of spiking neural networks without sacrificing performance, making them more energy-efficient for Edge applications.
http://arxiv.org/abs/2406.18346v1
Compressor summary: The paper analyzes the challenges and limitations of using Reinforcement Learning from Feedback methods to align Artificial Intelligence systems with human values and intentions, and suggests a more critical and reflective approach to their application.
http://arxiv.org/abs/2406.18345v1
Compressor summary: The emotion transformer (EmT) model leverages prior neurophysiological knowledge to enhance EEG emotion decoding by capturing long-term contextual information and performing well on cross-subject classification and regression tasks.
http://arxiv.org/abs/2406.18344v1
Compressor summary: The study shows that deep neural networks trained with different objectives share common features related to distinct brain regions, revealing the formation of visual concepts and the processing of visual information in networks.
http://arxiv.org/abs/2406.18340v1
Compressor summary: The proposed system offers Spanish grammar coaching with informative feedback, using a linguistic formalism and a fast parsing algorithm that reduces reliance on neural methods and costs.
http://arxiv.org/abs/2406.18334v1
Compressor summary: The paper proposes Compress Then Explain (CTE), a new method to improve the efficiency and accuracy of machine learning explanations by using distribution compression through kernel thinning instead of standard i.i.d. sampling.
http://arxiv.org/abs/2406.18333v1
Compressor summary: The text introduces a novel module for sign language recognition that leverages local and global contexts using intra-inter gloss attention, improving accuracy without prior knowledge.
http://arxiv.org/abs/2406.18332v1
Compressor summary: The paper proposes a principle-based taxonomy and evaluation dimensions for early classification of time series, and presents experiments comparing nine state-of-the art methods using an open-source library.
http://arxiv.org/abs/2406.18330v1
Compressor summary: The authors propose a technique for structure-based drug design that uses virtual receptors and protein language embeddings to improve performance and speed.
http://arxiv.org/abs/2406.18326v1
Compressor summary: PaCoST is a method to detect cheating in large language models by comparing their confidence under original and contaminated benchmarks.
http://arxiv.org/abs/2406.18321v1
Compressor summary: The paper introduces MathOdyssey, a new dataset for testing large language models' abilities in complex math problems, and shows that current LLMs still struggle with the hardest tasks.
http://arxiv.org/abs/2406.18314v1
Compressor summary: ContactNet is a novel attention-based Graph Neural Network that can classify accurate protein-protein interaction models from docking algorithms without multiple sequence alignment, achieving higher accuracy than current methods.
http://arxiv.org/abs/2406.18312v1
Compressor summary: The authors propose that integrating memory into large language models can help achieve artificial general intelligence by simplifying complex inferences and connecting related information.
http://arxiv.org/abs/2406.18311v1
Compressor summary: The paper proposes an online multi-task learning method that updates task relatedness iteratively using weight vectors and shows its superior performance over a conventional method on three datasets.
http://arxiv.org/abs/2406.18310v1
Compressor summary: This paper introduces STAR-RL, a hierarchical reinforcement learning framework for super-resolution of pathology images, which improves interpretability and diagnosis accuracy.
http://arxiv.org/abs/2406.18309v1
Compressor summary: The paper introduces FCM-Former, a machine learning tool that automates immunophenotyping for diagnosing childhood acute leukemia, achieving 96.5% accuracy.
http://arxiv.org/abs/2406.18305v1
Compressor summary: The S3 model, a simple yet powerful baseline for multimodal dialog, achieves near state-of-the-art results using a pre-trained language model and modality encoders with a small data mixture.
http://arxiv.org/abs/2406.18297v1
Compressor summary: This study explores using open-source large language models to identify check-worthy political statements and proposes a data pruning method to improve efficiency and performance.
http://arxiv.org/abs/2406.18295v1
Compressor summary: Foundation Models are better than problem-specific models for solving multiple Computer Vision problems with high accuracy, especially in Earth Observation applications where they are more efficient with limited data.
http://arxiv.org/abs/2406.18294v1
Compressor summary: The study proposes a Hierarchical Context Pruning strategy to improve repository-level code completion accuracy by modeling code files at the function level and removing irrelevant content, reducing input length for Repo-Code LLMs.
http://arxiv.org/abs/2406.18293v1
Compressor summary: The paper proposes a method to automatically optimize hyperparameters and reward functions together for deep reinforcement learning, improving performance on complex tasks.
http://arxiv.org/abs/2406.18284v1
Compressor summary: RealTalk is a new audio-driven framework for realistic face generation that preserves individual traits and works in real-time.
http://arxiv.org/abs/2406.18279v1
Compressor summary: The CAS model assesses confidence levels of semantic segmentation algorithms for EO tasks like land cover classification, improving performance and preventing errors using satellite data.
http://arxiv.org/abs/2406.18278v1
Compressor summary: GDA-N et is a novel method for attributing deepfakes to their GAN architectures, even if they are generated from different seeds or fine-tuned versions of the models.
http://arxiv.org/abs/2406.18276v1
Compressor summary: The authors present a framework for building knowledge graphs from Sanskrit texts, enhancing text analysis and paving the way for further advancements in computational Sanskrit.
http://arxiv.org/abs/2406.18266v1
Compressor summary: The authors present the first open-source Romanian Large Language Models (RoLLMs), trained on translated texts and benchmarks, achieving state-of-the-art results across various categories.
http://arxiv.org/abs/2406.18259v1
Compressor summary: This paper introduces a new ternary text classification scheme to better detect and explain when texts are written by LLMs or humans, addressing the growing concerns about authorship in the era of advanced LLMs.
http://arxiv.org/abs/2406.18256v1
Compressor summary: The paper presents LLaMIPa, a discourse parser that uses an LLM finetuned on SDRT-style annotations and can process discourse data incrementally, improving performance over encoder-only models.
http://arxiv.org/abs/2406.18253v1
Compressor summary: The text discusses visual grounding in vision-question answering (VQA), formalizes it as visually grounded reasoning, and proposes a method to create out-of-distribution tests that emphasize visual grounding.
http://arxiv.org/abs/2406.18245v1
Compressor summary: The text discusses a method to improve causal event extraction tasks by using evaluation models trained with reinforcement learning and weak-to-strong supervision, achieving high agreement with human judgement.
http://arxiv.org/abs/2406.18242v1
Compressor summary: ConStyle v2 is a plug-and-play prompter that improves U-Net style image restoration models using pre-training, classification, and knowledge distillation techniques.
http://arxiv.org/abs/2406.18239v1
Compressor summary: This paper tests a new text-to-text annotation tool that uses large foundation models and shows it can perform as well as fine-tuned BERT on German Twitter data without any labeled training data.
http://arxiv.org/abs/2406.18237v1
Compressor summary: PlaMo is a system that combines scene-aware path planning and physics-based control to enable humanoids to navigate complex 3D environments.
http://arxiv.org/abs/2406.18236v1
Compressor summary: CoDA is a software tool that allows the visual analysis of complex dendroid coral colonies, helping to understand their growth patterns and shapes.
http://arxiv.org/abs/2406.18227v1
Compressor summary: The GUIDE dataset provides annotated instructional videos with guidelines to help beginners learn new tasks more effectively and offers three sub-tasks to evaluate model comprehension ability.
http://arxiv.org/abs/2406.18221v1
Compressor summary: PAE is a novel technique that removes personal information from large language models without retraining them, enhancing their data privacy and preventing private data leakage.
http://arxiv.org/abs/2406.18220v1
Compressor summary: The paper proposes a method to incorporate procedural knowledge into deep learning models for video prediction, leading to better performance than data-driven models alone.
http://arxiv.org/abs/2406.18219v1
Compressor summary: This paper explores how mixture-of-experts (MoE) works in large language models, revealing its features and suggesting improvements for router design and expert allocation.
http://arxiv.org/abs/2406.18215v1
Compressor summary: The paper proposes new approximation algorithms for incomplete multi-graph matching, which is important for computer vision tasks like image or shape matching, and shows that they significantly outperform existing methods in terms of objective and runtime.
http://arxiv.org/abs/2406.18214v1
Compressor summary: The study proposes a method called "Trimming the fat" to prune redundant information from 3D models, improving their scalability and performance while reducing memory usage and computation time.
http://arxiv.org/abs/2406.18200v1
Compressor summary: SeeD is a framework that improves the efficiency and speed of tree-search-based reasoning methods in large language models.
http://arxiv.org/abs/2406.18199v1
Compressor summary: The text introduces a novel technique that combines octree-based implicit surface representations and Gaussian splatting to accurately reconstruct the geometry of target objects from multi-view images, especially under strong lighting conditions.
http://arxiv.org/abs/2406.18198v1
Compressor summary: The paper proposes a new method (VDG) for dynamic scene reconstruction that integrates self-supervised vision optimization, improves initialization and decomposition, works with RGB images, and outperforms existing methods.
http://arxiv.org/abs/2406.18197v1
Compressor summary: The paper proposes a data-driven approach to learn prompts for prompt-based anomaly detection using synthesized samples and a modified attention mechanism for pixel-wise anomaly segmentation.
http://arxiv.org/abs/2406.18193v1
Compressor summary: MammothModa is a multi-modal language model with improved visual capabilities, extended context window, and high-quality bilingual datasets, achieving state-of-the-art performance in various benchmarks.
http://arxiv.org/abs/2406.18192v1
Compressor summary: The paper proposes a method to adapt large language models for specific cultural contexts by tuning them with cultural knowledge and safety values data, improving their performance and adaptability.
http://arxiv.org/abs/2406.18187v1
Compressor summary: The paper proposes Selective Prompt Tuning (SPT), a method to improve conversational AI personalization by adaptively selecting and enhancing prompts using context-prompt contrastive learning and prompt fusion learning.
http://arxiv.org/abs/2406.18179v1
Compressor summary: The DeepExtremeCubes database is a new tool that uses satellite images and other data to help scientists study how extreme heatwaves and droughts affect ecosystems around the world.
http://arxiv.org/abs/2406.18178v1
Compressor summary: The paper argues that to advance AI through game research, we need to tackle Knightian uncertainty, or the ability to adapt to sudden rule changes in games without prior information or models.
http://arxiv.org/abs/2406.18176v1
Compressor summary: The "VIPriors" workshop's fourth edition featured data-impaired challenges for computer vision tasks with limited data, where participants used inductive biases, data augmentation, and model ensembles to improve data efficiency.
http://arxiv.org/abs/2406.18173v1
Compressor summary: UIO-LLMs are memory-enhanced transformers that use incremental optimization techniques to handle long texts more effectively and efficiently.
http://arxiv.org/abs/2406.18166v1
Compressor summary: The paper introduces Triple Set Prediction (TSP), a novel graph-level KG completion task, and proposes GPHT, a subgraph-based method for fast prediction of missing triples, along with two baselines and evaluation metrics.
http://arxiv.org/abs/2406.18164v1
Compressor summary: NeBuLa is a model that uses prior context and nonlinguistic cues to improve language-to-action predictions in collaborative tasks.
http://arxiv.org/abs/2406.18159v1
Compressor summary: The paper proposes a diffusion model-based approach to generate 3D scenes from human motion sequences that adhere to spatial constraints, avoid collisions, and respect layout constraints.
http://arxiv.org/abs/2406.18151v1
Compressor summary: The paper introduces SynRS3D, a large synthetic remote sensing 3D dataset, and RS3DAda, a multi-task unsupervised domain adaptation method that enables global monocular 3D semantic understanding from synthetic data.
http://arxiv.org/abs/2406.18146v1
Compressor summary: The authors introduce a new dataset (Med-GRIT-270k) and a multimodal language model (BiRD) for biomedical image refer and ground conversations, which can help develop intelligent biomedical assistants.
http://arxiv.org/abs/2406.18144v1
Compressor summary: The text reviews selective breeding techniques in insect farming, discussing how to improve traits, formulate objectives, and address genetic diversity issues for a more sustainable food source.
http://arxiv.org/abs/2406.18140v1
Compressor summary: Key points: - Novel Class Discovery (NCD) is a task to cluster unseen novel classes based on labeled data in the same domain - The paper explores NCD in cross domain setting with the condition that style information must be removed - The paper introduces an exclusive style removal module to improve performance on different distributions - The paper builds a fair benchmark for future NCD research Summary: The paper proposes a style removal module for Novel Class Discovery in cross domain setting and creates a benchmark for evaluating NCD methods.
http://arxiv.org/abs/2406.18139v1
Compressor summary: LOOK-M is a novel approach that efficiently compresses the multimodal KV cache in long-context MLLMs, enabling faster decoding and maintained or improved task performance.
http://arxiv.org/abs/2406.18135v1
Compressor summary: The text describes an ASR web application and interface that handles large volumes of audio files, transcriptions, and voice activity detection using neural networks and machine learning techniques.
http://arxiv.org/abs/2406.18134v1
Compressor summary: The paper evaluates how well large language models handle relevant or irrelevant retrieved context without explicit relevance judgments, finding that fine-tuning on mixed context improves robustness to retrieval inaccuracies.
http://arxiv.org/abs/2406.18133v1
Compressor summary: ConvoCache is a caching system for spoken chatbots that reuses responses to similar prompts, improving speed and cost-efficiency.
http://arxiv.org/abs/2406.18131v1
Compressor summary: The text proposes a novel architecture that reduces information leakage in unsupervised sequential disentanglement by using a subtraction inductive bias with only one sample, improving performance on various data-modality benchmarks.
http://arxiv.org/abs/2406.18129v1
Compressor summary: The paper proposes a novel Complex-to-Simple framework to improve sim-to-real domain adaptation in 3D object detection using fixed-size anchor heads, RoI augmentation, corner-format representation of aleatoric uncertainty, and noise-aware mean teacher method.
http://arxiv.org/abs/2406.18125v1
Compressor summary: The authors present a comprehensive approach to resume classification using large-scale datasets and advanced language models, achieving significant improvements over traditional machine learning methods.
http://arxiv.org/abs/2406.18122v1
Compressor summary: Key points: - Jailbreak attacks aim to evade model safety mechanisms and generate inappropriate content - Existing jailbreak attacks rely on crafting inducement prompts, which are less effective against large models with robust filtering - Retrieval-Augmented Generation (RAG) uses external knowledge bases to improve model capabilities - Poisoned-LangChain (PLC) is a novel indirect jailbreak attack that leverages a poisoned external knowledge base - PLC successfully attacked six different large language models under three scenarios with high success rates Summary: The paper introduces Poisoned-LangChain, an indirect jailbreak attack that exploits a poisoned external knowledge base to deceive large language models and make them generate malicious content.
http://arxiv.org/abs/2406.18120v1
Compressor summary: The paper presents methods for machine translation and speech recognition of code-switched Egyptian Arabic-English, using large language models and achieving significant improvements.
http://arxiv.org/abs/2406.18119v1
Compressor summary: The paper proposes a method to evaluate and improve roster robustness for employee scheduling using machine learning predictions of absenteeism and optimization.
http://arxiv.org/abs/2406.18116v1
Compressor summary: The authors propose BADGE, a framework that uses a large language model (LLM) to generate and evaluate badminton reports automatically, potentially enhancing sports promotion.
http://arxiv.org/abs/2406.18113v1
Compressor summary: The paper presents Mr. BLIP, a simple and effective model that uses image-text pretrained multimodal language models to achieve state-of-the-art results in video moment retrieval tasks without additional input signals or complex architectures.
http://arxiv.org/abs/2406.18108v1
Compressor summary: The paper proposes a method to improve ASR models by adjusting the importance of tokens based on their likelihood of being errors, helping to reduce accuracy loss from transcription errors in training data.
http://arxiv.org/abs/2406.18094v1
Compressor summary: The paper describes an approach to automatically generate "Brief Hospital Course" and "Discharge Instructions" sections for EHRs using LoRA fine-tuning on ClinicalT5-large, achieving a ROUGE-1 score of 0.394.
http://arxiv.org/abs/2406.18088v1
Compressor summary: The study introduces a new multimodal opinion expression identification task, using text and speech inputs, and proposes an LLM-driven method that significantly improves performance.
http://arxiv.org/abs/2406.18085v1
Compressor summary: The paper proposes global and local knowledge constraints for multilingual knowledge graph completion, improving the performance on Hits@1 and Hits@10 by over 12% and 16%.
http://arxiv.org/abs/2406.18082v1
Compressor summary: Key points: - The paper presents an efficient Planner-Action framework for AI agents on edge devices - It uses Phi-3 Mini (LLM) for planning and Octopus model for action execution - It employs model fine-tuning instead of in-context learning to reduce costs and improve response times - It uses multi-LoRA training to handle multi-domain queries Summary: The paper introduces an on-device Planner-Action framework that separates planning (Phi-3 Mini) and action execution (Octopus) for AI agents, using model fine-tuning and multi-LoRA training to optimize performance and flexibility.
http://arxiv.org/abs/2406.18079v1
Compressor summary: MFDNet is a lightweight network that uses Laplacian Pyramid to decompose images into low and high-frequency bands for flare removal while preserving image quality.
http://arxiv.org/abs/2406.18078v1
Compressor summary: The paper proposes a self-training framework with a pseudo-label scorer for aspect-based sentiment analysis, which improves performance by filtering out mismatches and using a human-annotated comparison dataset.
http://arxiv.org/abs/2406.18074v1
Compressor summary: DSPNet is a novel few-shot semantic segmentation method for medical imaging that constructs high-fidelity prototypes for object foreground and background using multi-modal clustering and channel-aware regulation.
http://arxiv.org/abs/2406.18070v1
Compressor summary: The authors present EgoVideo, a novel egocentric foundation model for various tasks in the Ego4D and EPIC-Kitchens challenges, showcasing its versatility and effectiveness.
http://arxiv.org/abs/2406.18068v1
Compressor summary: The paper presents a method to generate realistic co-speech facial expressions and upper-body gestures for digital characters using RGB video data and multimodal learning.
http://arxiv.org/abs/2406.18067v1
Compressor summary: MEJEM is a new dialect identification model that uses energy margin loss to detect out-of-distribution data better than existing methods.
http://arxiv.org/abs/2406.18066v1
Compressor summary: The paper presents a method to learn how to filter nonlinear dynamical systems using variational inference, which can improve existing filtering techniques like the ensemble Kalman filter.
http://arxiv.org/abs/2406.18064v1
Compressor summary: vRAG-Eval is a system that grades answer quality in RAG applications using binary scores, and shows promising alignment with human experts for GPT-4.
http://arxiv.org/abs/2406.18060v1
Compressor summary: The paper introduces AdaZeta, a framework to improve the performance, convergence, and memory efficiency of fine-tuning large language models using zeroth-order methods.
http://arxiv.org/abs/2406.18053v1
Compressor summary: BrHPO is a new HRL algorithm that allows both levels to communicate and correct errors, improving subgoal reachability and performance on long-horizon tasks.
http://arxiv.org/abs/2406.18051v1
Compressor summary: ViT-1.58b is a novel, low-precision quantized ViT model that achieves comparable performance to full-precision models while reducing memory and computational costs for resource-constrained environments.
http://arxiv.org/abs/2406.18050v1
Compressor summary: MGNet predicts pedestrian trajectories by forecasting intermediate goals using a CVAE, attention module, and goal evaluator, improving safety and efficiency in applications like autonomous vehicles.
http://arxiv.org/abs/2406.18048v1
Compressor summary: The paper introduces ScanFormer, an efficient image-text alignment model that iteratively extracts linguistically-relevant visual patches from images using informativeness prediction and patch selection.
http://arxiv.org/abs/2406.18045v1
Compressor summary: PharmGPT is a specialized LLM suite that outperforms general models on bio-pharmaceutical and chemical tasks, opening new possibilities for NLP in these domains.
http://arxiv.org/abs/2406.18043v1
Compressor summary: GenRL is a multimodal framework that connects vision-language models with generative world models for reinforcement learning, enabling generalist embodied agents to learn diverse tasks in different domains.
http://arxiv.org/abs/2406.18038v1
Compressor summary: The MT2ST framework combines the benefits of multi-task and single-task learning in word embedding training, reducing overfitting and training time significantly.
http://arxiv.org/abs/2406.18037v1
Compressor summary: This paper proposes SMG-Learning, a novel training paradigm for deep networks to improve sequential learning from different medical image sites by using Parallel Gradient Alignment and Site-Modulated Diffusion techniques.
http://arxiv.org/abs/2406.18034v1
Compressor summary: The paper introduces DoctorFLAN, a Chinese medical dataset for tuning LLMs to be effective medical assistants who collaborate with doctors, based on a survey of their needs.
http://arxiv.org/abs/2406.18035v1
Compressor summary: The paper introduces a new concept called "local linear recovery" to analyze how deep neural network models can reliably recover target functions at overparameterization.
http://arxiv.org/abs/2406.18033v1
Compressor summary: Soft Q-learning uses value function estimates to derive optimal value bounds and improve training performance.
http://arxiv.org/abs/2406.18031v1
Compressor summary: The article presents a structure flow field method that uses PDEs and predictor-update algorithms to generate real-time high-speed motion information for robotic devices and autonomous vehicles using images and depth measurements.
http://arxiv.org/abs/2406.18027v1
Compressor summary: The proposed framework uses in-context learning to align internal and external knowledge, improving the accuracy of lung lesion information extraction from text using a two-stage approach.
http://arxiv.org/abs/2406.18022v1
Compressor summary: The paper proposes an automated machine learning method for selecting the best Off-Policy Evaluation (OPE) estimator based on synthetic tasks.
http://arxiv.org/abs/2406.18020v1
Compressor summary: MolFusion is a method for combining molecular representations to predict drug properties better by aligning molecular-level and atomic-level information.
http://arxiv.org/abs/2406.18012v1
Compressor summary: The paper introduces OmniAD, a novel network for unsupervised anomaly detection between imperfectly aligned images of infrastructure scenes using refined reverse distillation and new data augmentation strategies.
http://arxiv.org/abs/2406.18011v1
Compressor summary: Expressive Keypoints with Skeleton Transformation and Instance Pooling improve skeleton-based action recognition by capturing subtle human actions using fine-grained joint details.
http://arxiv.org/abs/2406.18002v1
Compressor summary: The paper proposes an algorithm to use limited supervision from LLMs to improve sLLM's generative quality by adaptively trusting or ignoring LLM predictions based on sLLM confidence.
http://arxiv.org/abs/2406.17998v1
Compressor summary: This paper proposes a generative model that simulates changes over time using a probabilistic graphical model, enabling cheap and automatic data generation for training deep vision models in Earth's surface dynamics.
http://arxiv.org/abs/2406.17992v1
Compressor summary: DELD is a method that detects evolving disinformation generated by large language models using pre-trained language models and soft prompts.
http://arxiv.org/abs/2406.17990v1
Compressor summary: Explicit diversity conditions improve question answering system accuracy by generating more diverse and relevant synthetic data, especially in low-resource domains.
http://arxiv.org/abs/2406.17989v1
Compressor summary: This paper studies the learnability and benefits of MLP layers with sparse activations, which are common in neural network architectures but harder to optimize.
http://arxiv.org/abs/2406.17988v1
Compressor summary: DICE is an end-to-end method that uses a Transformer-based architecture to recover 3D hand-face interactions from a single image, achieving state-of-the-art performance and interactive speed.
http://arxiv.org/abs/2406.17987v1
Compressor summary: Cora is a neuro-symbolic AI platform developed by Elemental Cognition that enhances natural language understanding and reasoning for high-stakes domains, outperforming pure LLM or RAG approaches.