This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-23 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.15850v1
Compressor summary: The paper presents AutoAD-Zero, a method to generate audio descriptions for movies and TV series using visual and textual cues without training, which outperforms some fine-tuned models.
http://arxiv.org/abs/2407.15848v1
Compressor summary: The paper introduces BoostMVSNeRFs, a method to enhance MVS-based NeRF rendering quality in large-scale scenes by selecting and combining multiple cost volumes during volume rendering without additional training.
http://arxiv.org/abs/2407.15845v1
Compressor summary: The paper proposes a novel method for reconstructing data from trained classifiers in realistic settings, enabling data reconstruction beyond visual data and improving privacy risk awareness.
http://arxiv.org/abs/2407.15843v1
Compressor summary: The paper proposes object-centric slot representations in Bird's eye view for self-driving, which outperform other approaches in driving tasks and forecasting future scenes.
http://arxiv.org/abs/2407.15844v1
Compressor summary: The paper proposes an end-to-end method for predicting 3D hand meshes from RGB images, preserving contextual and scale information, and shows its effectiveness in experiments.
http://arxiv.org/abs/2407.15842v1
Compressor summary: Artist is a training-free method that controls content and style generation in diffusion models for text-driven stylization by separating denoising processes and suppressing irrelevant content.
http://arxiv.org/abs/2407.15841v1
Compressor summary: SF-LLaVA is a training-free video large language model that combines spatial and temporal features from video frames using a SlowFast design, achieving high performance on various video tasks.
http://arxiv.org/abs/2407.15838v1
Compressor summary: MMInstruct is a diverse visual instruction tuning dataset that improves the performance of Vision Large Language Models by addressing existing limitations in instruction quality and diversity.
http://arxiv.org/abs/2407.15837v1
Compressor summary: Latent MIM combines masked image modeling (MIM) with latent space reconstruction to learn high-level visual representations from unlabeled images, addressing challenges like representation collapsing and region correlation.
http://arxiv.org/abs/2407.15835v1
Compressor summary: The paper introduces dMel, a simple speech tokenization method that outperforms existing methods on speech recognition and synthesis tasks.
http://arxiv.org/abs/2407.15828v1
Compressor summary: This study introduces a new, large-scale, spontaneous, and acoustically clean spoken dialogue corpus for human-AI interactions called J-CHAT and demonstrates its effectiveness in improving dialogue generation models.
http://arxiv.org/abs/2407.15820v1
Compressor summary: This paper explores how different discount factors affect reinforcement learning performance and suggests using shorter horizons for partially observable environments.
http://arxiv.org/abs/2407.15819v1
Compressor summary: Chain-of-Sight is a module that accelerates MLLMs' pre-training by efficiently using visual details and extending visual tokens, reducing training time without sacrificing performance.
http://arxiv.org/abs/2407.15816v1
Compressor summary: The authors propose a multi-task algorithm for predicting multiple DNA alterations from H&E images, which could help prioritize samples for molecular testing and improve the detection of rare mutations.
http://arxiv.org/abs/2407.15814v1
Compressor summary: The paper investigates how well language models can interpret uncertainty expressions and map them to probabilities, finding that most models perform similarly to humans but are more biased by prior knowledge.
http://arxiv.org/abs/2407.15811v1
Compressor summary: The authors propose a low-cost method for training text-to-image generative models using deferred masking, synthetic images, and improved transformer architecture, achieving competitive results with much lower computational and financial costs than existing approaches.
http://arxiv.org/abs/2407.15810v1
Compressor summary: The authors introduce a new face dataset for adversarial audits and robust FRS training, highlight disparities in gender prediction accuracy across Global North and South, and propose low-resource bias mitigation techniques using few-shot and contrastive learning.
http://arxiv.org/abs/2407.15806v1
Compressor summary: The paper introduces FSboard, the largest fingerspelling recognition dataset for American Sign Language, collected from Deaf signers using mobile cameras, and presents a baseline model achieving 11.1% CER.
http://arxiv.org/abs/2407.15798v1
Compressor summary: The EMC framework enhances the generation of contextually appropriate and diverse listener facial reactions based on the speaker's multimodal behaviour, emotional context, and robustness to missing modalities.
http://arxiv.org/abs/2407.15795v1
Compressor summary: AdaCLIP is a vision-language model that uses learnable prompts to detect anomalies in images from unseen categories, achieving better performance than other methods.
http://arxiv.org/abs/2407.15793v1
Compressor summary: The paper proposes a novel approach called Continual Generative training for Incremental prompt-Learning that uses generative replay to adapt Vision-Language Models to new tasks while preserving zero-shot capabilities.
http://arxiv.org/abs/2407.15792v1
Compressor summary: The paper proposes an algorithm that estimates the means of mixtures with outliers, achieving optimal error guarantees with minimal overhead and leveraging mixture structure when possible.
http://arxiv.org/abs/2407.15788v1
Compressor summary: The paper introduces a system using Large Language Models to extract company tickers, sentiment analysis, and summaries from raw financial news without relying on pre-structured data feeds.
http://arxiv.org/abs/2407.15787v1
Compressor summary: The authors propose an unsupervised learning method to synthesize mastoidectomy volumes from preoperative CT scans for cochlear implant intraoperative navigation.
http://arxiv.org/abs/2407.15786v1
Compressor summary: LICORICE is a novel RL algorithm that learns interpretable policies using few human labels and concept ensembles.
http://arxiv.org/abs/2407.15780v1
Compressor summary: The paper studies how hard it is to explain different ML models with transparent mechanisms, focusing on abductive and contrastive problems.
http://arxiv.org/abs/2407.15773v1
Compressor summary: The paper proposes a new test-time adaptation method called STAMP, which uses a stable memory bank to improve recognition and outlier rejection for both known and unknown classes during inference.
http://arxiv.org/abs/2407.15763v1
Compressor summary: The paper proposes a method for open-world object detection that uses self-supervised learning and virtual outlier synthesis to detect anomalous objects without relying on class labels, achieving state-of-the-art results on various datasets.
http://arxiv.org/abs/2407.15762v1
Compressor summary: CLP is a general framework for finetuning language models on multiple objectives, enabling them to learn from feedback and balance conflicting goals like creativity and safety.
http://arxiv.org/abs/2407.15756v1
Compressor summary: Model editing helps deep learning models generalize better to distribution shifts in uranium ore concentrate classification.
http://arxiv.org/abs/2407.15754v1
Compressor summary: LongVideoBench is a new question-answering benchmark for multimodal models that tests their ability to understand and reason over long videos with subtitles, posing challenges even for advanced proprietary models.
http://arxiv.org/abs/2407.15738v1
Compressor summary: UGS and LDS are proposed methods that improve model accuracy and reduce training time in non-IID settings and straggler effects in DDL systems by optimizing the mini-batch sampling process.
http://arxiv.org/abs/2407.15736v1
Compressor summary: OMoS-QA is a dataset for immigration counseling that contains questions, documents, and answers in German and English, which is used to compare 5 pretrained LLMs on extractive question answering tasks with favorable trade-offs between precision and recall.
http://arxiv.org/abs/2407.15734v1
Compressor summary: TaskGen is a framework that uses agents and tasks to solve various problems with high success rates, by managing information efficiently and reducing verbosity.
http://arxiv.org/abs/2407.15731v1
Compressor summary: The IIMM measures how much a vision-language model will learn or forget after fine-tuning and can help predict performance gains and losses.
http://arxiv.org/abs/2407.15724v1
Compressor summary: Maximizing dataset diversity measured by $A$ (generalized entropy) improves image classification performance in deep learning, especially for medical imaging.
http://arxiv.org/abs/2407.15723v1
Compressor summary: The authors propose a new approach to generate floorplans with numerical constraints using data structures and evaluate it with a new dataset and benchmarks.
http://arxiv.org/abs/2407.15720v1
Compressor summary: The study investigates how large language models perform on various composite tasks, finding that they handle simpler tasks well but struggle with complex ones that require multiple steps of reasoning.
http://arxiv.org/abs/2407.15719v1
Compressor summary: The authors propose GFE-Mamba, a classifier that uses generative feature extraction to integrate multimodal data from assessment scales, MRI, and PET for accurate prediction of Alzheimer's disease progression in patients with mild cognitive impairment.
http://arxiv.org/abs/2407.15711v1
Compressor summary: AssistantBench is a benchmark for testing language agents on complex web tasks, showing their limitations and introducing a new agent called SeePlanAct that performs better than others.
http://arxiv.org/abs/2407.15708v1
Compressor summary: Swin Spikeformer is a novel model for dynamic scene reconstruction from spike streams that uses shifted window self-attention and temporal spike attention to achieve state-of-the-art performance in high-speed imaging with spike cameras.
http://arxiv.org/abs/2407.15707v1
Compressor summary: The paper proposes a meta-tracker that predicts the best visual tracker for a given video sequence based on initial frames and outperforms existing trackers on various benchmarks.
http://arxiv.org/abs/2407.15703v1
Compressor summary: The paper proposes a Transformer+Denoising Diffusion model that estimates probability density distribution for regression problems in science, using astronomical observations as an example.
http://arxiv.org/abs/2407.15694v1
Compressor summary: The paper investigates AI-generated text detection for Hindi, evaluating 26 LLMs, introducing a new dataset ($AG_{hi}$), and proposing a detectability index ($ADI_{hi}$).
http://arxiv.org/abs/2407.15680v1
Compressor summary: HaloQuest is a new dataset for testing vision-language models on multimodal hallucination using synthetic images and real ones, helping to improve their reliability and performance.
http://arxiv.org/abs/2407.15671v1
Compressor summary: The paper argues for more attention to philosophical aspects of AI technology and its use, and criticizes common theories of knowledge that influence current AI practices.
http://arxiv.org/abs/2407.15668v1
Compressor summary: SLVideo is a software that recognizes both hand and facial signs in Sign Language videos, enabling searching and creating a thesaurus of similar videos.
http://arxiv.org/abs/2407.15648v1
Compressor summary: The paper proposes a class-agnostic tree-transformer framework that predicts sequential assembly actions for 3D objects with primitive bricks from multi-view images using synthetic-to-real transfer learning and action-to-silhouette projection.
http://arxiv.org/abs/2407.15646v1
Compressor summary: The study evaluates the effect of Gaussian blur on image sharpness and object detection performance in automotive simulation using Virtual KITTI dataset.
http://arxiv.org/abs/2407.15645v1
Compressor summary: Psychometric alignment is a metric that measures how well language models reflect human knowledge distribution and can be used to improve their performance in various domains.
http://arxiv.org/abs/2407.15642v1
Compressor summary: Cinemo is a novel image animation method that improves motion controllability, temporal consistency, and smoothness by learning motion residuals, using a structural similarity index, and refining noise with discrete cosine transformation.
http://arxiv.org/abs/2407.15626v1
Compressor summary: The paper proposes a Reinforcement Learning framework for Visual Odometry, which adapts dynamically based on real-time conditions and reduces reliance on heuristic design choices.
http://arxiv.org/abs/2407.15621v1
Compressor summary: RadioRAG is an end-to-end framework that uses real-time online data to improve the accuracy of LLMs in answering radiology-specific questions, showing consistent improvements across various models.
http://arxiv.org/abs/2407.15613v1
Compressor summary: The paper proposes a network to extract and align multi-view semantic concepts from documents and images for better document-based zero-shot learning.
http://arxiv.org/abs/2407.15612v1
Compressor summary: The paper proposes using GPT-4 to automatically identify moves in written discourse by creating prompts informed by linguistic expertise.
http://arxiv.org/abs/2407.15611v1
Compressor summary: The paper introduces DMC, a feature selection method for high-dimensional small datasets that considers both feature values and the distribution of observations in the response variable, and uses GAwAR to find the best combination of features for binary classification.
http://arxiv.org/abs/2407.15608v1
Compressor summary: StylusAI is a new architecture that uses diffusion models to blend handwriting styles between English and German, improving legibility and diversity while outperforming existing models on two datasets.
http://arxiv.org/abs/2407.15605v1
Compressor summary: The paper evaluates how perspective changes affect foundation models' performance in recognizing fine-grained human activities, comparing different architectures and strategies for handling temporal information.
http://arxiv.org/abs/2407.15595v1
Compressor summary: Discrete Flow Matching is a novel generative model for discrete data that improves quality and efficiency compared to previous methods.
http://arxiv.org/abs/2407.15593v1
Compressor summary: The paper proposes a data-driven active localization method with viewpoint selection and self-supervised training that outperforms existing methods and can be integrated into real-world robotics applications, along with an open-source implementation.
http://arxiv.org/abs/2407.15590v1
Compressor summary: UMBEnet is a brain-like network that processes human emotions using multiple sensory modalities, achieving state-of-the-art results in facial expression recognition.
http://arxiv.org/abs/2407.15589v1
Compressor summary: This paper studies object-centric representations for visual question answering and compares them with foundation models on synthetic and real data.
http://arxiv.org/abs/2407.15588v1
Compressor summary: ERAlign is an unsupervised cross-lingual entity alignment framework that uses semantic textual features and a verification process to achieve near-perfect alignment despite noisy data.
http://arxiv.org/abs/2407.15580v1
Compressor summary: aMCL is a learning method that combines simulated annealing with MCL to improve prediction diversity and avoid suboptimal local minima in ambiguous tasks.
http://arxiv.org/abs/2407.15569v1
Compressor summary: The paper explores how RAFT, a method that combines chain-of-thought with supervised fine-tuning and retrieval augmented generation, improves the performance and reasoning abilities of generative dialogue models across various tasks and languages.
http://arxiv.org/abs/2407.15566v1
Compressor summary: HAP-VR is a framework to improve video retrieval by addressing challenges in similarity measure and loss estimation, achieving better performance than existing methods on benchmark datasets.
http://arxiv.org/abs/2407.15556v1
Compressor summary: The paper introduces SETTP, a method for effective style transfer in low-resource scenarios, which learns and transfers source style prompts and uses instance-level prompts to reduce semantic bias.
http://arxiv.org/abs/2407.15554v1
Compressor summary: DNMap is a storage-efficient method for large-scale 3D neural mapping that uses discrete representations, component vectors, and low-resolution continuous embeddings.
http://arxiv.org/abs/2407.15549v1
Compressor summary: The text discusses how targeted latent adversarial training (LAT) can improve the robustness of large language models (LLMs) to various undesirable behaviors, such as jailbreaking, backdoors, and unlearning specific knowledge.
http://arxiv.org/abs/2407.15545v1
Compressor summary: The paper proposes a memory-efficient method for neural network training by saving output tensors instead of input tensors in pointwise nonlinearity layers, which improves performance without sacrificing accuracy.
http://arxiv.org/abs/2407.15540v1
Compressor summary: The authors propose a lightweight auto-encoder network that compresses 3D maps while preserving descriptor matching performance for camera relocalization.
http://arxiv.org/abs/2407.15537v1
Compressor summary: EPO is a new method for Constrained Reinforcement Learning that uses adaptive penalties generated by a Penalty Metric Network to balance policy performance and constraint satisfaction efficiently.
http://arxiv.org/abs/2407.15531v1
Compressor summary: The paper proposes a new double deep learning-based architecture that efficiently codes event data from event cameras and performs classification using point cloud representations, achieving similar performance to original data even with lossy compression.
http://arxiv.org/abs/2407.15527v1
Compressor summary: The paper proposes a new deep learning model that incorporates human-understandable concepts and enables users to verify its decision-making process before deployment.
http://arxiv.org/abs/2407.15526v1
Compressor summary: The paper introduces Knowledge Recycling, a pipeline that optimizes synthetic data generation and use for training classifiers, improving their quality, usefulness, and privacy properties.
http://arxiv.org/abs/2407.15525v1
Compressor summary: The paper presents a framework for efficient importance sampling of mini-batch samples for gradient estimation from multiple distributions, which adapts to noisy gradients and vector-valued gradients, leading to faster training convergence.
http://arxiv.org/abs/2407.15516v1
Compressor summary: The study explores how dropping MLP and attention layers from LLMs at inference time can speed up their performance with minimal loss in accuracy.
http://arxiv.org/abs/2407.15512v1
Compressor summary: The paper proposes two new methods for handling missing data in multi-sensor machine learning models for Earth observation and evaluates their effectiveness using experiments on three datasets.
http://arxiv.org/abs/2407.15510v1
Compressor summary: The text discusses abstraction as a key element for AI, introduces the concept of anti-unification, and proposes an algebraic approach to this process based on recent applications in similarity and analogy.
http://arxiv.org/abs/2407.15508v1
Compressor summary: Quantized Large Language Models (LLMs) can achieve excellent performance while reducing resource consumption using innovative methods based on Learnable Singular-value Increment (LSI).
http://arxiv.org/abs/2407.15507v1
Compressor summary: Key points: - The text is about a novel approach to generate high-resolution images using diffusion models. - The new method avoids generating and averaging multiple overlapping predictions by shifting non-overlapping windows over time. - The method achieves better computational efficiency and image quality than existing techniques. Summary: The text presents a new approach for high-resolution image generation with diffusion models that improves efficiency and quality by shifting non-overlapping windows instead of averaging multiple predictions.
http://arxiv.org/abs/2407.15504v1
Compressor summary: The paper proposes a framework for compressing prompts for large language models, shows that query-aware compression is crucial, and introduces a new method to improve the performance of existing schemes.
http://arxiv.org/abs/2407.15502v1
Compressor summary: The paper introduces Web Rendering Parameters Generation (WebRPG), a new task that automates web page visualization based on their HTML code, and develops a dataset, baseline models, and evaluation methods for it.
http://arxiv.org/abs/2407.15500v1
Compressor summary: TextureCrop improves synthetic image detection accuracy by focusing on high-frequency parts of images where generation artifacts are more common, outperforming center cropping and resizing methods in detecting harmful AI-generated content.
http://arxiv.org/abs/2407.15498v1
Compressor summary: The paper proposes a corpus refining strategy for Chinese Spelling Correction that combines data from two augmentation methods and filters noisy data to improve accuracy and calibration.
http://arxiv.org/abs/2407.15489v1
Compressor summary: This paper compares multilingual pretraining objectives in a controlled way, finding that the model architecture and multilingual translation are key factors for success.
http://arxiv.org/abs/2407.15488v1
Compressor summary: The paper introduces DiffX, a novel diffusion model that generates cross-modal images (RGB+X) guided by layouts and text descriptions using a modality-shared latent space and gated cross-attention.
http://arxiv.org/abs/2407.15487v1
Compressor summary: This paper studies why vision-language models struggle with compositional image understanding and proposes a method, in-context learning, to improve their performance on complex reasoning tasks.
http://arxiv.org/abs/2407.15481v1
Compressor summary: The proposed method adjusts foreground illumination in composite images using ground-truth reflectance guidance and diverse reflectance generation, producing multiple harmonized results.
http://arxiv.org/abs/2407.15479v1
Compressor summary: The paper explores how to use pre-trained networks for recognizing object affordances without modifying them and tests two methods that achieve high accuracy.
http://arxiv.org/abs/2407.15476v1
Compressor summary: The paper proposes a multi-objective deep reinforcement learning framework for traffic allocation in e-commerce platforms that balances multiple objectives and handles long-term value and cold start issues.
http://arxiv.org/abs/2407.15472v1
Compressor summary: Key points: - The paper proposes a new method to learn from raw multispectral images using three techniques: raw spectral constancy, MSFA-preserving transformations, and raw-mixing. - The method improves classification performance and reduces computational cost compared to existing methods. Summary: The paper presents a novel approach for multispectral image classification that uses raw images and three techniques to enhance discriminant and illumination-robust features.
http://arxiv.org/abs/2407.15459v1
Compressor summary: The Text-to-Battery Recipe (T2BR) protocol uses natural language processing to automatically extract recipes for LiFePO4 batteries from research papers, providing valuable insights and accelerating innovation.
http://arxiv.org/abs/2407.15452v1
Compressor summary: GraphScale improves scalability and efficiency of graph neural networks and node representation learning on large graphs by separating data storage and computation in a distributed setting.
http://arxiv.org/abs/2407.15447v1
Compressor summary: SIGMA is a new method for pretraining videos that uses optimal transport to generate semantic and temporal features, leading to better video representations than existing methods.
http://arxiv.org/abs/2407.15446v1
Compressor summary: The paper proposes a method to realistically insert humans into various scenes using semantic masks and subject-conditioned inpainting, achieving high realism and preserving background and identity.
http://arxiv.org/abs/2407.15441v1
Compressor summary: The paper presents a system to detect and fix hallucination errors in LLMs using NER, NLI, SBD, and a decision tree, with a rewriting mechanism that balances precision, speed, and cost.
http://arxiv.org/abs/2407.15439v1
Compressor summary: The paper studies how to fairly choose among options in situations where receiving feedback is delayed or correlated with rewards, such as crowdsourcing and online advertising.
http://arxiv.org/abs/2407.15435v1
Compressor summary: The paper proposes a method to use raw 3D models to enhance the visual quality and accuracy of 3D architectural scenes reconstructed using 3D Gaussian Splatting, a mainstream technology in the industry.
http://arxiv.org/abs/2407.15427v1
Compressor summary: The paper presents a novel end-to-end method using YOLOv5 and multiscale modules to improve PCB defect detection accuracy, generalization, and real-time performance, outperforming existing methods.
http://arxiv.org/abs/2407.15425v1
Compressor summary: The paper studies how much information large transformer models can memorize and generalize using common training algorithms and synthetic data, and proposes a model to design task-specific models with the optimal number of parameters.
http://arxiv.org/abs/2407.15421v1
Compressor summary: Guez et al. (2019) studied how a recurrent neural network trained with reinforcement learning plans in Sokoban, finding that adding extra computation steps at test time improves its performance and reveals planning behavior.
http://arxiv.org/abs/2407.15420v1
Compressor summary: LocoTrack is a fast and accurate model for tracking any point across video sequences, using novel local 4D correlation and a lightweight encoder to overcome matching ambiguities.
http://arxiv.org/abs/2407.15415v1
Compressor summary: LLaST is a framework that uses large language models to improve speech-to-text translation systems by optimizing model architecture and using various techniques like multilingual data augmentation.
http://arxiv.org/abs/2407.15414v1
Compressor summary: This paper proposes a shuffling mechanism for Differentially Private Stochastic Gradient Descent (DPSGD) that enhances model utility without compromising privacy, using permutation invariance and an approximation on sum of lognormal distributions to analyze its performance.
http://arxiv.org/abs/2407.15408v1
Compressor summary: The authors propose a method (CAR) to evaluate and improve the temporal alignment of language and 3D human motion representations in motion-language latent spaces.
http://arxiv.org/abs/2407.15017v1
Compressor summary: This paper analyzes how large language models use, create, and change their knowledge, and identifies the challenges and opportunities for creating trustworthy artificial general intelligence.
http://arxiv.org/abs/2407.15399v1
Compressor summary: The text describes a new attack method on large language models that exploits human conversation strategies to extract harmful information by manipulating the nature of the provided responses.
http://arxiv.org/abs/2407.15396v1
Compressor summary: The paper proposes a novel framework for scene graph generation that considers the semantic diversity of predicates to improve unbiased predictions.
http://arxiv.org/abs/2407.15390v1
Compressor summary: ALLaM is a large Arabic language model that leverages knowledge transfer, vocabulary expansion, and human preferences to achieve state-of-the-art performance in various benchmarks.
http://arxiv.org/abs/2407.15375v1
Compressor summary: The paper introduces ESPADA, a comprehensive and flexible pronunciation dictionary for Spanish with over 628,000 entries and various annotations, to improve speech forced alignment and dialectal research in the Spanish language.
http://arxiv.org/abs/2407.15374v1
Compressor summary: The paper presents a new English linguistic corpus from Twitter posts collected from news agencies and individuals, with annotations and visualizations for studying language patterns.
http://arxiv.org/abs/2407.15369v1
Compressor summary: The SDD framework uses directional characteristics and sparse constraints to improve infrared small target detection, outperforming ten existing methods.
http://arxiv.org/abs/2407.15366v1
Compressor summary: Perspective-taking prompting (PeT) is a new method that helps large language models reduce toxicity and bias by encouraging them to consider different human perspectives and self-correct their responses.
http://arxiv.org/abs/2407.15362v1
Compressor summary: The text describes a new approach (mSTAR) that uses multimodal data from multiple sources, such as images and reports, to improve the performance of computational pathology models on various clinical tasks.
http://arxiv.org/abs/2407.15360v1
Compressor summary: The paper analyzes why transformer-based models struggle with integer multiplication and proposes improvements to enhance their performance and interpretability, outperforming LLMs GPT-4.
http://arxiv.org/abs/2407.15359v1
Compressor summary: The paper describes a hybrid method for generating discharge summaries using NER and GatorTronGPT, achieving 5th place in the "Discharge Me!" Challenge.
http://arxiv.org/abs/2407.15355v1
Compressor summary: The paper proposes ANR, a novel method that combines localized attention and MLP for efficient and accurate implicit neural representation, achieving improved reconstruction results on four datasets.
http://arxiv.org/abs/2407.15354v1
Compressor summary: The VectorFormer is a new camera-based 3D object detector that combines high-resolution vectors with low-resolution Bird's-Eye-View representations to improve 3D geometry detection efficiency and accuracy in multi-camera images.
http://arxiv.org/abs/2407.15353v1
Compressor summary: The paper proposes a customized retrieval augmented generation framework with domain-specific techniques for EDA tool documentation question-answering, and releases an evaluation benchmark ORD-QA.
http://arxiv.org/abs/2407.15352v1
Compressor summary: The authors introduce MAVEN-Fact, a large and high-quality event factuality detection dataset based on the MAVEN dataset, which helps improve understanding of textual events.
http://arxiv.org/abs/2407.15351v1
Compressor summary: The paper proposes using a Large Language Model as a Bayesian Inference module to improve Graph Neural Network interpretability and address learning bias in unsupervised models.
http://arxiv.org/abs/2407.15349v1
Compressor summary: RoadPainter is a new method for accurately detecting lane centerlines in road scenes using multi-view images and improving topological reasoning with additional points and an optional SD map module.
http://arxiv.org/abs/2407.15346v1
Compressor summary: DKA is a framework that improves KVQA by disentangling knowledge acquisition and using LLM feedback to generate simple sub-questions for accurate answers.
http://arxiv.org/abs/2407.15343v1
Compressor summary: Multi-prompt decoding improves text generation by using many candidates from a prompt bank and selecting the best one with Minimum Bayes Risk decoding.
http://arxiv.org/abs/2407.15341v1
Compressor summary: The proposed CFICL method enhances sentiment recognition and improves DimABSA prediction accuracy using in-context learning and similarity-based example selection.
http://arxiv.org/abs/2407.15337v1
Compressor summary: The paper proposes a method to reconstruct 3D scenes from LWIR and RGB images using a multispectral radiance field, improving thermal super-resolution and object visibility.
http://arxiv.org/abs/2407.15328v1
Compressor summary: The paper proposes a novel framework for diffusion models that reduces data memorization risk using iterative ensemble training and anti-gradient control, improving performance on four datasets.
http://arxiv.org/abs/2407.15325v1
Compressor summary: ODYSSEY is a new framework for Minecraft agents that uses a large language model and an open-world skill library to explore the game world and solve various tasks.
http://arxiv.org/abs/2407.15317v1
Compressor summary: Open-CD is a comprehensive change detection toolbox with various methods, components, and analysis scripts that aims to facilitate research and collaboration in the field.
http://arxiv.org/abs/2407.15312v1
Compressor summary: The Fuzzy-guided Multi-granularity Deep Neural Network (FMDNN) is a novel approach to histopathological image classification that mimics the multi-granular diagnostic method of pathologists and uses fuzzy logic to handle redundant information, improving accuracy and interpretability.
http://arxiv.org/abs/2407.15302v1
Compressor summary: The authors improved the accuracy and reliability of infrared thermometers by integrating machine learning algorithms with infrared thermography, which can help diagnose infectious diseases like COVID-19.