This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-16 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.10973v1
Compressor summary: Key points: - The paper introduces Make-An-Agent, a policy generator that uses conditional diffusion models to create policies from behavior embeddings - The model is trained on policy network checkpoints and their trajectories - It can generate versatile and scalable policies with few-shot demonstrations - It works on various tasks, domains, and robots, including real-world locomotion Summary: Make-An-Agent is a novel policy generator that creates versatile and scalable policies from few-shot demonstrations using conditional diffusion models and behavior embeddings.
http://arxiv.org/abs/2407.10972v1
Compressor summary: VGBench is a benchmark for testing Large Language Models' ability to understand and generate vector graphics in various formats and contexts.
http://arxiv.org/abs/2407.10971v1
Compressor summary: ValueWalk is a Bayesian IRL method that simplifies reward recovery by focusing on Q-values instead of rewards, making computation faster and enabling efficient sampling using Hamiltonian Monte Carlo.
http://arxiv.org/abs/2407.10969v1
Compressor summary: Q-Sparse is an efficient method for training large language models that achieves comparable results while reducing inference time and costs, and it works well with 1-bit LLMs like BitNet b1.58.
http://arxiv.org/abs/2407.10967v1
Compressor summary: BECAUSE is an algorithm that uses causal representation for states and actions in offline reinforcement learning to reduce objective mismatch and improve performance.
http://arxiv.org/abs/2407.10964v1
Compressor summary: FUNGI is a simple method to enhance vision encoders' features using self-supervised gradients, leading to consistent performance improvements across various tasks and domains.
http://arxiv.org/abs/2407.10960v1
Compressor summary: FLUTE is a fast lookup table engine for large language models that uses weight-only quantization and offline restructuring to reduce memory movement, enabling faster inference with competitive quantization performance.
http://arxiv.org/abs/2407.10958v1
Compressor summary: InVi is a method to insert or replace objects in videos using text-to-image models, addressing challenges of quality control, blending, and temporal coherence by employing a two-step process and extended-attention layers.
http://arxiv.org/abs/2407.10957v1
Compressor summary: The Ref-AVS task introduces a new way to segment objects in visual scenes using natural language expressions with audio and visual cues, and a new method is proposed that effectively utilizes these multimodal cues for precise segmentation.
http://arxiv.org/abs/2407.10956v1
Compressor summary: Spider2-V is a benchmark for testing multimodal agents' ability to automate data science and engineering tasks in real-world enterprise software systems.
http://arxiv.org/abs/2407.10953v1
Compressor summary: The paper introduces a multilingual dataset for mutual reinforcement effect research and a method to translate it using large language models, leading to better open-domain information extraction with a unified model.
http://arxiv.org/abs/2407.10949v1
Compressor summary: The text explains how researchers constructed a Transformer-based ELIZA chatbot to understand the mechanisms underlying naturalistic conversations.
http://arxiv.org/abs/2407.10947v1
Compressor summary: The paper proposes using text cues from image captions to improve audio guidance for segmenting sounding objects in visual scenes.
http://arxiv.org/abs/2407.10944v1
Compressor summary: The authors propose a scalable method for extracting naturalistic user feedback from chat data and show that it improves language model performance and alignment with human preferences.
http://arxiv.org/abs/2407.10937v1
Compressor summary: IDOL is a novel method for generating high-quality human-centric videos and depth maps using dual-modal U-Net and motion consistency losses.
http://arxiv.org/abs/2407.10935v1
Compressor summary: STARS is a self-supervised method for 3D action recognition that improves semantic clustering and generalization using encoder-decoder masked prediction and nearest-neacher contrastive learning.
http://arxiv.org/abs/2407.10930v1
Compressor summary: The paper proposes a method to fine-tune NLP systems with multiple stages by optimizing both the language models and prompting strategies together, without gold labels for intermediate stages.
http://arxiv.org/abs/2407.10923v1
Compressor summary: The paper proposes a novel text-guided out-painting framework using a State-Space Model called Mamba to generate 360-degree images from narrow field of view images, improving visual continuity and context richness with two modules: VCR and GMA.
http://arxiv.org/abs/2407.10920v1
Compressor summary: CulturalVQA is a new benchmark for assessing vision-language models' cultural understanding across 11 countries, revealing disparities in their performance and areas of improvement.
http://arxiv.org/abs/2407.10918v1
Compressor summary: Key points: - Object recognition systems can be fooled by adversarial perturbations - Part-based models improve robustness but lack part annotations - PIN++ provides part segmentation annotations for ImageNet-1K categories - MPM is a multi-scale part-supervised model that learns from part annotations - MPM outperforms baselines on adversarial, corruption, and out-of-distribution tests Summary: The paper introduces PIN++, a dataset with part segmentation annotations for ImageNet-1K, and MPM, a multi-scale part-supervised model that improves object recognition robustness against various challenges.
http://arxiv.org/abs/2407.10916v1
Compressor summary: The paper introduces H2GB, a new graph benchmark that combines heterophily and heterogeneity, along with UnifiedGT and H2G-former, a model variant that excels at this challenge.
http://arxiv.org/abs/2407.10910v1
Compressor summary: DataDream is a framework that uses few-shot examples to generate realistic classification datasets, improving image classification accuracy with fewer real data examples.
http://arxiv.org/abs/2407.10902v1
Compressor summary: This research aims to create a system that can accurately identify hand gestures representing numbers using computer vision, machine learning, and OpenCV techniques to improve human interaction with technology.
http://arxiv.org/abs/2407.10878v1
Compressor summary: The authors use deep neural networks to identify drivers of natural gas demand and create a counterfactual scenario without the Russian-Ukrainian war to estimate its impact on German energy sectors.
http://arxiv.org/abs/2407.10876v1
Compressor summary: The paper proposes RepVF, a unified representation for concurrent multi-task 3D perception in autonomous driving, and RFTR, a network that exploits the connections between tasks using a hierarchical structure of queries, reducing computational redundancy and feature competition.
http://arxiv.org/abs/2407.10870v1
Compressor summary: GPT-4o is a large AI model that can understand hand gestures from ultrasound without much training, making it useful for various applications.
http://arxiv.org/abs/2407.10867v1
Compressor summary: The paper proposes a method to certify Graph Neural Networks against data poisoning and backdoor attacks by using the neural tangent kernel and a novel mixed-integer linear program.
http://arxiv.org/abs/2407.10862v1
Compressor summary: The paper proposes a new method, R3D-AD, for detecting anomalies in 3D parts using a diffusion model that obscures the defects and learns to correct them, as well as a simulation strategy to generate diverse defect shapes for better generalization.
http://arxiv.org/abs/2407.10860v1
Compressor summary: The Human-Centric Transformer is a method for recognizing actions in videos across different domains by focusing on human cues and their interactions with contexts.
http://arxiv.org/abs/2407.10855v1
Compressor summary: Weighted Grouped-Query Attention (WGQA) improves attention mechanisms in language models by introducing new learnable parameters for key and value heads, leading to better performance with minimal inference overhead.
http://arxiv.org/abs/2407.10853v1
Compressor summary: The paper presents a decision framework to help practitioners assess bias and fairness risks in large language models (LLMs) by defining various metrics for different types of risks and use cases.
http://arxiv.org/abs/2407.10844v1
Compressor summary: The paper proposes distribution-free techniques for uncertainty prediction in graph neural networks for molecular property prediction, especially relaxed energy calculations, and evaluates latent distance methods as a well-calibrated and economical approach.
http://arxiv.org/abs/2407.10839v1
Compressor summary: The paper proposes a Reward Model that estimates reward signals from few annotated samples and enables Offline Reinforcement Learning with many reward-free transitions.
http://arxiv.org/abs/2407.10836v1
Compressor summary: The study proposes a novel framework called DG-PINNs, which improves the efficiency of solving inverse problems in PDEs by pre-training with only data loss and then fine-tuning with a composite loss function.
http://arxiv.org/abs/2407.10835v1
Compressor summary: The paper compares three exploration methods in deep transfer learning for knowledge transfer, finding that upper confidence bound performs best on a virtual drone problem.
http://arxiv.org/abs/2407.10834v1
Compressor summary: MetaLLM is a framework that intelligently selects the best large language model for classification tasks, improving accuracy and cost-effectiveness.
http://arxiv.org/abs/2407.10831v1
Compressor summary: Event cameras can perceive 3D environments even in extreme conditions, and the proposed temporal event stereo framework uses previous time steps to improve stereo matching performance efficiently.
http://arxiv.org/abs/2407.10829v1
Compressor summary: BiasScanner is a tool that helps users identify and understand media bias in online news articles using a pre-trained language model and a browser plug-in.
http://arxiv.org/abs/2407.10827v1
Compressor summary: The study tracks how circuits in decoder-only large language models evolve during training and finds that their algorithms and components remain consistent across model size and pre-training stages.
http://arxiv.org/abs/2407.10825v1
Compressor summary: Clean-label backdoor attacks are stealthy ways to manipulate deep neural networks by poisoning a small set of target class samples, which can be done with limited information and pose a serious threat when using third-party datasets for training.
http://arxiv.org/abs/2407.10820v1
Compressor summary: The paper proposes a novel logic-based explainer for Monte Carlo tree search (MCTS) used in transportation routing services, which translates user requirements into rigorous logic specifications and provides human-readable explanations of the algorithm's operation.
http://arxiv.org/abs/2407.10817v1
Compressor summary: FLAMe, a family of large language models trained on diverse quality assessment tasks, outperforms proprietary LLMs like GPT-4 in various evaluation benchmarks and is less biased than LLM-as-a-Judge models.
http://arxiv.org/abs/2407.10814v1
Compressor summary: The paper proposes a multi-instance prompt learning framework that combines visual and textual prior knowledge with pre-trained models for few-shot pathology image analysis, improving diagnosis of key patterns.
http://arxiv.org/abs/2407.10810v1
Compressor summary: FabGPT is a multimodal model for IC fabrication that detects defects, analyzes causes, and answers questions on processes using SEM images and text.
http://arxiv.org/abs/2407.10807v1
Compressor summary: The paper proposes a novel natural language data stream classification method using convolutional deep networks to detect fake news from text data encoded as discrete digital signals.
http://arxiv.org/abs/2407.10806v1
Compressor summary: Set-Mixer is a novel network architecture for point cloud recognition that improves robustness to noise corruption by enhancing communication among points and preserving relative spatial information.
http://arxiv.org/abs/2407.10805v1
Compressor summary: Think-on-Graph 2.0 is a framework that uses a knowledge graph to guide language models for better information retrieval and reasoning, improving their accuracy and performance.
http://arxiv.org/abs/2407.10804v1
Compressor summary: Mix-CPT is a new domain adaptation framework for large language models that combines knowledge memorization, utilization, and format alignment with minimal data requirements.
http://arxiv.org/abs/2407.10803v1
Compressor summary: The article proposes a self-supervised learning method for pre-training visual autonomous driving agents that improves their performance compared to classification-based methods.
http://arxiv.org/abs/2407.10802v1
Compressor summary: Key points: - Current optical flow and point-tracking methods rely on synthetic datasets - Event cameras have advantages in challenging visual conditions - A novel self-supervised loss combines contrast maximization with non-linear motion prior - The method improves performance in dense motion estimation and optical flow estimation Summary: The paper proposes a new self-supervised loss that leverages event camera data and enhances performance in motion estimation and optical flow tasks.
http://arxiv.org/abs/2407.10795v1
Compressor summary: The paper introduces a new contrastive decoding method for improving large language models' reasoning performance on diverse languages by skipping some layers to avoid language mismatch.
http://arxiv.org/abs/2407.10794v1
Compressor summary: Graphusion is a zero-shot knowledge graph construction framework that fuses global information from text, improving performance on link prediction and NLP tasks like QA.
http://arxiv.org/abs/2407.10793v1
Compressor summary: GraphEval is a framework to evaluate large language model responses using knowledge graph structures, which helps detect inconsistencies (hallucinations) and potentially correct them.
http://arxiv.org/abs/2407.10784v1
Compressor summary: AdapTable is a novel tabular test-time adaptation method that modifies output probabilities to handle distribution shifts, skewed entropy, latent space decision boundaries, confidence calibration issues, and model bias in tabular data.
http://arxiv.org/abs/2407.10780v1
Compressor summary: The text discusses how natural gradient descent, data decorrelation, and approximate methods for backpropagation can improve neural network training by considering local curvature and reducing correlations in data.
http://arxiv.org/abs/2407.10779v1
Compressor summary: Automated decision-making systems using causal ML models face challenges in complex social environments, which affect their performance in various tasks, as illustrated by a real-world jobseekers dataset.
http://arxiv.org/abs/2407.10775v1
Compressor summary: The paper proposes a framework for constrained reinforcement learning using gradient-based primal-dual algorithms, introduces an exploration-agnostic algorithm called C-PG, and shows its effectiveness on constrained control problems.
http://arxiv.org/abs/2407.10768v1
Compressor summary: MSegRNN is a variant of SegRNN that uses Mamba structure and other enhancements to improve memory efficiency and performance in long-term time series forecasting.
http://arxiv.org/abs/2407.10762v1
Compressor summary: The authors propose a new augmentation method using Neural Radiance Fields to generate diverse synthetic images for improving 6D pose estimation on spacecraft poses.
http://arxiv.org/abs/2407.10761v1
Compressor summary: The paper proposes a physics-informed machine learning model that combines neural networks and physical laws for better smart manufacturing outcomes in laser metal deposition.
http://arxiv.org/abs/2407.10758v1
Compressor summary: The paper proposes a novel method that uses stochastic competition to create sparse task-specific representations in deep networks, enabling efficient continual learning on edge devices with limited resources.
http://arxiv.org/abs/2407.10756v1
Compressor summary: GTPT is a new method for efficient human pose estimation using Transformer that introduces keypoints coarsely and prunes redundant tokens, achieving high performance on COCO and COCO-WholeBody datasets.
http://arxiv.org/abs/2407.10754v1
Compressor summary: Swarms of drones using anomaly detection and adaptive sampling can effectively detect and track occluded targets in dense vegetation with high accuracy and low latency.
http://arxiv.org/abs/2407.10753v1
Compressor summary: OPEN is a novel multi-view 3D object detector that uses object-wise depth information to improve detection accuracy, achieving state-of-the-art performance on nuScenes.
http://arxiv.org/abs/2407.10749v1
Compressor summary: SEED is a 3D DETR method that addresses challenges in point cloud detection by using dual query selection and deformable grid attention modules, achieving state-of-the-art performance on Waymo and nuScenes datasets.
http://arxiv.org/abs/2407.10747v1
Compressor summary: The authors examine how large language models can be used for coding unstructured political texts using codebooks and propose instruction-tuning as a way to improve performance.
http://arxiv.org/abs/2407.10745v1
Compressor summary: Key points: - Conspiracy theories and critical texts need different annotation schemes - Inter-group conflict is important in oppositional narratives - XAI-DisInfodemics corpus contains annotated COVID-19 Telegram messages - NLP-based automatization can distinguish conspiracy vs. critical texts Summary: The paper proposes a new annotation scheme for conspiracy theories and critical texts, uses a multilingual corpus of COVID-19 messages, and shows that NLP can detect inter-group conflict and violence in oppositional narratives.
http://arxiv.org/abs/2407.10738v1
Compressor summary: Key points: - The paper proposes AccDiffusion, a method for patch-wise higher-resolution image generation without training - It decouples the vanilla prompt into patch-content-aware prompts to avoid repeated object generation - It introduces dilated sampling with window interaction for better global consistency Summary: AccDiffusion is a new method that generates high-resolution images from low-resolution ones by using patch-content-aware prompts and dilated sampling, avoiding repeated objects and improving global consistency.
http://arxiv.org/abs/2407.10737v1
Compressor summary: Key points: - The text presents Vi-ST, a model that uses a Vision Transformer and a spatiotemporal convolutional neural network to study the temporal features of visual coding in natural scenes - The model performs well in generalization tests and reveals the importance of each temporal module - The text introduces a new metric for evaluating visual coding based on neuronal activity Summary: Vi-ST is a novel model that uses deep learning to unravel how the brain encodes dynamic visual scenes with neurons, and proposes a new metric for measuring this encoding.
http://arxiv.org/abs/2407.10735v1
Compressor summary: The paper examines the nature of Large Language Models (LLMs) like ChatGPT and argues that they are not autonomous agents but interlocutors or linguistic automata that can create realistic conversation experiences with humans.
http://arxiv.org/abs/2407.10736v1
Compressor summary: The paper discusses the forensic implications of image laundering using Stable Diffusion models and proposes a two-stage detection pipeline to differentiate between real, laundered, and synthetic images.
http://arxiv.org/abs/2407.10734v1
Compressor summary: The paper proposes a method to train deep neural networks (DNNs) efficiently on low-resource microcontrollers using quantized training and dynamic partial gradient updates.
http://arxiv.org/abs/2407.10733v1
Compressor summary: Mask-JEPA is a self-supervised learning framework for segmentation models that combines mask classification architectures with a joint embedding predictive architecture, addressing challenges in extracting representations and training the decoder, and achieving competitive results and adaptability across various datasets.
http://arxiv.org/abs/2407.10730v1
Compressor summary: ConvBench is a benchmark for evaluating and comparing convolution algorithms in deep learning models by assessing 9243 operations from 1097 real-world models and providing detailed performance and execution breakdown graphs.
http://arxiv.org/abs/2407.10725v1
Compressor summary: The paper introduces CLAVE, a framework for assessing Large Language Models' values using two complementary LLMs, and ValEval, a dataset with diverse value systems to benchmark evaluators.
http://arxiv.org/abs/2407.10723v1
Compressor summary: The paper proposes new methods to improve object detection in computer vision models by enhancing their ability to learn and generalize from novel compositions of objects and attributes.
http://arxiv.org/abs/2407.10718v1
Compressor summary: Sibyl is a large language model-based agent framework that uses a global workspace, a multi-agent debate-based jury, and tools to enhance complex reasoning skills.
http://arxiv.org/abs/2407.10709v1
Compressor summary: The paper presents a computer vision method to automatically identify and analyze maps, especially those with designated names of regions and landmarks, using a Convolutional Neural Network and the VinMap dataset.
http://arxiv.org/abs/2407.10707v1
Compressor summary: Our method uses Gaussian Splatting to efficiently render animatable avatars from sparse-view or monocular videos under novel viewpoints, poses, and lightings.
http://arxiv.org/abs/2407.10704v1
Compressor summary: The paper proposes quantization as a regularization technique for vision-language models, reducing overfitting and catastrophic forgetting while maintaining efficiency and generalization, and provides code at GitHub.
http://arxiv.org/abs/2407.10703v1
Compressor summary: The paper proposes a method to translate annotated day events into night events using Diffusion GAN and improves event-based models' performance on night scenes.
http://arxiv.org/abs/2407.10702v1
Compressor summary: Neural Collapse occurs when training deep neural networks for classification tasks with equal feature dimension and class numbers, and is related to saddle points in popular unconstrained feature models.
http://arxiv.org/abs/2407.10701v1
Compressor summary: DocBench is a new benchmark for evaluating large language model-based document reading systems on real documents and questions across various domains.
http://arxiv.org/abs/2407.10696v1
Compressor summary: The paper proposes a new method that combines unsupervised active contour models with deep learning for robust image segmentation, especially useful in histology where labeling data is scarce.
http://arxiv.org/abs/2407.10695v1
Compressor summary: Our method enhances NeRF with image inpainting to handle transient objects and improve volume rendering quality for realistic novel view synthesis using uncontrolled photos in the wild.
http://arxiv.org/abs/2407.10694v1
Compressor summary: The paper proposes FRD-ReID, a person re-identification network that controllably separates clothing-unrelated and clothing-related features using human parsing masks and attention mechanisms.
http://arxiv.org/abs/2407.10688v1
Compressor summary: The paper proposes PPGNN, a method that infers a latent graph structure from node features, refines it with Probability Passing, and applies it to GNNs for non-Euclidean data analysis while accounting for noise and improving efficiency.
http://arxiv.org/abs/2407.10687v1
Compressor summary: FRI-Net is a novel method that reconstructs 2D floorplans from 3D point clouds using an implicit representation with structural regularization, outperforming existing methods on two challenging datasets.
http://arxiv.org/abs/2407.10683v1
Compressor summary: Key points: - Text-to-image generation with diffusion models can produce factually inconsistent images (Image hallucination) - Image hallucination is classified into three types based on language model studies - Factual information from external images is used to generate realistic images using image editing tools Summary: The paper proposes a method to improve text-to-image generation by detecting and correcting three types of image hallucination using factual information from external images and image editing tools.
http://arxiv.org/abs/2407.10681v1
Compressor summary: Geometric Mixup (GeoMix) is a simple and interpretable technique that uses in-place graph editing to synthesize nodes and establish connections for them, improving node classification with limited labeled data using graph neural networks (GNNs).
http://arxiv.org/abs/2407.10671v1
Compressor summary: The Qwen2 series is a new line of large language models and multimodal models that outperform previous models and have strong performance in various tasks across multiple languages.
http://arxiv.org/abs/2407.10670v1
Compressor summary: The paper proposes Query Rewriter+, Knowledge Filter, Memory Knowledge Reservoir, and Retriever Trigger to improve the RAG system's response quality and efficiency by addressing issues such as Information Plateaus, Ambiguity, Irrelevant Knowledge, and Redundant Retrieval.
http://arxiv.org/abs/2407.10663v1
Compressor summary: The paper introduces a new model that uses neural distance fields to capture the shape and movement of the heart chambers and their relation to clinical demography, improving on existing methods for anatomical sequence completion.
http://arxiv.org/abs/2407.10662v1
Compressor summary: The XAI Experience Quality (XEQ) Scale is a new evaluation tool that measures the quality of XAI experiences based on four dimensions: learning, utility, fulfilment and engagement.
http://arxiv.org/abs/2407.10657v1
Compressor summary: The paper proposes a method to generate synthetic formulas for fine-tuning LLMs using NL and shows that validating these formulas improves performance and problem-solving ability.
http://arxiv.org/abs/2407.10655v1
Compressor summary: The paper introduces OVLW-DETR, a fast and flexible open-vocabulary detector that uses vision-language model embeddings to detect novel categories guided by text.
http://arxiv.org/abs/2407.10652v1
Compressor summary: The study shows that using large language models can significantly speed up and improve the accuracy of literature review filtering, reducing manual work and achieving high recalls.
http://arxiv.org/abs/2407.10649v1
Compressor summary: The paper introduces APC, a ViT-based method for weakly supervised semantic segmentation that improves patch embeddings using Adaptive-K Pooling and Patch Contrastive Learning, and enhances efficiency by adopting an end-to-end single-stage training approach.
http://arxiv.org/abs/2407.10645v1
Compressor summary: This study examines how prompt selection affects text annotation accuracy using large language models and proposes a method to optimize prompts.
http://arxiv.org/abs/2407.10641v1
Compressor summary: DDIP and D3IP are methods to improve 3D image reconstruction from generative diffusion priors, using meta-learning and efficient adaptation for diverse tasks and data.
http://arxiv.org/abs/2407.10633v1
Compressor summary: SkewSize is a metric that measures and characterizes model biases in various settings by analyzing the interaction between spurious variables and predictions.
http://arxiv.org/abs/2407.10629v1
Compressor summary: The paper explores using reinforcement learning to address bias in imbalanced classification by scaling the reward function based on contextual multi-armed bandits.
http://arxiv.org/abs/2407.10627v1
Compressor summary: Arena Learning is a method that uses AI-driven annotations to evaluate and improve large language models efficiently.
http://arxiv.org/abs/2407.10626v1
Compressor summary: NoviCode is a new task that challenges Text-to-Code models to generate executable programs from natural language descriptions by novice non-programmers, using API access and control structures, and aligning NL utterances with the code structure improves performance.
http://arxiv.org/abs/2407.10625v1
Compressor summary: WildVidFit is an image-based model that generates realistic video try-on sequences by conditioning on garment descriptions and human motion, using diffusion guidance from pre-trained models to maintain temporal coherence.
http://arxiv.org/abs/2407.10596v1
Compressor summary: This paper evaluates various CNN models and data augmentation techniques for hierarchical mobile robot localization using omnidirectional images, and provides public code on the project website.
http://arxiv.org/abs/2407.10592v1
Compressor summary: InsertDiffusion is a training-free diffusion architecture that efficiently embeds realistic object visualizations into images without extensive training or fine-tuning.
http://arxiv.org/abs/2407.10586v1
Compressor summary: The text presents a new method to reconstruct detailed 3D human shapes from monocular images by generating multi-view normal maps and using an attention-based neural implicit model.
http://arxiv.org/abs/2407.10583v1
Compressor summary: The text discusses three dogmas in modern reinforcement learning that have shaped the field but may need reevaluation for realizing its full potential.
http://arxiv.org/abs/2407.10582v1
Compressor summary: The paper proposes a method to generate task-specific data for low-resource languages using large language models and teacher models, improving cross-lingual performance on various tasks.
http://arxiv.org/abs/2407.10580v1
Compressor summary: The paper proposes a hybrid intelligence approach that uses human input and artificial intelligence to create more energy-efficient machine learning models.
http://arxiv.org/abs/2407.10575v1
Compressor summary: This paper reviews defenses against AI-generated visual media in computer vision applications, covering detection, disruption, and authentication methods, as well as trustworthiness aspects like robustness and fairness.
http://arxiv.org/abs/2407.10574v1
Compressor summary: The paper presents a fast and accurate CNN model for breast cancer classification using Bagging and stacking ensemble methods, achieving high accuracy and recall rates and outperforming VGG16 and ResNet-50 in comparative experiments.
http://arxiv.org/abs/2407.10567v1
Compressor summary: PULPo is a probabilistic deformable image registration method that accurately estimates uncertainty using Laplacian pyramids on hierarchical levels.
http://arxiv.org/abs/2407.10563v1
Compressor summary: The paper introduces Pathformer3D, a novel 3D scanpath Transformer for predicting eye movements in 360° images that uses self-attention to model visual working memory and outperforms existing methods.
http://arxiv.org/abs/2407.10559v1
Compressor summary: Key points: - Contrast agents are crucial for medical imaging but have limitations - The CAR problem is to reduce their dosage while maintaining visual enhancement - A learned inverse problem (LIP) approach is proposed and tested on pre-clinical images Summary: The paper proposes a new method to reduce contrast agent dosage in medical imaging using a learned inverse problem approach and shows its effectiveness on pre-clinical images.
http://arxiv.org/abs/2407.10558v1
Compressor summary: ConTEXTure is a network that creates texture maps for 3D meshes using text prompts and multiple images from different viewpoints, improving accuracy by generating consistent images with Zero123++.
http://arxiv.org/abs/2407.10554v1
Compressor summary: The paper reviews recent surveys in Natural Language Generation (NLG) to identify gaps and challenges posed by Large Language Models (LLMs) and propose a research roadmap for the field.
http://arxiv.org/abs/2407.10550v1
Compressor summary: The paper proposes a self-supervised method for detecting face forgery videos using natural spatiotemporal consistency of real face videos, improving generalization and robustness over existing methods.
http://arxiv.org/abs/2407.10545v1
Compressor summary: LightCL is a compact algorithm that improves continual learning efficiency by compressing resource consumption and enhancing generalizability using new metrics on neural network layers.
http://arxiv.org/abs/2407.10543v1
Compressor summary: The authors propose five methods to help DNN-based perception models identify unfamiliar regions in images, which can improve their decision-making when facing low competency situations.
http://arxiv.org/abs/2407.10542v1
Compressor summary: The paper introduces PMT and PMTR, a new framework for reliable and efficient geometric shape assembly using local correspondences and high-order feature transforms.
http://arxiv.org/abs/2407.10536v1
Compressor summary: Key points: - Paper explores Siamese Neural Networks for localization using catadioptric vision system and panoramic images - Siamese Neural Networks generate similarity function between images based on descriptors - Results outperform previous techniques, especially in cloudy and night conditions Summary: The paper proposes a method to locate robots indoors using Siamese Neural Networks that compare panoramic images from a catadioptric vision system and achieve better performance than previous approaches in challenging lighting situations.
http://arxiv.org/abs/2407.10534v1
Compressor summary: Key points: - Deep supervised models can use multiple datasets to improve performance - Different label spaces among datasets cause conflicts and lower performance - Paper proposes a graph neural network approach to unify label space across datasets - Method enables semantic segmentation without extra manual work - Method outperforms other multi-dataset training methods and achieves state-of-the-art results Summary: The paper presents a graph neural network method that unifies different label spaces among multiple datasets, allowing semantic segmentation models to be trained more effectively and efficiently without extra manual reannotation.
http://arxiv.org/abs/2407.10528v1
Compressor summary: The paper proposes a method to generate realistic human motions by using local actions as control signals and blending them with graph attention networks and motion diffusion.
http://arxiv.org/abs/2407.10510v1
Compressor summary: The authors introduce DigestDS, a dataset for predicting TCM prescriptions for digestive system diseases, and propose TCM-FTP, a method that fine-tunes pre-trained language models to achieve significant improvements in prediction accuracy.
http://arxiv.org/abs/2407.10499v1
Compressor summary: The paper introduces CIBench, a framework to evaluate how well large language models use code interpreters for data science tasks with or without human help.
http://arxiv.org/abs/2407.10495v1
Compressor summary: The authors propose using the Gromov-Wasserstein distance as a regularization mechanism in hyperbolic neural networks to better preserve the original data structure.
http://arxiv.org/abs/2407.10494v1
Compressor summary: The paper proposes a novel framework called Learning-to-Unlearn that uses meta-learning to balance between erasing specific data samples and maintaining the overall performance of the model.
http://arxiv.org/abs/2407.10490v1
Compressor summary: The text studies how large language models learn new tasks by analyzing their step-by-step changes and influences on different responses, helping to understand and improve their behavior.
http://arxiv.org/abs/2407.10488v1
Compressor summary: The text explores how to understand the internal workings of vision & language models, focusing on their handling of negation, using CLIP as an example.
http://arxiv.org/abs/2407.10487v1
Compressor summary: Lite2Relight is a novel technique that enables realistic 3D view synthesis and light editing of human portraits with efficient volumetric representation and robust face geometry, outperforming state-of-the-art methods.
http://arxiv.org/abs/2407.10486v1
Compressor summary: The paper introduces two modules for improving query-focused summarization using large language models, addressing document length and fine-grained alignment.
http://arxiv.org/abs/2407.10485v1
Compressor summary: The text proposes a new method for tracking objects from UAVs that improves accuracy and efficiency by modeling motion with detection features and addressing the long-tailed distribution of object motion.
http://arxiv.org/abs/2407.10484v1
Compressor summary: The paper explains why Euclidean classifiers work well with Riemannian features in GCP by analyzing matrix functions from a Riemannian geometry perspective and showing their effectiveness in visual classification tasks.
http://arxiv.org/abs/2407.10483v1
Compressor summary: Key points: - Graph data structures model relationships and interconnections in various domains - G-PCGRL is a novel method for procedurally generating graph data using reinforcement learning - The method adapts PCGRL framework, introduces new representations, and fulfills constraints - G-PCGRL outperforms random search and evolutionary algorithm, and is controllable Summary: The paper presents G-PCGRL, a reinforcement learning-based method for generating graph data structures in games, which adapts PCGRL framework, introduces new representations, and satisfies constraints, while being faster, more reliable, and controllable than random search and evolutionary algorithm.
http://arxiv.org/abs/2407.10482v1
Compressor summary: NGP-RT improves the rendering speed of Instant-NGP by using hash features, lightweight attention, and a pre-computed occupancy distance grid for fast and high-quality novel view synthesis.
http://arxiv.org/abs/2407.10481v1
Compressor summary: SuperPADL is a framework that uses both reinforcement and supervised learning to train controllers for real-time physics-based text-to-motion on thousands of diverse motion clips, allowing users to create interactive animations.
http://arxiv.org/abs/2407.10476v1
Compressor summary: The paper proposes a method for generating realistic and aesthetically pleasing kinetic typography videos from text prompts using guided video diffusion models, static and dynamic captions, and glyph loss.
http://arxiv.org/abs/2407.10459v1
Compressor summary: DiffStega is a training-free diffusion-based coverless image steganography method that uses a password-dependent reference image and the text as keys, enhancing security and versatility.
http://arxiv.org/abs/2407.10457v1
Compressor summary: The study explores performance differences between greedy decoding and sampling methods in large language models, highlighting non-determinism's impact on evaluations and showing the potential of smaller models.
http://arxiv.org/abs/2407.10456v1
Compressor summary: The paper proposes using multiple high scoring translations from minimum Bayes risk decoding in knowledge distillation for neural machine translation, achieving better results than existing methods.
http://arxiv.org/abs/2407.10454v1
Compressor summary: The text introduces Deflated Dynamics Value Iteration, a faster method than standard Value Iteration for computing the value function of Markov decision processes and reinforcement learning problems by removing dominant eigen-structures from the transition matrix.
http://arxiv.org/abs/2407.10453v1
Compressor summary: The authors propose using a large language model to enhance medication recommendation by combining text and medical codes, improving performance on two datasets.
http://arxiv.org/abs/2407.10452v1
Compressor summary: GraphPrint is a framework for predicting drug target affinity using graph representations of protein 3D structures, which improve over traditional features alone.
http://arxiv.org/abs/2407.10449v1
Compressor summary: The paper introduces a fast and stable algorithm for constructing elliptical slice sampling intersections, which improves Monte Carlo methods.
http://arxiv.org/abs/2407.10448v1
Compressor summary: The paper proposes a new method for causal effect estimation with hidden confounders using saddle-point optimization and neural networks.
http://arxiv.org/abs/2407.10445v1
Compressor summary: The paper investigates the vulnerability of image-to-image networks to backdoor attacks and proposes a novel attack technique that can compromise these networks without affecting their normal behavior on clean images.
http://arxiv.org/abs/2407.10441v1
Compressor summary: This study uses a reinforcement learning-based simulation to investigate how exit availability and configuration affect evacuation and harm rates in active shooter incidents in office environments, finding that more exits near the shooter's starting point improve safety.
http://arxiv.org/abs/2407.10439v1
Compressor summary: This paper proposes PolyRoom, a Transformer model that uses uniform sampling representation, room-aware query initialization, and room-aware self-attention to reconstruct floorplans from point clouds while overcoming common challenges.
http://arxiv.org/abs/2407.10433v1
Compressor summary: Key points: - CBCT is a common method for dental disease diagnosis - Accurate 3D tooth segmentation is important for treatment - Deep learning based methods need lots of annotated data and suffer from domain shift - The authors propose a multi-stage framework for 3D tooth segmentation in CBCT - Their approach achieves the third place in STS-3D challenge and outperforms other semi-supervised methods Summary: The authors present a novel multi-stage framework for 3D tooth segmentation in dental CBCT images, which requires less annotated data and is more robust to domain shift than deep learning based methods.
http://arxiv.org/abs/2407.10430v1
Compressor summary: MStar is a new inductive KG reasoning model that uses C-MPNNs and multiple query-specific starting entities to improve message propagation efficiency and mitigate noise in training samples.
http://arxiv.org/abs/2407.10419v1
Compressor summary: The Omni-Dimensional Frequency Learner (ODFL) model improves frequency-based methods for time series analysis by addressing channel redundancy, sparse frequency energy distribution, and semantic diversity in the spectrum feature.
http://arxiv.org/abs/2407.10413v1
Compressor summary: The study used generative AI to create realistic images of fruits, which improved fruit detection and quality assessment using deep learning models like YOLO.
http://arxiv.org/abs/2407.10406v1
Compressor summary: The paper proposes a transformer-based depth network and feature matching scheme to improve scale-awareness for full surround monodepth methods, achieving better performance than state-of-the-art methods.
http://arxiv.org/abs/2407.10403v1
Compressor summary: The paper introduces a reward shaping technique for Multi-Agent Reinforcement Learning (MARL) to improve cooperation and efficiency in planning paths for multiple agents.
http://arxiv.org/abs/2407.10399v1
Compressor summary: The study shows that Moiré patterns significantly reduce the accuracy of deepfake detectors when used on camera-captured videos from digital screens.
http://arxiv.org/abs/2407.10389v1
Compressor summary: The study proposes a framework that enhances NeRF rendering quality without increasing computational complexity by using a mixture of experts with varying resolutions and a novel gate formulation.
http://arxiv.org/abs/2407.10385v1
Compressor summary: The text proposes a visual prompting method using multimodal language models to improve the performance and efficiency of sensor data analysis for different sensory tasks.
http://arxiv.org/abs/2407.10380v1
Compressor summary: NTSEBench is a new dataset for evaluating large language and vision models on complex cognitive multi-modal reasoning tasks based on questions from an Indian examination.
http://arxiv.org/abs/2407.10374v1
Compressor summary: The paper proposes adapting Mamba, a computationally efficient model for pedestrian attribute recognition, into two frameworks and tests its performance with hybrid Mamba-Transformer variants.
http://arxiv.org/abs/2407.10366v1
Compressor summary: Proteus is a simple method to distill large vision foundation models into smaller equivalents without the original training data, achieving comparable or better performance than other models.