This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-08-20 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2408.10205v1
Compressor summary: The text proposes a framework that combines connectionist AI (Kolmogorov-Arnold Networks) with science for discovering features, structures, and formulas in physical laws.
http://arxiv.org/abs/2408.10204v1
Compressor summary: CLAT is an approach that improves both clean accuracy and adversarial robustness of neural networks by fine-tuning only critical layers, reducing parameters, and adapting to changes in layer criticality.
http://arxiv.org/abs/2408.10202v1
Compressor summary: The paper proposes a new method called SANER to reduce societal bias in CLIP without losing attribute information or using attribute annotations during debiasing.
http://arxiv.org/abs/2408.10198v1
Compressor summary: MeshFormer is a sparse-view 3D reconstruction model that uses transformers, 3D convolutions, input normal maps, and SDF supervision to efficiently train and generate high-quality textured meshes with geometric details.
http://arxiv.org/abs/2408.10189v1
Compressor summary: Key points: - The paper proposes MOHAWK, a method to distill pretrained Transformers into alternative architectures like SSMs. - MOHAWK matches different degrees of granularity in the mixing matrices and hidden units of Transformers and SSMs. - Phi-Mamba, a Mamba-2 variant, achieves strong performance with less than 1% of the training data typically used for non-Transformer models. Summary: MOHAWK learns to convert pretrained Transformers into state space models by matching their mixing matrices and hidden units, enabling Phi-Mamba to outperform past non-Transformer models with much less data.
http://arxiv.org/abs/2408.10188v1
Compressor summary: LongVILA is a system that enables efficient long-context training and inference for vision-language models, improving performance on tasks like long video captioning.
http://arxiv.org/abs/2408.10187v1
Compressor summary: The text discusses using remote sensing data and Machine Learning algorithms to detect floating plastic debris in the ocean, and introduces the Marine Debris Archive as a standard dataset for evaluating these methods.
http://arxiv.org/abs/2408.10178v1
Compressor summary: NeuRodin is a new neural framework that improves surface reconstruction in volume rendering by addressing challenges in SDF-based methods and retaining flexibility of density-based methods.
http://arxiv.org/abs/2408.10175v1
Compressor summary: The study shows that occlusions worsen face recognition system fairness by increasing bias against some demographic groups, especially Africans.
http://arxiv.org/abs/2408.10174v1
Compressor summary: The study proposes a novel zero-shot model fusion method called SMILE that addresses parameter interference issues and improves performance without extra training data or parameters.
http://arxiv.org/abs/2408.10161v1
Compressor summary: The paper presents NeuFlow v2, an efficient optical flow method that balances high accuracy with reduced computational costs, achieving 10x-70x speedup and running at over 20 FPS on a Jetson Orin Nano.
http://arxiv.org/abs/2408.10154v1
Compressor summary: LoopSplat improves 3D scene mapping accuracy by triggering loop closure online and aligning submaps via 3D Gaussian Splats registration.
http://arxiv.org/abs/2408.10153v1
Compressor summary: The paper proposes a method to translate unrealistic synthetic data to realistic clinical data for training monocular depth estimation in colonoscopy videos, improving its generalization to the clinical domain.
http://arxiv.org/abs/2408.10151v1
Compressor summary: The MLNeedle test evaluates LLMs' ability to retrieve relevant information from multilingual contexts and reveals their limitations in handling long contexts across languages.
http://arxiv.org/abs/2408.10147v1
Compressor summary: This paper studies how transformer models can learn new tasks from few examples during inference, using non-linear regression to show they can acquire contextual knowledge for generalization.
http://arxiv.org/abs/2408.10145v1
Compressor summary: MS-Mamba is a novel image restoration method that uses a multi-scale state-space model and adaptive gradient block to enhance detail and contrast, achieving state-of-the-art results with low computational complexity.
http://arxiv.org/abs/2408.10141v1
Compressor summary: The study shows how to use large language models to automatically generate leaderboard data from AI research articles, improving the speed and accuracy of information extraction.
http://arxiv.org/abs/2408.10135v1
Compressor summary: The text introduces a new algorithm that improves NeRF's ability to generate and optimize meshes from multi-view images by refining SDF and appearance representation, and adaptively incorporating additional images for training.
http://arxiv.org/abs/2408.10134v1
Compressor summary: The paper proposes a new model, DQI, to measure depth quality in stereoscopic omnidirectional images, which improves upon existing methods and considers human visual system characteristics.
http://arxiv.org/abs/2408.10130v1
Compressor summary: The paper proposes a method to improve lyric generation by integrating rhyme information into a pre-trained language model.
http://arxiv.org/abs/2408.10126v1
Compressor summary: The paper proposes a new algorithm for learning assumption-based argumentation from background knowledge and examples using transformation rules and answer set programming.
http://arxiv.org/abs/2408.10124v1
Compressor summary: MolGraph-LarDo integrates Large language models and Domain-specific small models for accurate molecular property prediction by using a two-stage prompt strategy and multi-modal alignment.
http://arxiv.org/abs/2408.10120v1
Compressor summary: The paper proposes Geo2Seq, a method to convert 3D molecular geometries into discrete sequences, enabling better generation of molecules using language models.
http://arxiv.org/abs/2408.10119v1
Compressor summary: Key points: - T2V generation is challenging due to real world complex motions - The paper proposes Factorized-Dreamer, a framework that uses limited and low-quality data to generate HQ videos - Factorized-Dreamer has several designs, such as adapter, cross attention, T5 encoder, and PredictNet - Noise schedule is used to ensure quality and stability of video generation - Experiments show the effectiveness of Factorized-Dreamer on various tasks Summary: The paper presents Factorized-Dreamer, a framework that can generate high-quality videos from limited and low-quality data using text and image embeddings, cross attention, T5 encoder, PredictNet, and noise schedule.
http://arxiv.org/abs/2408.10115v1
Compressor summary: GLIMMER is an unsupervised multi-document summarization approach that uses sentence graphs and semantic clusters to generate fluent and informative summaries, outperforming existing methods and pre-trained models.
http://arxiv.org/abs/2408.10113v1
Compressor summary: The paper proposes using Monte Carlo Tree Search (MCTS) as a guide for Reinforcement Learning (RL) agents to improve performance, especially in Off-Policy settings, and shows significant results on Atari 100k benchmark.
http://arxiv.org/abs/2408.10111v1
Compressor summary: PLUTUS is a large-scale, open-source, transformer-based model that uses contrastive learning and attention mechanisms to capture complex patterns in financial time series data.
http://arxiv.org/abs/2408.10107v1
Compressor summary: MixDiff is a framework for detecting out-of-distribution inputs for machine learning models without accessing their parameters or activations, by applying input-level perturbations and comparing the model outputs of two similar samples.
http://arxiv.org/abs/2408.10086v1
Compressor summary: ARMADA is a novel multimodal data augmentation method that generates semantically consistent image-text pairs by manipulating visual attributes of entities using knowledge bases and large language models.
http://arxiv.org/abs/2408.10085v1
Compressor summary: MASALA is a new XAI method that automatically determines the appropriate local region for explaining each instance, unlike existing methods that require a user-defined locality size.
http://arxiv.org/abs/2408.10084v1
Compressor summary: TANGO is a density-based clustering algorithm that uses global typicality to improve local dependencies, achieving better peak selection and sub-cluster characterization than mode-seeking methods.
http://arxiv.org/abs/2408.10075v1
Compressor summary: Multimodal RLHF methods use a latent variable formulation to infer user-specific preferences and learn reward models and policies tailored to each individual, improving alignment with diverse populations.
http://arxiv.org/abs/2408.10073v1
Compressor summary: The paper presents a new Sign Language Assessment tool that models natural human motion and provides useful feedback for language learners.
http://arxiv.org/abs/2408.10072v1
Compressor summary: The paper introduces a new face forgery analysis task with descriptions and reasoning, and proposes an assistive system based on a multimodal language model and a decision system that provides user-friendly and explainable results.
http://arxiv.org/abs/2408.10055v1
Compressor summary: The text discusses the theoretical foundations of deep reinforcement learning, focusing on exploration methods, and proposing a novel Bayesian actor-critic algorithm with empirical evaluation on benchmarks.
http://arxiv.org/abs/2408.10053v1
Compressor summary: The paper proposes a comprehensive checklist for contextual integrity-based privacy research that covers social identities, private attributes, and existing regulations using large language models.
http://arxiv.org/abs/2408.10046v1
Compressor summary: The paper proposes a method for unsupervised class incremental learning that uses fine-grained prototypes, granularity alignment, and strategy to minimize overlap between classes to discover unknown novel classes and preserve historical knowledge.
http://arxiv.org/abs/2408.10041v1
Compressor summary: Implicit Gaussian Splatting (IGS) is an efficient and compact model for photo-realistic novel view synthesis that integrates explicit point clouds with implicit feature embeddings using a multi-level tri-plane architecture and progressive training scheme, achieving high rendering quality while consuming only a few MBs.
http://arxiv.org/abs/2408.10040v1
Compressor summary: The P-O algorithm creates virtual human expert agents to generate many valid schedules and uses reinforced machine learning to improve them, achieving breakthrough performance in automatic manufacturing scheduling.
http://arxiv.org/abs/2408.10039v1
Compressor summary: The paper introduces a multi-step clinical diagnostic dataset (MSDiagnosis) and a framework that combines forward and backward inference with reflection and refinement to improve the performance of language models in complex medical diagnosis tasks.
http://arxiv.org/abs/2408.10015v1
Compressor summary: The paper presents a new method for finding optimal deterministic policies for continuous-state and -action constrained MDPs using a primal-dual policy gradient approach, which is applied to robot navigation and fluid control problems.
http://arxiv.org/abs/2408.10012v1
Compressor summary: The paper proposes CLIPCleaner, a method that uses a vision-language model to select clean samples from noisy labels, outperforming existing methods on benchmark datasets.
http://arxiv.org/abs/2408.10011v1
Compressor summary: PinnDE is an open-source python library for solving differential equations using physics-informed neural networks (PINNs) and deep operator networks (DeepONets).
http://arxiv.org/abs/2408.10007v1
Compressor summary: Key points: - propose self-supervised pre-training framework using real 3D data and pseudo-3D data from images - use efficient token embedding and 2D reconstruction target to overcome data scaling and efficiency challenges - achieve state-of-the-art performance in 3D perception tasks Summary: The paper presents a self-supervised pre-training method for 3D perception that leverages real and pseudo-3D data, and improves efficiency with novel token embedding and reconstruction target, leading to superior results.
http://arxiv.org/abs/2408.10006v1
Compressor summary: P-sLSTM is a modified sLSTM algorithm that improves time series forecasting by incorporating patching and channel independence, achieving state-of-the-art results with theoretical justifications.
http://arxiv.org/abs/2408.10003v1
Compressor summary: The text describes how a living knowledge graph was created by merging and extending ontologies to represent mathematical models and algorithms semantically and enrich them with metadata, including subject-specific properties.
http://arxiv.org/abs/2408.10002v1
Compressor summary: The paper introduces algorithms to find trade-offs between quality and fairness in clustering problems by exploring the complete Pareto front.
http://arxiv.org/abs/2408.09995v1
Compressor summary: The study combines two self-supervised methods to create balanced representations of transactional data, improving performance in tasks like sequence classification and next-event prediction in banking applications.
http://arxiv.org/abs/2408.09984v1
Compressor summary: The paper proposes a method to improve open-domain continual learning in vision-language models by using category-aware prototypes as Task-ID discriminators and domain prior prompts.
http://arxiv.org/abs/2408.09976v1
Compressor summary: The paper proposes an efficient method to approximate the whole Pareto set in multi-objective optimization problems using bilevel optimization and differentiable cross-entropy methods, which improves upon previous naive approaches.
http://arxiv.org/abs/2408.09974v1
Compressor summary: AdaZero is an end-to-end adaptive framework for reinforcement learning that balances exploration and exploitation based on entropy, achieving significant improvements in various environments.
http://arxiv.org/abs/2408.09967v1
Compressor summary: This paper proposes a new method that combines linear programming and an unsupervised machine learning model to solve complex optimization problems with constraints while preserving interpretability and adapting to different scenarios.
http://arxiv.org/abs/2408.09966v1
Compressor summary: The paper proposes a method to control the sparsity of deep neural networks using continuous sparsification and shows improved performance in high-sparsity scenarios.
http://arxiv.org/abs/2408.09958v1
Compressor summary: AdaResNet automatically adjusts the ratio between input pixels and transformed data in skip connections, improving the performance of deep neural networks.
http://arxiv.org/abs/2408.09957v1
Compressor summary: The paper presents py-ciu, a Python tool for generating explanations from machine learning models using the CIU method, which offers novel features compared to existing methods.
http://arxiv.org/abs/2408.09951v1
Compressor summary: The paper proposes a new AI-based fiber model for Beyond 5G communications that reduces re-training time and increases efficiency by using linear combinations of pre-trained models.
http://arxiv.org/abs/2408.09949v1
Compressor summary: C${^2}$RL is a novel pretraining paradigm for gloss-free Sign Language Representation Learning that emphasizes Implicit Content Learning and Explicit Context Learning to improve performance in tasks like Sign Language Translation and Sign Language Retrieval.
http://arxiv.org/abs/2408.09948v1
Compressor summary: The paper introduces a dataset and method to study and predict human attention during image captioning tasks using CLIP models and NeVA algorithms, improving existing models.
http://arxiv.org/abs/2408.09947v1
Compressor summary: The paper proposes a new fiber transmission model that can handle different bit rates without retraining and uses universal solutions based on the novelty principle.
http://arxiv.org/abs/2408.09946v1
Compressor summary: The text describes an approach to evaluate autonomous game players for social deduction games using a variant of SpyFall called SpyGame, introducing new metrics and qualitative analysis methods to assess their skills in intent identification and camouflage.
http://arxiv.org/abs/2408.09945v1
Compressor summary: The authors introduce a benchmark for translating classical Chinese poetry into English and propose RAT, a retrieval-augmented machine translation method that improves translation quality using knowledge about classical poetry and an automatic evaluation metric based on GPT-4.
http://arxiv.org/abs/2408.09940v1
Compressor summary: The ML-CrAIST model uses spatial and channel self-attention and a cross-attention block to effectively utilize multi-scale image details and improve single-image super-resolution performance, outperforming state-of-the-art methods.
http://arxiv.org/abs/2408.09939v1
Compressor summary: The researchers propose a new automated method to provide context for images, which can help detect misinformation and support fact-checking efforts.
http://arxiv.org/abs/2408.09929v1
Compressor summary: The paper explores how to learn beneficial noise for contrastive learning using Positive-incentive Noise (Pi-Noise) and proposes a framework to generate such noise as data augmentations.
http://arxiv.org/abs/2408.09920v1
Compressor summary: The paper proposes a human visual attention estimation strategy to improve existing image quality assessment models by measuring the statistical dependency between degraded and reference images.
http://arxiv.org/abs/2408.09918v1
Compressor summary: The paper analyzes how two types of temporal message passing mechanisms in graph neural networks differ in their expressive power and performance on color-persistent temporal graphs.
http://arxiv.org/abs/2408.09916v1
Compressor summary: VisEdit is a novel model editor for vision-language models that edits intermediate visual representations in relevant regions to correct knowledge, based on attribution analysis showing the importance of these representations for token predictions.
http://arxiv.org/abs/2408.09914v1
Compressor summary: The study investigates the potential of Active Learning for identifying disaster-related posts on social media, finding that it outperforms other methods with minimal labelling effort.
http://arxiv.org/abs/2408.09908v1
Compressor summary: The paper explores properties and performance of soft-margin SVMs with $p$-norm hinge loss, called $p$SVMs, and proposes a generalized version of the SMO algorithm to train them.
http://arxiv.org/abs/2408.09899v1
Compressor summary: The Lesion Concept Explainer (LCE) framework combines attribution and concept-based methods to explain the decisions of Deep Neural Networks for medical images, especially ultrasound images, using a fine-tuned Segment Anything Model (SAM).
http://arxiv.org/abs/2408.09896v1
Compressor summary: The UTGDiff model generates molecules from textual instructions using a unified text-graph transformer derived from language models, achieving better performance than sequence-based methods with fewer parameters.
http://arxiv.org/abs/2408.09895v1
Compressor summary: The authors propose a "Performance Law" equation that predicts the MMLU score of large language models based on their architecture and training data size, helping in selecting architectures and allocating computational resources efficiently.
http://arxiv.org/abs/2408.09891v1
Compressor summary: The paper explores optimal rates for differential privacy optimization with heavy-tailed gradients using a simple clipping method and an iterative updating method, improving on existing methods and matching the minimax lower bound.
http://arxiv.org/abs/2408.09882v1
Compressor summary: GINO-Q is a new algorithm that efficiently learns optimal policies for restless multi-armed bandits without requiring indexability, outperforming existing methods.
http://arxiv.org/abs/2408.09881v1
Compressor summary: The text proposes a method to estimate reliable uncertainty for spatio-temporal surrogate models using conformal prediction with minimal computational cost and broad applicability.
http://arxiv.org/abs/2408.09873v1
Compressor summary: Hyperspectral imaging can predict sepsis and mortality rates by monitoring microcirculatory changes in the palm and fingers, improving diagnosis and treatment management.
http://arxiv.org/abs/2408.09869v1
Compressor summary: Docling is an open-source package that converts PDF documents using AI models for layout analysis and table structure recognition, running efficiently on common hardware.
http://arxiv.org/abs/2408.09865v1
Compressor summary: The paper proposes a model called MAPLE that generates fine-grained explanations for recommending items to users, using aspect categories as input and achieving better performance than existing review-generation models.
http://arxiv.org/abs/2408.09858v1
Compressor summary: ShortCircuit is a transformer-based architecture that uses supervised and reinforcement learning to generate efficient Boolean circuits from truth tables, outperforming existing tools.
http://arxiv.org/abs/2408.09857v1
Compressor summary: TaSL is a framework that improves Continual Dialogue State Tracking by using group-wise techniques and skill consolidation to balance knowledge preservation and adaptation.
http://arxiv.org/abs/2408.09856v1
Compressor summary: TeamLoRA is a novel PEFT method that combines collaboration and competition among task-specific LoRA modules to enhance multi-task learning efficiency and performance.
http://arxiv.org/abs/2408.09853v1
Compressor summary: The Self-Directed Turing Test is a new way to evaluate Large Language Models' human-like behaviour in natural language conversations by allowing more dynamic exchanges and reducing human involvement.
http://arxiv.org/abs/2408.09849v1
Compressor summary: Key points: - Large language models (LLMs) are useful but costly to fine-tune with external supervision - LLM self-improvement involves training on self-generated data, which may have low quality - The paper proposes a new metric called DS weight to filter out correct but highly shifted samples - The approach improves reasoning ability and competes with methods using pre-trained reward models Summary: The paper introduces DS weight, a new metric to filter self-generated data for LLM self-improvement, which enhances reasoning and rivals external supervision.
http://arxiv.org/abs/2408.09846v1
Compressor summary: The paper introduces Reason-of-Select distillation, a method that enhances dialogue systems with meta-reasoning and domain bootstrapping to improve continual learning and mitigate forgetting.
http://arxiv.org/abs/2408.09841v1
Compressor summary: The paper applies xAI frameworks to explain the reasoning behind scheduling decisions of a DRL agent, but finds that current methods lack falsifiability, consistent terminology, and causal interpretations; they propose a hypotheses-based workflow to address these issues.
http://arxiv.org/abs/2408.09840v1
Compressor summary: The survey explores various methods and models for combining machine learning with physics knowledge to improve prediction and forecast using partial differential equations, considering both architectural and data-driven approaches and their industrial applications.
http://arxiv.org/abs/2408.09838v1
Compressor summary: CDE is a new algorithm that uses curriculum learning and Q-function subspaces to improve learning efficiency and adaptability in complex multi-agent domains like train scheduling.
http://arxiv.org/abs/2408.09834v1
Compressor summary: The text describes a method called Direct Preference Optimization (DPO) for fine-tuning language models based on human preferences without reinforcement learning, and proposes MinorDPO as an improvement to address some limitations.
http://arxiv.org/abs/2408.09825v1
Compressor summary: The paper proposes TDNetGen, a novel framework that uses generative data augmentation to predict network resilience without needing labeled data or prior knowledge of network dynamics.
http://arxiv.org/abs/2408.09822v1
Compressor summary: SurgicaL-CD is a new method to create realistic surgical images for training machine learning models using diffusion and consistency distillation, improving quality and utility over previous methods.
http://arxiv.org/abs/2408.09821v1
Compressor summary: SympNets are a new type of neural network that can approximate symplectic maps and solve Hamiltonian systems more accurately and efficiently than existing methods.
http://arxiv.org/abs/2408.09819v1
Compressor summary: Key points: - Paper introduces CMoralEval, a large and diverse moral evaluation dataset for Chinese LLMs - Data sources are TV program discussing moral norms and newspaper articles on moral anomalies - Morality taxonomy and principles based on traditional Chinese culture and contemporary norms - Platform with AI-assisted instance generation and annotation to streamline construction and evaluation of CMoralEval - Experiments show that CMoralEval is a challenging benchmark for Chinese LLMs Summary: The paper presents CMoralEval, a new dataset to test the morality of Chinese LLMs using TV programs and newspaper articles as data sources, with a morality taxonomy and principles derived from Chinese culture and norms.
http://arxiv.org/abs/2408.09818v1
Compressor summary: Liquid Fourier LDNets (LFLDNets) are an extension of Latent Dynamics Networks for creating surrogate models of complex differential equations, improving performance, accuracy, and efficiency in computational cardiology applications.
http://arxiv.org/abs/2408.09815v1
Compressor summary: PITuning is a framework that uses pre-trained language models to improve user intent prediction on smartphones by adapting to diverse event sequences and addressing long-tailed preferences.
http://arxiv.org/abs/2408.09807v1
Compressor summary: Key points: - Reinforcement learning (RL) is a paradigm for training intelligent agents from experience - Model-based (MB) RL methods are better suited for reset-free setting than previous methods - MoReFree agent adapts exploration and policy learning to prioritize task-relevant states - MoReFree outperforms privileged baselines with less supervision and data Summary: MoReFree is a model-based reset-free RL agent that learns from experience and prioritizes task-relevant states, achieving superior performance with minimal supervision.
http://arxiv.org/abs/2408.09800v1
Compressor summary: This paper proposes a novel method to generate realistic and annotated images of complex table structures using latent diffusion models, which improves the performance of object detection models like YOLOv5.
http://arxiv.org/abs/2408.09798v1
Compressor summary: Text-centric adversarial training improves robustness of multimodal models by converting diverse inputs into unified textual representation despite noise, order changes, and missing modalities.
http://arxiv.org/abs/2408.09794v1
Compressor summary: This paper shows how incorporating knowledge base information into language models improves text classification and enables faster, efficient classifiers using matrix factorization.
http://arxiv.org/abs/2408.09792v1
Compressor summary: Key points: - The paper proposes a framework for compositional representation learning for music data using generative models - The framework can perform unsupervised audio source separation, generation, and variation generation - The framework achieves comparable or superior performance to other methods and has lower computational cost Summary: The paper presents a novel framework that leverages generative models and compositional representation learning for music data, enabling unsupervised source separation, generation, and variation generation with high quality and low computational cost.
http://arxiv.org/abs/2408.09790v1
Compressor summary: SECL is a novel contrastive learning method for graph clustering that leverages network structures and outperforms existing methods.
http://arxiv.org/abs/2408.09787v1
Compressor summary: The text describes an autonomous animation-making agent called Anim-Director that uses large multimodal models and generative AI tools to create coherent and context-rich animations from concise narratives or simple instructions.
http://arxiv.org/abs/2408.09786v1
Compressor summary: The paper proposes a method to learn disentangled visual features across different compositions using a compositional graph and CLIP with adapters, improving Compositional Zero-shot Learning performance.
http://arxiv.org/abs/2408.09785v1
Compressor summary: GoNoGo is a Large Language Model agent system that automates software deployment decisions in the automotive industry, reducing costs and delays while meeting functional and industrial constraints.
http://arxiv.org/abs/2408.09777v1
Compressor summary: The paper proposes a two-step approach to summarize long regulatory texts and shows that its effectiveness depends on the encoder-decoder model used, highlighting challenges in evaluating generated texts.
http://arxiv.org/abs/2408.09775v1
Compressor summary: The paper proposes new adaptive decentralized algorithms for distributed machine learning tasks and proves their near-optimal sample complexity.
http://arxiv.org/abs/2408.09773v1
Compressor summary: The paper analyzes how large language models perceive their knowledge boundaries through probabilistic and verbalized confidence, finding that probabilistic perception is more accurate but both are affected by question frequency and natural language expression challenges.
http://arxiv.org/abs/2408.09768v1
Compressor summary: The paper proposes MalLight, a novel traffic signal control framework that uses reinforcement learning to optimize the functioning of surrounding signals and reduce congestion and collisions caused by malfunctioning signals.
http://arxiv.org/abs/2408.09765v1
Compressor summary: The paper proposes IBWS, an iterative method for robustly ranking elements using crowd-sourced data, and evaluates cheaper direct assessment methods that can scale to large datasets.
http://arxiv.org/abs/2408.09757v1
Compressor summary: This study shows how changing demonstrations in in-context learning can improve the fairness of large language models without losing accuracy and proposes a new technique to curate diverse data samples for better performance and fairness.
http://arxiv.org/abs/2408.09752v1
Compressor summary: The paper introduces IrisGeneral, a comprehensive dataset for iris anti-spoofing evaluation, and proposes Masked-MoE, a novel method to improve generalization across devices and racial groups using multiple sub-neural networks.
http://arxiv.org/abs/2408.09746v1
Compressor summary: The authors propose a solution for automated prostate cancer grading in mpMRI that incorporates prior knowledge, addresses data imbalance, and maintains interpretability using feature extraction, adaptive feedback loss, and an enhanced cascade classifier.
http://arxiv.org/abs/2408.09744v1
Compressor summary: RealCustom++ is a new method for text-to-image customization that uses real words instead of pseudo-words to improve both subject similarity and text controllability in generated images.
http://arxiv.org/abs/2408.09743v1
Compressor summary: The paper proposes a novel context-guided efficient X-ray medical report generation framework using Mamba as the vision backbone and context retrieval from the training set to enhance feature representation and generate high-quality reports.
http://arxiv.org/abs/2408.09742v1
Compressor summary: The paper develops and evaluates paired completion, a novel method for detecting and quantifying issue framing in textual discourse using next-token log probabilities from generative large language models, which shows promising results in scalability, accuracy and low bias.
http://arxiv.org/abs/2408.09739v1
Compressor summary: TraDiffusion is a training-free method for controlling image generation with mouse movements that can manipulate various aspects of the image while following a specified trajectory.
http://arxiv.org/abs/2408.09734v1
Compressor summary: MAFEA is a novel framework for few-shot object counting that encodes query and exemplar features mutually aware of each other, reducing target confusion and achieving state-of-the-art performance on two benchmarks.
http://arxiv.org/abs/2408.09723v1
Compressor summary: The paper proposes sTransformer, a new Transformer-based model with STCN and Sequence-guided Mask Attention to improve long-term time-series forecasting by capturing sequential and temporal information.
http://arxiv.org/abs/2408.09722v1
Compressor summary: This paper reviews recent progress in adapting few-shot learning methods for open-world settings, where data is uncertain, incomplete, and dynamic, and discusses the challenges, strengths, and weaknesses of three types of open-world few-shot learning approaches.
http://arxiv.org/abs/2408.09720v1
Compressor summary: The paper introduces MSP60K, a large-scale cross-domain pedestrian attribute recognition dataset, and proposes LLM-PAR, a framework that combines vision transformers with language models for better performance.
http://arxiv.org/abs/2408.09717v1
Compressor summary: The paper proposes a novel Semantic-Aware Dual Encoder Model (SEMDR) that uses a legal clue tracing mechanism to conduct fine-grained semantic reasoning between criminal facts and instruments for accurate Legal Judgment Prediction (LJP).
http://arxiv.org/abs/2408.09715v1
Compressor summary: HYDEN uses hyperbolic space to learn image-text representations that handle semantic uncertainty in the medical domain, outperforming baselines on zero-shot tasks.
http://arxiv.org/abs/2408.09709v1
Compressor summary: The paper introduces Histo-DD, a novel dataset distillation algorithm for histopathology image analysis that improves compatibility with high colour heterogeneity and generates more informative synthetic samples than previous methods.
http://arxiv.org/abs/2408.09706v1
Compressor summary: MePT is a novel method that uses diverse visual prompts to improve VLMs' generalization ability for various downstream tasks.
http://arxiv.org/abs/2408.09705v1
Compressor summary: Key points: - Graph unlearning technology is important for privacy and security of AI - Existing methods are inefficient and lack structural information - The paper proposes a novel framework called Community-centric Graph Eraser (CGE) that reduces data and parameters Summary: The paper introduces CGE, a new graph unlearning method that efficiently eliminates specific data from graph neural networks by mapping community subgraphs to nodes.
http://arxiv.org/abs/2408.09703v1
Compressor summary: PMformer is a new Transformer-based model that captures partial relationships among some time-series features and achieves better forecasting results than existing univariate or complete-multivariate models, while also being efficient and robust to missing data.
http://arxiv.org/abs/2408.09702v1
Compressor summary: The text describes a method to realistically insert virtual objects into real-world images by using a diffusion model to guide an inverse rendering process that recovers scene lighting and other parameters, improving the appearance of the composited object.
http://arxiv.org/abs/2408.09701v1
Compressor summary: The paper proposes a zero-shot cross-lingual approach using a neural projection technique to improve code generation for non-English prompts, addressing biases and limitations of current Large Language Models.
http://arxiv.org/abs/2408.09695v1
Compressor summary: The paper proposes LightWeather, a lightweight and effective Transformer-based model for global weather forecasting, using absolute positional encoding to capture spatial-temporal correlations without attention mechanisms.
http://arxiv.org/abs/2408.09688v1
Compressor summary: The authors propose a task to convert spoken language transcripts into written text with improved readability and evaluate the performance of large language models on this task using a new benchmark dataset.
http://arxiv.org/abs/2408.09682v1
Compressor summary: The paper explores using large language models to simulate field experiments and proposes two prompting strategies, observer and participant modes, which show promising results in certain scenarios but also identify limitations.
http://arxiv.org/abs/2408.09680v1
Compressor summary: MambaLoc is a new visual localization model that uses selective state space (SSM) to improve training efficiency, robustness in sparse data environments, and global feature extraction, while GIS enhances Non-local Neural Networks' performance with SSM's computational efficiency.
http://arxiv.org/abs/2408.09676v1
Compressor summary: SherlockNet is a self-supervised learning framework for handwriting authentication that handles noisy data, high-dimensional features, and lack of supervision by using energy-oriented contrastive learning and personalized fine-tuning.
http://arxiv.org/abs/2408.09674v1
Compressor summary: The paper proposes a new framework for super-resolution using neural networks that trains multiple scales with one model and introduces a novel upsampler called Implicit Grid Convolution (IGConv) that improves performance.
http://arxiv.org/abs/2408.09667v1
Compressor summary: BLADE is a benchmark to evaluate agents' abilities in data-driven scientific discovery by comparing their analyses with expert-verified ground truth.
http://arxiv.org/abs/2408.09665v1
Compressor summary: Key points: - The paper proposes SG-GS, a method to reconstruct human avatars from monocular videos using semantics-embedded 3D Gaussians, skeleton deformation, and cloth dynamics deformation. - The paper also introduces SHA, a tool for efficient body part semantic labeling, and a 3D network that integrates geometric and semantic associations for human avatar deformation. - The paper enhances the semantic accuracy of 3D Gaussians and rendering quality with three strategies: semantic projection, semantic-guided density regularization, and semantic-aware regularization with neighborhood consistency. Summary: The paper presents SG-GS, a method that uses semantics-embedded 3D Gaussians and deformation techniques to reconstruct realistic human avatars from monocular videos, achieving state-of-the-art performance with semantic guidance and regularization.
http://arxiv.org/abs/2408.09663v1
Compressor summary: CHASE is a method for creating realistic human avatars with supervision from intrinsic 3D consistency and 3D geometry contrastive learning, achieving better performance than current state-of-the-art methods in both full and sparse input settings.
http://arxiv.org/abs/2408.09656v1
Compressor summary: This study tests if ChatGPT-3.5, an AI language model, shows similar biases to humans when generating random numbers, finding that it avoids repetition better than humans.
http://arxiv.org/abs/2408.09655v1
Compressor summary: The paper proposes two nearest neighbor methods for nonparametric contextual bandits with unbounded contexts and analyzes their regret bounds under different conditions.
http://arxiv.org/abs/2408.09650v1
Compressor summary: ExpoMamba is a fast and efficient model that enhances low-light images by combining frequency components with a modified U-Net, outperforming traditional models in speed and quality.
http://arxiv.org/abs/2408.09647v1
Compressor summary: The study analyzes how CLIP detects deepfakes by recognizing similar concepts and introduces C2P-CLIP, an improved method that enhances detection performance using category-related concepts.
http://arxiv.org/abs/2408.09640v1
Compressor summary: The authors propose a new method to improve token representation for bidirectional language models by adding a small backward LM, which improves performance in named entity recognition and other tasks.
http://arxiv.org/abs/2408.09639v1
Compressor summary: The study compares different methods to measure the grammatical knowledge of large language models and suggests using a variety of methods for comprehensive evaluation.