This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-28 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.19395v1
Compressor summary: The paper introduces dataset size recovery, a method to estimate how many samples were used to train a model, using its weights and LoRA matrices.
http://arxiv.org/abs/2406.19394v1
Compressor summary: The paper introduces a self-training framework called HUWSOD for weakly supervised object detection, which uses innovative proposal generators and does not require external modules or additional supervision.
http://arxiv.org/abs/2406.19393v1
Compressor summary: The paper introduces a new anomaly detection problem using 3D shapes and a large dataset of images with diverse anomalies, and proposes a transformer-based approach to solve it.
http://arxiv.org/abs/2406.19392v1
Compressor summary: ReXTime is a benchmark to test AI models' ability to reason about cause-and-effect relationships across video segments, and it shows that current models are not yet as good as humans at this task.
http://arxiv.org/abs/2406.19391v1
Compressor summary: Fibottention is a sparse, efficient, and general self-attention architecture based on Fibonacci sequences for visual tasks that captures fine-grained details while reducing computational overhead.
http://arxiv.org/abs/2406.19390v1
Compressor summary: Key points: - New system for automatic 2D floorplan reconstruction using SALVe, a novel learned alignment verifier - Inputs: sparse 360° panoramas with semantic features of windows, doors, and openings - Outputs: room poses, layouts, and floorplan - Outperforms state-of-the-art SfM systems in completeness and accuracy Summary: The authors present a new system that uses SALVe, a learned alignment verifier, to reconstruct 2D floorplans from sparse 360° panoramas with semantic features, achieving better results than existing methods.
http://arxiv.org/abs/2406.19389v1
Compressor summary: OMG-LLaVA is a framework that combines pixel-level image understanding with reasoning abilities, enabling flexible user interaction via visual and text prompts.
http://arxiv.org/abs/2406.19384v1
Compressor summary: The text shows that Large Language Models are very robust and can still predict well even after layers are deleted or swapped, and suggests that there are four stages of inference across different models.
http://arxiv.org/abs/2406.19380v1
Compressor summary: TabReD is a new collection of tabular machine learning benchmarks that reflect real-world scenarios by including time-based splits and feature engineering.
http://arxiv.org/abs/2406.19371v1
Compressor summary: This paper introduces Suri, a dataset for multi-constraint instruction following in long-form text generation, and proposes Instructional ORPO (I-ORPO), an alignment method that uses synthetic negative feedback from LLM-generated corrupted instructions to improve quality.
http://arxiv.org/abs/2406.19370v1
Compressor summary: The authors propose analyzing how generative models learn abstract concepts using a concept space framework, finding that hidden capabilities emerge suddenly in the learning process.
http://arxiv.org/abs/2406.19369v1
Compressor summary: RWKV-SAM is a fast and accurate segmentation model that combines convolution and radial wave kernel correlation (RWKV) operations in its backbone and an efficient decoder for multiscale tokens.
http://arxiv.org/abs/2406.19364v1
Compressor summary: SimTxtSeg is a novel framework that uses simple text cues to generate pseudo-labels and fuse text and image features for weakly-supervised medical image segmentation.
http://arxiv.org/abs/2406.19362v1
Compressor summary: The paper proposes a new framework, STAL3D, that combines self-training and adversarial learning for unsupervised domain adaptation in 3D object detection, improving performance on cross-domain tasks and addressing issues like background interference and source domain size bias.
http://arxiv.org/abs/2406.19358v1
Compressor summary: This text compares the cross-lingual sentiment analysis performance of SMLMs and LLMs, finding that SMLMs excel in zero-shot settings while LLMs adapt better in few-shot settings.
http://arxiv.org/abs/2406.19356v1
Compressor summary: DiVERT is a novel method for generating and understanding implausible multiple-choice question distractors in math by using a large language model.
http://arxiv.org/abs/2406.19354v1
Compressor summary: The paper discusses challenges and proposes a testbed for model editing, which involves updating knowledge in language models, using a semi-synthetic dataset based on Wikidata.
http://arxiv.org/abs/2406.19353v1
Compressor summary: The paper introduces CORE4D, a large-scale 4D human-object-human interaction dataset that helps study collaborative object rearrangement and provides new challenges for generating human-object interactions.
http://arxiv.org/abs/2406.19349v1
Compressor summary: The paper introduces IndoToxic2024, a dataset for Indonesian hate speech and toxicity classification, focusing on vulnerable groups during the presidential election, and evaluates its effectiveness with various models.
http://arxiv.org/abs/2406.19341v1
Compressor summary: The paper proposes a bi-level learning method for a visual conditioning token that can adapt a deep neural network model to different domains in image classification, improving its performance by up to 1.9%.
http://arxiv.org/abs/2406.19320v1
Compressor summary: $\Delta$-IRIS is a fast and efficient model-based RL agent that uses discrete autoencoders and autoregressive transformers to predict future deltas, achieving state of the art results on the Crafter benchmark.
http://arxiv.org/abs/2406.19317v1
Compressor summary: The paper shows how Large Language Models can improve Contextual Multi-Armed Bandits by simulating human preferences, reducing online learning regret and data-gathering costs.
http://arxiv.org/abs/2406.19316v1
Compressor summary: This paper proposes two methods to improve training data for Scene Graph Generation, which are Feature Space Triplet Augmentation and Soft Transfer, and shows their effectiveness in achieving high Recall scores.
http://arxiv.org/abs/2406.19314v1
Compressor summary: LiveBench is a new benchmark for large language models that updates questions from recent sources, scores answers automatically, and covers various challenging tasks to avoid test set contamination and biases.
http://arxiv.org/abs/2406.19307v1
Compressor summary: This paper reviews the current state of research on commonsense causality, which is essential for human intelligence and decision-making, but lacks systematic exploration.
http://arxiv.org/abs/2406.19302v1
Compressor summary: Key points: - Climate change is accelerating due to human actions - Satellite images help observe and measure effects on natural areas - A deep learning framework maps land naturalness from Sentinel-2 data using contextual and geographical priors - Quantifying naturalness aids environmental stewardship Summary: The text describes how satellite images and a deep learning framework can map land naturalness, which is affected by human actions and climate change, to help protect the environment.
http://arxiv.org/abs/2406.19301v1
Compressor summary: MCNC is a new method to compress large AI models by constraining their parameter space to low-dimensional nonlinear manifolds, achieving high compression rates and performance across various tasks.
http://arxiv.org/abs/2406.19300v1
Compressor summary: scTree is a new method for single-cell RNA sequencing data that corrects batch effects and learns a tree structure representing clusters and their hierarchies, improving understanding of cellular landscapes.
http://arxiv.org/abs/2406.19299v1
Compressor summary: PNeRV is a patch-wise implicit neural representation for videos that preserves spatiotemporal continuity using polynomial neural networks and hierarchical sampling, achieving better performance in tasks like compression and downstream applications.
http://arxiv.org/abs/2406.19298v1
Compressor summary: The paper introduces Decomp Diffusion, an unsupervised method that decomposes images into compositional components, allowing for flexible scene composition.
http://arxiv.org/abs/2406.19297v1
Compressor summary: This paper explores how different modalities (e.g., vision and language) evolve at different rates when training models on a sequence of tasks, proposes a modality-aware feature distillation method to improve performance, and shows its effectiveness in multimodal continual learning settings.
http://arxiv.org/abs/2406.19292v1
Compressor summary: Finetuning large language models on a synthetic dataset of numerical key-value retrieval tasks enhances their information retrieval and reasoning abilities in long-context settings without causing hallucination or sacrificing general benchmark performance.
http://arxiv.org/abs/2406.19290v1
Compressor summary: This paper investigates various methods and applications in 2D and 3D human pose estimation, comparing state-of-the-art algorithms and discussing future directions.
http://arxiv.org/abs/2406.19280v1
Compressor summary: The paper introduces PubMedVision, a large and high-quality dataset for medical image-text pairs created by refining existing data and using GPT-4V to denoise and reformat it, leading to improved medical multimodal capabilities of language models.
http://arxiv.org/abs/2406.19276v1
Compressor summary: VERISCORE is a metric for evaluating factuality in diverse long-form generation tasks that can handle both verifiable and unverifiable claims, and it reveals that different models perform differently across tasks.
http://arxiv.org/abs/2406.19272v1
Compressor summary: SCBMs use distributional parameterization to model concept dependencies and improve intervention effectiveness in interpretable Concept Bottleneck Models.
http://arxiv.org/abs/2406.19271v1
Compressor summary: The research proposes a system that collects and filters web data using trusted AI models to ensure pure and reliable training data for Large Language Models.
http://arxiv.org/abs/2406.19263v1
Compressor summary: The paper introduces a Tree-of-Lens (ToL) agent that uses a Hierarchical Layout Tree to understand and describe screen content and layout based on user-indicated points, outperforming other tools on a new Screen Point-and-Read benchmark.
http://arxiv.org/abs/2406.19258v1
Compressor summary: GCFormer is a new graph Transformer that creates positive and negative token sequences to better capture diverse graph information and improve node representation quality for node classification tasks.
http://arxiv.org/abs/2406.19256v1
Compressor summary: AIDRIN is a framework that evaluates the quality and suitability of data for AI using various metrics, visualizations, and reports.
http://arxiv.org/abs/2406.19255v1
Compressor summary: This paper proposes a method called Finsta to improve video-language models by aligning text and video using fine-grained scene graphs, achieving better results on various tasks.
http://arxiv.org/abs/2406.19253v1
Compressor summary: The authors propose a new architecture for predicting space-time sequences that combines CNNs with advection and reaction-diffusion components, inspired by physical processes, to improve performance and explainability.
http://arxiv.org/abs/2406.19251v1
Compressor summary: The AutoRAG-HP framework uses a hierarchical multi-armed bandit method to efficiently tune hyper-parameters for Retrieval-Augmented Generation systems, achieving high recall with less API calls than Grid Search.
http://arxiv.org/abs/2406.19249v1
Compressor summary: NTFormer is a new graph Transformer that uses a novel token generator called Node2Par to express rich graph features, enabling it to learn node representations without requiring graph-specific modifications.
http://arxiv.org/abs/2406.19247v1
Compressor summary: The paper proposes a new method for assessing image quality by combining local manifold learning with contrastive learning, which improves differentiation and performance compared to existing methods.
http://arxiv.org/abs/2406.19244v1
Compressor summary: Key points: - GNNs are powerful for graph learning but have limited expressiveness - $K$-hop GNNs aggregate information from neighbors within $K$ hops - Substructure encoding function enhances expressive power of $K$-hop GNNs - Method is provably more powerful than previous works and achieves state-of-the-art results Summary: The paper proposes a substructure encoding function to improve the expressive power of $K$-hop graph neural networks, which outperforms previous works and matches 3-WL test.
http://arxiv.org/abs/2406.19238v1
Compressor summary: The study analysed how large language models respond to political statements and found that demographic features and question formats affect their stances, revealing biases and patterns in their justifications.
http://arxiv.org/abs/2406.19237v1
Compressor summary: FlowVQA is a new benchmark for testing visual question-answering models on flowcharts with various reasoning tasks.
http://arxiv.org/abs/2406.19236v1
Compressor summary: HA-VLN is a new approach to navigation that considers dynamic human activities and uses novel datasets and agents to improve real-world applicability and robustness.
http://arxiv.org/abs/2406.19232v1
Compressor summary: RuBLiMP is a new benchmark for testing Russian language models' grammatical knowledge by providing diverse minimal pairs covering various linguistic phenomena.
http://arxiv.org/abs/2406.19228v1
Compressor summary: The text discusses a framework for LLMs to detect silent tool errors and plan better, focusing on their use as tools rather than just choosing them.
http://arxiv.org/abs/2406.19227v1
Compressor summary: ARTE is a framework that aligns teacher instructional content with student preferences to generate tailored training examples for Knowledge Distillation on edge devices.
http://arxiv.org/abs/2406.19226v1
Compressor summary: Key points: - The paper proposes SimClass, a multi-agent framework using LLMs for virtual classroom teaching - SimClass simulates classroom interaction patterns and enhances user experience - Agents collaborate to create enlivening interactions and improve learning process Summary: SimClass is a novel framework that uses large language models to simulate and enhance classroom interactions in a virtual setting, improving the user's learning experience.
http://arxiv.org/abs/2406.19225v1
Compressor summary: The ProtoGMM model uses Gaussian mixtures to estimate multi-prototype distributions for semantic segmentation in unlabeled target domains by leveraging supervised models from labeled source domains, improving predictions with contrastive learning.
http://arxiv.org/abs/2406.19223v1
Compressor summary: T-FREE is a novel tokenizer that improves efficiency and cross-lingual transfer learning by embedding words using sparse activation patterns over character triplets without a reference corpus.
http://arxiv.org/abs/2406.19217v1
Compressor summary: The paper proposes a novel error detection method for robot-assisted surgeries using contextual information from videos and reasoning modules inspired by natural language processing.
http://arxiv.org/abs/2406.19215v1
Compressor summary: SeaKR is a new model that uses LLMs' self-aware uncertainty to retrieve and integrate knowledge for better question answering.
http://arxiv.org/abs/2406.19195v1
Compressor summary: The paper proposes a method to estimate the long-term heterogeneous dose-response curve by using optimal transport weighting to account for unobserved confounders and providing theoretical guarantees for counterfactual prediction error.
http://arxiv.org/abs/2406.19189v1
Compressor summary: The study proposes BENDR, a BERT-based model that uses pre-training and fine-tuning on large datasets to improve seizure detection from EEG recordings with low false positives and acceptable sensitivity.
http://arxiv.org/abs/2406.19188v1
Compressor summary: The paper introduces a method to make direct alignment in large language models more consistent across different completion lengths using a new averaging operator.
http://arxiv.org/abs/2406.19185v1
Compressor summary: CoPG is an RL algorithm that can finetune LLMs with off-policy data and outperforms direct alignment methods and policy gradient in generalization.
http://arxiv.org/abs/2406.19175v1
Compressor summary: The paper discusses using synthetic and real-world X-ray images to train an object detection model, finding that a mix of synthetic and unlabeled real-world data is more cost-efficient than using only labeled real-world data.
http://arxiv.org/abs/2406.19172v1
Compressor summary: The paper presents three techniques to detect and fix annotation errors in the OntoNotes 5.0 English NER corpus, improving model performance.
http://arxiv.org/abs/2406.19170v1
Compressor summary: The study investigates if providing explanations helps users understand an AI system's limitations when it performs poorly on a visual task involving full-color or grayscale images, but finds that explanations do not improve users' perceptions of the system's capabilities and limitations.
http://arxiv.org/abs/2406.19162v1
Compressor summary: The paper proposes a new method for estimating the migration direction of cells from a single image using deep circular regression and achieving better accuracy than previous methods.
http://arxiv.org/abs/2406.19156v1
Compressor summary: The paper proposes a neural network (HCMGNN) to predict gene-microbe-disease associations using causal metapaths and semantic sharing.
http://arxiv.org/abs/2406.19154v1
Compressor summary: The dual deep neural network (D-DNet) is a more efficient and accurate model for predicting PM2.5 and AOD550 levels than traditional models or the CAMS 4D-Var system, using real-time observations to improve forecasting.
http://arxiv.org/abs/2406.19150v1
Compressor summary: RAVEN is a framework that enhances vision-language models with retrieval augmentation for multiple tasks, improving efficiency and performance.
http://arxiv.org/abs/2406.19148v1
Compressor summary: The authors propose a random background augmentation method called BackMix for neural networks to improve echocardiogram view classification by focusing on the image content and reducing spurious correlations.
http://arxiv.org/abs/2406.19146v1
Compressor summary: The paper compares two scaling laws for optimal model size as a function of compute budget, identifies factors causing discrepancies, and shows how to obtain agreement with one law by correcting these factors.
http://arxiv.org/abs/2406.19136v1
Compressor summary: Key points: - Solubility prediction is crucial for drug effectiveness and safety - Traditional methods fail to capture complex molecular structures - Novel deep learning framework combines attention-based transformers, LSTM networks, and GCN - Outperforms benchmark models and offers insights for drug design and selection Summary: The text presents a novel deep learning framework that improves the prediction of drug solubility by capturing complex molecular structures better than traditional methods, with potential applications for drug discovery.
http://arxiv.org/abs/2406.19131v1
Compressor summary: The authors introduce CELLO, a novel dataset for causal reasoning tasks involving interactions between humans and objects, and propose CELLO-CoT, a chain-of-thought prompting strategy to improve large vision-language models' performance on these tasks.
http://arxiv.org/abs/2406.19130v1
Compressor summary: The paper proposes an evidential Concept Embedding Model that improves interpretable deep learning methods in medical image analysis by modeling concept uncertainty and rectifying misalignments for better clinical diagnosis explanations.
http://arxiv.org/abs/2406.19121v1
Compressor summary: ARLC is a model that learns abductive reasoning with context, achieving high accuracy and interpretability on Raven's matrices tasks, surpassing neuro-symbolic and large language models with fewer parameters.
http://arxiv.org/abs/2406.19116v1
Compressor summary: CHEW is a dataset of Wikipedia changes in text, used to test LLMs' timeline understanding and identify meaning shifts.
http://arxiv.org/abs/2406.19112v1
Compressor summary: The paper proposes an improved training method for smaller LLMs by leveraging knowledge from larger models and a novel post-training domain alignment phase, achieving better results than state-of-the-art models with more parameters.
http://arxiv.org/abs/2406.19107v1
Compressor summary: The text describes a new lightweight face detector with a customized backbone network and two multi-task losses that achieves high accuracy on the WIDER FACE dataset.
http://arxiv.org/abs/2406.19102v1
Compressor summary: Statements are a new data structure for extracting quantitative facts from ESG reports using deep learning models like SemTabNet, which can facilitate exploratory data analysis.
http://arxiv.org/abs/2406.19101v1
Compressor summary: DocKylin is a document-centric MLLM that uses pixel-level slimming and token-level slimming to improve visual content understanding in high-resolution document images.
http://arxiv.org/abs/2406.19097v1
Compressor summary: This survey examines 50 examples of datasets and models for fairness and bias in Large Multimodal Models (LMMs) and proposes a new category of quantifying bias, preuse.
http://arxiv.org/abs/2406.19092v1
Compressor summary: ASWA is a technique that improves generalization by updating a running average of model parameters only when it helps the validation performance, combining SWA and early stopping.
http://arxiv.org/abs/2406.19087v1
Compressor summary: The text compares how humans and artificial neural networks represent images, finding similarities and differences in their strategies and highlighting the need for better alignment.
http://arxiv.org/abs/2406.19073v1
Compressor summary: AMBROSIA is a new benchmark for testing text-to-SQL parsers' ability to handle ambiguous questions with different types of uncertainty, using controlled databases generated from scratch.
http://arxiv.org/abs/2406.19071v1
Compressor summary: The paper proposes a new method to improve empathetic responses from conversational agents using preference datasets and optimization algorithms, evaluating it on two metrics and a benchmark dataset.
http://arxiv.org/abs/2406.19070v1
Compressor summary: FAGhead is a method that creates realistic 3D human avatars from monocular videos with controllable expressions and poses, using Point-based Learnable Representation Field (PLRF) and alpha rendering.
http://arxiv.org/abs/2406.19066v1
Compressor summary: The paper proposes using uncertain identity data to train classifiers for better algorithmic fairness.
http://arxiv.org/abs/2406.19065v1
Compressor summary: The paper introduces STBench, a benchmark dataset for evaluating spatio-temporal understanding in large language models, and assesses 13 LLMs on four dimensions of this capability.
http://arxiv.org/abs/2406.19057v1
Compressor summary: REC-based detection uses language descriptions to detect objects not in existing class names, but may make false positive predictions; however, these can be mitigated by filtering detections by size, improving semantic segmentation and data annotation efficiency.
http://arxiv.org/abs/2406.19055v1
Compressor summary: SimpleFusion is a simple framework for fusing visible and infrared images using two plain convolutional neural networks without downsampling, preserving complementary information between the modalities.
http://arxiv.org/abs/2406.19054v1
Compressor summary: The text describes an accessible prototype system for non-experts to interact with machine learning and gain insights into user behavior, using a novel methodology for interactive machine learning and multimodal interaction.
http://arxiv.org/abs/2406.19049v1
Compressor summary: The study explores when the accuracy-on-the-line relationship in machine learning breaks down due to noisy data, nuisance features, and spurious features, leading to "Accuracy-on-the-wrong-line".
http://arxiv.org/abs/2406.19048v1
Compressor summary: The paper proposes a novel bidirectional complementary Lidar-camera fusion framework, called BiCo-Fusion, that can achieve robust semantic- and spatial-aware 3D object detection by mutually enhancing multi-modal features and adaptatively selecting them.
http://arxiv.org/abs/2406.19040v1
Compressor summary: The paper analyzes differential privacy in empirical risk minimization for semi-sensitive data and provides better bounds on excess risk.
http://arxiv.org/abs/2406.19032v1
Compressor summary: The paper proposes a method to improve large language models' generalization by using the reliability of weak supervision signals in the alignment process, which helps reduce errors from noisy supervision and enhance model accuracy.
http://arxiv.org/abs/2406.19030v1
Compressor summary: The paper introduces DiffLoss, a naturalness-oriented and semantic-aware optimization mechanism for image restoration that leverages diffusion models' distribution coverage and high-level semantic space to improve visual and semantic perception quality.
http://arxiv.org/abs/2406.19015v1
Compressor summary: The authors use Gaussian process models to analyze battery resistance, develop fault detection rules, and understand cell-level failure mechanisms for lithium iron phosphate batteries based on a large dataset of returned batteries.
http://arxiv.org/abs/2406.19006v1
Compressor summary: VideoMambaPro improves video action recognition by addressing limitations in Mamba's token processing with masked backward computation and elemental residual connections, outperforming transformer models while being more efficient.
http://arxiv.org/abs/2406.18999v1
Compressor summary: The paper proposes a re-ordering approach to improve image-based species identification by using DNA barcodes to detect out-of-distribution classes.
http://arxiv.org/abs/2406.18996v1
Compressor summary: The paper proposes a zero-shot domain adaptation method that uses data augmentation, domain adversarial learning, and dual-level contrastive learning to learn domain-invariant features with low task bias for the target task of interest.
http://arxiv.org/abs/2406.18992v1
Compressor summary: SSCBM is a new framework that improves concept bottleneck models by using semi-supervised learning and aligning unlabeled data with concepts, making them more accurate and efficient even with limited labeled data.
http://arxiv.org/abs/2406.18990v1
Compressor summary: The article proposes a real-time surrogate model for parameterized PDEs using a hybrid method of POD and SVR, aimed for interactive analysis in digital twins applications.
http://arxiv.org/abs/2406.18967v1
Compressor summary: UNest is a novel unpaired medical image synthesis architecture that uses structural inductive biases and structural attention to improve the synthesis of anatomical regions, achieving significant improvements over recent methods on three modalities.
http://arxiv.org/abs/2406.18966v1
Compressor summary: UniGen is a framework that uses large language models to generate diverse, accurate, and controllable text datasets for various applications, such as benchmarking and data augmentation.
http://arxiv.org/abs/2406.18958v1
Compressor summary: AnyControl is a framework for image synthesis that handles diverse control signals and produces high-quality images faithful to the input text.
http://arxiv.org/abs/2406.18954v1
Compressor summary: Alignment methods improve bot performance in following rules compared to instruction fine-tuning alone.
http://arxiv.org/abs/2406.18944v1
Compressor summary: This paper analyzes the vulnerability of personalized diffusion models to adversarial perturbations and proposes a method to align the latent image with its semantic meaning and contrastive learning to prevent performance degradation.
http://arxiv.org/abs/2406.18941v1
Compressor summary: CLIP3D-AD is a novel 3D few-shot anomaly detection method that adapts CLIP for classification and segmentation using synthesized anomalous images, image and text adapters, and multi-view fusion.
http://arxiv.org/abs/2406.18939v1
Compressor summary: The text proposes a fuzzy logic framework to evaluate and standardize different definitions of group fairness in AI systems, considering uncertain and context-specific beliefs.
http://arxiv.org/abs/2406.18931v1
Compressor summary: The paper proposes a new learning system that simplifies hyperparameter tuning and improves training efficiency using semi-adaptive synergetic two-way pseudoinverse learning subsystems trained without gradient descent.
http://arxiv.org/abs/2406.18930v1
Compressor summary: The book covers various aspects of AI research, from fundamentals to applications, and targets students and professionals interested in the field.
http://arxiv.org/abs/2406.18927v1
Compressor summary: The paper proposes a novel method for rectifying deviated fisheye images by learning a distortion vector map that captures local distortion features and improves performance with data augmentation.
http://arxiv.org/abs/2406.18926v1
Compressor summary: The study compares two GPT-2 models on a novel decision-making task, showing that fine-tuned models rely more on pretrained representations than scratch-trained ones, which develop more specific mechanisms.
http://arxiv.org/abs/2406.18925v1
Compressor summary: The paper introduces VisArgs, a dataset for evaluating AI's ability to understand visual arguments, and shows that current AI models struggle with identifying relevant visual cues and perform better when given them as input.
http://arxiv.org/abs/2406.18924v1
Compressor summary: The paper proposes an efficient multi-objective reinforcement learning algorithm that learns a continuous representation of the Pareto set using a single hypernet, achieving better performance on robot control problems.
http://arxiv.org/abs/2406.18922v1
Compressor summary: The paper proposes a better way to estimate training time and loss for transformer models based on memory copies, allowing for more efficient architecture design.
http://arxiv.org/abs/2406.18921v1
Compressor summary: The paper proposes using psychological questions to enhance small role-playing language models, improving their dialogue generation and character portrayal.
http://arxiv.org/abs/2406.18916v1
Compressor summary: UnifiedTQA is a trustful question answering framework that supports multiple types of structured data using a Condition Graph representation and a two-level querying method with dynamic demonstration retrieval.
http://arxiv.org/abs/2406.18910v1
Compressor summary: The paper introduces a method for generating diverse and accurate speaking-style captions by first predicting speaking-style factors and then sampling from them.
http://arxiv.org/abs/2406.18908v1
Compressor summary: The authors propose a semi-supervised segmentation method using synthetic images and optical flow clues for detecting obstacles in railway scenarios with varying conditions.
http://arxiv.org/abs/2406.18907v1
Compressor summary: The authors compare traditional and BERT-based dynamic topic models for Roman literature and find that while statistics favor the former, insights from the latter are better and it's easier to use.
http://arxiv.org/abs/2406.18906v1
Compressor summary: The authors develop a task to assess how well large language models can recognize different poetic forms and elements, and discuss the challenges of creating benchmarks for poetry and other creative tasks.
http://arxiv.org/abs/2406.18901v1
Compressor summary: The paper proposes an autoencoder-based method to analyze and reduce spurious correlations, improving out-of-distribution generalization for deep neural networks on the Global Wheat Head Detection dataset.
http://arxiv.org/abs/2406.18898v1
Compressor summary: The paper introduces a large-scale 360° videos dataset collected from diverse real-world environments, with pose and depth information, for learning-based tasks such as depth estimation and view synthesis.
http://arxiv.org/abs/2406.18895v1
Compressor summary: The paper explores using large language models for generating interlinear glossed text without any training and shows that targeted example selection can improve performance.
http://arxiv.org/abs/2406.18893v1
Compressor summary: The paper proposes new methods to improve text-to-image customization by aligning generated images with user-supplied reference images and fixing issues in existing methods.
http://arxiv.org/abs/2406.18884v1
Compressor summary: The paper proposes a novel multi-level sequential three-way decision for group decision-making (S3W-GDM) method that considers vagueness, hesitation, and variation in GDM problems using granular computing to improve efficiency.
http://arxiv.org/abs/2406.18880v1
Compressor summary: The paper explores how large language models can be used for low-resource languages tasks without labeled data by using a novel in-context learning approach called Self-Supervised Prompting (SSP).
http://arxiv.org/abs/2406.18872v1
Compressor summary: Self-play improves language models in both cooperative and competitive settings, even when objectives change from cooperation to competition or vice versa.
http://arxiv.org/abs/2406.18868v1
Compressor summary: RAIL is a method to learn from multiple domains without forgetting or relying on extra data, while preserving VLM's zero-shot ability using recursive ridge regression and feature projection.
http://arxiv.org/abs/2406.18865v1
Compressor summary: DCEM is an algorithm for learning from selective labels with disparate censorship, which mitigates labeling biases without compromising performance on synthetic and clinical data.
http://arxiv.org/abs/2406.18864v1
Compressor summary: The paper studies how modality gap affects cross-modality transfer and proposes MoNA, a meta-learning method that reduces the gap by transforming target data.
http://arxiv.org/abs/2406.18861v1
Compressor summary: The researchers developed a machine learning model to predict traffic incident durations in Sydney, using data on road characteristics, incidents, and socio-economic factors, and found that XGBoost performed best.
http://arxiv.org/abs/2406.18859v1
Compressor summary: The study investigates if large language models can generate patient-friendly versions of radiology reports using self-correction prompts and a new evaluation method with radiologists and laypeople.
http://arxiv.org/abs/2406.18856v1
Compressor summary: The paper introduces a Chinese-English financial news dataset (FFN) and evaluates ChatGPT, ERNIE-bot, and OpenNMT models for financial machine translation, highlighting challenges and opportunities in this domain.
http://arxiv.org/abs/2406.18854v1
Compressor summary: The paper proposes Tri-Hom, a new composite metric that considers three aspects of graph homophily and shows its effectiveness in understanding the performance of Graph Neural Networks.
http://arxiv.org/abs/2406.18853v1
Compressor summary: The paper proposes a multi-objective decoding algorithm that combines multiple language models to better serve diverse user needs and shows its effectiveness in improving various metrics.
http://arxiv.org/abs/2406.18851v1
Compressor summary: LICO is a model that enhances large language models for black-box optimization in the molecular domain using in-context predictions.
http://arxiv.org/abs/2406.18849v1
Compressor summary: The paper introduces Dysca, a dynamic and scalable benchmark for evaluating large vision-language models on novel images, questions, and answers across different styles, scenarios, and question types.
http://arxiv.org/abs/2406.18848v1
Compressor summary: The paper proposes a sparse self-attention model using domain knowledge to impute missing hourly step count data from wearable sensors in real-world settings.
http://arxiv.org/abs/2406.18847v1
Compressor summary: The paper proposes LAPDOG, a method that uses external knowledge and story retrieval to generate personalized dialogues based on persona profiles.
http://arxiv.org/abs/2406.18845v1
Compressor summary: The paper proposes a dual-stream framework for event stream-based pattern recognition that uses differentiated fusion, Transformer, GNN, and a hybrid interaction readout mechanism to achieve state-of-the-art performance on multiple datasets.
http://arxiv.org/abs/2406.18844v1
Compressor summary: This paper investigates the generalizability of backdoor attacks on large vision-language models during instruction tuning and proposes modifications to improve attack effectiveness across different domains.
http://arxiv.org/abs/2406.18839v1
Compressor summary: The study proposes using simpler questions before retrieving visual or non-visual information to improve knowledge-based visual question-answering performance on three datasets.
http://arxiv.org/abs/2406.18837v1
Compressor summary: Our hybrid approach combines deep learning and optical flow to perform motion segmentation without training data, using object proposals, clustering, and depth maps as cues.
http://arxiv.org/abs/2406.18836v1
Compressor summary: The paper presents a new image retrieval method using masked image-text pairs to learn the relationship between a query image and text, improving accuracy.
http://arxiv.org/abs/2406.18832v1
Compressor summary: OutlierTune is an efficient method for quantizing large language models' activations by pre-executing dequantization and symmetrization, achieving better generalization and hardware efficiency.
http://arxiv.org/abs/2406.18817v1
Compressor summary: Key points: - A novel non-rigid point set registration method based on clustering centroids and members - Tikhonov regularization with an $\ell_1$-induced Laplacian kernel for smooth and robust displacement fields - Clustering-improved Nystr"om method to reduce computational complexity and storage of the Gram matrix - High accuracy, low-dimensionality, and ability to handle large deformations Summary: The paper proposes a new non-rigid point set registration method that uses clustering analysis, Tikhonov regularization with an $\ell_1$ kernel, and a clustering-improved Nystr"om method to achieve high accuracy, low complexity, and robustness.
http://arxiv.org/abs/2406.18809v1
Compressor summary: DEC is a flexible framework for unsupervised domain adaptation in semantic segmentation that uses synthetic multi-source datasets to improve performance on real-world datasets.
http://arxiv.org/abs/2406.18805v1
Compressor summary: The text proposes a unified algorithmic framework for minimizing regret in interactive problems with adaptive agents, by casting them as online control problems and analyzing their properties.
http://arxiv.org/abs/2406.18802v1
Compressor summary: Random feature techniques can approximate positive-definite kernels with infinite-dimensional dot products using optimal sampling policies.
http://arxiv.org/abs/2406.18800v1
Compressor summary: Infinite-width NTK models can access richer features than finite models, but their performance is limited by weak optimizers like SGD.