This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-06-23 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2406.14563v1
Compressor summary: The paper proposes a two-step method to create safer and more aligned large language models by generating and incorporating synthetic data during merging.
http://arxiv.org/abs/2406.14562v1
Compressor summary: Whiteboard-of-thought prompting helps multimodal language models use images to solve visual reasoning tasks, improving their performance on four difficult natural language challenges.
http://arxiv.org/abs/2406.14561v1
Compressor summary: This paper presents correct methods for computing word probabilities in language models using subwords, highlighting issues with bow-marking tokenizers and showing how these corrections impact linguistic studies.
http://arxiv.org/abs/2406.14555v1
Compressor summary: The text reviews multimodal-guided image editing techniques using text-to-image diffusion models, presenting a unified framework and discussing various editing scenarios, challenges, and future directions.
http://arxiv.org/abs/2406.14553v1
Compressor summary: We compress xCOMET, a machine translation evaluation metric, using distillation, quantization, and pruning techniques to make it more efficient and accessible.
http://arxiv.org/abs/2406.14551v1
Compressor summary: SaSPA is a text-to-image diffusion method that generates diverse and accurate synthetic images for fine-grained visual classification tasks without using real images as guidance, outperforming existing methods in various settings.
http://arxiv.org/abs/2406.14550v1
Compressor summary: GraphReader is a graph-based system that helps LLMs process long texts more effectively by using an agent to explore the text as a graph and generate answers based on gathered insights.
http://arxiv.org/abs/2406.14549v1
Compressor summary: The study investigates how machine learning models memorize and leak sensitive information during training, and proposes a diagnostic test to detect latent memorized sequences.
http://arxiv.org/abs/2406.14548v1
Compressor summary: ECT improves training efficiency of consistency models by viewing them as a special case of diffusion models and fine-tuning them with a differential equation.
http://arxiv.org/abs/2406.14546v1
Compressor summary: The text explores how large language models can infer hidden information from training data using inductive out-of-context reasoning, which could pose safety risks if they acquire dangerous knowledge.
http://arxiv.org/abs/2406.14545v1
Compressor summary: The research develops a zero-knowledge framework to extract database schema elements from text-to-SQL models without compromising privacy and security.
http://arxiv.org/abs/2406.14544v1
Compressor summary: Prism is a framework that separates perception and reasoning in vision language models to assess their strengths, improve performance, and reduce costs.
http://arxiv.org/abs/2406.14541v1
Compressor summary: This paper shows that large language models struggle to generate realistic tables and proposes a method to make them better at it.
http://arxiv.org/abs/2406.14539v1
Compressor summary: iCD is a new method to improve text-to-image generation by encoding real images into its latent space and enabling precise image manipulation using dynamic guidance.
http://arxiv.org/abs/2406.14537v1
Compressor summary: The paper proposes MacroHFT, a novel reinforcement learning method for high-frequency trading that combines multiple sub-agents with different financial indicators and a hyper-agent to handle market fluctuations using memory and context.
http://arxiv.org/abs/2406.14532v1
Compressor summary: This paper investigates how finetuning LLMs on synthetic data affects math reasoning efficiency and proposes using per-step negatives to address spurious correlations and improve performance.
http://arxiv.org/abs/2406.14529v1
Compressor summary: The paper compares KANs and MLPs on tabular datasets, finding that KANs have better accuracy and F1 scores but are computationally more expensive than MLPs.
http://arxiv.org/abs/2406.14528v1
Compressor summary: DeciMamba extends Mamba's context to handle long-range NLP tasks more effectively.
http://arxiv.org/abs/2406.14526v1
Compressor summary: The study evaluates how image and video generation models can produce copyrighted characters, even without explicit prompts, and suggests combining existing and new mitigation strategies to reduce this issue.
http://arxiv.org/abs/2406.14517v1
Compressor summary: The paper proposes PostMark, a post-hoc watermarking technique for LLM-generated text that does not require logit access and is more robust to paraphrasing attacks than existing methods.
http://arxiv.org/abs/2406.14515v1
Compressor summary: MMBench-Video is a benchmark for evaluating large vision-language models' video understanding skills using free-form questions and YouTube videos.
http://arxiv.org/abs/2406.14514v1
Compressor summary: The paper proposes a layered graph approach to model dynamic crime scenarios with moving attackers and defenders, and compares it with a MILP approach in terms of computational time and solution quality.
http://arxiv.org/abs/2406.14511v1
Compressor summary: The authors investigate how chain of thought rationales improve model distillation and find that placing them after labels and using only a few key tokens are crucial for achieving better performance.
http://arxiv.org/abs/2406.14510v1
Compressor summary: The paper proposes a weakly supervised diffusion model for consistent and identity-preserving removal of small attributes like glasses in videos, using synthetic imperfect data and strong video priors.
http://arxiv.org/abs/2406.14508v1
Compressor summary: The study shows that large language models can generate persuasive political messages, but their advantage over smaller models decreases significantly as they get bigger, and coherence is a key factor in their effectiveness.
http://arxiv.org/abs/2406.14507v1
Compressor summary: The paper proposes a cubic-regularized Newton's method for efficiently unlearning neural networks while mitigating catastrophic forgetting and preserving data ownership rights.
http://arxiv.org/abs/2406.14504v1
Compressor summary: The paper introduces a new task for evaluating how well large language models can adapt translations to different cultures, revealing their strengths and weaknesses in this area.
http://arxiv.org/abs/2406.14503v1
Compressor summary: The CAIL 2023 Argument Mining Track aims to identify and extract argument pairs from trial dialogs using summarized judgment documents and trial recordings, introducing new dataset CAIL2023-ArgMine with annotated cases.
http://arxiv.org/abs/2406.14500v1
Compressor summary: This paper proposes a method to improve radiology report summarization by generating layperson summaries using non-expert communication techniques and few-shot learning, resulting in more accurate and accessible summaries.
http://arxiv.org/abs/2406.14498v1
Compressor summary: The paper introduces SensorCaps, a dataset of IMU-derived activity narrations, OpenSQA, an instruction-following dataset, and LLaSA, a multimodal AI agent that can interpret and respond to questions about human activities and motions.
http://arxiv.org/abs/2406.14496v1
Compressor summary: FOCI is a multiple-choice benchmark for testing fine-grained object classification skills in LVLMs, which reveals CLIP models' superior performance over LVLMs.
http://arxiv.org/abs/2406.14495v1
Compressor summary: The paper introduces the rational KAN, a novel basis function for Kolmogorov-Arnold networks using Pade approximation and rational Jacobi functions, which are evaluated on deep learning and physics-informed tasks.
http://arxiv.org/abs/2406.14492v1
Compressor summary: The text discusses the problem of image caption hallucinations by large vision-language models and challenges the claim that adding grounding objectives reduces them, using a more realistic evaluation protocol.
http://arxiv.org/abs/2406.14491v1
Compressor summary: The paper introduces Instruction Pre-Training, a method that uses instruction-response pairs to enhance language models before fine-tuning them on specific tasks.
http://arxiv.org/abs/2406.14485v1
Compressor summary: The workshop explored how XAI can enhance artistic expression and understanding in HCI, Interaction Design, AI, and digital arts fields.
http://arxiv.org/abs/2406.14483v1
Compressor summary: The paper proposes a method to estimate uncertainty in neural weather forecasts, improving trust and usefulness of the predictions.
http://arxiv.org/abs/2406.14482v1
Compressor summary: The paper introduces a large-scale benchmark dataset for multi-category visible-thermal small object detection (RGBT SOD) and proposes a robust evaluation measure called SAFit.
http://arxiv.org/abs/2406.14481v1
Compressor summary: The authors use deep neural networks to predict brain recordings from movies and identify sites where vision and language are integrated in the human brain.
http://arxiv.org/abs/2406.14479v1
Compressor summary: The paper analyzes similarity between representations of transformer models' hidden layers using cosine similarity, proposes an aligned training approach to enhance similarity, and shows its benefits for multi-exit models.
http://arxiv.org/abs/2406.14478v1
Compressor summary: This study proposes a machine learning model that predicts surface roughness in material extrusion processes using printing parameters, reducing the need for extensive experiments.
http://arxiv.org/abs/2406.14477v1
Compressor summary: The SafeSora dataset helps align large vision models with human values in text-to-video generation tasks by providing preference annotations on helpfulness and harmlessness dimensions.
http://arxiv.org/abs/2406.14476v1
Compressor summary: The text describes a novel approach to learning state representations in agents that couple descriptive and normative aspects via telic states, allowing for goal-directed behavior with minimal policy complexity.
http://arxiv.org/abs/2406.14473v1
Compressor summary: The paper proposes a data-centric viewpoint on large language models, highlighting the importance of data in various scenarios such as benchmarks, curation, attribution, and transfer, and suggesting new research directions to improve AI and LLM research.
http://arxiv.org/abs/2406.14462v1
Compressor summary: The paper explores using personas to make large language models more diverse and human-like in subjective social tasks, but finds they still struggle with implicit biases.
http://arxiv.org/abs/2406.14459v1
Compressor summary: This paper studies how BERT's performance degrades when some of its parameters are corrupted and then fine-tuned, finding that bottom-layer corruption is more harmful than top-layer corruption.
http://arxiv.org/abs/2406.14458v1
Compressor summary: The paper explores using AI/ML for highly accurate user positioning in 6G IIoT applications and reports promising results.
http://arxiv.org/abs/2406.14457v1
Compressor summary: The paper proposes a reinforcement learning method that balances understanding and generation tasks in task-oriented dialogue systems, improving performance on three datasets and few-shot ability.
http://arxiv.org/abs/2406.14456v1
Compressor summary: Key points: - The paper introduces a compositional representation learning approach for time series classification - It uses an unsupervised method to segment sequential data into coherent components based on change space - It shows competitive performance on public benchmarks Summary: The paper presents a novel method for time series classification that learns from coherent components of sequential data segmented by change space.
http://arxiv.org/abs/2406.14455v1
Compressor summary: MM-GTUNets is a new framework that uses graph transformers and reward learning to predict brain disorders from multiple data types, outperforming existing methods on large datasets.
http://arxiv.org/abs/2406.14449v1
Compressor summary: APEER is a novel algorithm that generates refined prompts for relevance ranking with large language models, improving performance over manual prompts and showing better transferability across tasks and LLMs.
http://arxiv.org/abs/2406.14446v1
Compressor summary: The paper presents a method to improve human activity recognition in smart homes using contrastive learning and seed points from an initial bootstrapping phase.
http://arxiv.org/abs/2406.14442v1
Compressor summary: The study compares different graph neural network architectures for case-control classification using omics data from Parkinson's disease and control samples.
http://arxiv.org/abs/2406.14436v1
Compressor summary: The text proposes three models for stochastic video generation that incorporate camera motion and actions into image reconstruction using multi-modal learning.
http://arxiv.org/abs/2406.14434v1
Compressor summary: The paper introduces a benchmark for evaluating truthfulness in multilingual language models and proposes FaMSS, a method to optimize data allocation across languages and data types, which reduces representation disparity and improves multilingual capabilities.
http://arxiv.org/abs/2406.14427v1
Compressor summary: The paper proposes a framework for efficient control with inference costs, where agents balance utility and task performance depending on the task demands, resulting in different inference strategies.
http://arxiv.org/abs/2406.14425v1
Compressor summary: SynDARin is a method to create QA datasets for low-resource languages by using English paragraphs and generating synthetic questions and answers from them, which are then translated and validated.
http://arxiv.org/abs/2406.14422v1
Compressor summary: FutureNet and Lane Occupancy Field (LOF) are proposed methods to improve motion prediction in autonomous driving by encoding future scenarios and jointly predicting lane occupancy of surrounding agents.
http://arxiv.org/abs/2406.14412v1
Compressor summary: The paper introduces two datasets for 3D canine pose estimation in different environments and analyzes various models' performance on them.
http://arxiv.org/abs/2406.14408v1
Compressor summary: FVEL is a tool that combines formal verification with large language models to improve code verification efficiency and accuracy.
http://arxiv.org/abs/2406.14404v1
Compressor summary: The paper proposes QuEE, a dynamic network that combines quantization and early exiting to reduce computational resources during inference in machine learning models.
http://arxiv.org/abs/2406.14401v1
Compressor summary: FairSFS is a new algorithm for streaming feature selection that ensures fairness by preventing sensitive data from affecting the model output.
http://arxiv.org/abs/2406.14399v1
Compressor summary: The WEATHER-5K dataset is a new, comprehensive, and publicly available resource for global station weather forecasting that improves accuracy by addressing limitations of existing datasets.
http://arxiv.org/abs/2406.14398v1
Compressor summary: ATAC-Net is a deep learning framework that uses a few known anomalous samples and attention-guided cropping to improve visual anomaly detection in quality control and manufacturing.
http://arxiv.org/abs/2406.14394v1
Compressor summary: SEC-QA is a framework for generating continuous financial dataset with multi-document QA pairs and refreshes using recent document collections.
http://arxiv.org/abs/2406.14388v1
Compressor summary: ADS is a method to generate high-quality posterior distributions for partially observed signals by actively selecting measurements with maximum entropy using guided diffusion.
http://arxiv.org/abs/2406.14377v1
Compressor summary: FastECG is a computationally efficient semi-supervised learning method that adapts pre-trained models for robust detection of cardiovascular diseases using electrocardiography data with limited supervision.
http://arxiv.org/abs/2406.14373v1
Compressor summary: The text explores how large language models can simulate social dynamics and replicate forces that shape human societies using a simulated agent society based on Hobbes's Social Contract Theory.
http://arxiv.org/abs/2406.14370v1
Compressor summary: The authors present a new dataset for bank check signature verification and propose an object detection network with a dilation module that improves detection and verification of genuine and forged signatures.
http://arxiv.org/abs/2406.14367v1
Compressor summary: PoseBench is a benchmark to evaluate the robustness of pose estimation models against real-world corruption, revealing vulnerabilities in current state-of-the-art methods and suggesting design improvements.
http://arxiv.org/abs/2406.14365v1
Compressor summary: Key points: - The text describes a weakly supervised learning task for pathological lymph node segmentation in the mediastinum. - The authors submitted a model that used pseudo labeling, TotalSegmentator toolbox, and public TCIA datasets to achieve third rank in the MICCAI2023 LNQ challenge. - They found that incorporating all visible lymph nodes improved segmentation performance and that models trained only on enlarged lymph nodes could not generalize to smaller ones. Summary: The authors participated in a weakly supervised learning challenge for lymph node segmentation in the mediastinum, using pseudo labeling and other methods to achieve third rank. They also discovered that incorporating all visible lymph nodes improved performance and that models trained on enlarged lymph nodes failed to generalize to smaller ones.
http://arxiv.org/abs/2406.14361v1
Compressor summary: The paper shows that current AI models for power grid control fail when a line is disconnected, and suggests using graph theory to improve them.
http://arxiv.org/abs/2406.14360v1
Compressor summary: The paper proposes a method, EBAD-NeRF, that uses event cameras to improve NeRF performance in scenes with motion blur by jointly optimizing camera poses and NeRF parameters.
http://arxiv.org/abs/2406.14349v1
Compressor summary: The text discusses the need for evaluating the stability and robustness of Explainable AI techniques, proposing a test for non-adversarial perturbations and an ensemble approach for analysing XAI methods in neural networks and tabular datasets.
http://arxiv.org/abs/2406.14343v1
Compressor summary: iWISDM is a new benchmark for evaluating multimodal models' ability to follow complex instructions in vision-language tasks, revealing a significant gap with human performance.
http://arxiv.org/abs/2406.14341v1
Compressor summary: HoTPP is a new benchmark for evaluating event sequence prediction models over a horizon, addressing the limitations of existing metrics and challenging traditional autoregressive methods.
http://arxiv.org/abs/2406.14336v1
Compressor summary: The text describes a method that uses a language model to extract spatial relations from historical narratives about the English Lake District, visualizing them as a network.
http://arxiv.org/abs/2406.14335v1
Compressor summary: The paper proposes self-supervised Interpretable Concept Embedding Models (ICEMs) that can interpret and control Large-Language Models (LLMs) by predicting concept labels, offering meaningful explanations, and allowing human interventions.
http://arxiv.org/abs/2406.14329v1
Compressor summary: The paper introduces Adaptive Adversarial Cross-Entropy (AACE) loss for Sharpness-Aware Minimization (SAM), which improves model generalization by adjusting perturbation directions based on the model's convergence stage.
http://arxiv.org/abs/2406.14328v1
Compressor summary: The paper investigates Green ML, examining various model architectures and hyperparameters to identify energy-efficient practices for sustainable ML operations.
http://arxiv.org/abs/2406.14326v1
Compressor summary: medIKAL is a framework that combines Large Language Models with knowledge graphs to enhance diagnostic capabilities in Electronic Medical Records by assigning weights, merging LLM results, and refining through path-based reranking and prompt templates.
http://arxiv.org/abs/2406.14324v1
Compressor summary: The paper introduces attention-oriented metrics (ATOMs) to study how reinforcement learning agents learn to focus on different aspects of a game and how this affects their performance.
http://arxiv.org/abs/2406.14322v1
Compressor summary: The text discusses the importance of ensuring uniform privacy protection for users when fine-tuning large language models on sensitive data using differential privacy techniques.
http://arxiv.org/abs/2406.14319v1
Compressor summary: The paper presents a low-latency inference framework for large language models that allows them to infer from incomplete prompts and reduces response time significantly while maintaining accuracy.
http://arxiv.org/abs/2406.14314v1
Compressor summary: Key points: - The paper introduces goal identification from observed UI trajectories, a task to infer user intentions from GUI interactions. - It proposes a novel evaluation metric to measure paraphrasing of task descriptions within a specific UI environment. - It uses Android-In-The-Wild and Mind2Web datasets for experiments with humans and models (GPT-4 and Gemini-1.5 Pro). - It finds that Gemini performs better than GPT but still underperforms compared to humans. Summary: The paper presents a task and metric to infer user intentions from GUI interactions, tests them on two datasets, and shows that current models are inferior to humans.
http://arxiv.org/abs/2406.14313v1
Compressor summary: The paper introduces a new method, FUn-FuSIC, that improves few-shot transfer for knowledge base question answering (KBQA) by handling unanswerable questions using logical forms and feedback from a large language model.
http://arxiv.org/abs/2406.14312v1
Compressor summary: K-Tokeniser improves clinical text processing by using semantic-based tokenisation and requires no pre-training, leading to better results in various tasks.
http://arxiv.org/abs/2406.14310v1
Compressor summary: The paper proposes a novel automated method to link high-level business requirements with technical system requirements using advanced natural language processing techniques and shows significant efficiency improvements over existing methods.
http://arxiv.org/abs/2406.14297v1
Compressor summary: The paper discusses using AI on spacecraft for efficient data analysis and transmission, demonstrating it with a CNN model for NASA's MMS mission that is reduced in size and precision while maintaining accuracy.
http://arxiv.org/abs/2406.14288v1
Compressor summary: MAGI is a community-aware graph clustering framework that uses modularity maximization as a contrastive pretext task to avoid semantic drift and achieve scalability, outperforming state-of-the-art methods on multiple datasets.
http://arxiv.org/abs/2406.14284v1
Compressor summary: The paper proposes a method to generate grammatically incorrect Bangla sentences and creates a corpus, Vaiyakarana, which can help improve automatic grammar correction in the language.
http://arxiv.org/abs/2406.14283v1
Compressor summary: Q* is a framework that guides large language models to make better decisions in reasoning tasks without fine-tuning or additional computational cost.
http://arxiv.org/abs/2406.14282v1
Compressor summary: The paper proposes a new framework to improve large language models' performance in complex question-answering tasks by using planning data from knowledge graphs.
http://arxiv.org/abs/2406.14281v1
Compressor summary: FairX is an open-source tool for analyzing and training models on fairness, utility, and explainability of data using various metrics and synthetic data generation.
http://arxiv.org/abs/2406.14277v1
Compressor summary: The paper proposes a method for open-domain question-answering that improves retrieval by breaking down questions into sub-questions and adding self-generated passages to guide answer extraction.
http://arxiv.org/abs/2406.14275v1
Compressor summary: Step-Back Profiling is a method to personalize large language models for scientific writing by capturing user characteristics, and it outperforms baselines on various tasks.
http://arxiv.org/abs/2406.14274v1
Compressor summary: SP-TCL is a simple and effective approach for weakly-supervised partial domain adaptation, which uses self-paced learning to discover and transfer knowledge across noisy labeled source and unlabeled target domains.
http://arxiv.org/abs/2406.14273v1
Compressor summary: The paper explores how AI might affect job satisfaction and meaning in IT work by interviewing experts who envision humans remaining dominant and AI as a complement.
http://arxiv.org/abs/2406.14272v1
Compressor summary: The authors propose a new task, dataset, and model for generating realistic 3D talking heads from speech in different languages, improving lip-sync accuracy.
http://arxiv.org/abs/2406.14267v1
Compressor summary: The paper analyzes existing evaluation frameworks for multilingual NLP models, discusses their limitations, and explores using machine translation to evaluate MLMs across a wide range of languages, showing that current approaches may overestimate performance on low-resource languages.
http://arxiv.org/abs/2406.14266v1
Compressor summary: The text describes a novel machine learning tool that helps academic educators improve their lectures by automatically analysing lecture videos and providing feedback on didactic features.
http://arxiv.org/abs/2406.14265v1
Compressor summary: VeriFlow is an architecture that uses a flow-based density model to allow verifying neural networks' safety and reliability using SMT and abstract interpretation methods with fine-grained probabilistic control.
http://arxiv.org/abs/2406.14261v1
Compressor summary: The paper proposes a self-supervised method for unsupervised video person re-identification using tracklet partitioning, clustering, and pseudo label generation, achieving state-of-the-art results.
http://arxiv.org/abs/2406.14259v1
Compressor summary: MEAT is a method to improve model robustness in adversarial training by searching for the median of historical model weights, reducing weight anomalies and overfitting.
http://arxiv.org/abs/2406.14255v1
Compressor summary: DuMapNet is an end-to-end system that creates standardized, vectorized maps of lanes in cities using a transformer-based network and contextual information from neighboring areas, reducing costs by 95%.
http://arxiv.org/abs/2406.14250v1
Compressor summary: The paper introduces E-ANT, a large Chinese GUI navigation dataset with human traces and screenshots, to help improve and evaluate multimodal language models for this task.
http://arxiv.org/abs/2406.14240v1
Compressor summary: CityNav is a new dataset for language-goal aerial navigation using real-world cities' point cloud data, which reveals the importance of human-driven strategies and 2D spatial maps for efficient city-scale navigation.
http://arxiv.org/abs/2406.14239v1
Compressor summary: Key points: - The paper proposes optimizations for YOLO-based object detection models to improve efficiency - The optimizations include efficient backbone scaling, Fast Pyramidal Architecture Network (FPAN), and Decoupled Network-in-Network (DNiN) detection head - The new model family, LeYOLO, achieves a competitive FLOP-to-accuracy ratio in various resource constraints Summary: The paper introduces LeYOLO, a new model family for efficient YOLO-based object detection, which incorporates several optimizations such as backbone scaling, FPAN, and DNiN. LeYOLO outperforms existing models in terms of FLOP and accuracy in different resource settings.
http://arxiv.org/abs/2406.14235v1
Compressor summary: The proposed method uses paired human-robot videos to adapt pre-trained models for robotic manipulation tasks, improving performance on different benchmarks.
http://arxiv.org/abs/2406.14232v1
Compressor summary: The paper proposes an adversarial training method for structural health monitoring that improves model robustness by using circle loss to keep examples away from the decision boundary.
http://arxiv.org/abs/2406.14231v1
Compressor summary: aeon is a Python library that offers various machine learning tasks for time series data using efficient algorithms and a scikit-learn compatible API.
http://arxiv.org/abs/2406.14230v1
Compressor summary: GET AETA is a new way to test how well large language models follow ethical guidelines by dynamically creating challenges based on each model's abilities.
http://arxiv.org/abs/2406.14228v1
Compressor summary: EvoAgent is a method to create diverse multi-agent systems from existing large language models by applying evolutionary algorithms, improving their problem-solving abilities.
http://arxiv.org/abs/2406.14226v1
Compressor summary: Key points: - Single-view depth estimation is an ill-posed problem with multiple depth solutions from a single image - Uncertainty in depth predictions can be harmful for applications like autonomous driving or medical robotics - Bayesian deep neural networks can quantify uncertainty, but may need synthetic data to transition to real domain - Self-supervised teacher-student architecture using illumination as a self-supervisory signal can improve depth estimation from endoscopic images Summary: The authors propose a self-supervised method for single-view depth estimation that uses illumination cues and a teacher-student architecture to account for uncertainty in depth predictions, especially for medical robotics.
http://arxiv.org/abs/2406.14220v1
Compressor summary: This study used deep learning techniques to improve land cover mapping accuracy using different types of satellite images, and found that the LinkNet model performed well with multispectral images.
http://arxiv.org/abs/2406.14219v1
Compressor summary: The paper presents AIPS, an autonomous system that generates complex inequality theorems and solves high-level mathematical problems without human guidance or large datasets, outperforming existing methods.
http://arxiv.org/abs/2406.14214v1
Compressor summary: REVEAL-IT is a novel framework for explaining the learning process of an agent in complex environments using visualizations and a GNN-based explainer.
http://arxiv.org/abs/2406.14213v1
Compressor summary: The paper proposes adding symbolic working memory to Transformers for machine translation tasks, improving prediction quality by storing relevant keywords and handling text diversity.
http://arxiv.org/abs/2406.14208v1
Compressor summary: SeCoKD is a method that uses self-knowledge distillation to improve the In-Context Learning ability of large language models with fewer demonstrations and better performance on reasoning tasks.
http://arxiv.org/abs/2406.14207v1
Compressor summary: LayerMatch is a layer-specific pseudo-label strategy that improves semi-supervised learning performance by mitigating the impact of noisy labels in the linear classification layer and accelerating clustering in the feature extraction layer.
http://arxiv.org/abs/2406.14206v1
Compressor summary: The text introduces Live Video Captioning, a new approach for generating captions in real-time for video streams, and proposes a model using deformable transformers and temporal filtering to overcome its challenges.
http://arxiv.org/abs/2406.14201v1
Compressor summary: The paper analyzes failure cases and predicts segmentation errors in computer vision tasks using uncertainty-based metrics like entropy.
http://arxiv.org/abs/2406.14197v1
Compressor summary: The text explains how chain-of-thought reasoning enhances language models' performance by extending their computational power to a level similar to probabilistic Turing machines.
http://arxiv.org/abs/2406.14194v1
Compressor summary: VLBiasBench is a comprehensive benchmark to evaluate social biases in large vision-language models using diverse images and questions.
http://arxiv.org/abs/2406.14192v1
Compressor summary: Key points: - The text proposes a universal framework for temporal reasoning tasks using LLMs - It studies 38 temporal reasoning tasks and finds that mathematical data helps but is not enough - It introduces a self-critic temporal optimization method to enhance temporal reasoning - It develops Timo, a model that achieves SOTA performance in temporal reasoning Summary: The text presents a framework for improving LLMs' temporal reasoning abilities using mathematical data and a novel optimization method. It demonstrates the framework's effectiveness with Timo, a state-of-the-art temporal reasoning model.
http://arxiv.org/abs/2406.14191v1
Compressor summary: This paper surveys temporal knowledge graph question answering (TKGQA) methods, categorizes temporal questions, and suggests future research directions.
http://arxiv.org/abs/2406.14189v1
Compressor summary: The paper introduces a new natural language generation method using tree-traversing order and compares it with diffusion models in graphic generation, while also presenting SenTree, a module for generating approximate binary trees.
http://arxiv.org/abs/2406.14183v1
Compressor summary: The text introduces a framework that uses spectral geometry principles to compare and align different neural model representations, improving interpretability and performance on various applications.
http://arxiv.org/abs/2406.14178v1
Compressor summary: The paper proposes EvSegSNN, a low-power semantic segmentation method using spiking neural networks and event cameras, achieving better performance than existing models with less parameters and computation.
http://arxiv.org/abs/2406.14177v1
Compressor summary: The paper presents SimulSeamless, a system for speech-to-text translation that combines two models and performs well in multiple languages at IWSLT 2024.
http://arxiv.org/abs/2406.14171v1
Compressor summary: The paper proposes ranking LLMs based on data compression, showing that compression length reflects the model's performance, and suggests using compression ratio as a metric for evaluating large language models.
http://arxiv.org/abs/2406.14167v1
Compressor summary: The authors propose a method that uses large language models to generate word definitions as senses, which helps detect semantic changes over time and provides interpretability.
http://arxiv.org/abs/2406.14164v1
Compressor summary: The paper proposes a new data-driven guided decoding method for generating diagnostic texts from medical images that incorporates existing tags of key conditions, and evaluates it on different systems and datasets.
http://arxiv.org/abs/2406.14161v1
Compressor summary: AMBER is an imitation learning approach that uses a graph neural network to predict the optimal mesh resolution for complex engineering systems based on expert examples.
http://arxiv.org/abs/2406.14155v1
Compressor summary: The paper proposes methods to reduce political biases in large language models and generate balanced overviews from diverse viewpoints.
http://arxiv.org/abs/2406.14150v1
Compressor summary: The authors propose a multi-modal model called IsoFormer that connects DNA, RNA, and proteins to predict differential transcript expression across human tissues.
http://arxiv.org/abs/2406.14144v1
Compressor summary: The paper investigates how to identify and analyze the neurons responsible for safety behaviors in large language models, which can help improve their alignment and reduce risks.
http://arxiv.org/abs/2406.14137v1
Compressor summary: The text discusses the problem of LVLMs generating irrelevant or biased responses, and proposes a method called MACAROON to improve their ability to ask for clarifications and generate contrastive response pairs.
http://arxiv.org/abs/2406.14132v1
Compressor summary: The CoMAN method helps Online Food Ordering Services allocate budgets more efficiently by predicting users' response to incentives and adapting to spatio-temporal preferences, leading to higher conversion rates and orders.
http://arxiv.org/abs/2406.14131v1
Compressor summary: The study proposes methods for classifying sexually explicit content using different approaches, aiming to improve automated detection of child sexual abuse materials (CSAM) and reduce human reviewers' exposure to harmful images.
http://arxiv.org/abs/2406.14130v1
Compressor summary: The paper proposes ExVideo, a post-tuning method that enhances video synthesis models to generate longer videos with less training time and without compromising quality or generalization capabilities.
http://arxiv.org/abs/2406.14129v1
Compressor summary: Event-Bench is a benchmark for evaluating video event understanding in multimodal large language models, addressing the short-cut bias issue and providing a cost-effective method to enhance video MLLMs using merged video instructions.
http://arxiv.org/abs/2406.14124v1
Compressor summary: The authors propose data pruning based on sample importance measured by information content, which improves the performance of large language models when training with limited data.
http://arxiv.org/abs/2406.14122v1
Compressor summary: EduQate is a method for personalized learning that uses a network to model interdependent content and Q-learning to optimize arm selection based on student progress.
http://arxiv.org/abs/2406.14120v1
Compressor summary: The paper introduces a new HSI classification model combining CNNs for local feature extraction and transformers for long-range context modelling, which outperforms existing methods on four datasets.
http://arxiv.org/abs/2406.14115v1
Compressor summary: The text reviews existing methods for data selection in fine-tuning Large Language Models, proposes a three-stage scheme, and compares them using a unified method to improve efficiency and feasibility.