This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-07-04 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2407.03321v1
Compressor summary: The paper introduces \benchmarkName, a new benchmark to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks, addressing challenges in existing evaluation methods.
http://arxiv.org/abs/2407.03320v1
Compressor summary: The text introduces IXC-2.5, a large-vision language model with long-context capabilities and improved vision-language comprehension for various applications, achieving GPT-4V level performance with less resources.
http://arxiv.org/abs/2407.03314v1
Compressor summary: The paper introduces BACON, a graph structure that enhances vision language models' abilities in various tasks by simplifying complex visual scenes into basic elements.
http://arxiv.org/abs/2407.03310v1
Compressor summary: Turing Programs enable length generalization on various algorithmic tasks by decomposing them into steps resembling a Turing Machine's computation.
http://arxiv.org/abs/2407.03302v1
Compressor summary: This paper reviews how emergent communication research can be applied to various fields like machine learning and linguistics by studying how language-like systems arise in multi-agent reinforcement learning environments.
http://arxiv.org/abs/2407.03300v1
Compressor summary: DisCo-Diff models introduce discrete latent variables to simplify the learning process of diffusion models and improve their performance on various tasks, including image synthesis and molecular docking.
http://arxiv.org/abs/2407.03297v1
Compressor summary: The paper proposes a new way to adjust the noise in diffusion models during training to improve efficiency and performance.
http://arxiv.org/abs/2407.03292v1
Compressor summary: The paper presents a biomechanical-constrained non-rigid medical image registration algorithm using physics-informed neural networks (PINNs) that generalizes linear elasticity to nonlinear models and solves the inverse parameter estimation problem under PINNs.
http://arxiv.org/abs/2407.03291v1
Compressor summary: VCHAR is a novel framework for recognizing complex human activities in smart environments without precise labeling of atomic activities, providing video-based explanations for better user comprehension.
http://arxiv.org/abs/2407.03282v1
Compressor summary: The paper explores how Large Language Models can estimate their own hallucination risk before generating responses and analyzes the internal mechanisms involved in this process.
http://arxiv.org/abs/2407.03277v1
Compressor summary: The text describes a dataset of commercial translations from 12 directions over six years, which can be used to evaluate MT metrics based on their preference for newer translations.
http://arxiv.org/abs/2407.03268v1
Compressor summary: FRESCO is a framework to study how images on social media platforms affect society by analyzing them at three levels and using a new metric called the FRESCO score.
http://arxiv.org/abs/2407.03263v1
Compressor summary: UniSeg3D is a 3D segmentation framework that handles six tasks with one model, enhancing understanding of 3D scenes by sharing knowledge across tasks.
http://arxiv.org/abs/2407.03261v1
Compressor summary: The paper proposes neural operators for modeling magnetic hysteresis and shows they outperform traditional methods and generalize well to novel input fields.
http://arxiv.org/abs/2407.03257v1
Compressor summary: The paper proposes ModernNCA, a modified version of Neighborhood Component Analysis, which improves semantic similarity learning for tabular data and surpasses most deep tabular models in accuracy and efficiency.
http://arxiv.org/abs/2407.03253v1
Compressor summary: Key points: - The text proposes Sentence Transformers Fine-tuning (STF), a system for classifying topics from tweets using pretrained models and fine-tuning. - STF outperforms state-of-the-art approaches and does not need much labeled data. - The main contribution is the application of pretrained sentence transformers language models. Summary: The text introduces STF, a system that uses pretrained models and fine-tuning to classify tweet topics accurately, overcoming the limitations of existing methods.
http://arxiv.org/abs/2407.03251v1
Compressor summary: ACTRESS is a new approach for semi-supervised visual grounding that uses active sampling and selective retraining to improve performance with sparse labeled data.
http://arxiv.org/abs/2407.03243v1
Compressor summary: The paper proposes Attention-Driven Constraint Balancing (AttBalance), a framework that improves visual grounding tasks using transformer-based models and attention mechanisms, achieving state-of-the-art results.
http://arxiv.org/abs/2407.03240v1
Compressor summary: The paper proposes a cyclic learning mechanism for multi-view 3D detection and tracking tasks that suppresses irrelevant regions in historical frames and improves object awareness, resulting in consistent performance gains over baselines.
http://arxiv.org/abs/2407.03236v1
Compressor summary: The paper presents a new approach to train ATD models, which improve Arabic text comprehension and processing by using finetuned transformers and the Noisy-Student method, achieving state-of-the-art results.
http://arxiv.org/abs/2407.03234v1
Compressor summary: The study shows that adding a space to LLMs' inputs can cause them to generate unsafe or biased outputs, highlighting the need for better alignment methods.
http://arxiv.org/abs/2407.03232v1
Compressor summary: A study shows that adding a space to input can trick LLMs into generating unsafe outputs, highlighting the need for better model alignment.
http://arxiv.org/abs/2407.03227v1
Compressor summary: The text proposes a method to improve Text-to-SQL semantic parsing using Large Language Models and in-context learning with few-shot examples and approximate SQL query generation.
http://arxiv.org/abs/2407.03217v1
Compressor summary: MHNet is a novel deep learning model that uses hierarchical and high-order features from multi-view brain functional networks derived from rs-fMRI data for neurodevelopmental disorder prediction, outperforming state-of-the-art methods.
http://arxiv.org/abs/2407.03216v1
Compressor summary: The paper proposes a novel object-centric model that learns disentangled representations and discovers blocks to predict dynamic visual states, achieving better accuracy and interpretability in various settings.
http://arxiv.org/abs/2407.03211v1
Compressor summary: Quantization affects multilingual LLMs differently and negatively impacts non-Latin script languages and challenging tasks.
http://arxiv.org/abs/2407.03210v1
Compressor summary: The text discusses using adversarial explanations and autoencoders to improve AI decision-making, robustness, and human oversight in real-world applications.
http://arxiv.org/abs/2407.03205v1
Compressor summary: The text introduces a method to detect oriented objects in aerial images using a complex plane OBB representation, a conformer RPN head, and a category-aware dynamic label assignment.
http://arxiv.org/abs/2407.03204v1
Compressor summary: EVA is a method to create more realistic and expressive digital human avatars from monocular video by combining a sculpted 3D model with SMPL-X and improving alignment, density control, and confidence prediction.
http://arxiv.org/abs/2407.03200v1
Compressor summary: The paper proposes SegVG, a method that uses segmentation signals from box annotations for Visual Grounding and mitigates domain discrepancy with Triple Alignment module, achieving SOTA performance.
http://arxiv.org/abs/2407.03197v1
Compressor summary: The paper proposes a dynamic feature aggregation module for temporal action detection models that adapts kernel weights and receptive fields at different timestamps, improving performance on various benchmarks.
http://arxiv.org/abs/2407.03194v1
Compressor summary: The paper proves that ensembles have prediction instabilities and suggests balancing information use with risk management.
http://arxiv.org/abs/2407.03185v1
Compressor summary: The authors propose a transformer-based time series forecasting method that uses multiple resolutions, cross-series information, and novel modules to improve performance on a real-world pricing problem.
http://arxiv.org/abs/2407.03181v1
Compressor summary: The Divergent CoT (DCoT) method improves the performance of large language models by requiring them to generate multiple divergent reasoning chains in a single inference step, enabling self-correction.
http://arxiv.org/abs/2407.03179v1
Compressor summary: The paper proposes a modified Sigmoid function with learnable parameters as an attention mechanism to highlight salient motion features in videos for action recognition tasks.
http://arxiv.org/abs/2407.03172v1
Compressor summary: The paper presents an advanced ensemble technique for 3D image reconstruction from 2D images, developed by the authors who participated in Kaggle's Image Matching Challenge and conducted a review of top-performing methods.
http://arxiv.org/abs/2407.03169v1
Compressor summary: The paper proposes a decoder-only architecture for speech-to-text translation using large language models and shows its effectiveness on two benchmarks, while analyzing the impact of different fine-tuning techniques and task formulation.
http://arxiv.org/abs/2407.03168v1
Compressor summary: LivePortrait is a video-driven portrait animation framework that uses implicit keypoints to create lifelike videos from single images, with improved efficiency and controllability compared to diffusion-based methods.
http://arxiv.org/abs/2407.03165v1
Compressor summary: The paper presents a method to generate consistent normals for point clouds using a boundary energy derived from the Dirichlet energy of the generalized winding number field, which improves robustness to noise and complex structures.
http://arxiv.org/abs/2407.03163v1
Compressor summary: The paper proposes a YOLOv8 model with a Global Context block that improves fracture detection and reaches state-of-the-art performance on a wrist X-ray dataset.
http://arxiv.org/abs/2407.03157v1
Compressor summary: The paper proposes Positional Integrity Encoding (PIE) for efficient and accurate code prediction in real-time editing scenarios, reducing computational overhead by over 85%.
http://arxiv.org/abs/2407.03154v1
Compressor summary: The authors propose using protein language models as a reward function to generate new protein sequences with reinforcement learning, while periodically finetuning a proxy model.
http://arxiv.org/abs/2407.03152v1
Compressor summary: Stereo Risk is a new deep-learning method for stereo matching that uses continuous risk minimization to estimate scene depth better than existing methods.
http://arxiv.org/abs/2407.03145v1
Compressor summary: The paper proposes a two-phase training method for language translation using large pre-trained models, which improves accuracy for aligned source and target sentence orders and spoken language, especially with added tags and interleaved sentences.
http://arxiv.org/abs/2407.03140v1
Compressor summary: The authors propose machine learning models for target detection and uncertainty estimation in RDM images to improve GMTI radar tracking performance.
http://arxiv.org/abs/2407.03130v1
Compressor summary: ADClick is a novel interactive image segmentation algorithm that generates anomaly masks for real defective images with high accuracy, using only a few manual clicks per image and innovative residual features and language prompts.
http://arxiv.org/abs/2407.03129v1
Compressor summary: The paper explores how different prompts affect large language models' social biases and task performance, showing that they are highly sensitive to prompts and have tradeoffs between them.
http://arxiv.org/abs/2407.03125v1
Compressor summary: The article summarizes the theoretical foundations and recent advances in graph learning models, focusing on their expressiveness, generalization, optimization, and unique phenomena.
http://arxiv.org/abs/2407.03118v1
Compressor summary: The authors test an algorithm for personalized learning paths using convolutional neural networks on a large digital self-learning platform, and find that it does not significantly improve learners' effort or performance compared to group-based or individual non-adaptive treatments.
http://arxiv.org/abs/2407.03115v1
Compressor summary: The paper proposes a new adversarial attack method that reduces $L_0$-norm distortion while maintaining low $L_2$-norm loss, making the perturbations sparse and imperceptible to humans.
http://arxiv.org/abs/2407.03108v1
Compressor summary: This paper evaluates the reliability and stability of various explainable AI (XAI) methods using a diabetes dataset and four machine learning models, finding eXirt as the most reliable XAI method and all others sensitive to perturbations except one.
http://arxiv.org/abs/2407.03106v1
Compressor summary: The paper proposes a new loss function called Anti-Collapse Loss for deep metric learning that improves feature representation, avoids embedding space collapse, and enhances model performance.
http://arxiv.org/abs/2407.03105v1
Compressor summary: GFlowNets are a learning paradigm that samples from an unnormalized probability distribution and can learn complex patterns, but the paper investigates how they generalize to novel, longer trajectories.
http://arxiv.org/abs/2407.03104v1
Compressor summary: KeyVideoLLM is a method to efficiently select keyframes from videos for large language models, improving data management, speed, and video question-answering performance.
http://arxiv.org/abs/2407.03103v1
Compressor summary: Cactus is a realistic dialogue dataset for training open-source language models as psychological counselors using Cognitive Behavioral Therapy techniques.
http://arxiv.org/abs/2407.03094v1
Compressor summary: The paper proposes a new method for predicting causal effects of continuous treatments using conformal prediction, accounting for uncertainty in propensity score estimation, and demonstrates its effectiveness on synthetic and real datasets.
http://arxiv.org/abs/2407.03082v1
Compressor summary: The paper proposes a new framework (SBRL-HAP) for estimating treatment effects that works well both on in-distribution and out-of-distribution data, addressing selection bias and distribution shift issues.
http://arxiv.org/abs/2407.03080v1
Compressor summary: The paper proposes a novel way to generate realistic synthetic tabular data using Deep Generative Models with artificial inductive bias from transfer learning and meta-learning techniques, improving the quality of the synthetic data in limited real-data environments.
http://arxiv.org/abs/2407.03076v1
Compressor summary: The paper investigates multi-task learning for document-level neural machine translation to make the model sensitive to the choice of context and shows better performance in low-resource settings, but struggles with generating source from context.
http://arxiv.org/abs/2407.03065v1
Compressor summary: The paper proposes a policy optimization algorithm that eliminates a warm-up phase and achieves rate-optimal regret in different reinforcement learning settings.
http://arxiv.org/abs/2407.03061v1
Compressor summary: ALTER is a framework that enhances large language models' table-based reasoning by augmenting NL questions and tables with relevant information.
http://arxiv.org/abs/2407.03059v1
Compressor summary: The text introduces a fairness-aware dataset for job recommendation in advertising, which preserves predictive power and addresses the challenge of balancing fairness and utility in high-impact domains.
http://arxiv.org/abs/2407.03056v1
Compressor summary: Key points: - KDPL is a novel approach to prompt learning based on unsupervised knowledge distillation from more powerful models - It eliminates the need for labeled examples during adaptation and improves generalization of learned prompts - It can transfer knowledge even without knowing training class names Summary: KDPL is an unsupervised method that learns to adapt prompts from stronger models, enhancing zero-shot generalization and transferability without labels.
http://arxiv.org/abs/2407.03051v1
Compressor summary: The paper proposes QDPO, a new technique that aligns quantized large language models with their full-precision versions, improving chatbot performance and efficiency.
http://arxiv.org/abs/2407.03049v1
Compressor summary: The paper proposes eight enhancements for Monte-Carlo Tree Search (MCTS) in General Video Game Playing (GVGP), which improve win percentages and approach competitive levels with existing agents.
http://arxiv.org/abs/2407.03043v1
Compressor summary: The paper proposes SlerpFace, a novel face template protection technique that rotates and drops out features of face templates to prevent identity-preserving synthetic face image attacks using diffusion models.
http://arxiv.org/abs/2407.03040v1
Compressor summary: The R2S framework uses CoD-Chain of Dialogue logic to guide LLMs in generating knowledge-intensive multi-turn dialogues for instruction tuning, covering diverse domains like Wikipedia (English), Science (Chinese), and Artifacts (Chinese).
http://arxiv.org/abs/2407.03036v1
Compressor summary: SAFT is a simple method that improves CLIP's performance on out-of-distribution data by only updating important parameters during fine-tuning.
http://arxiv.org/abs/2407.03033v1
Compressor summary: The text proposes a new method called ISWSST that uses quantum mechanics ideas to improve semantic segmentation of multispectral imagery, addressing several issues in the current approaches and achieving better accuracy.
http://arxiv.org/abs/2407.03032v1
Compressor summary: The paper presents experimental results on Arabic readability assessment using various methods, achieving good scores by combining different techniques.
http://arxiv.org/abs/2407.03020v1
Compressor summary: The paper introduces the task of CODAfication, which normalizes Dialectal Arabic into a standardized written form, and presents new models and methods to improve its performance.
http://arxiv.org/abs/2407.03018v1
Compressor summary: The paper introduces GeCA, a novel generative model inspired by biological evolution that improves retinal disease classification in Fundus and OCT images.
http://arxiv.org/abs/2407.03010v1
Compressor summary: The paper presents CAVIS, a framework that uses contextual information to improve object tracking and instance matching in video segmentation tasks, achieving state-of-the-art results, especially on difficult videos.
http://arxiv.org/abs/2407.03009v1
Compressor summary: The text discusses how heatmaps from image classification networks can be used for weakly supervised segmentation and improved interpretability, and proposes a novel semi-supervised segmentation method using differentiable heatmap architectures.
http://arxiv.org/abs/2407.03008v1
Compressor summary: The paper proposes a model-agnostic framework for Video Question-Answering that enhances compositional reasoning by integrating video aligner and answer aggregator modules, and evaluates it on various datasets using new metrics and an automatic question decomposition pipeline.
http://arxiv.org/abs/2407.03007v1
Compressor summary: This paper investigates how different factors affect the performance of tool learning methods in large language models and offers insights for improving their practical use.
http://arxiv.org/abs/2407.03006v1
Compressor summary: The paper presents FCDiffusion, a diffusion-based framework for text-guided image-to-image translation using Discrete Cosine Transform to filter latent features and control different aspects of the translation.
http://arxiv.org/abs/2407.03005v1
Compressor summary: The study examines how Wav2Vec2, a deep neural speech model, processes and resolves phonotactic constraints in ambiguous sounds and finds that this ability emerges early in the model's Transformer module.
http://arxiv.org/abs/2407.03004v1
Compressor summary: The study semioLLM evaluates how well large language models can diagnose epilepsy using text descriptions of seizures, revealing both their strengths and limitations for clinical applications.
http://arxiv.org/abs/2407.03000v1
Compressor summary: VIVA is a benchmark that tests vision-language models' ability to use human values to make decisions in real-world situations, revealing their limitations and potential improvements.
http://arxiv.org/abs/2407.02996v1
Compressor summary: The study examines value consistency of large language models across various scenarios and topics, finding them relatively consistent except on controversial topics.
http://arxiv.org/abs/2407.02990v1
Compressor summary: The paper proposes a new efficient method for 3D Human Pose Estimation using a Graph and Skipped Transformer architecture that exploits spatio-temporal information and achieves superior performance with reduced computational complexity.
http://arxiv.org/abs/2407.02988v1
Compressor summary: The paper reviews YOLO object detection algorithms, focusing on their improvements and suitability for edge deployment.
http://arxiv.org/abs/2407.02987v1
Compressor summary: LoRA-Guard is a method to adapt guardrails for content moderation on resource-constrained devices like mobile phones by sharing knowledge between LLMs and guardrail models.
http://arxiv.org/abs/2407.02984v1
Compressor summary: The paper proposes using Genetic Programming to generate datasets with semantic variability for interpreting black box deep learning models in gene regulation, achieving good diversity and outperforming a random baseline.
http://arxiv.org/abs/2407.02978v1
Compressor summary: This paper presents a model to classify AI-generated or human text and evaluates its effectiveness, addressing concerns about machine-generated text misuse in various contexts.
http://arxiv.org/abs/2407.02977v1
Compressor summary: The study tests how well LLMs like GPT-4 and Mistral can evaluate scientific summaries by comparing their judgments to human annotators, finding weak correlation between them.
http://arxiv.org/abs/2407.02968v1
Compressor summary: This paper proposes and tests lightweight multi-class anomaly detection models for visual inspection systems, showing that they can be deployed on edge devices with low latency and memory requirements.
http://arxiv.org/abs/2407.02964v1
Compressor summary: The proposed Finite State Machine prompting method enhances large language models' reasoning capabilities for complex tasks by iteratively decomposing questions into sub-questions and self-correcting, improving accuracy and trustworthiness.
http://arxiv.org/abs/2407.02961v1
Compressor summary: The paper proposes a fast and interpretable method (FKEA) to evaluate the diversity of generated data using random Fourier features and kernel approximation, which can handle large-scale generative models.
http://arxiv.org/abs/2407.02946v1
Compressor summary: The text describes a novel 3D image registration method that uses depth information from a time-of-flight camera to accurately align images from different cameras, improving the assessment of plant phenotypes.
http://arxiv.org/abs/2407.02945v1
Compressor summary: The paper introduces a method for extrapolated view synthesis in urban scenes using LiDAR and prior knowledge, improving rendering quality for views outside the training camera distribution.
http://arxiv.org/abs/2407.02937v1
Compressor summary: The study applies a multilingual anonymization system to nine languages, achieving robust results with varying quality of speech synthesis components.
http://arxiv.org/abs/2407.02936v1
Compressor summary: GraCoRe is a benchmark for evaluating large language models' graph comprehension and reasoning abilities across various types of graphs and tasks, revealing insights into their performance and limitations.
http://arxiv.org/abs/2407.02934v1
Compressor summary: PosMLP-Video is a lightweight MLP-like backbone for video recognition that uses relative positional encoding and spatio-temporal factorized positional MLP blocks to achieve competitive performance on image understanding tasks.
http://arxiv.org/abs/2407.02920v1
Compressor summary: EgoFlowNet is a point-level scene flow estimation network that predicts a binary mask and uses all input points for ego-motion and scene flow estimation, improving performance over existing methods on realistic KITTI scenes.
http://arxiv.org/abs/2407.02918v1
Compressor summary: The paper proposes a new method for real-time 3D reconstruction of surgical scenes without using Structure-from-Motion, by leveraging optical flow priors and scene consistency checks.
http://arxiv.org/abs/2407.02917v1
Compressor summary: The paper explores various aspects of negotiation-based conversations using a preliminary version of the Talkamatic Dialogue Manager.
http://arxiv.org/abs/2407.02914v1
Compressor summary: The paper analyzes how ensemble learning in machine learning affects accuracy and energy consumption, and suggests designing small ensembles with subset-based training, majority voting, and energy-efficient algorithms for a green AI approach.
http://arxiv.org/abs/2407.02913v1
Compressor summary: SFC is a new algorithm that improves quantized convolution efficiency by extending DFT with symbolic computing and introducing correction terms, achieving 3.68x reduction in multiplication and maintaining accuracy.
http://arxiv.org/abs/2407.02910v1
Compressor summary: Key points: - Industrial quality inspection faces challenges in detecting anomalies with sparse data and unseen objects. - Hybrid task of domain generalization on sparse classes introduced. - Three new datasets based on MVTec AD modified and presented. - Embedding-based approaches (SEMLP and Labeled PatchCore) designed and tested. - SEMLP achieves best performance with 87.2% AUROC on average. Summary: The paper introduces a new hybrid task and datasets for anomaly detection in industrial quality inspection, and proposes embedding-based methods that outperform existing approaches with 87.2% accuracy.
http://arxiv.org/abs/2407.02906v1
Compressor summary: RS-Diffusion is a new method to correct Rolling Shutter artifacts in single images using diffusion techniques and patch-attention, and introduces a new dataset with ground-truth data.
http://arxiv.org/abs/2407.02894v1
Compressor summary: Translatotron-V is an end-to-end image translation model that uses target text decoding and visual tokenization to reduce the language alignment burden and improve performance while preserving visual features.
http://arxiv.org/abs/2407.02893v1
Compressor summary: The paper proposes a novel method called UGTST that selectively annotates a few target domain samples to improve prostate segmentation in cross-center medical images using deep learning models.
http://arxiv.org/abs/2407.02891v1
Compressor summary: The paper presents a novel post-training quantization method, GPTQT, that reduces memory usage and speeds up processing in large language models by using progressive two-step linear and binary coding.
http://arxiv.org/abs/2407.02887v1
Compressor summary: EGIInet is a novel framework that efficiently combines 2D and 3D information for point cloud completion using a unified encoding process and an explicitly guided information interaction strategy, achieving state-of-the-art results with fewer parameters.
http://arxiv.org/abs/2407.02881v1
Compressor summary: ShiftAddAug improves the accuracy of neural networks using multiplication-free operators by augmenting them with costly multiplication and a novel weight sharing method, achieving significant gains in image classification and semantic segmentation tasks.
http://arxiv.org/abs/2407.02880v1
Compressor summary: aTLAS algorithm combines pre-trained model components to enhance knowledge composition and transfer using linear combinations of parameter blocks with different learned coefficients.
http://arxiv.org/abs/2407.02870v1
Compressor summary: The authors propose new features for assessing privacy risks in time-series models using seasonality and trend components estimated from health data.
http://arxiv.org/abs/2407.02863v1
Compressor summary: The text describes a data-driven approach to model realistic and diverse behaviors of road users in multi-agent simulations, using clustering methods on raw data from different environments.
http://arxiv.org/abs/2407.02861v1
Compressor summary: The proposed method uses Physics-Informed Real NVP neural networks with self-supervised training to enhance fault detection in satellite multivariate time series, showing significant performance improvements and potential for other applications.
http://arxiv.org/abs/2407.02856v1
Compressor summary: Machine learning models perform worse on incomplete network data and need at least 7 packets in the test set for reliable anomaly detection.
http://arxiv.org/abs/2407.02854v1
Compressor summary: UniGloR is a self-supervised method for sign language translation and production that works without gloss annotations and achieves good results on multiple tasks.
http://arxiv.org/abs/2407.02853v1
Compressor summary: Plant Doctor is an AI system that uses video footage to diagnose and track leaf damage in urban street plants, helping control disease spread in cities.
http://arxiv.org/abs/2407.02846v1
Compressor summary: The text proposes a novel method called DA4LG for language grounding with 3D objects, which uses multi-task learning to align vision and language across domains.
http://arxiv.org/abs/2407.02842v1
Compressor summary: MindBench is a new benchmark for structured document analysis that includes various tasks and challenges to improve current models' performance.
http://arxiv.org/abs/2407.02837v1
Compressor summary: The paper proposes two methods to protect personal data in texts: a feature-based one and a context-aware one using Multilingual-BERT, which performs better and considers semantic relationships.
http://arxiv.org/abs/2407.02835v1
Compressor summary: The paper proposes a new unsupervised domain adaptation method for object detection that uses a pairwise attentive adversarial network with a Domain Mixup module to align features from different domains and improve adaptation.
http://arxiv.org/abs/2407.02834v1
Compressor summary: The text discusses how Aspect-based Sentiment Analysis, a method that analyzes specific aspects of customer feedback, is important for businesses to understand market trends and improve their competitive edge.
http://arxiv.org/abs/2407.02832v1
Compressor summary: Key points: - UAV-view geo-localization is to match drone images with satellite images for localization - The paper proposes a style alignment method and a dynamic observation module - The method transforms visual style, reduces noise, and uses deconstruction loss - The method achieves state-of-the-art performance on benchmarked datasets Summary: The paper presents a novel method for UAV-view geo-localization that aligns visual styles, controls noise, and uses deconstruction loss to outperform previous methods.
http://arxiv.org/abs/2407.02830v1
Compressor summary: This paper presents an algorithm that removes reflection noise from TLS point clouds, improving 3D vision tasks in urban environments with reflective surfaces.
http://arxiv.org/abs/2407.02827v1
Compressor summary: The paper analyzes implicit gradient descent for training two-layer physics-informed neural networks and shows it converges faster and more reliably than common gradient descent.
http://arxiv.org/abs/2407.02825v1
Compressor summary: This paper introduces a new method for finding representation learning functions using CGANs for causal inference when two distributions are balanced.
http://arxiv.org/abs/2407.02821v1
Compressor summary: The concatenation pre-processing algorithm improves data quality, process model fit, and critical health outcome predictions by reducing dataset complexities in healthcare datasets.
http://arxiv.org/abs/2407.02820v1
Compressor summary: The paper studies how sense-aware contextual word embeddings encode semantic changes in different contexts and dimensions, using fine-tuned language models and various analyses.
http://arxiv.org/abs/2407.02819v1
Compressor summary: The paper proposes a faster way to train language models by pre-aggregating the corpus with a collapsed n-gram distribution, which improves model quality and convergence rate.
http://arxiv.org/abs/2407.02814v1
Compressor summary: The paper proposes a framework that uses causal mediation analysis to understand and reduce biases in vision-language models by focusing on image features, which have the largest impact on bias.
http://arxiv.org/abs/2407.02813v1
Compressor summary: The text proposes a method called Dy-DCA that uses a dynamic deep neural network and content-aware data processing to improve video quality and efficiency, reducing model number and optimizing compilation for real-time performance on mobile devices.
http://arxiv.org/abs/2407.02811v1
Compressor summary: The paper introduces extit{SPLITZ}, a novel method that splits classifiers into two parts, constrains the Lipschitz constant of one part and smooths the other, to improve robustness against adversarial examples.
http://arxiv.org/abs/2407.02794v1
Compressor summary: The authors propose a new method to decompose grayscale images into three parts: structural, smooth, and oscillatory, using regularization terms and an efficient algorithm.
http://arxiv.org/abs/2407.02783v1
Compressor summary: The report explores the potential of very large language models (>50B parameters) through supervised fine-tuning and progressive growth experiments, and shares an open-source 1T model checkpoint.
http://arxiv.org/abs/2407.02779v1
Compressor summary: MED is a framework for training one KGE model that can serve multiple scenarios with different dimensional requirements by cropping sub-models without additional training, using mutual learning, evolutionary improvement, and dynamic loss weighting to enhance performance.
http://arxiv.org/abs/2407.02778v1
Compressor summary: Our proposed method, SED, handles label noise in a self-adaptive and class-balanced way by using a novel sample selection strategy, mean-teacher model, sample re-weighting mechanism, and consistency regularization.
http://arxiv.org/abs/2407.02776v1
Compressor summary: The text introduces a framework for building and simulating quantum finite-state automata (QFAs) using predefined construction methods and improving accuracy on noisy quantum computers.
http://arxiv.org/abs/2407.02775v1
Compressor summary: MLKD-BERT is a novel method that improves knowledge distillation by exploring multi-level knowledge and adjusting student attention heads, leading to better performance and faster inference on BERT models.
http://arxiv.org/abs/2407.02772v1
Compressor summary: The generalized Newton's method (GeN) is a Hessian-informed optimizer that automatically selects the learning rate for faster convergence without tuning, and performs well on language and vision tasks.