This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-01-30 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2401.16424v1
Compressor summary: The paper discusses current computer vision methods for studying animal behavior in videos and suggests future directions to overcome practical challenges.
http://arxiv.org/abs/2401.16423v1
Compressor summary: Key points: - Objective: synchronize audio and video in real-world videos with sparse cues - Contributions: novel model, multi-modal pre-training, state-of-the-art performance, AudioSet dataset, interpretability, audio-visual synchronizability Summary: The paper presents a new audio-visual synchronization model that uses multi-modal pre-training and performs well on real-world videos with sparse cues. It also explores new aspects like interpretability and synchronizability.
http://arxiv.org/abs/2401.16422v1
Compressor summary: The paper studies how strategic users choosing among multiple online services affect convergence or oscillation in service optimization, and shows that memory-based retraining can ensure convergent behavior for some loss functions.
http://arxiv.org/abs/2401.16421v1
Compressor summary: BiPE is a new positional encoding method that combines intra- and inter-segment encodings to improve semantic information and extrapolation capabilities for language sequences.
http://arxiv.org/abs/2401.16420v1
Compressor summary: InternLM-XComposer2 is a powerful vision-language model that creates and understands text-image compositions, using Partial LoRA to balance precision and creativity.
http://arxiv.org/abs/2401.16419v1
Compressor summary: The paper presents a new SEBN model with linear constraints, Gaussian Processes, and Horseshoe prior to learn semi-parametric relationships, improve interpretability, and achieve better performance on synthetic and UCI Liver Disorders datasets.
http://arxiv.org/abs/2401.16416v1
Compressor summary: Endo-4DGS is a real-time method for dynamic scene reconstruction in robot-assisted surgery using 4D Gaussian Splatting and monocular views, which improves surgical outcomes.
http://arxiv.org/abs/2401.16412v1
Compressor summary: The study uses neural networks to test the vulnerability of different voting methods to strategic manipulation under various conditions and finds significant differences in manipulability among them.
http://arxiv.org/abs/2401.16405v1
Compressor summary: The paper presents a sparse fine-tuning method that improves performance of large language models without increasing memory requirements, making it compatible with quantization and efficient optimizers.
http://arxiv.org/abs/2401.16403v1
Compressor summary: ViLexNorm is a corpus for Vietnamese lexical normalization, which improves various NLP tasks by transforming words into their standard forms using sentences from social media.
http://arxiv.org/abs/2401.16402v1
Compressor summary: This paper surveys Visual Anomaly Detection (VAD), discussing its challenges and recent advancements in dealing with scarce, diverse, and complex data.
http://arxiv.org/abs/2401.16398v1
Compressor summary: Key points: - Behavioral cloning uses demonstrations to learn a policy - The authors propose using latent spaces of pre-trained models to find similar experiences and copy behavior - The approach is tested on MineRL-dataset in Minecraft environment - Search-based approach outperforms learning-based models in accuracy and perceptual evaluation Summary: The paper proposes a search-based method for behavioral cloning using latent spaces of pre-trained models, which achieves better results than learning-based models in Minecraft.
http://arxiv.org/abs/2401.16393v1
Compressor summary: Key points: - The Amazon is facing a severe drought affecting the Rio Negro River - Researchers used a U-net model to map water surfaces from Sentinel-1 satellite images with high accuracy - The water surface reached its lowest level in November 2023, reduced by 68.1% compared to the maximum Summary: Using deep learning and satellite data, researchers mapped the drastic decline of the Rio Negro River water surface in the Amazon drought.
http://arxiv.org/abs/2401.16386v1
Compressor summary: This paper surveys recent progress in using pre-trained models for continual learning, categorizes existing methods, and discusses fairness in comparisons.
http://arxiv.org/abs/2401.16383v1
Compressor summary: ILP uses minimal unsatisfiable subprograms (MUSPs) to efficiently and soundly prune the search space, reducing learning times by up to 99%.
http://arxiv.org/abs/2401.16380v1
Compressor summary: WRAP uses an instruction-tuned model to paraphrase documents in specific styles, speeding up pre-training and improving LLM performance on noisy data.
http://arxiv.org/abs/2401.16375v1
Compressor summary: The paper analyzes different layout generation methods, proposes a learning-based locator to detect errors in generated layouts using wireframe images, and shows improved results over existing approaches.
http://arxiv.org/abs/2401.16373v1
Compressor summary: Bayesian optimization is a useful tool for optimizing complex functions in various fields, but there are still challenges and opportunities to make it more efficient and effective.
http://arxiv.org/abs/2401.16367v1
Compressor summary: TQCompressor compresses pre-trained language models like GPT-2 by using improved tensor decompositions and a training strategy that achieves better performance than other compression methods.
http://arxiv.org/abs/2401.16355v1
Compressor summary: PathMMU is a high-quality benchmark for large multimodal models in pathology, but even advanced AI models struggle to match human expertise.
http://arxiv.org/abs/2401.16349v1
Compressor summary: Key points: - A system that matches resumes and jobs using data augmentation and contrastive learning - ConFit creates an augmented dataset by paraphrasing resume and job sections - Outperforms prior methods in ranking jobs and resumes Summary: ConFit is a system that uses data augmentation and contrastive learning to match resumes and jobs, creating an enhanced dataset and improving ranking performance.
http://arxiv.org/abs/2401.16348v1
Compressor summary: The paper compares different types of topic models in real-world content analysis and document annotation tasks, finding that some neural models perform better than classical ones despite questionable validity of automated evaluation metrics.
http://arxiv.org/abs/2401.16347v1
Compressor summary: The paper proposes two methods for cross-modal retrieval using multiple input modalities and shows their effectiveness in improving retrieval performance.
http://arxiv.org/abs/2401.16335v1
Compressor summary: RLHF improves human-centric values in language models by using Iterative Data Smoothing (IDS) to update data and model simultaneously, reducing reward model degradation.
http://arxiv.org/abs/2401.16329v1
Compressor summary: Key points: - Signature synthesis is a technique to generate artificial signatures for verification - The paper proposes a 3D framework using lognormality principle and neuromotor control processes - The paper shows the synthesis of trajectories, velocities, duplicates, and air writing and gestures - The paper demonstrates the performance and human likeness of the synthetic signatures Summary: The paper presents a 3D signature synthesis framework that generates artificial specimens for verification, based on neuromotor control processes and lognormality principle. It also shows how to synthesize trajectories, duplicates, air writing and gestures, and evaluates the quality and realism of the synthetic signatures.
http://arxiv.org/abs/2401.16327v1
Compressor summary: The authors propose a novel method to improve neural operators' generalization across multiple governing PDEs using physics-informed contrastive pretraining.
http://arxiv.org/abs/2401.16318v1
Compressor summary: The text describes a new method to extract and generalize interactions between input variables encoded by deep neural networks for the same task, which improves explainability in AI.
http://arxiv.org/abs/2401.16313v1
Compressor summary: ACES is a large challenge set that tests 50 machine translation metrics on 68 different types of errors across 146 languages, revealing their strengths and weaknesses, and suggesting improvements for metric design.
http://arxiv.org/abs/2401.16305v1
Compressor summary: MixSup is a 3D object detection method that uses cheap cluster labels for semantics and expensive box labels for poses and shapes, achieving high performance with reduced annotation effort.
http://arxiv.org/abs/2401.16304v1
Compressor summary: The paper proposes a regression-based method for visual place recognition that improves ranking accuracy and efficiency by using camera field-of-view overlap as ground truth.
http://arxiv.org/abs/2401.16299v1
Compressor summary: The paper proposes methods to improve generalization of pretrained Graph Neural Networks for molecular property prediction by jointly training them with auxiliary tasks and adapting their weights or gradients.
http://arxiv.org/abs/2401.16298v1
Compressor summary: Selective Uncertainty-based Active Learning prioritizes pixels within target areas and near decision boundaries to improve medical image segmentation, outperforming conventional uncertainty-based methods.
http://arxiv.org/abs/2401.16294v1
Compressor summary: A new method for explaining machine learning models uses convex hulls and dual representations to generate examples and calculate feature importance values, improving on LIME.
http://arxiv.org/abs/2401.16293v1
Compressor summary: The paper proposes using textual entailment to validate facts extracted from language models for knowledge base population, improving accuracy and reducing unintended or hallucinatory results.
http://arxiv.org/abs/2401.16291v1
Compressor summary: MachineLearnAthon is a new teaching concept that uses real-world ML challenges with industrial data sets to promote interdisciplinary and practical skills in students of various backgrounds.
http://arxiv.org/abs/2401.16287v1
Compressor summary: GAPS is a novel model that solves geometry math problems by generating solution programs as compositions of operators and operands, and outperforms Geoformer on calculation and proving tasks.
http://arxiv.org/abs/2401.16285v1
Compressor summary: The paper proposes a new method for detecting misleading content using linguistic features and symbolic knowledge, achieving state-of-the-art performance in various datasets.
http://arxiv.org/abs/2401.16284v1
Compressor summary: The paper proposes new strategies to improve deep learning methods for estimating the pose of an object in computer vision and robotics by addressing their limitations.
http://arxiv.org/abs/2401.16282v1
Compressor summary: The paper introduces MAPLE, a method for verifying claims using limited data and unlabelled pairwise data, which outperforms existing approaches in fact-checking tasks.
http://arxiv.org/abs/2401.16280v1
Compressor summary: Key points: - A large video understanding model is used for human fall detection on untrimmed video - A pretrained vision transformer detects three classes of actions: Fall, Lying and Other/ADL - A simple cutup method and a preprocessing pipeline are used to create labeled action clips - The method achieves state-of-the-art results on the HQFSD dataset under real-time settings Summary: The paper proposes a method that uses a video understanding model with a pretrained vision transformer to detect falls and other activities on untrimmed videos, and shows its effectiveness on a public dataset.
http://arxiv.org/abs/2401.16270v1
Compressor summary: The paper proposes using axis-aligned octagons for region based knowledge graph embeddings to overcome limitations in modeling relational composition and rules.
http://arxiv.org/abs/2401.16265v1
Compressor summary: CO2 is a new approach that allows large language models to be trained efficiently even on clusters with limited communication bandwidth by combining local updating, asynchronous communication, and advanced techniques for stability and convergence.
http://arxiv.org/abs/2401.16247v1
Compressor summary: The paper explores human-based red teaming for machine translation, assessing model performance and reliability by generating edge cases that reveal critical errors.
http://arxiv.org/abs/2401.16240v1
Compressor summary: The paper proposes a novel method for unsupervised abstractive summarization of social media user timelines for mental health monitoring, using a hierarchical variational autoencoder and a large language model.
http://arxiv.org/abs/2401.16236v1
Compressor summary: The authors propose a system that uses DRL to optimize communication between an observer and a robot controller in a 5G wireless network, improving performance on a simulated task.
http://arxiv.org/abs/2401.16235v1
Compressor summary: The paper introduces a player pressure map to visualize and evaluate the pressure experienced by soccer teams in game scenes, helping coaches improve players' performance under pressure.
http://arxiv.org/abs/2401.16232v1
Compressor summary: The paper evaluates liveness detection models for biometric security and highlights the challenges and gaps in cross-database scenarios.
http://arxiv.org/abs/2401.16224v1
Compressor summary: The paper proposes Diffutoon, a toon shading method based on diffusion models that can render photorealistic videos in anime style and edit them according to prompts.
http://arxiv.org/abs/2401.16215v1
Compressor summary: The paper proposes a method to combine small rules into large ones for inductive logic programming, using constraint solvers to improve accuracy on various domains like game playing and drug design.
http://arxiv.org/abs/2401.16209v1
Compressor summary: MultiMUC is a multilingual parallel corpus for template filling with translations into five languages and human annotations.
http://arxiv.org/abs/2401.16197v1
Compressor summary: Key points: - Geospatial data is crucial for predictive models but may perpetuate historical biases and exclusionary practices - The paper proposes a toolkit to identify and mitigate such biases, especially in ordinal regression with spatial attributes - The paper illustrates the methodology using a Parisian real estate dataset and discusses the implications of geographical aggregation levels for fairness and calibration Summary: The paper introduces a toolkit to address geospatial data biases in predictive models, especially in ordinal regression, and applies it to a Parisian real estate dataset, examining how different aggregation levels affect fairness and calibration.
http://arxiv.org/abs/2401.16193v1
Compressor summary: The text proposes a novel feature-based diversity constraint for coreset selection that considers both similarity and contribution of dimensions, improving performance and diversity in deep learning.
http://arxiv.org/abs/2401.16189v1
Compressor summary: FIMP is a novel method for autonomous driving that predicts multi-agent motions by capturing potential future interactions using feature-level decoding and affinity learning.
http://arxiv.org/abs/2401.16184v1
Compressor summary: The text introduces a new method for analyzing language models' latent space that provides absolute and model-centric insights into their semantics, improving their performance and interpretability.
http://arxiv.org/abs/2401.16182v1
Compressor summary: The French government created LLaMandement, a Large Language Model that generates neutral summaries of legislative proposals and helps process parliamentary sessions efficiently and effectively.
http://arxiv.org/abs/2401.16176v1
Compressor summary: This paper provides a comprehensive overview of structure-preserving graph transformers and categorizes their strategies based on their design objectives and goals for preserving graph structures.
http://arxiv.org/abs/2401.16164v1
Compressor summary: The paper proposes a new Hessian-free algorithm for solving constrained Bi-Level Optimization problems in machine learning, using a smooth proximal Lagrangian value function to handle constraints.
http://arxiv.org/abs/2401.16160v1
Compressor summary: The authors propose a sparse mixture of LoRA experts for instruction finetuning MLLMs to handle data conflicts from distinct domains and achieve better performance.
http://arxiv.org/abs/2401.16158v1
Compressor summary: Key points: - Mobile-Agent is a vision-centric mobile device agent that can perform complex tasks on apps. - It does not require system-specific customizations or XML files of apps. - Mobile-Eval is a benchmark for evaluating Mobile-Agent, and the results are promising. Summary: Mobile-Agent is a vision-centric mobile device agent that can navigate and operate apps without system-specific customizations, using Mobile-Eval to demonstrate its accuracy and versatility.
http://arxiv.org/abs/2401.16157v1
Compressor summary: Our approach uses a spatial-aware initialization noise to improve text-to-image generation by leveraging inverted reference images for layout control.
http://arxiv.org/abs/2401.16144v1
Compressor summary: The paper proposes a new training method for neural radiance fields (NeRFs) that improves rendering quality by dividing input views into groups based on visual similarity and training specialized models on each group, then aggregating their knowledge using distillation.
http://arxiv.org/abs/2401.16137v1
Compressor summary: X-PEFT is a novel method that uses binary masks to select adapters efficiently for multiple profiles, outperforming conventional adapter tuning with much less memory usage.
http://arxiv.org/abs/2401.16133v1
Compressor summary: The new mixed-integer programming approach improves the accuracy of optimal classification trees, especially for small and medium-sized datasets, while incorporating both linear and nonlinear metrics.
http://arxiv.org/abs/2401.16131v1
Compressor summary: CIMIL-CRC, a deep neural network framework, uses clinical information to improve the classification of colorectal cancer subtypes from whole-slide images.
http://arxiv.org/abs/2401.16124v1
Compressor summary: The paper explores how to generalize dynamic constraints in ASP to improve temporal problem solving and evaluates the effect on solver performance.
http://arxiv.org/abs/2401.16122v1
Compressor summary: DeFlow uses a GRU refinement module to transition from voxel-based to point-based features for scene flow estimation, improving performance and efficiency on large-scale point cloud data.
http://arxiv.org/abs/2401.16119v1
Compressor summary: TriDiRA is a novel approach that disentangles three types of modality-specific representations to improve multimodal learning for affective analysis tasks.
http://arxiv.org/abs/2401.16110v1
Compressor summary: The paper proposes a framework for improving vision-based roadside detection in autonomous vehicles by mitigating background overfitting and generating diverse training data using unlabeled images, leading to better performance on new scenes.
http://arxiv.org/abs/2401.16107v1
Compressor summary: The text describes an AI framework that mimics real-world medical consultations to improve automatic diagnosis using large language models without much training time.
http://arxiv.org/abs/2401.16104v1
Compressor summary: This paper introduces a three-step deep learning algorithm for detecting and analyzing defects in industrial computed tomography using sinograms instead of reconstructed images, achieving high accuracy and precision.
http://arxiv.org/abs/2401.16102v1
Compressor summary: The paper proposes a flexible deep learning model (FPNN) that effectively predicts battery life using electrochemical features extracted from video-like data, achieving high accuracy and interpretability in different tasks.
http://arxiv.org/abs/2401.16092v1
Compressor summary: The paper introduces MAGBIG, a benchmark to study gender bias in multilingual text-to-image generation models, and finds that they deviate from normative assumptions and differ across languages.
http://arxiv.org/abs/2401.16088v1
Compressor summary: The paper proposes two fairness metrics for algorithmic recourse that consider time and effort, and tests them on a simulated recourse system.
http://arxiv.org/abs/2401.16087v1
Compressor summary: The paper introduces HRIQ, a high-resolution image quality database for training blind image quality assessment models to accurately predict MOS of high-resolution images.
http://arxiv.org/abs/2401.16086v1
Compressor summary: The paper shows that non-fluent target-side synthetic training samples can improve multilingual machine translation performance across different tasks and domains.
http://arxiv.org/abs/2401.16078v1
Compressor summary: The paper investigates how word-level linguistic annotations affect under-resourced neural machine translation and finds that source-language annotations are generally helpful, while target-language annotations perform better with part-of-speech tags than morpho-syntactic description tags.
http://arxiv.org/abs/2401.16076v1
Compressor summary: The paper presents a method to predict the most engaging moments for creating trailers using both visual and dialogue information, and tests it on a new soap opera dataset.
http://arxiv.org/abs/2401.16055v1
Compressor summary: The attacker can build a local model based on the victim's outputs, and the choice of vocabulary does not significantly affect the performance, but the victim's vocabulary can be extracted from the outputs.
http://arxiv.org/abs/2401.16051v1
Compressor summary: The paper proposes dynamic prototype adaptation (DPA), a method that learns task-specific prototypes for segmenting point clouds with minimal supervision, and outperforms existing methods on two benchmarks.
http://arxiv.org/abs/2401.16045v1
Compressor summary: The paper introduces TENLPA, a novel model that uses type information in knowledge graphs to improve complex logical query answering by constructing type-based entity-relation graphs and adaptively adjusting neural link predictors.
http://arxiv.org/abs/2401.16035v1
Compressor summary: The paper proposes a second order velocity field method for kinematic surface fitting that improves symmetry detection and morphological classification in medical image analysis.
http://arxiv.org/abs/2401.16027v1
Compressor summary: The study presents a refined X23D model that generates accurate 3D spine reconstructions from few intraoperative fluoroscopic images, bridging the domain gap between synthetic and real data for improved surgical navigation in orthopedic surgeries.
http://arxiv.org/abs/2401.16025v1
Compressor summary: The paper introduces SPO, a new algorithm that improves on PPO by using a better clipping method for KL divergence, which enhances stability and performance in reinforcement learning environments.
http://arxiv.org/abs/2401.16024v1
Compressor summary: The study proposes a method to learn vector-symbolic architectures rules for Raven's progressive matrices, improving abstract reasoning in artificial intelligence and outperforming connectionist models.
http://arxiv.org/abs/2401.16012v1
Compressor summary: The paper investigates challenging metaphors for NLP models and proposes an automatic pipeline to identify them, showing significant drops in performance on downstream tasks.
http://arxiv.org/abs/2401.16011v1
Compressor summary: GPS is a new graph contrastive learning approach that uses graph pooling to generate multi-scale positive views, improving representation learning performance on graphs.
http://arxiv.org/abs/2401.15996v1
Compressor summary: AccessLens is a system that uses machine learning and 3D printing to identify and solve everyday physical interface barriers for diverse people.
http://arxiv.org/abs/2401.15989v1
Compressor summary: DECS is a deep embedding clustering algorithm without pseudo targets that uses sample stability to pull samples to their clusters and outperforms existing methods.
http://arxiv.org/abs/2401.15987v1
Compressor summary: Our proposed data-driven method refines coarse hand motion for interacting with objects in virtual reality and robotics, using a hand-centric representation and a new hierarchical architecture.
http://arxiv.org/abs/2401.15977v1
Compressor summary: Motion-I2V is a framework for generating videos from images that can handle large motion variations, support precise control over motion trajectories, and enable zero-shot video translation.
http://arxiv.org/abs/2401.15975v1
Compressor summary: StableIdentity is a method that can insert a person's identity into various contexts using just one face image, achieving high-quality human-centric generation while preserving the identity.
http://arxiv.org/abs/2401.15973v1
Compressor summary: The paper proposes OMSI, a method that adapts sample weights for each sample in a mini-batch to improve continual learning performance and accuracy.
http://arxiv.org/abs/2401.15969v1
Compressor summary: The paper studies different routers for Mixture-of-Experts models in computer vision tasks and finds that Expert Choice routers, soft MoEs, and adapting language model routers perform better.
http://arxiv.org/abs/2401.15966v1
Compressor summary: The study uses GPT-4 and OsakaED, two large language models, to generate counseling dialogues based on cognitive behavioral therapy scenarios, and finds that GPT-4 significantly improves mood, empathy, and other dialogue qualities compared to a scenario-based system.
http://arxiv.org/abs/2401.15964v1
Compressor summary: The Spatio-Temporal Attention Graph Neural Network combines graph and temporal convolutional neural networks for predicting industrial system lifespans, improving precision and explainability with multi-head attention.
http://arxiv.org/abs/2401.15952v1
Compressor summary: The paper introduces a new unsupervised domain adaptation method called class-aware optimal transport, which uses deep neural networks and higher-order moment matching to transfer knowledge from labeled source domain to unlabeled target domain.
http://arxiv.org/abs/2401.15949v1
Compressor summary: The paper proposes a new layer called EML to replace convolution layers in CNNs, reducing computation complexity and enabling parallelization, and introduces TFDMNet, a network structure that combines the advantages of both EMLs and convolution layers.
http://arxiv.org/abs/2401.15948v1
Compressor summary: The paper explores issues with conditional normalising flows and suggests using adversarial training to improve them, testing the method on synthetic and real data.
http://arxiv.org/abs/2401.15947v1
Compressor summary: The authors propose MoE-tuning, a training strategy to create sparse large vision-language models that can improve performance and reduce hallucinations with constant computational cost.
http://arxiv.org/abs/2401.15944v1
Compressor summary: The paper proposes a Domain Matching module to enhance reference-based image super-resolution models in real-world scenarios with domain gaps, such as satellite imaging, where existing models struggle.
http://arxiv.org/abs/2401.15942v1
Compressor summary: The paper proposes a novel multi-center linear classifier for image classification that samples multiple sub-centers from conditional Gaussian distributions to capture intra-class local structures more efficiently.
http://arxiv.org/abs/2401.15938v1
Compressor summary: The paper proposes a method to reduce motion-induced errors in 3D shape measurement using fringe patterns by leveraging encoder data and camera-projector geometry, requiring low cost and easy implementation.
http://arxiv.org/abs/2401.15935v1
Compressor summary: The study explores combining generative and contrastive self-supervised learning techniques for event sequence representation and shows their universal benefits across various applications.
http://arxiv.org/abs/2401.15934v1
Compressor summary: The paper introduces HICH-IT, a new multimodal dataset for hypertensive intracerebral hemorrhage that includes textual information and head CT images to improve AI accuracy in diagnosis and treatment.
http://arxiv.org/abs/2401.15927v1
Compressor summary: The text introduces E-EVAL, a benchmark for evaluating large language models in the Chinese K-12 education domain, which reveals their strengths and limitations across various subjects.
http://arxiv.org/abs/2401.15914v1
Compressor summary: The paper proposes a novel approach called OGEN that improves out-of-distribution generalization of vision-language models by using a feature generator and an adaptive self-distillation mechanism to regularize the model.
http://arxiv.org/abs/2401.15911v1
Compressor summary: The paper proposes Distribution-consistency Structural Causal Models (DiscoSCMs) to better model counterfactuals in causal frameworks, introducing a new parameter and theoretical results.
http://arxiv.org/abs/2401.15903v1
Compressor summary: The authors propose a theory for identifying latent representations in comparative deep generative models, which can improve analysis of patterns between different data sets using piece-wise affine mixing functions and constrained optimization.
http://arxiv.org/abs/2401.15902v1
Compressor summary: CENet is a simple network that fuses RGB and depth images for autonomous driving using a fast guidance module, a decoupled prediction head, and achieves competitive performance on benchmarks.
http://arxiv.org/abs/2401.15900v1
Compressor summary: Key points: - The paper presents a self-supervised method for cross-view video reconstruction using masked autoencoder (MAE) and cross-attention mechanism - The method leverages geometry information to improve robustness to viewpoint changes and temporal modeling of static regions - The method achieves state-of-the-art results on various datasets Summary: The paper proposes a self-supervised MAE framework for cross-view video reconstruction that uses geometry information and cross-attention to improve robustness and temporal modeling, and achieves state-of-the-art performance.
http://arxiv.org/abs/2401.15896v1
Compressor summary: The text introduces a large bilingual dataset and a new pretraining method to improve vision-language models' understanding of images in both Chinese and English, achieving state-of-the-art results on multimodal tasks.
http://arxiv.org/abs/2401.15894v1
Compressor summary: Cy2Mixer is a novel spatio-temporal GNN that leverages topological non-trivial invariants and gated multi-layer perceptrons to capture spatial, temporal, and topological information in graphs for traffic analysis.
http://arxiv.org/abs/2401.15893v1
Compressor summary: The paper presents a novel downscaling framework for tidal current data that uses deep learning, addresses its unique characteristics, and reduces computational cost.
http://arxiv.org/abs/2401.15886v1
Compressor summary: The paper explores how gray level texture features can be used to automatically segment and classify RNAscope transcripts in breast cancer tissue, potentially improving the speed and accuracy of diagnosis and treatment.
http://arxiv.org/abs/2401.15885v1
Compressor summary: The paper proposes solutions to improve long-tailed object detection by addressing the regression bias caused by class-specific regression heads for rare categories, achieving state-of-the-art results on LVIS dataset.
http://arxiv.org/abs/2401.15884v1
Compressor summary: The paper proposes a method called CRAG to enhance the robustness of text generation by using web searches, confidence evaluation, and selective information extraction.
http://arxiv.org/abs/2401.15879v1
Compressor summary: lil'HDoC improves sample complexity for identifying good arms with small threshold gaps compared to HDoC and other algorithms.
http://arxiv.org/abs/2401.15875v1
Compressor summary: WSTATT is a deep learning model that combines weather and satellite data to create accurate and early crop maps, accounting for physical processes of crop growth.
http://arxiv.org/abs/2401.15872v1
Compressor summary: The paper proposes a Q-network based on radial basis functions for deep reinforcement learning to solve complex inventory management problems, demonstrating better performance than simple methods and current DRL approaches.
http://arxiv.org/abs/2401.15866v1
Compressor summary: The authors explore using noisy labels to train networks for explainable machine learning tasks, resulting in faster and more cost-effective approximations compared to exact label training.
http://arxiv.org/abs/2401.15865v1
Compressor summary: The paper proposes LiDAR-PTQ, a Post-Training Quantization method for 3D lidar detection tasks that achieves state-of-the-art performance and speedup while being cost-effective.
http://arxiv.org/abs/2401.15864v1
Compressor summary: The paper proposes a spatial and temporal method to improve learned video compression by handling motion inconsistency and occlusion in local regions.
http://arxiv.org/abs/2401.15863v1
Compressor summary: The paper introduces IADD, a method that improves dataset distillation by assigning importance weights to network parameters, resulting in more robust and generalizable distilled datasets.
http://arxiv.org/abs/2401.15861v1
Compressor summary: The paper proposes DrBERT, a method that improves BERT's encoder by enhancing its decoder, leading to better natural language processing performance without increasing inference time or serving costs.
http://arxiv.org/abs/2401.15859v1
Compressor summary: DiFF is a new dataset with over 500,000 face-focused diffusion-generated images that aim to study and improve forgery detection methods.
http://arxiv.org/abs/2401.15856v1
Compressor summary: The paper introduces Noise Injection method to study how Reinforcement Learning agents generalize when state transition probabilities change slightly in similar environments.
http://arxiv.org/abs/2401.15855v1
Compressor summary: The paper introduces Cross-Scale MAE, a self-supervised model for remote sensing image analysis that uses scale augmentation and cross-scale consistency constraints to learn consistent and meaningful representations.
http://arxiv.org/abs/2401.15854v1
Compressor summary: The paper introduces a hierarchical deep learning model for the Sequential Sentence Classification task in medical abstracts, using sentence embeddings from an LSTM network and further enhancements from a C-RNN and MLP.
http://arxiv.org/abs/2401.15847v1
Compressor summary: The paper introduces Multipanel Visual Question Answering, a benchmark that challenges models in comprehending multipanel images with 6,600 questions and answers, revealing the limitations of current large vision language models.
http://arxiv.org/abs/2401.15846v1
Compressor summary: The paper proposes a meta-learning approach using recurrent neural networks and monotonic neural networks to predict human activity events from short sequences, taking into account temporal periodic patterns.
http://arxiv.org/abs/2401.15842v1
Compressor summary: The paper introduces a modular method using a large language model to improve Visual Question Answering Grounding tasks with low computational resources and various pre-trained models.
http://arxiv.org/abs/2401.15841v1
Compressor summary: Key points: - The paper proposes a novel framework for 3D object reconstruction from a single image using multi-view images. - The framework addresses inconsistent lighting, misaligned geometry, and sparse views using three techniques: intrinsic decomposition guidance, transient-mono prior guidance, and view augmentation. - The framework achieves superior performance compared to the state-of-the-art method Syncdreamer. Summary: The paper presents a new 3D reconstruction framework that uses multi-view images and tackles lighting, geometry, and view issues with three guidelines and fusion strategies, outperforming the current best method.
http://arxiv.org/abs/2401.15840v1
Compressor summary: The paper proposes a new framework to make AI more transparent and interpretable by integrating emergent communication into AI models, enabling causal understanding of outputs.
http://arxiv.org/abs/2401.15834v1
Compressor summary: This paper explores how using fewer base classes for feature extraction can improve few-shot learning and presents simple, intuitive methods that can be applied to any few-shot solution.
http://arxiv.org/abs/2401.15820v1
Compressor summary: The text proposes a novel framework for interpreting neural models in image scene classification using knowledge graph-based methods, addressing limitations in concept completeness, fusion, and manipulation.
http://arxiv.org/abs/2401.15814v1
Compressor summary: OntoMedRec is a model that uses medical ontologies to improve medication recommendation by addressing data sparsity issues in electronic health records.