This page contains one-sentence summaries of cs.AI/ML/CV/CL papers announced on 2024-02-01 generated by the compressor, my personal LLM-based project.
http://arxiv.org/abs/2401.18085v1
Compressor summary: Motion guidance is a technique that enables precise editing of image layouts, positions, poses, and shapes by using dense, complex motion fields guided by an optical flow network in the diffusion sampling process.
http://arxiv.org/abs/2401.18084v1
Compressor summary: UniTouch is a unified tactile model that enables vision-based touch sensors to learn from multiple modalities and perform various touch sensing tasks in the zero-shot setting.
http://arxiv.org/abs/2401.18083v1
Compressor summary: The paper proposes a method to improve scene landmark detection for camera localization by splitting landmarks into subgroups, using dense reconstructions, and having a compact architecture.
http://arxiv.org/abs/2401.18079v1
Compressor summary: KVQuant is a new method that accurately compresses key-value cache activations in large language models for better memory efficiency and performance.
http://arxiv.org/abs/2401.18075v1
Compressor summary: CARFF is a method that predicts future 3D scenes from past images using a probabilistic encoder and a Neural Radiance Field, handling uncertainty and dynamics for applications like autonomous driving.
http://arxiv.org/abs/2401.18070v1
Compressor summary: The study examines how well large language models can solve arithmetic word problems like children, focusing on comprehension, planning, and execution steps.
http://arxiv.org/abs/2401.18059v1
Compressor summary: The text proposes a new approach called RAPTOR that improves retrieval-augmented language models by recursively embedding, clustering, and summarizing chunks of text, leading to better integration of information across documents and complex reasoning tasks.
http://arxiv.org/abs/2401.18058v1
Compressor summary: LongAlign is a method for improving large language models' performance on long context tasks by fine-tuning them with instruction data, using packing, sorted batching, and loss weighting strategies, and introducing a new benchmark called LongBench-Chat.
http://arxiv.org/abs/2401.18057v1
Compressor summary: Rank Supervised Contrastive Learning is a new technique for time series classification that uses targeted data augmentation, selective filtering, and a novel rank loss to capture fine-grained similarity information and achieve state-of-the-art performance.
http://arxiv.org/abs/2401.18054v1
Compressor summary: The paper proposes a new benchmark for continual graph learning with spatio-temporal graphs, studies the impact of learning order and GNN architecture on performance, and reveals novel insights on class-order and task-order sensitivity.
http://arxiv.org/abs/2401.18047v1
Compressor summary: The paper proposes a hybrid model using epidemic modeling, particle swarm optimization, and deep learning to better predict multiple waves of an epidemic, and shows its effectiveness in forecasting COVID-19 cases for the USA, India, and the UK.
http://arxiv.org/abs/2401.18046v1
Compressor summary: The study examines how humans process syntactic ambiguities in sentences by comparing two hypotheses using fMRI data and finding evidence for multipath parsing in brain regions like the superior temporal gyrus.
http://arxiv.org/abs/2401.18045v1
Compressor summary: SpeechComposer is a novel speech language model that enhances performance in multiple speech tasks by composing prompt tokens, enabling knowledge sharing among tasks.
http://arxiv.org/abs/2401.18040v1
Compressor summary: The study investigates intrinsic motivation reinforcement learning algorithms for dialogue systems, achieving improved performance and domain resilience by using random network distillation and curiosity-driven exploration.
http://arxiv.org/abs/2401.18035v1
Compressor summary: Key points: - The text describes a self-supervised deep learning model for detecting folding patterns in the cerebral cortex, using MRI datasets of thousands of subjects. - The model is applied to the cingulate region and can distinguish between double-parallel pattern, which is linked to schizophrenia characteristics. - This is a novel approach that could be extended to other brain regions for biomarker detection. Summary: The authors develop and test a deep learning model that can automatically detect cortical folding patterns from MRI scans of large cohorts, and demonstrate its ability to identify a pattern associated with schizophrenia in the cingulate region.
http://arxiv.org/abs/2401.18034v1
Compressor summary: Gyan AI Paramanu is a family of novel Indic language models pretrained on a single GPU for 10 Indian languages, outperforming large LLMs while being much smaller and efficient.
http://arxiv.org/abs/2401.18032v1
Compressor summary: The DROP method decouples and combines features for occluded person re-identification and human parsing, improving performance over existing approaches.
http://arxiv.org/abs/2401.18028v1
Compressor summary: The paper explores using large language models (LLMs) to generate and compare categories of negative AI impacts from news articles, finding that fine-tuned models perform better than instruction-based ones in reflecting the taxonomy of impacts.
http://arxiv.org/abs/2401.18018v1
Compressor summary: The study explores how safety prompts affect language models' representations and proposes a method called DRO to optimize them for better LLM safety.
http://arxiv.org/abs/2401.18001v1
Compressor summary: Our summary is: The paper surveys QA systems' problems, proposes desiderata to evaluate them, and finds novel trends across multiple issues.
http://arxiv.org/abs/2401.17992v1
Compressor summary: The paper introduces MONet, a new model for image recognition that uses only multilinear operators and outperforms previous polynomial networks, achieving similar results to modern neural network architectures.
http://arxiv.org/abs/2401.17985v1
Compressor summary: This study uses deep learning models and remotely sensed imagery to map individual juniper shrubs in Sierra Nevada, Spain, and develops a new evaluation metric for complex growth patterns.
http://arxiv.org/abs/2401.17981v1
Compressor summary: The paper studies how to improve MLLMs' image understanding by infusing detection information and evaluates different models for this purpose, achieving significant improvements in multimodal dialogue.
http://arxiv.org/abs/2401.17979v1
Compressor summary: The paper explores entity linking for skill mentions in job market domain using neural models and ESCO taxonomy.
http://arxiv.org/abs/2401.17975v1
Compressor summary: The paper proposes a novel method to interpret neural networks using neuroscience and information theory tools, which can reveal the level of redundancy, smoothness, and differentiability of network codes, and explains how these properties affect learning performance and polysemantic neurons.
http://arxiv.org/abs/2401.17974v1
Compressor summary: The paper introduces GUMsley, a dataset for evaluating entity salience in different text genres, and shows that using salient entities improves summarization quality.
http://arxiv.org/abs/2401.17972v1
Compressor summary: The study introduces MelNet, a novel deep learning algorithm for object detection, and compares its performance with other models using the KITTI dataset.
http://arxiv.org/abs/2401.17948v1
Compressor summary: The paper proposes a novel Terminator architecture that uses coordinate-based implicit MLPs to generate hyper-kernels for enhancing feature extraction in self-attention mechanisms, achieving faster training convergence and better performance on image classification tasks.
http://arxiv.org/abs/2401.17922v1
Compressor summary: Coreference annotation and resolution is essential for computational literary studies, but challenging for fiction due to structured outputs, inferences, and varied language; seq2seq systems can address these issues by learning to generate marked-up copies of sentences.
http://arxiv.org/abs/2401.17919v1
Compressor summary: LOCOST is a state-space model-based encoder-decoder architecture that generates long text summaries from long context inputs with low complexity and memory efficiency, outperforming sparse transformers on full-book summarization tasks.
http://arxiv.org/abs/2401.17916v1
Compressor summary: The paper proposes a source-free object detection method for remote sensing images that uses perturbation and alignment techniques to adapt to different domains without accessing the source data.
http://arxiv.org/abs/2401.17911v1
Compressor summary: The paper proposes a new spike-based text encoding method for natural language processing tasks on spiking neural networks, which shows better performance and energy efficiency compared to traditional deep learning models.
http://arxiv.org/abs/2401.17910v1
Compressor summary: ControlCap is a multimodal embedding architecture that uses linguistic guidance to produce dense captions for images and videos, achieving state-of-the-art results.
http://arxiv.org/abs/2401.17904v1
Compressor summary: Hi-SAM is a unified model that excels in hierarchical text segmentation using SAM and offers automatic and promptable mask generation modes.
http://arxiv.org/abs/2401.17897v1
Compressor summary: Legal text entailment using ChatGPT can be improved by consolidating its provisional answers with label models, achieving a state-of-the-art accuracy of 76.15%.
http://arxiv.org/abs/2401.17895v1
Compressor summary: RAM3D is a new method that lets you replace specific objects in a 3D scene using text prompts and multi-view images, while keeping the scene consistent and realistic.
http://arxiv.org/abs/2401.17883v1
Compressor summary: The paper reviews recent advancements in video inpainting, evaluates them on visual quality and computational efficiency using human annotators and standardized hardware, and suggests future directions.
http://arxiv.org/abs/2401.17882v1
Compressor summary: The paper introduces the concept of awareness in large language models and proposes a way to measure it using a new dataset.
http://arxiv.org/abs/2401.17881v1
Compressor summary: PVLR is a framework that uses dual prompting strategies to leverage language models for multi-label image recognition, improving performance over previous methods.
http://arxiv.org/abs/2401.17879v1
Compressor summary: AEROBLADE detects deepfake images by measuring the reconstruction error of an autoencoder, without needing training or special tools.
http://arxiv.org/abs/2401.17874v1
Compressor summary: Blender-hoisynth is an interactive tool that generates realistic 3D hand-object interaction data for training supervised learning models, enabling users to annotate and control the data with virtual reality hands.
http://arxiv.org/abs/2401.17870v1
Compressor summary: The authors propose a new machine learning model that uses teleconnection information to improve subseasonal forecasting and reduce carbon emissions compared to current methods.
http://arxiv.org/abs/2401.17868v1
Compressor summary: Conv-LoRA is a method that improves image segmentation by integrating lightweight convolutional parameters into the Segment Anything Model, enhancing its performance across various domains.
http://arxiv.org/abs/2401.17865v1
Compressor summary: Key points: - Machine teaching: creating optimal dataset for student model to achieve goals given by teacher - Focus: discrete domain, manipulating student's predictions based on teacher's goals - Method: combinatorial optimization and iterative searching algorithm - Application: correcting errors or causing misclassification for personal gain - Result: superior performance over baselines Summary: The paper proposes a method to manipulate student model's predictions in the discrete domain using machine teaching, which involves optimizing data and applying an iterative searching algorithm. This can be used for error correction or malicious manipulation, and shows better results than conventional approaches.
http://arxiv.org/abs/2401.17862v1
Compressor summary: The text introduces Proximity QA, a framework to improve MLLMs' depth perception and proximity analysis of objects in images using visual instruction tuning and a new dataset called Proximity-110K.
http://arxiv.org/abs/2401.17857v1
Compressor summary: SA-GS is a novel interactive method for object segmentation in 3D Gaussian Splatting, enabling applications like VR, AR, and movie production.
http://arxiv.org/abs/2401.17858v1
Compressor summary: The project proposal investigates how Large Language Models can understand and generate gestures in different contexts by using established psycholinguistic study designs and evaluating their ability to simulate human behaviour.
http://arxiv.org/abs/2401.17851v1
Compressor summary: IGTR is a novel method that uses instruction learning to improve scene text recognition by understanding character attributes and handling various recognition tasks.
http://arxiv.org/abs/2401.17839v1
Compressor summary: The study evaluates GPT models' accuracy, stability, and biases, using a balanced dataset called 'Global-Liar,' and finds that GPT-4 has regional and temporal biases, while configuration settings affect factuality.
http://arxiv.org/abs/2401.17838v1
Compressor summary: The paper proposes a novel framework, CHGH, that uses graph learning to predict skill demand and supply variations in the labor market, considering their complex interconnections.
http://arxiv.org/abs/2401.17835v1
Compressor summary: The Parsimonious Latent Space Model (PLSM) simplifies the dynamics of high-dimensional world models to improve their generalization and performance in various tasks.
http://arxiv.org/abs/2401.17828v1
Compressor summary: SWTformer is a method that uses Swin Transformer to generate class probabilities and CAMs from image-level labels, improving semantic segmentation accuracy by combining local and global views.
http://arxiv.org/abs/2401.17827v1
Compressor summary: The study examines four methods to create Malayalam paraphrases and compares automated metrics and human evaluation, finding discrepancies that highlight the need for more nuanced evaluation for agglutinative languages.
http://arxiv.org/abs/2401.17824v1
Compressor summary: This paper reviews scientific language models (SciLMs) and compares their performance across various applications and data sets, highlighting the need for further research in this growing field.
http://arxiv.org/abs/2401.17823v1
Compressor summary: PrivPGD is a new method for generating private data synthesis from tabular datasets, using optimal transport and particle gradient descent, which improves upon existing methods and can handle domain-specific constraints.
http://arxiv.org/abs/2401.17821v1
Compressor summary: The study examines how bounding box accuracy affects human trust and performance in object detection tasks and suggests using F1 score optimization and center dots for better results.
http://arxiv.org/abs/2401.17809v1
Compressor summary: The text proposes an expandable framework for modifying subject word embeddings to edit knowledge in LLMs without damaging them or increasing inference overhead, and shows its performance on various datasets.
http://arxiv.org/abs/2401.17807v1
Compressor summary: The text provides an overview of 3D content generation methods, including representation, algorithms, datasets, and applications, to help readers understand the current state and future directions of the field.
http://arxiv.org/abs/2401.17803v1
Compressor summary: The paper proposes SimAda, a simple framework to improve the generalization of SAM across various downstream tasks by adapting its general modules without dataset-specific designs.
http://arxiv.org/abs/2401.17802v1
Compressor summary: DE-TSMCL is a novel framework that leverages contrastive learning and data augmentation to improve long sequence time series forecasting by enhancing feature representations.
http://arxiv.org/abs/2401.17797v1
Compressor summary: M2-RAAP is a multi-modal recipe for improving zero-shot video-text retrieval by addressing data quality, input type, temporal modeling, and feature enhancement issues.
http://arxiv.org/abs/2401.17791v1
Compressor summary: Eigenformer introduces a new attention mechanism for graph representation learning that utilizes the Laplacian spectrum and achieves comparable or better performance than existing methods while being faster to train.
http://arxiv.org/abs/2401.17790v1
Compressor summary: This paper proposes a method to speed up model soups by using ensemble logits instead of subset selection, and shows its effectiveness in various settings.
http://arxiv.org/abs/2401.17789v1
Compressor summary: Key points: - Neural image compression models use variational autoencoders to encode and decode images - SGA+ is a refinement method that improves performance, sensitivity to hyperparameters, and extends to three-class rounding - Refinement with the best method reduces compression error on Tecnick dataset and moves along rate-distortion curve Summary: SGA+ enhances neural image compression models by refining latent representations and achieving better compression performance.
http://arxiv.org/abs/2401.17783v1
Compressor summary: SDRDPy is an easy-to-use app that helps experts analyze and report on rules discovered by supervised data mining algorithms.
http://arxiv.org/abs/2401.17780v1
Compressor summary: The paper proposes a new primal-dual RL algorithm for online CMDPs with Uniform-PAC guarantees of convergence, sublinear regret, and polynomial sample complexity.
http://arxiv.org/abs/2401.17776v1
Compressor summary: Double InfoGAN is a GAN-based method for contrastive analysis that achieves better latent separation and image quality than existing VAE-based methods on various visual datasets, including medical images.
http://arxiv.org/abs/2401.17773v1
Compressor summary: Key points: - A framework for learning cross-modal video representations by pre-training on raw data - Proposed Shared Network Pre-training (SNP) and Significant Semantic Strengthening (S3) strategies - Achieved state-of-the-art in pixel-level video-text pre-training and balanced efficiency and performance Summary: The paper presents a framework that uses SNP and S3 to pre-train cross-modal video representations from raw data, achieving state-of-the-art results and efficiency.
http://arxiv.org/abs/2401.17766v1
Compressor summary: The paper reviews recent advances in fine-grained analysis for zero-shot learning, providing a taxonomy of methods, a benchmark of datasets and models, and discussing challenges and future directions.
http://arxiv.org/abs/2401.17759v1
Compressor summary: The text discusses a new three-level approach to using technology for rapid damage assessment of bridges and other critical infrastructure during wars and disasters, which can improve decision-making and resilience.
http://arxiv.org/abs/2401.17755v1
Compressor summary: CauESC is a novel framework that recognizes emotion causes of distress, understands verbal grooming strategies, and improves emotional support in conversations by modeling causal and interactive effects of emotions.
http://arxiv.org/abs/2401.17752v1
Compressor summary: The authors propose a method to improve the expressive power of GNNs by using exact isomorphism solver techniques and probabilistic sampling, achieving better graph representation learning with linear increase in runtime.
http://arxiv.org/abs/2401.17749v1
Compressor summary: Key points: - The paper introduces SwarmBrain, an embodied agent using LLMs for real-time strategy in StarCraft II - SwarmBrain has two components: Overmind Intelligence Matrix and Swarm ReflexNet - SwarmBrain can achieve victory against Computer players at different difficulty levels Summary: The paper presents SwarmBrain, a StarCraft II agent that uses large language models to orchestrate macro-level strategies and respond quickly to tactical situations. SwarmBrain can beat various Computer opponents.
http://arxiv.org/abs/2401.17743v1
Compressor summary: The paper proposes an algorithmic framework for robust forecast aggregation that finds the best aggregator for different information structures, and shows its effectiveness in numerical experiments.
http://arxiv.org/abs/2401.17736v1
Compressor summary: Multilabelfy is a framework that combines human and machine intelligence to validate and enhance dataset quality for multi-label classification tasks.
http://arxiv.org/abs/2401.17728v1
Compressor summary: COMET is a novel online test-time adaptation method that adapts pre-trained models to new classes without source data, using contrastive and entropy losses within a mean teacher framework.
http://arxiv.org/abs/2401.17716v1
Compressor summary: DECC is a framework that uses large language models to extract emotion-cause pairs from text by guiding them with chain-of-thought and enhancing with in-context learning.
http://arxiv.org/abs/2401.17714v1
Compressor summary: The study developed a simple and inexpensive method to monitor insects in 3D using computer vision techniques, enabling better understanding of their behavior and ecology.
http://arxiv.org/abs/2401.17710v1
Compressor summary: The paper presents a novel method for quantifying and predicting aesthetic preferences in interior design using fuzzy logic and image processing techniques.
http://arxiv.org/abs/2401.17705v1
Compressor summary: The study shows that machine learning algorithms can accurately predict suicidal behavior in young Indians based on childhood trauma and mental health data, suggesting potential for early intervention.
http://arxiv.org/abs/2401.17703v1
Compressor summary: The paper introduces Tree-of-Experts, a method to improve generating Winograd Schema Challenge questions, and WSC+, a new dataset with more categories and insights into LLMs' overconfidence and bias.
http://arxiv.org/abs/2401.17699v1
Compressor summary: The authors propose a unified dataset and a vision-language model to detect both physical and digital attacks on face recognition systems in a single framework, improving detection performance.
http://arxiv.org/abs/2401.17695v1
Compressor summary: The paper explores using unsupervised clustering methods to analyze large data cubes from physics experiments, such as X-ray fluorescence on artworks and simulated astrophysical observations.
http://arxiv.org/abs/2401.17692v1
Compressor summary: The authors propose a technique to mitigate strong priors problem in language models by generating weakened versions of instructions and extrapolating continuations from them, leading to improvements on eleven models across four tasks.
http://arxiv.org/abs/2401.17686v1
Compressor summary: The paper proposes Deductive Beam Search, which improves Large Language Models' reasoning capabilities by integrating chain-of-thought and deductive reasoning with step-wise beam search and verification.
http://arxiv.org/abs/2401.17671v1
Compressor summary: This paper investigates how high-performance large language models (LLMs) resemble the brain's language processing mechanisms and suggests that contextual information is crucial for improving both model performance and brain similarity.
http://arxiv.org/abs/2401.17664v1
Compressor summary: ImgAny is a novel multi-modal generative model that can create high-quality images from any combination of seven input modalities, mimicking human reasoning and perception.
http://arxiv.org/abs/2401.17658v1
Compressor summary: The text discusses how long-document Transformers acquire and use document structure during pre-training and inference, and evaluates the effects of structure infusion on two challenging NLP tasks.
http://arxiv.org/abs/2401.17657v1
Compressor summary: Key points: - Energy-based model for bridge-type innovation using neural network and Langevin dynamics - Bridge-type population follows Boltzmann distribution - Train energy function on structured image dataset of four types of bridges - Generate new bridge types from latent space with low energy scores Summary: The paper proposes an energy-based model for generating new bridge types by training a neural network on bridge images and using Langevin dynamics to sample from the Boltzmann distribution of bridge populations.
http://arxiv.org/abs/2401.17654v1
Compressor summary: DCTAU is a novel open-set recognition framework that models potential open space by expanding unknown classes near targeted known classes and uses a dual contrastive loss to effectively alleviate distribution disruption and imbalance issues.
http://arxiv.org/abs/2401.17653v1
Compressor summary: The text discusses advances in creating realistic synthetic health datasets to preserve characteristics and enable safe data sharing without revealing patient identity, while addressing challenges, evaluation methods, deployment examples, regulation, ethics, access, governance, and future opportunities.
http://arxiv.org/abs/2401.17642v1
Compressor summary: The paper proposes a method to improve nighttime optical flow by using a common-latent space to align features between daytime and nighttime images.
http://arxiv.org/abs/2401.17633v1
Compressor summary: The paper explores why large language models may refuse harmless queries and proposes Self-Contrastive Decoding, a technique to reduce this problem without retraining the model.
http://arxiv.org/abs/2401.17632v1
Compressor summary: This paper investigates how speech self-supervised learning (SSL) and speaker self-supervised learning (SSSL) represent speech properties and speaker information, revealing differences in their capacities and layer usage.
http://arxiv.org/abs/2401.17629v1
Compressor summary: SaFaRI is a diffusion model for image restoration that preserves data-fidelity in spatial and frequency domains, achieving state-of-the-art performance on various noisy inverse problems.
http://arxiv.org/abs/2401.17623v1
Compressor summary: This paper explores how appending new knowledge to large language models affects their existing knowledge and introduces a framework to minimize this impact.
http://arxiv.org/abs/2401.17615v1
Compressor summary: GraphMSL is a novel molecular representation learning framework that captures self-similarity and relative similarities using multimodal continuous similarity metrics, improving effectiveness in predicting molecular properties and enabling drug discovery.
http://arxiv.org/abs/2401.17612v1
Compressor summary: The paper introduces Integrative Graph Convolutional Networks (IGCN), a novel neural network approach for multi-modal data networks, which learns node embeddings from multiple topologies and fuses them using attention to improve model interpretability and performance on various node classification tasks.
http://arxiv.org/abs/2401.17609v1
Compressor summary: LaneGraph2Seq is a novel method for extracting lane graphs from images using a language model with vertex-edge encoding and connectivity enhancement, achieving better results than existing approaches.
http://arxiv.org/abs/2401.17604v1
Compressor summary: The Economical Cued Speech Fusion Transformer (EcoCued) is a new method that uses a novel Token-Importance-Aware Attention mechanism to improve automatic Cued Speech recognition by efficiently capturing cross-modal relationships between lip reading and hand cueing.
http://arxiv.org/abs/2401.17603v1
Compressor summary: Key points: - New generative model combines latent diffusion with persistent homology - Model creates 3D shapes with high diversity and topological characteristics - Shape generation involves embedding implicit representation into latent vectors and navigating through them via diffusion - Framework is flexible, supporting various input modalities and topology modifications Summary: The paper presents a new generative model that uses latent diffusion and persistent homology to create diverse 3D shapes with controlled topological features, and works with different inputs and outputs.
http://arxiv.org/abs/2401.17602v1
Compressor summary: The study proposes a novel method using Large Language Models and advanced reasoning techniques for assertion detection in clinical NLP, improving the understanding of medical conditions from unstructured texts.
http://arxiv.org/abs/2401.17600v1
Compressor summary: The paper introduces a benchmark to evaluate how well vision-language models perform on tasks involving Earth observation data, and finds that they excel at open-ended tasks but struggle with spatial reasoning.
http://arxiv.org/abs/2401.17597v1
Compressor summary: The paper proposes a method for summarizing long dialogues by using speaker information and pre-training on diverse datasets, achieving state-of-the-art performance.
http://arxiv.org/abs/2401.17592v1
Compressor summary: Local feature matching methods are categorized into detector-based and detector-free techniques, which use deep learning models to improve accuracy and robustness in computer vision applications like image retrieval, 3D reconstruction, and object recognition.
http://arxiv.org/abs/2401.17588v1
Compressor summary: The paper proposes a local and global conversation model (LGCM) that uses both local and global contexts to generate accurate responses in open domain conversations.
http://arxiv.org/abs/2401.17585v1
Compressor summary: The paper introduces ReCoE, a dataset to analyze challenges in updating interconnected facts for accurate reasoning, and finds existing knowledge editing methods perform poorly on it.
http://arxiv.org/abs/2401.17580v1
Compressor summary: CTAug is a framework that improves graph contrastive learning by preserving cohesion properties and enhancing the encoder's ability to discern subgraph patterns.
http://arxiv.org/abs/2401.17574v1
Compressor summary: The paper proposes using Hyena mechanism instead of attention heads in transformer models for more efficient pre-training of large language models.
http://arxiv.org/abs/2401.17548v1
Compressor summary: LIFT is a new method for multivariate time series forecasting that leverages local lead-lag relationships between variates to improve accuracy by 5.5%.
http://arxiv.org/abs/2401.17547v1
Compressor summary: Key points: - Recent T2I models generate high-quality images but are computationally expensive - Paper proposes a method to compress I2I models in a task-oriented way - Method reduces model size and number of timesteps - Method improves efficiency for image editing and restoration tasks Summary: The paper presents a novel compression method for I2I models that reduces their size and computational cost while maintaining near-optimal results for image editing and restoration.
http://arxiv.org/abs/2401.17544v1
Compressor summary: QFX is a trainable fixed-point quantization method that learns binary-point positions and minimizes DSP usage, achieving higher accuracy on FPGA deployment of deep learning models compared to post-training quantization.
http://arxiv.org/abs/2401.17542v1
Compressor summary: The paper introduces a benchmark for medical data-effective learning, which aims to use data efficiently and effectively to train AI models in healthcare.
http://arxiv.org/abs/2401.17541v1
Compressor summary: The study investigates approximate IRM techniques for robustness and finds Information Bottleneck-based IRM improves ECE while preserving accuracy without overfitting.
http://arxiv.org/abs/2401.17539v1
Compressor summary: Ensembles improve score-based sampling methods by using particle dynamics to approximate reverse diffusion drifts for modeling complex probability distributions without gradients.
http://arxiv.org/abs/2401.17536v1
Compressor summary: The paper proposes a method to improve question answering efficiency by finding semantically related entity nodes in knowledge graphs and pruning noisy ones using dependency distance and graph attention network.
http://arxiv.org/abs/2401.17527v1
Compressor summary: The paper proposes a novel reinforcement learning method (HYGRO) to learn optimal stopping strategies for cuts generation in mixed-integer linear programs, improving their solving efficiency.
http://arxiv.org/abs/2401.17523v1
Compressor summary: The paper proposes a novel game-theoretic approach for unlearnable example attacks on deep neural networks, called Game Unlearnable Example (GUE), which effectively degrades test accuracy by adding imperceptible perturbations to training data.
http://arxiv.org/abs/2401.17515v1
Compressor summary: The authors propose a two-stage approach to learn image grammar, which represents the semantics and order of parts in an image, to help image classifiers detect corruptions involving missing or disarrayed objects.
http://arxiv.org/abs/2401.17514v1
Compressor summary: FEUDA is a frustratingly easy UDA method that uses two instruction-tuning tasks and masked language modeling to adapt to different domains without labeled data from the target domain.
http://arxiv.org/abs/2401.17511v1
Compressor summary: The paper discusses the difficulties of explaining AI models for healthcare risks in natural language and proposes a solution for predicting IVF outcomes.