Ward: Newsss

552

VisionDirector: Vision-Language Guided Closed-Loop Refinement for Generative Image Synthesis

arXiv:2512.19243v1 Announce Type: new
Abstract: Generative models can now produce photorealistic imagery, yet they still struggle with the long, multi-goal prompts that professional designers issue. To expose this gap and better evaluate models' performance in real-world settings, we introduce Long…

420

Socratic Students: Teaching Language Models to Learn by Asking Questions

arXiv:2512.13102v2 Announce Type: replace
Abstract: Large Language Models (LLMs) excel at static interactions, where they answer user queries by retrieving knowledge encoded in their parameters. However, in many real-world settings, such as educational tutoring or medical assistance, relevant infor…

431

Fraud Detection Through Large-Scale Graph Clustering with Heterogeneous Link Transformation

arXiv:2512.19061v1 Announce Type: new
Abstract: Collaborative fraud, where multiple fraudulent accounts coordinate to exploit online payment systems, poses significant challenges due to the formation of complex network structures. Traditional detection methods that rely solely on high-confidence id…

342

TRACE: Your Diffusion Model is Secretly an Instance Edge Detector

arXiv:2503.07982v3 Announce Type: replace
Abstract: High-quality instance and panoptic segmentation has traditionally relied on dense instance-level annotations such as masks, boxes, or points, which are costly, inconsistent, and difficult to scale. Unsupervised and weakly-supervised approaches red…

343

Transforming Data Management In EDA: Preparing For The AI Era

Ensuring data moves smoothly across multiple disciplines, tools, and globally distributed teams.
The post Transforming Data Management In EDA: Preparing For The AI Era appeared first on Semiconductor Engineering.

329

DeltaMIL: Gated Memory Integration for Efficient and Discriminative Whole Slide Image Analysis

arXiv:2512.19331v1 Announce Type: new
Abstract: Whole Slide Images (WSIs) are typically analyzed using multiple instance learning (MIL) methods. However, the scale and heterogeneity of WSIs generate highly redundant and dispersed information, making it difficult to identify and integrate discrimina…

332

Low-Rank Expert Merging for Multi-Source Domain Adaptation in Person Re-Identification

arXiv:2508.06831v2 Announce Type: replace
Abstract: Adapting person re-identification (reID) models to new target environments remains a challenging problem that is typically addressed using unsupervised domain adaptation (UDA) methods. Recent works show that when labeled data originates from sever…

333

Self-Consistent Probability Flow for High-Dimensional Fokker-Planck Equations

arXiv:2512.19196v1 Announce Type: cross
Abstract: Solving high-dimensional Fokker-Planck (FP) equations is a challenge in computational physics and stochastic dynamics, due to the curse of dimensionality (CoD) and the bottleneck of evaluating second-order diffusion terms. Existing deep learning app…

319

Explainable Graph Spectral Clustering For GloVe-like Text Embeddings

arXiv:2508.14075v2 Announce Type: replace
Abstract: In a previous paper, we proposed an introduction to the explainability of Graph Spectral Clustering results for textual documents, given that document similarity is computed as cosine similarity in term vector space.
In this paper, we generalize…

322

Trajectory Planning for UAV-Based Smart Farming Using Imitation-Based Triple Deep Q-Learning

arXiv:2512.18604v1 Announce Type: new
Abstract: Unmanned aerial vehicles (UAVs) have emerged as a promising auxiliary platform for smart agriculture, capable of simultaneously performing weed detection, recognition, and data collection from wireless sensors. However, trajectory planning for UAV-bas…

320

FPBench: A Comprehensive Benchmark of Multimodal Large Language Models for Fingerprint Analysis

arXiv:2512.18073v1 Announce Type: new
Abstract: Multimodal LLMs (MLLMs) have gained significant traction in complex data analysis, visual question answering, generation, and reasoning. Recently, they have been used for analyzing the biometric utility of iris and face images. However, their capabili…

321

No Pose at All: Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views

arXiv:2508.01171v2 Announce Type: replace
Abstract: We introduce SPFSplat, an efficient framework for 3D Gaussian splatting from sparse multi-view images, requiring no ground-truth poses during training or inference. It employs a shared feature extraction backbone, enabling simultaneous prediction …

324

Overcoming Spectral Bias via Cross-Attention

arXiv:2512.18586v1 Announce Type: new
Abstract: Spectral bias implies an imbalance in training dynamics, whereby high-frequency components may converge substantially more slowly than low-frequency ones. To alleviate this issue, we propose a cross-attention-based architecture that adaptively reweigh…

227

Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering

arXiv:2512.18551v1 Announce Type: new
Abstract: In language modeling, neologisms are new tokens trained to represent a concept not already included in a given model's vocabulary. Neologisms can be used to encourage specific behavior in models, for example by appending prompts with "Give me a neolog…

222

SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning

arXiv:2512.18068v1 Announce Type: new
Abstract: Imitation learning (IL) has shown immense promise in enabling autonomous dexterous manipulation, including learning surgical tasks. To fully unlock the potential of IL for surgery, access to clinical datasets is needed, which unfortunately lack the ki…

222

Confidence Calibration in Vision-Language-Action Models

arXiv:2507.17383v2 Announce Type: replace
Abstract: Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present a first-of-its-kind study of confidence calibration in vision-language-acti…

221

LIR$^3$AG: A Lightweight Rerank Reasoning Strategy Framework for Retrieval-Augmented Generation

arXiv:2512.18329v1 Announce Type: new
Abstract: Retrieval-Augmented Generation (RAG) effectively enhances Large Language Models (LLMs) by incorporating retrieved external knowledge into the generation process. Reasoning models improve LLM performance in multi-hop QA tasks, which require integrating…

222

Sharpness-Controlled Group Relative Policy Optimization with Token-Level Probability Shaping

arXiv:2511.00066v2 Announce Type: replace
Abstract: Reinforcement learning with verifiable rewards (RLVR) has become a practical route to improve large language model reasoning, and Group Relative Policy Optimization (GRPO) is a widely used optimizer in this setting. This paper revisits GRPO from a…

222

Exploring Zero-Shot ACSA with Unified Meaning Representation in Chain-of-Thought Prompting

arXiv:2512.19651v1 Announce Type: new
Abstract: Aspect-Category Sentiment Analysis (ACSA) provides granular insights by identifying specific themes within reviews and their associated sentiment. While supervised learning approaches dominate this field, the scarcity and high cost of annotated data f…

213

How I learned to stop worrying and love AI slop

Lately, everywhere I scroll, I keep seeing the same fish-eyed CCTV view: a grainy wide shot from the corner of a living room, a driveway at night, an empty grocery store. Then something impossible happens. JD Vance shows up at the doorstep in a crazy outfit. A car folds into itself like paper and dr…