Skip to content
Mobrief
Mobrief

The Brief

Product 14h ago

Quantization-Aware Training in TorchAO (II)Quantization-Aware Training in TorchAO (II)

In PyTorch Blog's previous Quantization-Aware Training (QAT) blog , PyTorch Blog introduced the initial QAT flow in TorchAO for large language models targeting edge devices with ExecuTorch .

Since then, PyTorch Blog extended this flow to also target fast CUDA kernels like the ones in MSLK for fast inference in vLLM , and incorporated this flow into popular fine-tuning frameworks like Unsloth and Axolotl .…

Why it matters

PyTorch Blog also explored more advanced QAT techniques like PARQ for lower bit quantization (prototype): Unsloth integration : Recover…

PyTorch Blog
Business 10h ago

Nvidia Takes on Telco Industry With Open Source ModelNvidia Takes on Telco Industry With Open Source Model

While Nvidia's approach focuses on enabling more autonomous workflows for telco companies, it faces competition from traditional network vendors such as Ericsson and Nokia.

AI Business
Business yesterday

Amazon Spends Another $21B to Beef up Spain's AI InfrastructureAmazon Spends Another $21B to Beef up Spain's AI Infrastructure

The latest round of funding signifies another escalation in Amazon's commitment to the country.

AI Business
Product 13h ago

LangSmith CLI & SkillsLangSmith CLI & Skills

LangChain Blog’re releasing a CLI along with LangChain Blog's first set of skills to give AI coding agents expertise in the LangSmith ecosystem.

This includes adding tracing to agents, understanding their execution, building test sets, and evaluating performance. On LangChain Blog's eval set, this bumps Claude Code’s performance on these tasks from 17% to 92%.…

LangChain Blog
Research 12h ago

Google faces wrongful death suit after Gemini allegedly convinced a man to die and…Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital

According to a lawsuit filed in a US federal court in Northern California on Wednesday, Google's chatbot Gemini allegedly drove 36-year-old Jonathan Gavalas from Florida to suicide.

The article Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital appeared first on The Decoder .

The Decoder
Business 10h ago

Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x…Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x more efficient

To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the semantic understanding…

But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit. Today, German AI startup Black Forest Labs (maker of the…

VentureBeat
ALL STORIES

60 stories from 88 sources

Research just now

Chatbot Arena Elo Rankings — Top 20 ModelsChatbot Arena Elo Rankings — Top 20 Models

LMArena Elo Rankings — Chatbot Arena Elo Rankings — Top 20 Models. Compare and track AI model performance.

LMArena Elo Rankings
Labs 1h ago

User may experience errors in ChatGPTUser may experience errors in ChatGPT

Status: Resolved All impacted services have now fully recovered.

Affected components Conversations (Operational)

OpenAI Status
Product 2h ago

Revert "[BE] Apply up007 and up045 to .ci through tools "Revert "[BE] Apply up007 and up045 to .ci through tools "

This reverts commit f1da356 .

Reverted #176458 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ( comment )

PyTorch Releases
Product 4h ago

[user-streams] Add stream support to inductor wrapper codegen[user-streams] Add stream support to inductor wrapper codegen

PyTorch Releases update to user-streams: Add stream support to inductor wrapper codegen.

PyTorch Releases
Product 5h ago

[MPS] Fix masked_scatter to preserve scalar tensor shape[MPS] Fix masked_scatter to preserve scalar tensor shape

PyTorch Releases update to MPS: Fix masked_scatter to preserve scalar tensor shape.

PyTorch Releases
Labs 5h ago

API Error RatesAPI Error Rates

Status: Resolved All impacted services have now fully recovered.

Affected components Realtime (Operational) Files (Operational) Embeddings (Operational) Responses (Operational) Codex (Operational) Images (Operational) Batch (Operational) Chat Completions (Operational) Audio…

OpenAI Status
Press 8h ago

Grammarly Is Offering ‘Expert’ AI Reviews From Your Favorite Authors—Dead or AliveGrammarly Is Offering ‘Expert’ AI Reviews From Your Favorite Authors—Dead or Alive

The tool, offered by the recently-rebranded company Superhuman, gives feedback based on the work of famous dead and living writers—without their permission.

Wired AI
Press 9h ago

What AI Models for War Actually Look LikeWhat AI Models for War Actually Look Like

While companies like Anthropic debate limits on military uses of AI, Smack Technologies is training models to plan battlefield operations.

Wired AI
Product 9h ago

Embed Amazon Quick Suite chat agents in enterprise applicationsEmbed Amazon Quick Suite chat agents in enterprise applications

AWS Machine Learning: Embed Amazon Quick Suite chat agents in enterprise applications.

AWS Machine Learning
Product 9h ago

Unlock powerful call center analytics with Amazon Nova foundation modelsUnlock powerful call center analytics with Amazon Nova foundation models

Call center analytics play a crucial role in improving customer experience and operational efficiency.

With foundation models (FMs), you can improve the quality and efficiency of call center operations and analytics. Organizations can use generative AI to assist human customer support agents and managers of contact…

Why it matters

Organizations can use generative AI to assist human customer support agents and managers of contact center teams, so they can gain…

AWS Machine Learning
Business 10h ago

Nvidia Takes on Telco Industry With Open Source ModelNvidia Takes on Telco Industry With Open Source Model

While Nvidia's approach focuses on enabling more autonomous workflows for telco companies, it faces competition from traditional network vendors such as Ericsson and Nokia.

AI Business
Product 10h ago

How Ricoh built a scalable intelligent document processing solution on AWSHow Ricoh built a scalable intelligent document processing solution on AWS

This post is cowritten by Jeremy Jacobson and Rado Fulek from Ricoh.

This post demonstrates how enterprises can overcome document processing scaling limits by combining generative AI, serverless architecture, and standardized frameworks. Ricoh engineered a repeatable, reusable…

Why it matters

Ricoh engineered a repeatable, reusable framework using the AWS GenAI Intelligent Document Processing (IDP) Accelerator .

AWS Machine Learning
Business 10h ago

Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x…Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x more efficient

To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the semantic understanding…

But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit. Today, German AI startup Black Forest Labs (maker of the…

VentureBeat
Research 12h ago

OpenAI's Codex app lands on Windows after topping a million Mac downloads in its…OpenAI's Codex app lands on Windows after topping a million Mac downloads in its first week

OpenAI brings its AI coding tool Codex to Windows, with native support for Windows environments and over 1.6 million weekly active users.

The article OpenAI's Codex app lands on Windows after topping a million Mac downloads in its first week appeared first on The Decoder .

The Decoder
Tech 12h ago

Google’s AI-powered workspace is now available to more users in SearchGoogle’s AI-powered workspace is now available to more users in Search

Google is bringing Canvas to everyone in the US using AI Mode in Search.

The feature opens up a dedicated workspace within its AI-powered search tool, allowing it to use the latest information from Search to organize plans, develop tools, and draft documents in a panel alongside your chat.…

Why it matters

Though Google initially launched Canvas inside […]

The Verge Tech
Research 12h ago

Google faces wrongful death suit after Gemini allegedly convinced a man to die and…Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital

According to a lawsuit filed in a US federal court in Northern California on Wednesday, Google's chatbot Gemini allegedly drove 36-year-old Jonathan Gavalas from Florida to suicide.

The article Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital appeared first on The Decoder .

The Decoder
Labs 13h ago

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning modelPhi-4-reasoning-vision and the lessons of training a multimodal reasoning model

Microsoft Research: Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model.

Microsoft Research
Product 13h ago

LangSmith CLI & SkillsLangSmith CLI & Skills

LangChain Blog’re releasing a CLI along with LangChain Blog's first set of skills to give AI coding agents expertise in the LangSmith ecosystem.

This includes adding tracing to agents, understanding their execution, building test sets, and evaluating performance. On LangChain Blog's eval set, this bumps Claude Code’s performance on these tasks from 17% to 92%.…

LangChain Blog
Product 13h ago

EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs…v5.3.0: EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs Audio V2

New Model additions EuroBERT EuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention.

It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens. Links: Documentation | Paper | Blog Post Add eurobert ( #39455 ) by @ArthurZucker in #39455 VibeVoice ASR VibeVoice…

HF Transformers Releases
Research 13h ago

Meta signs multi-year AI deal with News Corp worth up to $50 million a…Meta signs multi-year AI deal with News Corp worth up to $50 million a year

Meta is paying News Corp up to $50 million a year for AI training data.

Good for individual publishers, bad for the industry as a whole. The article Meta signs multi-year AI deal with News Corp worth up to $50 million a year appeared first on The Decoder .

The Decoder
News 13h ago

5 Essential Security Patterns for Robust Agentic AI5 Essential Security Patterns for Robust Agentic AI

Machine Learning Mastery: 5 Essential Security Patterns for Robust Agentic AI.

Machine Learning Mastery
Research 13h ago

GPT-5.4 reportedly brings a million-token context window and an extreme reasoning modeGPT-5.4 reportedly brings a million-token context window and an extreme reasoning mode

GPT-5.4 is coming soon: double the context window of GPT-5.2, more reliable performance on long-running tasks, and a new "extreme" thinking mode.

The article GPT-5.4 reportedly brings a million-token context window and an extreme reasoning mode appeared first on The Decoder .

The Decoder
Product 14h ago

Quantization-Aware Training in TorchAO (II)Quantization-Aware Training in TorchAO (II)

In PyTorch Blog's previous Quantization-Aware Training (QAT) blog , PyTorch Blog introduced the initial QAT flow in TorchAO for large language models targeting edge devices with ExecuTorch .

Since then, PyTorch Blog extended this flow to also target fast CUDA kernels like the ones in MSLK for fast inference in vLLM , and incorporated this flow into popular fine-tuning frameworks like Unsloth and Axolotl .…

Why it matters

PyTorch Blog also explored more advanced QAT techniques like PARQ for lower bit quantization (prototype): Unsloth integration : Recover…

PyTorch Blog
Labs 14h ago

Elevated errors on Claude Haiku 4.5Elevated errors on Claude Haiku 4.5

Mar 4 , 17:01 UTC Resolved - Errors have returned to the baseline as of 8:08 PT / 16:08 UTC.

Mar 4 , 16:13 UTC Monitoring - A fix has been implemented and Anthropic Status is monitoring the results. Mar 4 , 15:58 UTC Investigating - The earlier issues with Haiku 4.5 have reappeared.

Anthropic Status
Labs 14h ago

Tuning Flash Attention for Peak Performance in NVIDIA CUDA TileTuning Flash Attention for Peak Performance in NVIDIA CUDA Tile

In this post, NVIDIA Developer dive into one of the most critical workloads in modern AI: Flash Attention, where you’ll learn: How to implement Flash Attention using NVIDIA...

NVIDIA Developer
Labs 14h ago

Use Canvas in AI Mode to get things done and bring your ideas to…Use Canvas in AI Mode to get things done and bring your ideas to life, right in Search.

Canvas in AI Mode is now available for everyone in the U.S.

Plus, it can now help you draft documents or build interactive tools.

Google AI Blog
Product 14h ago

Inside BMW Group’s experiments evaluating domain-specific language modelsSmall models, high quality: Inside BMW Group’s experiments evaluating domain-specific language models

A car you can talk to has been a longstanding dream, whether as the basis for television shows or more recent smartphone integrations.

One way of achieving better, more natural voice commands is by incorporating AI foundation models into vehicle systems, which offer more intelligence than traditional voice commands. AI foundation models can connect…

Why it matters

AI foundation models can connect everyday questions with vehicle functions in a seamless dialogue.

Google Cloud AI Blog
Research 14h ago

Supreme Court AI copyright decision sounds sweeping but actually settles very littleSupreme Court AI copyright decision sounds sweeping but actually settles very little

AI inventor Stephen Thaler wanted the US Supreme Court to recognize a machine as the sole author of an image.

The court refused, but the ruling only covers this extreme case. It says nothing about whether people can claim copyright for work they create with AI tools. The article Supreme Court AI copyright decision sounds…

Why it matters

The article Supreme Court AI copyright decision sounds sweeping but actually settles very little appeared first on The Decoder .

The Decoder
Research 14h ago

US military uses Anthropic's Claude for AI-driven strike planning in Iran warUS military uses Anthropic's Claude for AI-driven strike planning in Iran war

In the war against Iran, the US military is using generative AI at scale for target selection and strike planning for the first time.

Of all models, it's the one from the company Washington just banned. The article US military uses Anthropic's Claude for AI-driven strike planning in Iran war appeared first on The Decoder .

The Decoder
Business 15h ago

OpenAI Says ChatGPT Instant 5.3 is Less Cringe, More AccurateOpenAI Says ChatGPT Instant 5.3 is Less Cringe, More Accurate

The AI model maker said it is responding to user criticisms.

AI Business
Research 16h ago

Do Your Customers Have Analysis Paralysis? Find OutDo Your Customers Have Analysis Paralysis? Find Out

Key Takeaways Analysis paralysis in customers happens when you offer too many choices with not enough differentiation.

Instead, use integrated data to provide personalized recommendations that guide shoppers. Salesforce CRM can help you see where customers are struggling or dropping out of the customer journey, so you can fix it and…

Salesforce AI Research
Press 17h ago

Bridging the operational AI gapBridging the operational AI gap

The transformational potential of AI is already well established.

Enterprise use cases are building momentum and organizations are transitioning from pilot projects to AI in production. Companies are no longer just talking about AI; they are redirecting budgets and resources to make…

Why it matters

Many are already experimenting with agentic AI, which promises new levels of automation.

MIT Technology Review
Business 17h ago

Pentagon vendor cutoff exposes the AI dependency map most enterprises never builtPentagon vendor cutoff exposes the AI dependency map most enterprises never built

The federal directive ordering all U.S.

government agencies to cease using Anthropic technology comes with a six-month phaseout window. That timeline assumes agencies already know where Anthropic’s models sit inside their workflows. Most don’t today.

VentureBeat
News 17h ago

Why Enterprise AI StallsEscaping the Prototype Mirage: Why Enterprise AI Stalls

Too many prototypes, too few products

Towards Data Science
Press 17h ago

Earth’s rumblings, and AI for strikes on IranThe Download: Earth’s rumblings, and AI for strikes on Iran

This is today’s edition of The Download , MIT Technology Review's weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Listen to Earth’s rumbling, secret soundtrack The boom of a calving glacier. The crackling rumble of a wildfire. The roar of a surging storm front.

MIT Technology Review
Product 18h ago

MCP Apps support on VercelMCP Apps support on Vercel

Teams can now build and deploy MCP Apps on Vercel with full support for Next.js.MCP Apps are similar to ChatGPT apps, but are a provider-agnostic open standard for embedded UIs.

They run inside iframes and communicate with any compatible host, such as ChatGPT, using a shared bridge.This architecture uses ui/* JSON-RPC over postMessage, enabling a single UI to function across any compatible…

Vercel Blog
News 19h ago

How Does Keyword Search WorkRAG with Hybrid Search: How Does Keyword Search Work?

Understanding keyword search, TF-IDF, and BM25

Towards Data Science
Research 19h ago

Meta creates new applied AI engineering divisionMeta creates new applied AI engineering division

Meta is building a new applied AI engineering organization, according to an internal memo obtained by the Wall Street Journal.

The article Meta creates new applied AI engineering division appeared first on The Decoder .

The Decoder
Research 20h ago

Anthropic nears $20 billion revenue run rate despite Pentagon feudAnthropic nears $20 billion revenue run rate despite Pentagon feud

Anthropic is on track to generate nearly $20 billion in annual revenue based on current performance, according to Bloomberg.

The article Anthropic nears $20 billion revenue run rate despite Pentagon feud appeared first on The Decoder .

The Decoder
Business 20h ago

Anthropic’s Skyrocketing Revenue, A Contract Compromise?, Nvidia EarningsAnthropic’s Skyrocketing Revenue, A Contract Compromise?, Nvidia Earnings

Anthropic's enterprise business is reaching escape velocity, which increases the importance of finding a compromise with the government.

Then, agents dramatically increase demand for Nvidia chips, even if they threaten software.

Stratechery
Research 20h ago

OpenAI is building a GitHub competitor that could challenge its biggest investorOpenAI is building a GitHub competitor that could challenge its biggest investor

OpenAI is building its own alternative to GitHub, Microsoft's widely used platform for code management and collaboration, according to The Information.

The article OpenAI is building a GitHub competitor that could challenge its biggest investor appeared first on The Decoder .

The Decoder
Labs 21h ago

Extending single-minus amplitudes to gravitonsExtending single-minus amplitudes to gravitons

A new preprint extends single-minus amplitudes to gravitons, with GPT-5.2 Pro helping derive and verify nonzero graviton tree amplitudes in quantum gravity.

OpenAI News
Business yesterday

Amazon Spends Another $21B to Beef up Spain's AI InfrastructureAmazon Spends Another $21B to Beef up Spain's AI Infrastructure

The latest round of funding signifies another escalation in Amazon's commitment to the country.

AI Business
Business yesterday

Capgemini Joins OpenAI's Frontier Alliance to Scale Enterprise AICapgemini Joins OpenAI's Frontier Alliance to Scale Enterprise AI

The partners are looking to close the gap between AI experimentation and real-world enterprise deployment.

AI Business
Business yesterday

Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake…Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake of latest open source release

Alibaba's Qwen team of AI researchers have been among the most prolific and well-regarded by international machine learning community — shipping dozens of powerful generalized and specialized generative models…

But now, just 24 hours after shipping the open source Qwen3.5 small model series —a release that drew public praise from Elon Musk for its "impressive intelligence density" —the project’s technical architect and…

VentureBeat
Research 12h ago

Simplifying Human Motion PredictionSimpliHuMoN: Simplifying Human Motion Prediction

Human motion prediction combines the tasks of trajectory forecasting and human pose prediction.

For each of the two tasks, specialized models have been developed. Combining these models for holistic human motion prediction is non-trivial, and recent methods have struggled to compete on established benchmarks for…

arXiv · Machine Learning
Research 12h ago

Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space…Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification

Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial conditions for weather prediction and reanalyses for climate…

Yet, existing traditional and machine-learning DA methods struggle to achieve accuracy, efficiency and uncertainty quantification simultaneously. Here, arXiv cs.LG proposes HLOBA (Hybrid-Ensemble Latent…

Why it matters

Here, arXiv cs.LG proposes HLOBA (Hybrid-Ensemble Latent Observation-Background Assimilation), a three-dimensional hybrid-ensemble DA…

arXiv · Machine Learning
Research 12h ago

Supernova Explosions Learned by Deep ODE NetworksSELDON: Supernova Explosions Learned by Deep ODE Networks

The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C.

Rubin Observatory's Legacy Survey of Space and Time comes online, overwhelming the traditional physics-based inference pipelines. A continuous-time forecasting AI model is of interest because it can deliver…

Why it matters

A continuous-time forecasting AI model is of interest because it can deliver millisecond-scale inference for thousands of objects per…

arXiv · Machine Learning
Research 12h ago

A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS DevelopmentA Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-session forgetting, stochasticity, instruction failure, and adaptation…

arXiv cs.AI proposes a dual-helix governance framework reframing these challenges as structural governance problems that model capacity alone cannot resolve. arXiv cs.AI implements the framework as a 3-track…

arXiv · Artificial Intelligence
Research 12h ago

Linear-Time Stateful 3D Reconstruction with Test-Time TrainingZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost that scales quadratically with the number of input images, making…

Sequential-reconstruction approaches reduce this cost but sacrifice reconstruction quality. arXiv cs.AI introduces ZipMap, a stateful feed-forward model that achieves linear-time, bidirectional 3D reconstruction while…

arXiv · Artificial Intelligence
Research 12h ago

Reasoning-Aware Retrival for Deep Research AgentsAgentIR: Reasoning-Aware Retrival for Deep Research Agents

Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems.

Unlike human users who issue and refine queries without documenting their intermediate thought processes, Deep Research agents generate explicit natural language reasoning before each search call, revealing rich…

Why it matters

To exploit this overlooked signal, arXiv cs.CL introduces: (1) Reasoning-Aware Retrieval, a retrieval paradigm that jointly embeds the…

arXiv · NLP & Language
Research 12h ago

Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer EconomyTurning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their…

Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In…

arXiv · Machine Learning
Research 12h ago

Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual…TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family.

arXiv cs.CL introduces TaxonRL, a reinforcement learning approach using Group Relative Policy Optimization with intermediate rewards that decomposes the reasoning process into hierarchical taxonomic predictions. arXiv…

Why it matters

arXiv cs.CL's method incentivizes models to explicitly reason about species-level, genus-level, and family-level features before making…

arXiv · NLP & Language
Research 12h ago

Real Real-Time Long Video Generation ModelHelios: Real Real-Time Long Video Generation Model

arXiv cs.CV introduces Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline.

arXiv cs.CV make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time…

Why it matters

Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V…

arXiv · Computer Vision
Research 12h ago

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian RegularizationRobustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local…

Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. arXiv cs.AI introduces Adversarially-Aligned Jacobian…

Why it matters

arXiv cs.AI introduces Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity…

arXiv · Artificial Intelligence
Research 12h ago

Evaluating Conversational Agents over Unstructured KnowledgeKnowledge: Evaluating Conversational Agents over Unstructured Knowledge

Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specific knowledge from large, proprietary, and unstructured corpora…

Yet most existing benchmarks evaluate retrieval or tool use independently of each other, creating a gap in realistic, fully agentic evaluation over unstructured data in long-horizon interactions. arXiv cs.CL…

arXiv · NLP & Language
Research 12h ago

Low-Resource Guidance for Controllable Latent Audio DiffusionLow-Resource Guidance for Controllable Latent Audio Diffusion

Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time controls (\textit{e.g.}, guidance) that can also be…

By examining the bottlenecks of existing guidance-based controls, in particular their high cost-per-step due to decoder backpropagation, arXiv cs.AI introduces a guidance-based approach through selective TFG and…

arXiv · Artificial Intelligence
Research 12h ago

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web…Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-stream architecture opens an underexplored attack surface: an…

arXiv cs.CL's vulnerability analysis on MiniWob++ reveals that attacks including a visual component far outperform text-only injections, exposing critical gaps in text-centric VLM safety training. Motivated by this…

arXiv · NLP & Language
Research 12h ago

Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point WeightsRobust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights

The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterization of the Unscented Transform (UT).

Conventional weighting schemes, governed by fixed scaling parameters, assume implicit Gaussianity and fail to adapt to time-varying dynamics or heavy-tailed measurement noise. This work introduces the Meta-Adaptive…

Why it matters

This work introduces the Meta-Adaptive UKF (MA-UKF), a framework that reformulates sigma-point weight synthesis as a hyperparameter…

arXiv · Machine Learning
Research 12h ago

A Concentration-Alignment PerspectiveDissecting Quantization Error: A Concentration-Alignment Perspective

Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop.

Recently, function-preserving transforms (e.g. rotations, Hadamard transform, channel-wise scaling) have been successfully applied to reduce post-training quantization error, yet a principled explanation remains…

Why it matters

arXiv cs.AI analyze linear-layer quantization via the signal-to-quantization-noise ratio (SQNR), showing that for uniform integer…

arXiv · Artificial Intelligence
/ Search M Mode T Theme