Skip to content
Mobrief
Mobrief

The Brief

Product 1h ago

60 million Copilot code reviews and counting60 million Copilot code reviews and counting

Since GitHub Blog's initial launch of Copilot code review (CCR) last April, usage has grown 10X, now accounting for more than one in five code reviews on GitHub.

Behind the scenes, GitHub Blog’ve been running continuous experiments to enhance comment quality. GitHub Blog also moved to an agentic architecture that retrieves repository context and reasons across changes. At…

GitHub Blog
Business 2h ago

OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft…OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets

The AI updates aren't slowing down.

Literally two days after OpenAI launched a new underlying AI model for ChatGPT called GPT-5.3 Instant, the company has unveiled another, even more massive upgrade: GPT-5.4. Actually, GPT-5.4 comes in two varieties:…

Why it matters

Actually, GPT-5.4 comes in two varieties: GPT-5.4 Thinking and GPT-5.4 Pro , the latter designed for the most complex tasks.

VentureBeat
Labs 2h ago

NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in FinanceNVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance

Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to...

NVIDIA Developer
Product 2h ago

Fast and FlexibleFlexAttention + FlashAttention-4: Fast and Flexible

TL;DR: On Hopper and Blackwell GPUs, FlexAttention now has a FlashAttention-4 backend.

PyTorch Blog added support in PyTorch to automatically generate CuTeDSL score/mask modification functions, and to JIT-instantiate FlashAttention-4 for custom attention variants. This leads to performance gains of 1.2×…

Why it matters

This leads to performance gains of 1.2× to 3.2× over the existing Triton implementation on compute-bound workloads.

PyTorch Blog
Labs 3h ago

Controlling Floating-Point Determinism in NVIDIA CCCLControlling Floating-Point Determinism in NVIDIA CCCL

A computation is considered deterministic if multiple runs with the same input data produce the same bitwise result.

While this may seem like a simple property...

NVIDIA Developer
ALL STORIES

60 stories from 88 sources

Research just now

Chatbot Arena Elo Rankings — Top 20 ModelsChatbot Arena Elo Rankings — Top 20 Models

LMArena Elo Rankings — Chatbot Arena Elo Rankings — Top 20 Models. Compare and track AI model performance.

LMArena Elo Rankings
Product 1h ago

60 million Copilot code reviews and counting60 million Copilot code reviews and counting

Since GitHub Blog's initial launch of Copilot code review (CCR) last April, usage has grown 10X, now accounting for more than one in five code reviews on GitHub.

Behind the scenes, GitHub Blog’ve been running continuous experiments to enhance comment quality. GitHub Blog also moved to an agentic architecture that retrieves repository context and reasons across changes. At…

GitHub Blog
Labs 1h ago

ChatGPT Issues Sending MessagesChatGPT Issues Sending Messages

Status: Monitoring OpenAI Status has applied the mitigation and are monitoring the recovery.

Affected components Conversations (Degraded performance)

OpenAI Status
News 1h ago

ZeRO & FSDPAI in Multiple GPUs: ZeRO & FSDP

Learn how Zero Redundancy Optimizer works, how to implement it from scratch, and how to use it in PyTorch

Towards Data Science
Research 1h ago

OpenAI launches GPT-5.4 Thinking and Pro combining coding, reasoning, and computer use…OpenAI launches GPT-5.4 Thinking and Pro combining coding, reasoning, and computer use in one model

GPT-5.4 is OpenAI's most capable model yet, combining coding, computer operation, and reasoning in a single package for the first time.

The article OpenAI launches GPT-5.4 Thinking and Pro combining coding, reasoning, and computer use in one model appeared first on The Decoder .

The Decoder
Product 1h ago

register correct static dispatch for and and orregister correct static dispatch for and and or

PyTorch Releases: register correct static dispatch for __and__ and __or__.

PyTorch Releases
Tech 1h ago

Birdbuddy’s AI-powered hummingbird feeder is matching its best price to dateBirdbuddy’s AI-powered hummingbird feeder is matching its best price to date

Spring is peak bird-watching season, and if you want a closer look this year, Birdbuddy’s Smart Hummingbird Feeder Pro Solar has you covered.

Normally $299, Birdbuddy is now selling the solar-powered feeder for $189 ($110 off), which matches its lowest price to date. Although the feeder is designed for hummingbirds, when paired with its companion […]

The Verge Tech
Research 2h ago

Stop Trying to DIY Your Way to Conversational AnalyticsStop Trying to DIY Your Way to Conversational Analytics

For most organizations, data and analytics remains a cost center—a massive investment in lakes and warehouses that hasn’t yet paid its way.

Businesses have hired brilliant analysts. Yet, for the average employee, data remains a friction-filled resource. When a sales leader needs to know why revenue is dipping, they shouldn’t have to log a ticket and hope…

Salesforce AI Research
Press 2h ago

Trump gets data center companies to pledge to pay for power generationTrump gets data center companies to pledge to pay for power generation

On Wednesday, the Trump administration announced that a large collection of tech companies had signed on to what it's calling the Ratepayer Protection Pledge.

By agreeing, the initial signatories—Amazon, Google, Meta, Microsoft, OpenAI, Oracle, and xAI—are saying they will pay for the new generation and transmission capacities needed for any additional data centers they…

Why it matters

But the agreement has no enforcement mechanism, and it will likely run into issues with hardware supplies.

Ars Technica AI
Research 2h ago

How to Power Data 360 with Code ExtensionHow to Power Data 360 with Code Extension

Code Extension resolves a dilemma an architect faces: the need to operationalize complex data manipulations without leaving the Salesforce trust boundary.

Traditionally, handling advanced requirements like parsing complex XML, managing encrypted data, or designing custom AI chunking algorithms, led architects to export data to external systems or rely on unmanaged local…

Why it matters

Moving the data to perform advanced processing on it introduces risks, including rogue code security vulnerabilities, and compliance…

Salesforce AI Research
Labs 2h ago

NVIDIA Blackwell Sets STAC-AI Record for LLM Inference in FinanceNVIDIA Blackwell Sets STAC-AI Record for LLM Inference in Finance

Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to...

NVIDIA Developer
Labs 2h ago

How does AI understand my visual searchesAsk a Techspert: How does AI understand my visual searches?

Learn more about AI Mode in Search’s query fan-out method for visual search.

Google AI Blog
Business 2h ago

OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft…OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets

The AI updates aren't slowing down.

Literally two days after OpenAI launched a new underlying AI model for ChatGPT called GPT-5.3 Instant, the company has unveiled another, even more massive upgrade: GPT-5.4. Actually, GPT-5.4 comes in two varieties:…

Why it matters

Actually, GPT-5.4 comes in two varieties: GPT-5.4 Thinking and GPT-5.4 Pro , the latter designed for the most complex tasks.

VentureBeat
Product 2h ago

Fast and FlexibleFlexAttention + FlashAttention-4: Fast and Flexible

TL;DR: On Hopper and Blackwell GPUs, FlexAttention now has a FlashAttention-4 backend.

PyTorch Blog added support in PyTorch to automatically generate CuTeDSL score/mask modification functions, and to JIT-instantiate FlashAttention-4 for custom attention variants. This leads to performance gains of 1.2×…

Why it matters

This leads to performance gains of 1.2× to 3.2× over the existing Triton implementation on compute-bound workloads.

PyTorch Blog
Research 3h ago

ChatGPT users research products but won't buy there, forcing OpenAI to rethink its…ChatGPT users research products but won't buy there, forcing OpenAI to rethink its commerce strategy

OpenAI wanted to turn ChatGPT into a shopping destination, but only about a dozen retailers signed up and users weren't buying.

Now the company is handing off purchases to app partners like Instacart and Target. The article ChatGPT users research products but won't buy there, forcing OpenAI to rethink its commerce strategy appeared first on…

The Decoder
Product 3h ago

Make kulinseth and albanD emeritus for MPS/Metal backendMake kulinseth and albanD emeritus for MPS/Metal backend

PyTorch Releases: Make kulinseth and albanD emeritus for MPS/Metal backend.

PyTorch Releases
Labs 3h ago

Tuning Flash Attention for Peak Performance in NVIDIA CUDA TileTuning Flash Attention for Peak Performance in NVIDIA CUDA Tile

In this post, NVIDIA Developer dive into one of the most critical workloads in modern AI: Flash Attention, where you’ll learn: How to implement Flash Attention using NVIDIA...

NVIDIA Developer
Labs 3h ago

Controlling Floating-Point Determinism in NVIDIA CCCLControlling Floating-Point Determinism in NVIDIA CCCL

A computation is considered deterministic if multiple runs with the same input data produce the same bitwise result.

While this may seem like a simple property...

NVIDIA Developer
Product 3h ago

Learnings from GitHub and AndelaScaling AI opportunity across the globe: Learnings from GitHub and Andela

Across the globe, developer talent is abundant.

But what has been historically inequitable is the access to emerging technologies, mentorship, and enablement when those technologies are reshaping the industry. Developers in regions like Africa, South America, and…

Why it matters

Developers in regions like Africa, South America, and Southeast Asia can build products at scale, yet access to emerging tools and…

GitHub Blog
Product 3h ago

The ultimate Nano Banana prompting guideThe ultimate Nano Banana prompting guide

Creating precise, high-quality images often involves endless trial and error.

You need a model that actually understands what you’re asking for. Built on the Gemini 3 family of models, Nano Banana models apply deep reasoning capabilities to fully understand your prompt before generating an…

Google Cloud AI Blog
Tech 3h ago

Roblox is censoring chats with AIRoblox is censoring chats with AI

Roblox is using AI to alter the content of chat messages on its platform in real time using a new feature rolling out today.

Real-time chat rephrasing goes beyond the current filtering for banned language, which replaces certain words and phrases with "#" symbols. Now, Roblox says those words and phrases can be "translated into […]

The Verge Tech
Business 4h ago

Gemini’s Canvas in AI Mode Available in Google Search in USGemini’s Canvas in AI Mode Available in Google Search in US

The expansion of the workspace platform follows a gradual rollout.

AI Business
Tech 4h ago

Meta’s AI glasses reportedly send sensitive footage to human reviewers in KenyaMeta’s AI glasses reportedly send sensitive footage to human reviewers in Kenya

Meta's AI-powered smart glasses could be sending sensitive footage to human reviewers in Nairobi, Kenya, according to an investigation by the Swedish outlets Svenska Dagbladet and Göteborgs-Posten.

The report, which was published last week, claims Meta contractors in Kenya have seen videos captured with the smart glasses that show "bathroom visits, sex and other intimate […]

The Verge Tech
Research 4h ago

Google Search quietly becomes an AI assistant as Canvas feature launches for US usersGoogle Search quietly becomes an AI assistant as Canvas feature launches for US users

Google is turning AI search into a workspace.

Canvas lets users build interactive dashboards, documents, and code prototypes directly in AI mode. The article Google Search quietly becomes an AI assistant as Canvas feature launches for US users appeared first on…

The Decoder
Labs 4h ago

The latest AI news we announced in FebruaryThe latest AI news we announced in February

Here are Google’s latest AI updates from February 2026

Google AI Blog
Product 4h ago

Drive organizational growth with Amazon Lex multi-developer CI/CD pipelineDrive organizational growth with Amazon Lex multi-developer CI/CD pipeline

As your conversational AI initiatives evolve, developing Amazon Lex assistants becomes increasingly complex.

Multiple developers working on the same shared Lex instance leads to configuration conflicts, overwritten changes, and slower iteration cycles. Scaling Amazon Lex development requires isolated environments, version…

AWS Machine Learning
Research 4h ago

Olmo Hybrid and future LLM architecturesOlmo Hybrid and future LLM architectures

So-called hybrid architectures are far from new in open-weight models these days.

Interconnects (Nathan Lambert) now have the recent Qwen 3.5 (previewed by Qwen3-Next ), Kimi Linear last fall (a smaller release than their flagship Kimi K2 models ), Nvidia’s Nemotron 3 Nano (with the bigger models…

Why it matters

This is one of those times when a research trend looks like it’s getting adopted everywhere at once (maybe the Muon optimizer too, soon?).

Interconnects (Nathan Lambert)
Product 4h ago

Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI…Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints

AWS Machine Learning: Building custom model provider for Strands Agents with LLMs hosted on SageMaker AI endpoints.

AWS Machine Learning
Product 4h ago

Deploying PyTorch Models to the Micro-Edge with ExecuTorch and ArmDeploying PyTorch Models to the Micro-Edge with ExecuTorch and Arm

PyTorch Blog: Deploying PyTorch Models to the Micro-Edge with ExecuTorch and Arm.

PyTorch Blog
Business 5h ago

Databricks built a RAG agent it says can handle every kind of enterprise searchDatabricks built a RAG agent it says can handle every kind of enterprise search

Most enterprise RAG pipelines are optimized for one search behavior.

They fail silently on the others. A model trained to synthesize cross-document reports handles constraint-driven entity search poorly. A model tuned for simple lookup tasks falls apart on multi-step reasoning over…

VentureBeat
Research 5h ago

Social Relationship Management Software For Small TeamsSocial Relationship Management Software For Small Teams

Key Takeaways Social relationship management software lets small teams respond to customer inquiries faster and with more personalized context, focusing on the relationship aspect of marketing.

Integrating social with a unified data system, like a CRM, ensures that every interaction is recorded and accessible to your teams. Salesforce offers automated tools for connecting to your social network, helping to…

Salesforce AI Research
Research 6h ago

Tech giants make non-binding White House pledge to cover AI data center energy costsTech giants make non-binding White House pledge to cover AI data center energy costs

Google, Microsoft, Meta, Amazon, Oracle, xAI, and OpenAI signed a voluntary pledge at the White House to cover the electricity costs of their data centers themselves.

The article Tech giants make non-binding White House pledge to cover AI data center energy costs appeared first on The Decoder .

The Decoder
Press 6h ago

an AI agent’s hit piece, and preventing lightningThe Download: an AI agent’s hit piece, and preventing lightning

This is today’s edition of The Download , MIT Technology Review's weekday newsletter that provides a daily dose of what’s going on in the world of technology.

Online harassment is entering its AI era Scott Shambaugh didn’t think twice when he denied an AI agent’s request to contribute to matplotlib, a software library he helps manage. Then things got weird. In the middle of…

MIT Technology Review
Research 6h ago

Apple puts AI disclosure responsibility on labels and distributorsApple puts AI disclosure responsibility on labels and distributors

Apple Music is rolling out Transparency Tags that let labels and distributors flag AI-generated content across four categories: Artwork, Tracks, Compositions, and Music Videos.

The article Apple puts AI disclosure responsibility on labels and distributors appeared first on The Decoder .

The Decoder
Research 6h ago

Anthropic CEO attacks OpenAI's Pentagon deal as "safety theater" while investors…Anthropic CEO attacks OpenAI's Pentagon deal as "safety theater" while investors scramble for de-escalation

Anthropic CEO Dario Amodei attacks OpenAI's Pentagon contract as "80% safety theater" in a leaked memo and accuses the Trump administration of punishing his company for a lack of political loyalty.

OpenAI hastily updates its contract, investors push for de-escalation, and a major tech industry group backs Anthropic. Meanwhile, Amodei is making a last-ditch attempt to negotiate directly with the Under Secretary…

The Decoder
Tech 6h ago

Apple Music adds optional labels for AI songs and visualsApple Music adds optional labels for AI songs and visuals

Apple is asking artists and record labels on its music streaming platform to voluntarily label songs that were made using AI.

The new "Transparency Tags" metadata system for Apple Music was announced in a newsletter to industry partners yesterday, according to Music Business Worldwide, and covers four categories, including track,…

The Verge Tech
Business 7h ago

Euro Regulators Question Meta Over AI Glasses Privacy FearsEuro Regulators Question Meta Over AI Glasses Privacy Fears

The allegations involve a U.S.-based data annotation and labeling vendor.

AI Business
Product 7h ago

From idea to secure checkout in minutes with StripeFrom idea to secure checkout in minutes with Stripe

Building commerce applications looks very different than it did even a few years months ago.Teams are no longer treating storefronts and billing systems as long-running integration projects that happen after the…

They iterate quickly, deploy globally by default, and increasingly rely on AI tools to generate UI, checkout flows, and subscription logic.Commerce is becoming more programmable and increasingly agent-driven. As AI…

Vercel Blog
Product 7h ago

GPT 5.4 is now on AI GatewayGPT 5.4 is now on AI Gateway

GPT-5.4 and GPT-5.4 Pro are now available on AI Gateway.This model brings the agentic and reasoning leaps from GPT-5.3-Codex to all domains.

This includes knowledge work like reports, spreadsheets, presentations, and analysis in addition to coding. It handles complex multi-step workflows more reliably, including tasks that involve tools, research, and…

Why it matters

GPT-5.4 is faster and also more token-efficient than previous iterations (GPT-5.2).

Vercel Blog
Research 7h ago

Alibaba's chief AI developer quits, taking key team members with himAlibaba's chief AI developer quits, taking key team members with him

Alibaba's chief AI developer Junyang Lin has unexpectedly resigned, and several core members of the Qwen team followed him out the door.

The departures were reportedly triggered by an internal reorganization. The article Alibaba's chief AI developer quits, taking key team members with him appeared first on The Decoder .

The Decoder
News 8h ago

How Human Work Will Remain Valuable in an AI WorldHow Human Work Will Remain Valuable in an AI World

The Road to Reality — Episode 1

Towards Data Science
News 9h ago

Vector Databases vs. Graph RAG for Agent Memory: When to Use WhichVector Databases vs. Graph RAG for Agent Memory: When to Use Which

Machine Learning Mastery: Vector Databases vs. Graph RAG for Agent Memory: When to Use Which.

Machine Learning Mastery
Press 9h ago

How much wildfire prevention is too muchHow much wildfire prevention is too much?

The race to prevent the worst wildfires has been an increasingly high-tech one.

Companies are proposing AI fire detection systems and drones that can stamp out early blazes . And now, one Canadian startup says it’s going after lightning. Lightning-sparked fires can be a big deal: The Canadian…

MIT Technology Review
Business 9h ago

An Interview with Gregory Allen About Anthropic and the U.S. GovernmentAn Interview with Gregory Allen About Anthropic and the U.S. Government

Stratechery: An Interview with Gregory Allen About Anthropic and the U.S. Government.

Stratechery
Research yesterday

Simplifying Human Motion PredictionSimpliHuMoN: Simplifying Human Motion Prediction

Human motion prediction combines the tasks of trajectory forecasting and human pose prediction.

For each of the two tasks, specialized models have been developed. Combining these models for holistic human motion prediction is non-trivial, and recent methods have struggled to compete on established benchmarks for…

arXiv · Machine Learning
Research yesterday

Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space…Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification

Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial conditions for weather prediction and reanalyses for climate…

Yet, existing traditional and machine-learning DA methods struggle to achieve accuracy, efficiency and uncertainty quantification simultaneously. Here, arXiv cs.LG proposes HLOBA (Hybrid-Ensemble Latent…

Why it matters

Here, arXiv cs.LG proposes HLOBA (Hybrid-Ensemble Latent Observation-Background Assimilation), a three-dimensional hybrid-ensemble DA…

arXiv · Machine Learning
Research yesterday

Supernova Explosions Learned by Deep ODE NetworksSELDON: Supernova Explosions Learned by Deep ODE Networks

The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C.

Rubin Observatory's Legacy Survey of Space and Time comes online, overwhelming the traditional physics-based inference pipelines. A continuous-time forecasting AI model is of interest because it can deliver…

Why it matters

A continuous-time forecasting AI model is of interest because it can deliver millisecond-scale inference for thousands of objects per…

arXiv · Machine Learning
Research yesterday

A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS DevelopmentA Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-session forgetting, stochasticity, instruction failure, and adaptation…

arXiv cs.AI proposes a dual-helix governance framework reframing these challenges as structural governance problems that model capacity alone cannot resolve. arXiv cs.AI implements the framework as a 3-track…

arXiv · Artificial Intelligence
Research yesterday

Linear-Time Stateful 3D Reconstruction with Test-Time TrainingZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost that scales quadratically with the number of input images, making…

Sequential-reconstruction approaches reduce this cost but sacrifice reconstruction quality. arXiv cs.LG introduces ZipMap, a stateful feed-forward model that achieves linear-time, bidirectional 3D reconstruction while…

arXiv · Machine Learning
Research yesterday

Reasoning-Aware Retrival for Deep Research AgentsAgentIR: Reasoning-Aware Retrival for Deep Research Agents

Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems.

Unlike human users who issue and refine queries without documenting their intermediate thought processes, Deep Research agents generate explicit natural language reasoning before each search call, revealing rich…

Why it matters

To exploit this overlooked signal, arXiv cs.CL introduces: (1) Reasoning-Aware Retrieval, a retrieval paradigm that jointly embeds the…

arXiv · NLP & Language
Research yesterday

Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer EconomyTurning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy

YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their…

Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In…

arXiv · Machine Learning
Research yesterday

Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual…TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family.

arXiv cs.CL introduces TaxonRL, a reinforcement learning approach using Group Relative Policy Optimization with intermediate rewards that decomposes the reasoning process into hierarchical taxonomic predictions. arXiv…

Why it matters

arXiv cs.CL's method incentivizes models to explicitly reason about species-level, genus-level, and family-level features before making…

arXiv · NLP & Language
Research yesterday

Real Real-Time Long Video Generation ModelHelios: Real Real-Time Long Video Generation Model

arXiv cs.CV introduces Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline.

arXiv cs.CV make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time…

Why it matters

Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V…

arXiv · Computer Vision
Research yesterday

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian RegularizationRobustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local…

Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. arXiv cs.LG introduces Adversarially-Aligned Jacobian…

Why it matters

arXiv cs.LG introduces Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity…

arXiv · Machine Learning
Research yesterday

Evaluating Conversational Agents over Unstructured KnowledgeKnowledge: Evaluating Conversational Agents over Unstructured Knowledge

Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specific knowledge from large, proprietary, and unstructured corpora…

Yet most existing benchmarks evaluate retrieval or tool use independently of each other, creating a gap in realistic, fully agentic evaluation over unstructured data in long-horizon interactions. arXiv cs.AI…

arXiv · Artificial Intelligence
Research yesterday

Low-Resource Guidance for Controllable Latent Audio DiffusionLow-Resource Guidance for Controllable Latent Audio Diffusion

Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time controls (\textit{e.g.}, guidance) that can also be…

By examining the bottlenecks of existing guidance-based controls, in particular their high cost-per-step due to decoder backpropagation, arXiv cs.LG introduces a guidance-based approach through selective TFG and…

arXiv · Machine Learning
Research yesterday

Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web…Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks

Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-stream architecture opens an underexplored attack surface: an…

arXiv cs.LG's vulnerability analysis on MiniWob++ reveals that attacks including a visual component far outperform text-only injections, exposing critical gaps in text-centric VLM safety training. Motivated by this…

arXiv · Machine Learning
Research yesterday

Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point WeightsRobust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights

The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterization of the Unscented Transform (UT).

Conventional weighting schemes, governed by fixed scaling parameters, assume implicit Gaussianity and fail to adapt to time-varying dynamics or heavy-tailed measurement noise. This work introduces the Meta-Adaptive…

Why it matters

This work introduces the Meta-Adaptive UKF (MA-UKF), a framework that reformulates sigma-point weight synthesis as a hyperparameter…

arXiv · Machine Learning
Research yesterday

A Concentration-Alignment PerspectiveDissecting Quantization Error: A Concentration-Alignment Perspective

Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop.

Recently, function-preserving transforms (e.g. rotations, Hadamard transform, channel-wise scaling) have been successfully applied to reduce post-training quantization error, yet a principled explanation remains…

Why it matters

arXiv cs.LG analyze linear-layer quantization via the signal-to-quantization-noise ratio (SQNR), showing that for uniform integer…

arXiv · Machine Learning
/ Search M Mode T Theme