The Brief
Quantization-Aware Training in TorchAO (II)Quantization-Aware Training in TorchAO (II)
In PyTorch Blog's previous Quantization-Aware Training (QAT) blog , PyTorch Blog introduced the initial QAT flow in TorchAO for large language models targeting edge devices with ExecuTorch .
Since then, PyTorch Blog extended this flow to also target fast CUDA kernels like the ones in MSLK for fast inference in vLLM , and incorporated this flow into popular fine-tuning frameworks like Unsloth and Axolotl .…
PyTorch Blog also explored more advanced QAT techniques like PARQ for lower bit quantization (prototype): Unsloth integration : Recover…
Nvidia Takes on Telco Industry With Open Source ModelNvidia Takes on Telco Industry With Open Source Model
While Nvidia's approach focuses on enabling more autonomous workflows for telco companies, it faces competition from traditional network vendors such as Ericsson and Nokia.
Amazon Spends Another $21B to Beef up Spain's AI InfrastructureAmazon Spends Another $21B to Beef up Spain's AI Infrastructure
The latest round of funding signifies another escalation in Amazon's commitment to the country.
LangSmith CLI & SkillsLangSmith CLI & Skills
LangChain Blog’re releasing a CLI along with LangChain Blog's first set of skills to give AI coding agents expertise in the LangSmith ecosystem.
This includes adding tracing to agents, understanding their execution, building test sets, and evaluating performance. On LangChain Blog's eval set, this bumps Claude Code’s performance on these tasks from 17% to 92%.…
Google faces wrongful death suit after Gemini allegedly convinced a man to die and…Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital
According to a lawsuit filed in a US federal court in Northern California on Wednesday, Google's chatbot Gemini allegedly drove 36-year-old Jonathan Gavalas from Florida to suicide.
The article Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital appeared first on The Decoder .
Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x…Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x more efficient
To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the semantic understanding…
But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit. Today, German AI startup Black Forest Labs (maker of the…
60 stories from 88 sources
Chatbot Arena Elo Rankings — Top 20 ModelsChatbot Arena Elo Rankings — Top 20 Models
LMArena Elo Rankings — Chatbot Arena Elo Rankings — Top 20 Models. Compare and track AI model performance.
User may experience errors in ChatGPTUser may experience errors in ChatGPT
Status: Resolved All impacted services have now fully recovered.
Affected components Conversations (Operational)
Revert "[BE] Apply up007 and up045 to .ci through tools "Revert "[BE] Apply up007 and up045 to .ci through tools "
This reverts commit f1da356 .
Reverted #176458 on behalf of https://github.com/pytorch-auto-revert due to Reverted automatically by pytorch's autorevert, to avoid this behaviour add the tag autorevert: disable ( comment )
[user-streams] Add stream support to inductor wrapper codegen[user-streams] Add stream support to inductor wrapper codegen
PyTorch Releases update to user-streams: Add stream support to inductor wrapper codegen.
[MPS] Fix masked_scatter to preserve scalar tensor shape[MPS] Fix masked_scatter to preserve scalar tensor shape
PyTorch Releases update to MPS: Fix masked_scatter to preserve scalar tensor shape.
API Error RatesAPI Error Rates
Status: Resolved All impacted services have now fully recovered.
Affected components Realtime (Operational) Files (Operational) Embeddings (Operational) Responses (Operational) Codex (Operational) Images (Operational) Batch (Operational) Chat Completions (Operational) Audio…
Grammarly Is Offering ‘Expert’ AI Reviews From Your Favorite Authors—Dead or AliveGrammarly Is Offering ‘Expert’ AI Reviews From Your Favorite Authors—Dead or Alive
The tool, offered by the recently-rebranded company Superhuman, gives feedback based on the work of famous dead and living writers—without their permission.
What AI Models for War Actually Look LikeWhat AI Models for War Actually Look Like
While companies like Anthropic debate limits on military uses of AI, Smack Technologies is training models to plan battlefield operations.
Embed Amazon Quick Suite chat agents in enterprise applicationsEmbed Amazon Quick Suite chat agents in enterprise applications
AWS Machine Learning: Embed Amazon Quick Suite chat agents in enterprise applications.
Unlock powerful call center analytics with Amazon Nova foundation modelsUnlock powerful call center analytics with Amazon Nova foundation models
Call center analytics play a crucial role in improving customer experience and operational efficiency.
With foundation models (FMs), you can improve the quality and efficiency of call center operations and analytics. Organizations can use generative AI to assist human customer support agents and managers of contact…
Organizations can use generative AI to assist human customer support agents and managers of contact center teams, so they can gain…
Nvidia Takes on Telco Industry With Open Source ModelNvidia Takes on Telco Industry With Open Source Model
While Nvidia's approach focuses on enabling more autonomous workflows for telco companies, it faces competition from traditional network vendors such as Ericsson and Nokia.
How Ricoh built a scalable intelligent document processing solution on AWSHow Ricoh built a scalable intelligent document processing solution on AWS
This post is cowritten by Jeremy Jacobson and Rado Fulek from Ricoh.
This post demonstrates how enterprises can overcome document processing scaling limits by combining generative AI, serverless architecture, and standardized frameworks. Ricoh engineered a repeatable, reusable…
Ricoh engineered a repeatable, reusable framework using the AWS GenAI Intelligent Document Processing (IDP) Accelerator .
Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x…Black Forest Labs' new Self-Flow technique makes training multimodal AI models 2.8x more efficient
To create coherent images or videos, generative AI diffusion models like Stable Diffusion or FLUX have typically relied on external "teachers"—frozen encoders like CLIP or DINOv2—to provide the semantic understanding…
But this reliance has come at a cost: a "bottleneck" where scaling up the model no longer yields better results because the external teacher has hit its limit. Today, German AI startup Black Forest Labs (maker of the…
OpenAI's Codex app lands on Windows after topping a million Mac downloads in its…OpenAI's Codex app lands on Windows after topping a million Mac downloads in its first week
OpenAI brings its AI coding tool Codex to Windows, with native support for Windows environments and over 1.6 million weekly active users.
The article OpenAI's Codex app lands on Windows after topping a million Mac downloads in its first week appeared first on The Decoder .
Google’s AI-powered workspace is now available to more users in SearchGoogle’s AI-powered workspace is now available to more users in Search
Google is bringing Canvas to everyone in the US using AI Mode in Search.
The feature opens up a dedicated workspace within its AI-powered search tool, allowing it to use the latest information from Search to organize plans, develop tools, and draft documents in a panel alongside your chat.…
Though Google initially launched Canvas inside […]
Google faces wrongful death suit after Gemini allegedly convinced a man to die and…Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital
According to a lawsuit filed in a US federal court in Northern California on Wednesday, Google's chatbot Gemini allegedly drove 36-year-old Jonathan Gavalas from Florida to suicide.
The article Google faces wrongful death suit after Gemini allegedly convinced a man to die and become digital appeared first on The Decoder .
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning modelPhi-4-reasoning-vision and the lessons of training a multimodal reasoning model
Microsoft Research: Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model.
LangSmith CLI & SkillsLangSmith CLI & Skills
LangChain Blog’re releasing a CLI along with LangChain Blog's first set of skills to give AI coding agents expertise in the LangSmith ecosystem.
This includes adding tracing to agents, understanding their execution, building test sets, and evaluating performance. On LangChain Blog's eval set, this bumps Claude Code’s performance on these tasks from 17% to 92%.…
EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs…v5.3.0: EuroBERT, VibeVoice ASR, TimesFM2.5, PP-DocLayoutV2, OlmoHybrid, ModernVBert, Higgs Audio V2
New Model additions EuroBERT EuroBERT is a multilingual encoder model based on a refreshed transformer architecture, akin to Llama but with bidirectional attention.
It supports a mixture of European and widely spoken languages, with sequences of up to 8192 tokens. Links: Documentation | Paper | Blog Post Add eurobert ( #39455 ) by @ArthurZucker in #39455 VibeVoice ASR VibeVoice…
Meta signs multi-year AI deal with News Corp worth up to $50 million a…Meta signs multi-year AI deal with News Corp worth up to $50 million a year
Meta is paying News Corp up to $50 million a year for AI training data.
Good for individual publishers, bad for the industry as a whole. The article Meta signs multi-year AI deal with News Corp worth up to $50 million a year appeared first on The Decoder .
5 Essential Security Patterns for Robust Agentic AI5 Essential Security Patterns for Robust Agentic AI
Machine Learning Mastery: 5 Essential Security Patterns for Robust Agentic AI.
GPT-5.4 reportedly brings a million-token context window and an extreme reasoning modeGPT-5.4 reportedly brings a million-token context window and an extreme reasoning mode
GPT-5.4 is coming soon: double the context window of GPT-5.2, more reliable performance on long-running tasks, and a new "extreme" thinking mode.
The article GPT-5.4 reportedly brings a million-token context window and an extreme reasoning mode appeared first on The Decoder .
Quantization-Aware Training in TorchAO (II)Quantization-Aware Training in TorchAO (II)
In PyTorch Blog's previous Quantization-Aware Training (QAT) blog , PyTorch Blog introduced the initial QAT flow in TorchAO for large language models targeting edge devices with ExecuTorch .
Since then, PyTorch Blog extended this flow to also target fast CUDA kernels like the ones in MSLK for fast inference in vLLM , and incorporated this flow into popular fine-tuning frameworks like Unsloth and Axolotl .…
PyTorch Blog also explored more advanced QAT techniques like PARQ for lower bit quantization (prototype): Unsloth integration : Recover…
Elevated errors on Claude Haiku 4.5Elevated errors on Claude Haiku 4.5
Mar 4 , 17:01 UTC Resolved - Errors have returned to the baseline as of 8:08 PT / 16:08 UTC.
Mar 4 , 16:13 UTC Monitoring - A fix has been implemented and Anthropic Status is monitoring the results. Mar 4 , 15:58 UTC Investigating - The earlier issues with Haiku 4.5 have reappeared.
Tuning Flash Attention for Peak Performance in NVIDIA CUDA TileTuning Flash Attention for Peak Performance in NVIDIA CUDA Tile
In this post, NVIDIA Developer dive into one of the most critical workloads in modern AI: Flash Attention, where you’ll learn: How to implement Flash Attention using NVIDIA...
Use Canvas in AI Mode to get things done and bring your ideas to…Use Canvas in AI Mode to get things done and bring your ideas to life, right in Search.
Canvas in AI Mode is now available for everyone in the U.S.
Plus, it can now help you draft documents or build interactive tools.
Inside BMW Group’s experiments evaluating domain-specific language modelsSmall models, high quality: Inside BMW Group’s experiments evaluating domain-specific language models
A car you can talk to has been a longstanding dream, whether as the basis for television shows or more recent smartphone integrations.
One way of achieving better, more natural voice commands is by incorporating AI foundation models into vehicle systems, which offer more intelligence than traditional voice commands. AI foundation models can connect…
AI foundation models can connect everyday questions with vehicle functions in a seamless dialogue.
Supreme Court AI copyright decision sounds sweeping but actually settles very littleSupreme Court AI copyright decision sounds sweeping but actually settles very little
AI inventor Stephen Thaler wanted the US Supreme Court to recognize a machine as the sole author of an image.
The court refused, but the ruling only covers this extreme case. It says nothing about whether people can claim copyright for work they create with AI tools. The article Supreme Court AI copyright decision sounds…
The article Supreme Court AI copyright decision sounds sweeping but actually settles very little appeared first on The Decoder .
US military uses Anthropic's Claude for AI-driven strike planning in Iran warUS military uses Anthropic's Claude for AI-driven strike planning in Iran war
In the war against Iran, the US military is using generative AI at scale for target selection and strike planning for the first time.
Of all models, it's the one from the company Washington just banned. The article US military uses Anthropic's Claude for AI-driven strike planning in Iran war appeared first on The Decoder .
OpenAI Says ChatGPT Instant 5.3 is Less Cringe, More AccurateOpenAI Says ChatGPT Instant 5.3 is Less Cringe, More Accurate
The AI model maker said it is responding to user criticisms.
Do Your Customers Have Analysis Paralysis? Find OutDo Your Customers Have Analysis Paralysis? Find Out
Key Takeaways Analysis paralysis in customers happens when you offer too many choices with not enough differentiation.
Instead, use integrated data to provide personalized recommendations that guide shoppers. Salesforce CRM can help you see where customers are struggling or dropping out of the customer journey, so you can fix it and…
Bridging the operational AI gapBridging the operational AI gap
The transformational potential of AI is already well established.
Enterprise use cases are building momentum and organizations are transitioning from pilot projects to AI in production. Companies are no longer just talking about AI; they are redirecting budgets and resources to make…
Many are already experimenting with agentic AI, which promises new levels of automation.
Pentagon vendor cutoff exposes the AI dependency map most enterprises never builtPentagon vendor cutoff exposes the AI dependency map most enterprises never built
The federal directive ordering all U.S.
government agencies to cease using Anthropic technology comes with a six-month phaseout window. That timeline assumes agencies already know where Anthropic’s models sit inside their workflows. Most don’t today.
Why Enterprise AI StallsEscaping the Prototype Mirage: Why Enterprise AI Stalls
Too many prototypes, too few products
Earth’s rumblings, and AI for strikes on IranThe Download: Earth’s rumblings, and AI for strikes on Iran
This is today’s edition of The Download , MIT Technology Review's weekday newsletter that provides a daily dose of what’s going on in the world of technology.
Listen to Earth’s rumbling, secret soundtrack The boom of a calving glacier. The crackling rumble of a wildfire. The roar of a surging storm front.
MCP Apps support on VercelMCP Apps support on Vercel
Teams can now build and deploy MCP Apps on Vercel with full support for Next.js.MCP Apps are similar to ChatGPT apps, but are a provider-agnostic open standard for embedded UIs.
They run inside iframes and communicate with any compatible host, such as ChatGPT, using a shared bridge.This architecture uses ui/* JSON-RPC over postMessage, enabling a single UI to function across any compatible…
How Does Keyword Search WorkRAG with Hybrid Search: How Does Keyword Search Work?
Understanding keyword search, TF-IDF, and BM25
Meta creates new applied AI engineering divisionMeta creates new applied AI engineering division
Meta is building a new applied AI engineering organization, according to an internal memo obtained by the Wall Street Journal.
The article Meta creates new applied AI engineering division appeared first on The Decoder .
Anthropic nears $20 billion revenue run rate despite Pentagon feudAnthropic nears $20 billion revenue run rate despite Pentagon feud
Anthropic is on track to generate nearly $20 billion in annual revenue based on current performance, according to Bloomberg.
The article Anthropic nears $20 billion revenue run rate despite Pentagon feud appeared first on The Decoder .
Anthropic’s Skyrocketing Revenue, A Contract Compromise?, Nvidia EarningsAnthropic’s Skyrocketing Revenue, A Contract Compromise?, Nvidia Earnings
Anthropic's enterprise business is reaching escape velocity, which increases the importance of finding a compromise with the government.
Then, agents dramatically increase demand for Nvidia chips, even if they threaten software.
OpenAI is building a GitHub competitor that could challenge its biggest investorOpenAI is building a GitHub competitor that could challenge its biggest investor
OpenAI is building its own alternative to GitHub, Microsoft's widely used platform for code management and collaboration, according to The Information.
The article OpenAI is building a GitHub competitor that could challenge its biggest investor appeared first on The Decoder .
Extending single-minus amplitudes to gravitonsExtending single-minus amplitudes to gravitons
A new preprint extends single-minus amplitudes to gravitons, with GPT-5.2 Pro helping derive and verify nonzero graviton tree amplitudes in quantum gravity.
Amazon Spends Another $21B to Beef up Spain's AI InfrastructureAmazon Spends Another $21B to Beef up Spain's AI Infrastructure
The latest round of funding signifies another escalation in Amazon's commitment to the country.
Capgemini Joins OpenAI's Frontier Alliance to Scale Enterprise AICapgemini Joins OpenAI's Frontier Alliance to Scale Enterprise AI
The partners are looking to close the gap between AI experimentation and real-world enterprise deployment.
Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake…Did Alibaba just kneecap its powerful Qwen AI team? Key figures depart in wake of latest open source release
Alibaba's Qwen team of AI researchers have been among the most prolific and well-regarded by international machine learning community — shipping dozens of powerful generalized and specialized generative models…
But now, just 24 hours after shipping the open source Qwen3.5 small model series —a release that drew public praise from Elon Musk for its "impressive intelligence density" —the project’s technical architect and…
Simplifying Human Motion PredictionSimpliHuMoN: Simplifying Human Motion Prediction
Human motion prediction combines the tasks of trajectory forecasting and human pose prediction.
For each of the two tasks, specialized models have been developed. Combining these models for holistic human motion prediction is non-trivial, and recent methods have struggled to compete on established benchmarks for…
Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space…Accurate and Efficient Hybrid-Ensemble Atmospheric Data Assimilation in Latent Space with Uncertainty Quantification
Data assimilation (DA) combines model forecasts and observations to estimate the optimal state of the atmosphere with its uncertainty, providing initial conditions for weather prediction and reanalyses for climate…
Yet, existing traditional and machine-learning DA methods struggle to achieve accuracy, efficiency and uncertainty quantification simultaneously. Here, arXiv cs.LG proposes HLOBA (Hybrid-Ensemble Latent…
Here, arXiv cs.LG proposes HLOBA (Hybrid-Ensemble Latent Observation-Background Assimilation), a three-dimensional hybrid-ensemble DA…
Supernova Explosions Learned by Deep ODE NetworksSELDON: Supernova Explosions Learned by Deep ODE Networks
The discovery rate of optical transients will explode to 10 million public alerts per night once the Vera C.
Rubin Observatory's Legacy Survey of Space and Time comes online, overwhelming the traditional physics-based inference pipelines. A continuous-time forecasting AI model is of interest because it can deliver…
A continuous-time forecasting AI model is of interest because it can deliver millisecond-scale inference for thousands of objects per…
A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS DevelopmentA Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development
WebGIS development requires rigor, yet agentic AI frequently fails due to five large language model (LLM) limitations: context constraints, cross-session forgetting, stochasticity, instruction failure, and adaptation…
arXiv cs.AI proposes a dual-helix governance framework reframing these challenges as structural governance problems that model capacity alone cannot resolve. arXiv cs.AI implements the framework as a 3-track…
Linear-Time Stateful 3D Reconstruction with Test-Time TrainingZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training
Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost that scales quadratically with the number of input images, making…
Sequential-reconstruction approaches reduce this cost but sacrifice reconstruction quality. arXiv cs.AI introduces ZipMap, a stateful feed-forward model that achieves linear-time, bidirectional 3D reconstruction while…
Reasoning-Aware Retrival for Deep Research AgentsAgentIR: Reasoning-Aware Retrival for Deep Research Agents
Deep Research agents are rapidly emerging as primary consumers of modern retrieval systems.
Unlike human users who issue and refine queries without documenting their intermediate thought processes, Deep Research agents generate explicit natural language reasoning before each search call, revealing rich…
To exploit this overlooked signal, arXiv cs.CL introduces: (1) Reasoning-Aware Retrieval, a retrieval paradigm that jointly embeds the…
Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer EconomyTurning Trust to Transactions: Tracking Affiliate Marketing and FTC Compliance in YouTube's Influencer Economy
YouTube has evolved into a powerful platform that where creators monetize their influence through affiliate marketing, raising concerns about transparency and ethics, especially when creators fail to disclose their…
Although regulatory agencies like the US Federal Trade Commission (FTC) have issued guidelines to address these issues, non-compliance and consumer harm persist, and the extent of these problems remains unclear. In…
Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual…TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning
Traditional vision-language models struggle with contrastive fine-grained taxonomic reasoning, particularly when distinguishing between visually similar species within the same genus or family.
arXiv cs.CL introduces TaxonRL, a reinforcement learning approach using Group Relative Policy Optimization with intermediate rewards that decomposes the reasoning process into hierarchical taxonomic predictions. arXiv…
arXiv cs.CL's method incentivizes models to explicitly reason about species-level, genus-level, and family-level features before making…
Real Real-Time Long Video Generation ModelHelios: Real Real-Time Long Video Generation Model
arXiv cs.CV introduces Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline.
arXiv cs.CV make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics such as self-forcing, error-banks, or keyframe sampling; (2) real-time…
Specifically, Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V…
Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian RegularizationRobustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization
As Large Language Models (LLMs) transition into autonomous multi-agent ecosystems, robust minimax training becomes essential yet remains prone to instability when highly non-linear policies induce extreme local…
Standard remedies that enforce global Jacobian bounds are overly conservative, suppressing sensitivity in all directions and inducing a large Price of Robustness. arXiv cs.AI introduces Adversarially-Aligned Jacobian…
arXiv cs.AI introduces Adversarially-Aligned Jacobian Regularization (AAJR), a trajectory-aligned approach that controls sensitivity…
Evaluating Conversational Agents over Unstructured KnowledgeKnowledge: Evaluating Conversational Agents over Unstructured Knowledge
Conversational agents are increasingly deployed in knowledge-intensive settings, where correct behavior depends on retrieving and applying domain-specific knowledge from large, proprietary, and unstructured corpora…
Yet most existing benchmarks evaluate retrieval or tool use independently of each other, creating a gap in realistic, fully agentic evaluation over unstructured data in long-horizon interactions. arXiv cs.CL…
Low-Resource Guidance for Controllable Latent Audio DiffusionLow-Resource Guidance for Controllable Latent Audio Diffusion
Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time controls (\textit{e.g.}, guidance) that can also be…
By examining the bottlenecks of existing guidance-based controls, in particular their high cost-per-step due to decoder backpropagation, arXiv cs.AI introduces a guidance-based approach through selective TFG and…
Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web…Dual-Modality Multi-Stage Adversarial Safety Training: Robustifying Multimodal Web Agents Against Cross-Modal Attacks
Multimodal web agents that process both screenshots and accessibility trees are increasingly deployed to interact with web interfaces, yet their dual-stream architecture opens an underexplored attack surface: an…
arXiv cs.CL's vulnerability analysis on MiniWob++ reveals that attacks including a visual component far outperform text-only injections, exposing critical gaps in text-centric VLM safety training. Motivated by this…
Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point WeightsRobust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights
The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterization of the Unscented Transform (UT).
Conventional weighting schemes, governed by fixed scaling parameters, assume implicit Gaussianity and fail to adapt to time-varying dynamics or heavy-tailed measurement noise. This work introduces the Meta-Adaptive…
This work introduces the Meta-Adaptive UKF (MA-UKF), a framework that reformulates sigma-point weight synthesis as a hyperparameter…
A Concentration-Alignment PerspectiveDissecting Quantization Error: A Concentration-Alignment Perspective
Quantization can drastically increase the efficiency of large language and vision models, but typically incurs an accuracy drop.
Recently, function-preserving transforms (e.g. rotations, Hadamard transform, channel-wise scaling) have been successfully applied to reduce post-training quantization error, yet a principled explanation remains…
arXiv cs.AI analyze linear-layer quantization via the signal-to-quantization-noise ratio (SQNR), showing that for uniform integer…
No stories found
Try adjusting your search or filters