Academic or research source. Check the methodology, sample size, and whether it's been replicated.
SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer
Modern offline Reinforcement Learning (RL) methods find performant actor-critics, however, fine-tuning these actor-critics online with value-based RL algorithms typically causes immediate drops in performance.
SMAC: Score-Matched Actor-Critics for Robust Offline-to-Online Transfer
TLDR
Modern offline Reinforcement Learning (RL) methods find performant actor-critics, however, fine-tuning these actor-critics online with value-based RL algorithms typically causes immediate drops in performance.