Introduced in April 2024 in the paper: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models from DeepSeek-AI, Tsinghua University and Peking University. Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm that builds upon Proximal Policy Optimization (PPO) algorithim, an algorithim that was introduced back in 2017 by openAI for reinforcement learning tasks but later proved to be helpful in LLMs through reinforcement learning with human feedback. GRPO is a more simple and efficient algorithm that was initially designed to improve mathematical reasoning capabilities while reducing memory consumption in large language models and in this article we will explore how this all comes to play but before we proceed for those not familiar with Reinforcement learning, in reinforcement learning we have something called the policy, in terms of LLMs, the policy is the model's learned strategy that maps input text (states) to output text (actions). In simple terms the policy is like a brain that tells the model what to do given a certain state. Most of the methods in RL are geared towards optimizing this policy, i.e to a policy that will give us the best actions given a certain state.
In a significant move that signals the growing importance of artificial intelligence infrastructure in the United States, President Donald Trump announced an unprecedented $500 billion joint venture called "Stargate" during a White House briefing on January 21, 2025. The initiative brings together tech giants OpenAI, Oracle, and SoftBank in what could become the largest AI infrastructure project in American history.
The world of artificial intelligence continues to evolve at an unprecedented pace, and at the forefront of this revolution stands DeepSeek V3, the latest innovation from the Chinese AI company, DeepSeek. Released in December 2024, this open-source large language model (LLM) has made waves in the AI community, rivaling even the most advanced closed-source models. In this article, we dive into the features, performance, training methodology, and implications of DeepSeek V3.
Looking for a practical approach to learning deep reinforcement learning (DRL)? This comprehensive course takes you from the fundamentals to advanced concepts through hands-on implementation using Python and PyTorch.