OpenAI: Investigating the consequences of accidentally grading CoT during RL alignment.openai.com 2 points by pretext 11 hours ago