OpenAI: Investigating the consequences of accidentally grading CoT during RL