Research Blog
Hello! We are a sub-group of Machine Learning and Statistical Data Analysis Lab (mslab) at The University of Tokyo, with a focus on model evaluation, alignment, and safety.
CapReward: Penalizing implausibly high pass rates to prevent reward hacking in coding RL
CapReward is a peaked reward function for RL fine-tuning of coding agents: it rewards pass rates up to a known cap and penalizes performance beyond it, breaking the monotonic incentive to game accessible tests.
June 2026
CapCode: Detecting cheating in coding agents with capped, randomized tests
CapCode constructs coding benchmarks whose best non-cheating pass rate is deliberately below 1, turning above-cap performance into a statistical signal of test-gaming while preserving model rankings.
June 2026
CapBencher: Give Your LLM Benchmark a Built-in Alarm for Test-Set Overfitting
CapBencher caps best-possible accuracy by injecting controlled randomness into logically correct answers, enabling public benchmarks that can detect leakage and leaderboard gaming.
February 2026ICML 2026