Research Blog

Hello! We are a sub-group of Machine Learning and Statistical Data Analysis Lab (mslab) at The University of Tokyo, with a focus on model evaluation, alignment, and safety.

Posts

CapReward: Penalizing implausibly high pass rates to prevent reward hacking in coding RL

CapReward is a peaked reward function for RL fine-tuning of coding agents: it rewards pass rates up to a known cap and penalizes performance beyond it, breaking the monotonic incentive to game accessible tests.

June 2026
CapCode: Detecting cheating in coding agents with capped, randomized tests

CapCode constructs coding benchmarks whose best non-cheating pass rate is deliberately below 1, turning above-cap performance into a statistical signal of test-gaming while preserving model rankings.

June 2026
CapBencher: Give Your LLM Benchmark a Built-in Alarm for Test-Set Overfitting

CapBencher caps best-possible accuracy by injecting controlled randomness into logically correct answers, enabling public benchmarks that can detect leakage and leaderboard gaming.

February 2026ICML 2026