It's because batch size is dynamic. So a different batch size will change the ou...

criemen · 2025-09-21T15:54:05 1758470045

Batch size is dynamic, in MoE apparently the experts chosen depend on the batch (not only your single inference request, which sounds weird to me, but I'm just an end user), no one audited the inference pipeline for floating point nondeterminisms, and I'm not even sure that temperature 0 implies deterministic sampling (the quick math formula I found has e^(1/temp) which means that 0 is not a valid value anyways and would need some dealing with).