Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's because batch size is dynamic. So a different batch size will change the output even on temp 0.


Batch size is dynamic, in MoE apparently the experts chosen depend on the batch (not only your single inference request, which sounds weird to me, but I'm just an end user), no one audited the inference pipeline for floating point nondeterminisms, and I'm not even sure that temperature 0 implies deterministic sampling (the quick math formula I found has e^(1/temp) which means that 0 is not a valid value anyways and would need some dealing with).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: