Depending on the routing function you can figure out all the active experts ahead of the forward pass for a single token and pipeline the expert loading.
Depending on the routing function you can figure out all the active experts ahead of the forward pass for a single token and pipeline the expert loading.