Max, there's two problems I see with your comment.
1) the paper didn't show a 0% FNR. I mean tables 4, 7, and B.2 are pretty explicit. It's not hard to figure out from the others either.
2) a 0% error rate requires some pretty serious assumptions to be true. For that type of result to not be incredibly suspect requires there to be zero noise in the data, analysis, and at all parts. I do not see that being true of the mentioned dataset.
Even high scores are suspect. Generalizing the previous a score is suspect if it is higher than the noise level. Can you truly attest that this condition is true?
I'm suspect that you're introducing data leakage. I haven't looked enough into your training and data to determine how that's happening but you'll probably need a pretty deep analysis as leakage is really easy to sneak in. It can do so in non obvious ways. A very common one is tuning hyper parameters on test results. You don't have to pass data to pass information. Another sly way for this to happen is that the test set isn't significantly disjoint from the training set. If the perturbation is too small then you aren't testing generalization you're testing a slightly noisy training set (which your training should be introducing noise to help regularize, so you end up just measuring your training performance).
Your numbers are too good and that's suspect. You need a lot more evidence to suggest they mean what you want them to mean.
Find me a clean public dataset with no AI involvement and I will be happy to report Pangram's false positive rate on it.