1. HW does 2x2 blocks of pixels always so it can have derivatives, even if you don't use them..
2. Accessing SV_PrimitiveID is surprisingly slow on Nvidia/AMD, by writing it out in the PS you will take a huge perf hit in HW. There are ways to work around this, but they aren't trivial and differ between vendors, and you have to be aware of the issue it in the first place! I think some of the "software" > "hardware" raster stuff may come from this.
The HW shader in this demo looks wonky though, it should be writing out the visibility buffer, and instead it is writing out a vec4 with color data, so of course that is going to hurt perf. Way too many varyings being passed down also.
In a high triangle HW rasterizer you want the visibility buffer PS do a little compute as possible, and write as little as possible, so it should only have 1 or 2 input varyings and simply writes them out.
1. HW does 2x2 blocks of pixels always so it can have derivatives, even if you don't use them..
2. Accessing SV_PrimitiveID is surprisingly slow on Nvidia/AMD, by writing it out in the PS you will take a huge perf hit in HW. There are ways to work around this, but they aren't trivial and differ between vendors, and you have to be aware of the issue it in the first place! I think some of the "software" > "hardware" raster stuff may come from this.
The HW shader in this demo looks wonky though, it should be writing out the visibility buffer, and instead it is writing out a vec4 with color data, so of course that is going to hurt perf. Way too many varyings being passed down also.
In a high triangle HW rasterizer you want the visibility buffer PS do a little compute as possible, and write as little as possible, so it should only have 1 or 2 input varyings and simply writes them out.