ChatGPT has a giant system prompt that you have no control over. Try using Llama and create a system prompt with clear instructions and examples. If you were going to use a model in a production system you would also want to either fine tune it or train a BERT-like model as a classifier that just outputs a score. Maybe even more than one for ranking along different dimensions.