Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Generally, the easiest:

1. Sample a set of prompts / answers from historical usage.

2. Run that through various frontier models again and if they don't agree on some answers, hand-pick what you're looking for.

3. Test different models using OpenRouter and score each along cost / speed / accuracy dimensions against your test set.

4. Analyze the results and pick the best, then prompt-optimize to make it even better. Repeat as needed.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: