Continuous Prompt Evaluation

Rank The Best LLMs For Your AI Needs

Evaluate prompts across every major LLM in real time, so that your team makes performance driven decisions that scale with your needs.

0 of 150

We'll email you when TDPrompt is publicly online.

Why choose TDPrompt?

Adaptability at
speed

Stay agile, identify and switch to the optimal LLM as performance shifts, without disrupting your workflow.

Know which LLM
wins

Obtain prompt accuracy and performance metrics that surpass the standard benchmarks metrics .

Quality that
keeps pace

Seamless transition your entire QA Team to an AI Eval Engineering Team.

Prompt Evaluation features ready for you

Real time
model ranking

Take instant decisions based on your live LLM ranking, comparing models on prompt accuracy, latency, token consumption, and cost.

Continuous
performance monitoring

Continuously evaluate your LLM prompt performance, onboard new market models and seamlessly update to new versions.

Advanced reporting
and dashboards

Compare models and share insights with stakeholders through intuitive dashboards, customizable queries and detailed reports.

Built on Mocha
testing framework

Leverage the reliability and familiarity of battle tested frameworks your engineering team already knows and trusts.

What can you test with TDPrompt?

From basic LLM prompts to the more complex scenarios, the TDPrompt library allow your test suites to be evaluated towards many LLMs and generate evaluation metrics at in a single step.

Examples

Minimal TDPrompt setup, initialize the library with the needed models, send a "request for arithmetic operations" prompt, and verify the LLM response contains the numeric result.

test/plain-vanilla.js

Sample for a restaurant order agent, it initialize a chat conversation (system + user messages), prompt the LLM with a complex, rule-driven scenario (include business rules, constraints and required fields), and assert the model returns a strict JSON object with the initial ordered items.

test/restaurant-order-creation.js

Validate the restaurant order prompts by iterating the chat conversation and asking for updates to the initial order, then assert the model returns a strict JSON object with the updated items reflecting the current state.

test/restaurant-order-update.js

This example demonstrates prompt-accuracy testing by verifying that the returned order accurately represents the most recent state: it asserts that each item is present, that quantities match the expected values, and that any updates or modifications are correctly reflected in the model's JSON response.

test/scenario-restaurant-order-validate-state.js

Evaluation Metrics

Simply execute your testing task as always, no reinvention of the wheel is needed.

test/run-tdprompt-test.sh

Generate Custome LLM Ranking reports with detailed prompt evaluation metrics, including accuracy, latency, token usage, and cost analysis.

test/run-tdprompt-report.sh