Skip to main content

Testsets

A testset is a versioned bag of testcases: an artifact you commit rows into, where each commit produces an immutable revision. Evaluations pin a specific testset revision, so the same run keeps replaying the same inputs even after you add or change rows.

Testsets are versioned. For commit semantics, include_archived, and how revision IDs stay stable, see Versioning. The rest of this page covers what is specific to testsets.

Body

A testset revision commits an ordered list of testcase IDs plus commit metadata. The testcases themselves are immutable blobs scoped to the testset.

A testcase payload:

{
"id": "e44bf373-44ec-52ce-aa5c-a3bd982e783b",
"testset_id": "019d9ca2-0000-0000-0000-000000000000",
"data": {
"country": "Germany",
"capital": "Berlin",
"testcase_dedup_id": "gr-001"
}
}

The schema of a testcase is implicit in the keys of data. There is no separate column list. Adding a column means committing rows that have the new key.

Dedup IDs

A testcase can carry a caller-supplied testcase_dedup_id (inside data) that the server uses to identify the same row across re-imports and edits. If you don't provide one, the server generates one from the content hash.

CSV and older JSON uploads also accept a top-level __dedup_id__ field (deprecated). The server normalises it into data.testcase_dedup_id on its way in.

Testcase immutability

A testcase is content-addressed. Its id is a hash of data plus the testset_id (plus an optional salt), so two testcases with the same content in the same testset collapse to the same row.

Because a testcase is a blob, editing a row does not mutate the existing testcase. The commit writes a new blob with a new ID, then creates a new revision whose testcase list points at the new ID. The old blob is still reachable by its original ID. That is why revision history is reproducible: a revision is just a list of pointers to blobs that never change.

Import and export

Testcases move in and out of a testset as CSV or JSON arrays.

EndpointAction
POST /testsets/revisions/{id}/uploadCommit a new revision from a CSV or JSON file.
POST /testsets/revisions/{id}/downloadDownload the revision's testcases as CSV or JSON.
POST /simple/testsets/uploadCreate a new testset and seed its first revision in one call.
POST /simple/testsets/{id}/uploadCommit a new revision on an existing simple testset.

Each row may include __id__ (preserve an existing testcase ID), __flags__, __tags__, and __meta__. Empty metadata columns are dropped from exports.

Retrieve semantics

POST /testsets/revisions/retrieve resolves a testset, variant, or revision reference to a single revision. Two options control what comes back:

  • include_testcases defaults to true. Full testcase objects are returned unless you opt out.
  • include_testcase_ids defaults to true. The ordered ID list is returned unless you opt out.

For instance, to retrieve a revision with only the ordered ID list and no testcase bodies:

curl -X POST "$AGENTA_HOST/api/testsets/revisions/retrieve" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset_ref": {"id": "019d9ca1-0000-0000-0000-000000000000"},
"include_testcases": false
}'
info

This is the opposite default of POST /queries/revisions/retrieve (see Query Pattern), where include_traces defaults to false because trace materialisation is expensive. Keep the asymmetry in mind when you build clients that consume both.

Simple endpoints

The /simple/testsets/ surface collapses the artifact, variant, and latest revision into one flat record with its testcases merged in:

curl "$AGENTA_HOST/api/simple/testsets/019d9ca1-0000-0000-0000-000000000000" \
-H "Authorization: ApiKey $AGENTA_API_KEY"

Use it when you want the "current testset" without tracking lineage. For commits, forks, or specific-revision retrieval, use /testsets/, /testsets/variants/, and /testsets/revisions/. See Simple Endpoints for the general pattern.

Relationship to evaluations

Evaluations are run against a testset. Each evaluation pins a specific testset_revision_id when the run is configured. Committing a new revision on the testset does not retroactively change a pinned run.

Example

Create a testset with two rows, add a row, and retrieve the latest revision.

# 1. Create a simple testset with seed rows.
curl -X POST "$AGENTA_HOST/api/simple/testsets/" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset": {
"slug": "country-capitals",
"name": "country-capitals",
"data": {
"testcases": [
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Japan", "capital": "Tokyo"}}
]
}
}
}'

The response includes id, revision_id, and the testcase IDs that were created.

# 2. Add a row as a new revision using the structured commit endpoint.
curl -X POST "$AGENTA_HOST/api/testsets/revisions/commit" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{
"testset_revision_commit": {
"testset_id": "019d9ca1-0000-0000-0000-000000000000",
"testset_revision_id": "019d9ca1-0000-0000-0000-000000000001",
"message": "Add Brazil",
"delta": {
"rows": {
"add": [{"data": {"country": "Brazil", "capital": "Brasilia"}}]
}
}
}
}'
# 3. Retrieve the latest revision of the testset.
curl -X POST "$AGENTA_HOST/api/testsets/revisions/retrieve" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey $AGENTA_API_KEY" \
-d '{"testset_ref": {"id": "019d9ca1-0000-0000-0000-000000000000"}}'
info

An evaluation that pinned revision 019d9ca1-0000-0000-0000-000000000001 keeps replaying the original two rows even though the testset now has three.

Lifecycle

Testsets, variants, and revisions are soft-deleted. Use POST /testsets/{id}/archive and POST /testsets/{id}/unarchive to flip deleted_at. The same pattern applies at the variant and revision level. Archiving a testset hides it from /query responses unless the request sets include_archived: true. See Versioning.