Reference/SDK

braintrust

An isomorphic JS library for logging data to Braintrust. braintrust is distributed as a library on NPM. It is also open source and available on GitHub.

Quickstart

Install the library with npm (or yarn).

npm install braintrust

Then, create a file like hello.eval.ts with the following content:

import { Eval } from "braintrust";
 
function isEqual({ output, expected }: { output: string; expected?: string }) {
  return { name: "is_equal", score: output === expected ? 1 : 0 };
}
 
Eval("Say Hi Bot", {
  data: () => {
    return [
      {
        input: "Foo",
        expected: "Hi Foo",
      },
      {
        input: "Bar",
        expected: "Hello Bar",
      },
    ]; // Replace with your eval dataset
  },
  task: (input: string) => {
    return "Hi " + input; // Replace with your LLM call
  },
  scores: [isEqual],
});

Finally, run the script with npx braintrust eval hello.eval.ts.

BRAINTRUST_API_KEY=<YOUR_BRAINTRUST_API_KEY> npx braintrust eval hello.eval.ts

Classes

Interfaces

Functions

BaseExperiment

BaseExperiment<Input, Expected, Metadata>(options?): BaseExperiment<Input, Expected, Metadata>

Use this to specify that the dataset should actually be the data from a previous (base) experiment. If you do not specify a name, Braintrust will automatically figure out the best base experiment to use based on your git history (or fall back to timestamps).

Type parameters

NameType
Inputunknown
Expectedunknown
Metadataextends BaseMetadata = void

Parameters

NameTypeDescription
optionsObject
options.name?stringThe name of the base experiment to use. If unspecified, Braintrust will automatically figure out the best base using your git history (or fall back to timestamps).

Returns

BaseExperiment<Input, Expected, Metadata>


Eval

Eval<Input, Output, Expected, Metadata, EvalReport>(name, evaluator, reporterOrOpts?): Promise<EvalResultWithSummary<Input, Output, Expected, Metadata>>

Type parameters

NameType
InputInput
OutputOutput
Expectedvoid
Metadataextends BaseMetadata = void
EvalReportboolean

Parameters

NameType
namestring
evaluatorEvaluator<Input, Output, Expected, Metadata>
reporterOrOpts?string | ReporterDef<EvalReport> | EvalOptions<EvalReport>

Returns

Promise<EvalResultWithSummary<Input, Output, Expected, Metadata>>


Reporter

Reporter<EvalReport>(name, reporter): ReporterDef<EvalReport>

Type parameters

Name
EvalReport

Parameters

NameType
namestring
reporterReporterBody<EvalReport>

Returns

ReporterDef<EvalReport>


_internalGetGlobalState

_internalGetGlobalState(): BraintrustState

Returns

BraintrustState


_internalSetInitialState

_internalSetInitialState(): void

Returns

void


buildLocalSummary

buildLocalSummary(evaluator, results): ExperimentSummary

Parameters

NameType
evaluatorEvaluatorDef<any, any, any, any>
resultsEvalResult<any, any, any, any>[]

Returns

ExperimentSummary


createFinalValuePassThroughStream

createFinalValuePassThroughStream<T>(onFinal): TransformStream<T, BraintrustStreamChunk>

Create a stream that passes through the final value of the stream. This is used to implement BraintrustStream.finalValue().

Type parameters

NameType
Textends string | { data: string ; type: "text_delta" } | { data: string ; type: "json_delta" } | Uint8Array

Parameters

NameTypeDescription
onFinal(result: unknown) => voidA function to call with the final value of the stream.

Returns

TransformStream<T, BraintrustStreamChunk>

A new stream that passes through the final value of the stream.


currentExperiment

currentExperiment(options?): Experiment | undefined

Returns the currently-active experiment (set by braintrust.init). Returns undefined if no current experiment has been set.

Parameters

NameType
options?OptionalStateArg

Returns

Experiment | undefined


currentLogger

currentLogger<IsAsyncFlush>(options?): Logger<IsAsyncFlush> | undefined

Returns the currently-active logger (set by braintrust.initLogger). Returns undefined if no current logger has been set.

Type parameters

NameType
IsAsyncFlushextends boolean

Parameters

NameType
options?AsyncFlushArg<IsAsyncFlush> & OptionalStateArg

Returns

Logger<IsAsyncFlush> | undefined


currentSpan

currentSpan(options?): Span

Return the currently-active span for logging (set by one of the traced methods). If there is no active span, returns a no-op span object, which supports the same interface as spans but does no logging.

See Span for full details.

Parameters

NameType
options?OptionalStateArg

Returns

Span


devNullWritableStream

devNullWritableStream(): WritableStream

Returns

WritableStream


flush

flush(options?): Promise<void>

Flush any pending rows to the server.

Parameters

NameType
options?OptionalStateArg

Returns

Promise<void>


getSpanParentObject

getSpanParentObject<IsAsyncFlush>(options?): Span | Experiment | Logger<IsAsyncFlush>

Mainly for internal use. Return the parent object for starting a span in a global context.

Type parameters

NameType
IsAsyncFlushextends boolean

Parameters

NameType
options?AsyncFlushArg<IsAsyncFlush> & OptionalStateArg

Returns

Span | Experiment | Logger<IsAsyncFlush>


init

init<IsOpen>(options): InitializedExperiment<IsOpen>

Log in, and then initialize a new experiment in a specified project. If the project does not exist, it will be created.

Type parameters

NameType
IsOpenextends boolean = false

Parameters

NameTypeDescription
optionsReadonly<FullInitOptions<IsOpen>>Options for configuring init().

Returns

InitializedExperiment<IsOpen>

The newly created Experiment.

init<IsOpen>(project, options?): InitializedExperiment<IsOpen>

Legacy form of init which accepts the project name as the first parameter, separately from the remaining options. See init(options) for full details.

Type parameters

NameType
IsOpenextends boolean = false

Parameters

NameType
projectstring
options?Readonly<InitOptions<IsOpen>>

Returns

InitializedExperiment<IsOpen>


initDataset

initDataset<IsLegacyDataset>(options): Dataset<IsLegacyDataset>

Create a new dataset in a specified project. If the project does not exist, it will be created.

Type parameters

NameType
IsLegacyDatasetextends boolean = false

Parameters

NameTypeDescription
optionsReadonly<FullInitDatasetOptions<IsLegacyDataset>>Options for configuring initDataset().

Returns

Dataset<IsLegacyDataset>

The newly created Dataset.

initDataset<IsLegacyDataset>(project, options?): Dataset<IsLegacyDataset>

Legacy form of initDataset which accepts the project name as the first parameter, separately from the remaining options. See initDataset(options) for full details.

Type parameters

NameType
IsLegacyDatasetextends boolean = false

Parameters

NameType
projectstring
options?Readonly<InitDatasetOptions<IsLegacyDataset>>

Returns

Dataset<IsLegacyDataset>


initExperiment

initExperiment<IsOpen>(options): InitializedExperiment<IsOpen>

Alias for init(options).

Type parameters

NameType
IsOpenextends boolean = false

Parameters

NameType
optionsReadonly<InitOptions<IsOpen>>

Returns

InitializedExperiment<IsOpen>

initExperiment<IsOpen>(project, options?): InitializedExperiment<IsOpen>

Alias for init(project, options).

Type parameters

NameType
IsOpenextends boolean = false

Parameters

NameType
projectstring
options?Readonly<InitOptions<IsOpen>>

Returns

InitializedExperiment<IsOpen>


initLogger

initLogger<IsAsyncFlush>(options?): Logger<IsAsyncFlush>

Create a new logger in a specified project. If the project does not exist, it will be created.

Type parameters

NameType
IsAsyncFlushextends boolean = false

Parameters

NameTypeDescription
optionsReadonly<InitLoggerOptions<IsAsyncFlush>>Additional options for configuring init().

Returns

Logger<IsAsyncFlush>

The newly created Logger.


invoke

invoke<Input, Output, Stream>(args): Promise<InvokeReturn<Stream, Output>>

Invoke a Braintrust function, returning a BraintrustStream or the value as a plain Javascript object.

Type parameters

NameType
InputInput
OutputOutput
Streamextends boolean = false

Parameters

NameTypeDescription
argsInvokeFunctionArgs<Input, Output, Stream> & LoginOptions & { forceLogin?: boolean }The arguments for the function (see InvokeFunctionArgs for more details).

Returns

Promise<InvokeReturn<Stream, Output>>

The output of the function.


loadPrompt

loadPrompt(options): Promise<Prompt>

Load a prompt from the specified project.

Parameters

NameTypeDescription
optionsLoadPromptOptionsOptions for configuring loadPrompt().

Returns

Promise<Prompt>

The prompt object.

Throws

If the prompt is not found.

Throws

If multiple prompts are found with the same slug in the same project (this should never happen).

Example

const prompt = await loadPrompt({
  projectName: "My Project",
  slug: "my-prompt",
});

log

log(event): string

Log a single event to the current experiment. The event will be batched and uploaded behind the scenes.

Parameters

NameTypeDescription
eventExperimentLogFullArgsThe event to log. See Experiment.log for full details.

Returns

string

The id of the logged event.


logError

logError(span, error): void

Parameters

NameType
spanSpan
errorunknown

Returns

void


login

login(options?): Promise<BraintrustState>

Log into Braintrust. This will prompt you for your API token, which you can find at https://www.braintrust.dev/app/token. This method is called automatically by init().

Parameters

NameTypeDescription
optionsLoginOptions & { forceLogin?: boolean }Options for configuring login().

Returns

Promise<BraintrustState>


loginToState

loginToState(options?): Promise<BraintrustState>

Parameters

NameType
optionsLoginOptions

Returns

Promise<BraintrustState>


newId

newId(): string

Returns

string


parseCachedHeader

parseCachedHeader(value): number | undefined

Parameters

NameType
valueundefined | null | string

Returns

number | undefined


reportFailures

reportFailures<Input, Output, Expected, Metadata>(evaluator, failingResults, «destructured»): void

Type parameters

NameType
InputInput
OutputOutput
ExpectedExpected
Metadataextends BaseMetadata

Parameters

NameType
evaluatorEvaluatorDef<Input, Output, Expected, Metadata>
failingResultsEvalResult<Input, Output, Expected, Metadata>[]
«destructured»ReporterOpts

Returns

void


setFetch

setFetch(fetch): void

Set the fetch implementation to use for requests. You can specify it here, or when you call login.

Parameters

NameTypeDescription
fetch(input: URL | RequestInfo, init?: RequestInit) => Promise<Response>(input: string | URL | Request, init?: RequestInit) => Promise<Response>MDN Reference

Returns

void


startSpan

startSpan<IsAsyncFlush>(args?): Span

Lower-level alternative to traced. This allows you to start a span yourself, and can be useful in situations where you cannot use callbacks. However, spans started with startSpan will not be marked as the "current span", so currentSpan() and traced() will be no-ops. If you want to mark a span as current, use traced instead.

See traced for full details.

Type parameters

NameType
IsAsyncFlushextends boolean = false

Parameters

NameType
args?StartSpanArgs & AsyncFlushArg<IsAsyncFlush> & OptionalStateArg

Returns

Span


summarize

summarize(options?): Promise<ExperimentSummary>

Summarize the current experiment, including the scores (compared to the closest reference experiment) and metadata.

Parameters

NameTypeDescription
optionsObjectOptions for summarizing the experiment.
options.comparisonExperimentId?stringThe experiment to compare against. If None, the most recent experiment on the origin's main branch will be used.
options.summarizeScores?booleanWhether to summarize the scores. If False, only the metadata will be returned.

Returns

Promise<ExperimentSummary>

A summary of the experiment, including the scores (compared to the closest reference experiment) and metadata.


traceable

traceable<F, IsAsyncFlush>(fn, args?): IsAsyncFlush extends false ? (...args: Parameters<F>) => Promise<Awaited<ReturnType<F>>> : F

A synonym for wrapTraced. If you're porting from systems that use traceable, you can use this to make your codebase more consistent.

Type parameters

NameType
Fextends (...args: any[]) => any
IsAsyncFlushextends boolean = false

Parameters

NameType
fnF
args?StartSpanArgs & SetCurrentArg & AsyncFlushArg<IsAsyncFlush>

Returns

IsAsyncFlush extends false ? (...args: Parameters<F>) => Promise<Awaited<ReturnType<F>>> : F


traced

traced<IsAsyncFlush, R>(callback, args?): PromiseUnless<IsAsyncFlush, R>

Toplevel function for starting a span. It checks the following (in precedence order):

  • Currently-active span
  • Currently-active experiment
  • Currently-active logger

and creates a span under the first one that is active. Alternatively, if parent is specified, it creates a span under the specified parent row. If none of these are active, it returns a no-op span object.

See Span.traced for full details.

Type parameters

NameType
IsAsyncFlushextends boolean = false
Rvoid

Parameters

NameType
callback(span: Span) => R
args?StartSpanArgs & SetCurrentArg & AsyncFlushArg<IsAsyncFlush> & OptionalStateArg

Returns

PromiseUnless<IsAsyncFlush, R>


updateSpan

updateSpan(«destructured»): void

Update a span using the output of span.export(). It is important that you only resume updating to a span once the original span has been fully written and flushed, since otherwise updates to the span may conflict with the original span.

Parameters

NameType
«destructured»{ exported: string } & Omit<Partial<ExperimentEvent>, "id"> & OptionalStateArg

Returns

void


withDataset

withDataset<R, IsLegacyDataset>(project, callback, options?): R

This function is deprecated. Use initDataset instead.

Type parameters

NameType
RR
IsLegacyDatasetextends boolean = false

Parameters

NameType
projectstring
callback(dataset: Dataset<IsLegacyDataset>) => R
optionsReadonly<InitDatasetOptions<IsLegacyDataset>>

Returns

R


withExperiment

withExperiment<R>(project, callback, options?): R

This function is deprecated. Use init instead.

Type parameters

Name
R

Parameters

NameType
projectstring
callback(experiment: Experiment) => R
optionsReadonly<LoginOptions & { forceLogin?: boolean } & { baseExperiment?: string ; baseExperimentId?: string ; dataset?: AnyDataset ; description?: string ; experiment?: string ; gitMetadataSettings?: { collect: "some" | "none" | "all" ; fields?: ("dirty" | "tag" | "commit" | "branch" | "author_name" | "author_email" | "commit_message" | "commit_time" | "git_diff")[] } ; isPublic?: boolean ; metadata?: Record<string, unknown> ; projectId?: string ; repoInfo?: { author_email?: null | string ; author_name?: null | string ; branch?: null | string ; commit?: null | string ; commit_message?: null | string ; commit_time?: null | string ; dirty?: null | boolean ; git_diff?: null | string ; tag?: null | string } ; setCurrent?: boolean ; state?: BraintrustState ; update?: boolean } & InitOpenOption<false> & SetCurrentArg>

Returns

R


withLogger

withLogger<IsAsyncFlush, R>(callback, options?): R

This function is deprecated. Use initLogger instead.

Type parameters

NameType
IsAsyncFlushextends boolean = false
Rvoid

Parameters

NameType
callback(logger: Logger<IsAsyncFlush>) => R
optionsReadonly<LoginOptions & { forceLogin?: boolean } & { projectId?: string ; projectName?: string ; setCurrent?: boolean ; state?: BraintrustState } & AsyncFlushArg<IsAsyncFlush> & SetCurrentArg>

Returns

R


wrapAISDKModel

wrapAISDKModel<T>(model): T

Wrap an ai-sdk model (created with .chat(), .completion(), etc.) to add tracing. If Braintrust is not configured, this is a no-op

Type parameters

NameType
Textends object

Parameters

NameType
modelT

Returns

T

The wrapped object.


wrapOpenAI

wrapOpenAI<T>(openai): T

Wrap an OpenAI object (created with new OpenAI(...)) to add tracing. If Braintrust is not configured, this is a no-op

Currently, this only supports the v4 API.

Type parameters

NameType
Textends object

Parameters

NameType
openaiT

Returns

T

The wrapped OpenAI object.


wrapOpenAIv4

wrapOpenAIv4<T>(openai): T

Type parameters

NameType
Textends OpenAILike

Parameters

NameType
openaiT

Returns

T


wrapTraced

wrapTraced<F, IsAsyncFlush>(fn, args?): IsAsyncFlush extends false ? (...args: Parameters<F>) => Promise<Awaited<ReturnType<F>>> : F

Wrap a function with traced, using the arguments as input and return value as output. Any functions wrapped this way will automatically be traced, similar to the @traced decorator in Python. If you want to correctly propagate the function's name and define it in one go, then you can do so like this:

const myFunc = wrapTraced(async function myFunc(input) {
  const result = await client.chat.completions.create({
    model: "gpt-3.5-turbo",
    messages: [{ role: "user", content: input }],
  });
  return result.choices[0].message.content ?? "unknown";
});

Now, any calls to myFunc will be traced, and the input and output will be logged automatically. If tracing is inactive, i.e. there is no active logger or experiment, it's just a no-op.

Type parameters

NameType
Fextends (...args: any[]) => any
IsAsyncFlushextends boolean = false

Parameters

NameTypeDescription
fnFThe function to wrap.
args?StartSpanArgs & SetCurrentArg & AsyncFlushArg<IsAsyncFlush>Span-level arguments (e.g. a custom name or type) to pass to traced.

Returns

IsAsyncFlush extends false ? (...args: Parameters<F>) => Promise<Awaited<ReturnType<F>>> : F

The wrapped function.

Type Aliases

AnyDataset

Ƭ AnyDataset: Dataset<boolean>


BaseExperiment

Ƭ BaseExperiment<Input, Expected, Metadata>: Object

Type parameters

NameType
InputInput
ExpectedExpected
Metadataextends BaseMetadata = DefaultMetadataType

Type declaration

NameType
_phantom?[Input, Expected, Metadata]
_type"BaseExperiment"
name?string

BaseMetadata

Ƭ BaseMetadata: Record<string, unknown> | void


BraintrustStreamChunk

Ƭ BraintrustStreamChunk: z.infer<typeof braintrustStreamChunkSchema>

A chunk of data from a Braintrust stream. Each chunk type matches an SSE event type.


ChatPrompt

Ƭ ChatPrompt: Object

Type declaration

NameType
messagesOpenAIMessage[]
tools?any[]

CommentEvent

Ƭ CommentEvent: IdField & { _audit_metadata?: Record<string, unknown> ; _audit_source: Source ; comment: { text: string } ; created: string ; origin: { id: string } } & ParentExperimentIds | ParentProjectLogIds


CompiledPrompt

Ƭ CompiledPrompt<Flavor>: CompiledPromptParams & { span_info?: { metadata: { prompt: { id: string ; project_id: string ; variables: Record<string, unknown> ; version: string } } ; name?: string ; spanAttributes?: Record<any, any> } } & Flavor extends "chat" ? ChatPrompt : Flavor extends "completion" ? CompletionPrompt : {}

Type parameters

NameType
Flavorextends "chat" | "completion"

CompiledPromptParams

Ƭ CompiledPromptParams: Omit<NonNullable<PromptData["options"]>["params"], "use_cache"> & { model: NonNullable<NonNullable<PromptData["options"]>["model"]> }


CompletionPrompt

Ƭ CompletionPrompt: Object

Type declaration

NameType
promptstring

DatasetRecord

Ƭ DatasetRecord<IsLegacyDataset>: IsLegacyDataset extends true ? LegacyDatasetRecord : NewDatasetRecord

Type parameters

NameType
IsLegacyDatasetextends boolean = typeof DEFAULT_IS_LEGACY_DATASET

DefaultMetadataType

Ƭ DefaultMetadataType: void


DefaultPromptArgs

Ƭ DefaultPromptArgs: Partial<CompiledPromptParams & AnyModelParam & ChatPrompt & CompletionPrompt>


EndSpanArgs

Ƭ EndSpanArgs: Object

Type declaration

NameType
endTime?number

EvalCase

Ƭ EvalCase<Input, Expected, Metadata>: { input: Input ; tags?: string[] } & Expected extends void ? {} : { expected: Expected } & Metadata extends void ? {} : { metadata: Metadata }

Type parameters

Name
Input
Expected
Metadata

EvalScorerArgs

Ƭ EvalScorerArgs<Input, Output, Expected, Metadata>: EvalCase<Input, Expected, Metadata> & { output: Output }

Type parameters

NameType
InputInput
OutputOutput
ExpectedExpected
Metadataextends BaseMetadata = DefaultMetadataType

EvalTask

Ƭ EvalTask<Input, Output>: (input: Input, hooks: EvalHooks) => Promise<Output> | (input: Input, hooks: EvalHooks) => Output

Type parameters

Name
Input
Output

ExperimentLogFullArgs

Ƭ ExperimentLogFullArgs: Partial<Omit<OtherExperimentLogFields, "output" | "scores">> & Required<Pick<OtherExperimentLogFields, "output" | "scores">> & Partial<InputField | InputsField> & Partial<IdField>


ExperimentLogPartialArgs

Ƭ ExperimentLogPartialArgs: Partial<OtherExperimentLogFields> & Partial<InputField | InputsField>


FullInitOptions

Ƭ FullInitOptions<IsOpen>: { project?: string } & InitOptions<IsOpen>

Type parameters

NameType
IsOpenextends boolean

FullLoginOptions

Ƭ FullLoginOptions: LoginOptions & { forceLogin?: boolean }


IdField

Ƭ IdField: Object

Type declaration

NameType
idstring

InitOptions

Ƭ InitOptions<IsOpen>: FullLoginOptions & { baseExperiment?: string ; baseExperimentId?: string ; dataset?: AnyDataset ; description?: string ; experiment?: string ; gitMetadataSettings?: GitMetadataSettings ; isPublic?: boolean ; metadata?: Record<string, unknown> ; projectId?: string ; repoInfo?: RepoInfo ; setCurrent?: boolean ; state?: BraintrustState ; update?: boolean } & InitOpenOption<IsOpen>

Type parameters

NameType
IsOpenextends boolean

InputField

Ƭ InputField: Object

Type declaration

NameType
inputunknown

InputsField

Ƭ InputsField: Object

Type declaration

NameType
inputsunknown

InvokeReturn

Ƭ InvokeReturn<Stream, Output>: Stream extends true ? BraintrustStream : Output

The return type of the invoke function. Conditionally returns a BraintrustStream if stream is true, otherwise returns the output of the function using the Zod schema's type if present.

Type parameters

NameType
Streamextends boolean
OutputOutput

LogCommentFullArgs

Ƭ LogCommentFullArgs: IdField & { _audit_metadata?: Record<string, unknown> ; _audit_source: Source ; comment: { text: string } ; created: string ; origin: { id: string } } & ParentExperimentIds | ParentProjectLogIds


LogFeedbackFullArgs

Ƭ LogFeedbackFullArgs: IdField & Partial<Omit<OtherExperimentLogFields, "output" | "metrics" | "datasetRecordId"> & { comment: string ; source: Source }>


OtherExperimentLogFields

Ƭ OtherExperimentLogFields: Object

Type declaration

NameType
datasetRecordIdstring
errorunknown
expectedunknown
metadataRecord<string, unknown>
metricsRecord<string, unknown>
outputunknown
scoresRecord<string, number | null>
tagsstring[]

PromiseUnless

Ƭ PromiseUnless<B, R>: B extends true ? R : Promise<Awaited<R>>

Type parameters

Name
B
R

SerializedBraintrustState

Ƭ SerializedBraintrustState: z.infer<typeof loginSchema>


SetCurrentArg

Ƭ SetCurrentArg: Object

Type declaration

NameType
setCurrent?boolean

StartSpanArgs

Ƭ StartSpanArgs: Object

Type declaration

NameType
event?StartSpanEventArgs
name?string
parent?string
spanAttributes?Record<any, any>
startTime?number
type?SpanType

WithTransactionId

Ƭ WithTransactionId<R>: R & { _xact_id: TransactionId }

Type parameters

Name
R

Variables

LEGACY_CACHED_HEADER

Const LEGACY_CACHED_HEADER: "x-cached"


NOOP_SPAN

Const NOOP_SPAN: NoopSpan


X_CACHED_HEADER

Const X_CACHED_HEADER: "x-bt-cached"


braintrustStreamChunkSchema

Const braintrustStreamChunkSchema: ZodUnion<[ZodObject<{ data: ZodString ; type: ZodLiteral<"text_delta"> }, "strip", ZodTypeAny, { data: string ; type: "text_delta" }, { data: string ; type: "text_delta" }>, ZodObject<{ data: ZodString ; type: ZodLiteral<"json_delta"> }, "strip", ZodTypeAny, { data: string ; type: "json_delta" }, { data: string ; type: "json_delta" }>]>