Skip to main content

How to run an evaluation asynchronously

We can run evaluations asynchronously via the SDK using aevaluate(), which accepts all of the same arguments as evaluate() but expects the application function to be asynchronous. You can learn more about how to use the evaluate() function here.

Python only

This guide is only relevant when using the Python SDK. In JS/TS the evaluate() function is already async. You can see how to use it here.

Use aevaluate()

Requires langsmith>=0.2.0

from langsmith import aevaluate, wrappers, Client
from openai import AsyncOpenAI

# Optionally wrap the OpenAI client to trace all model calls.
oai_client = wrappers.wrap_openai(AsyncOpenAI())

# Optionally add the 'traceable' decorator to trace the inputs/outputs of this function.
@traceable
async def researcher_app(inputs: dict) -> str:
instructions = """You are an excellent researcher. Given a high-level research idea, \

list 5 concrete questions that should be investigated to determine if the idea is worth pursuing."""

response = await oai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": instructions},
{"role": "user", "content": inputs["idea"]},
],
)
return response.choices[0].message.content

# Evaluator functions can be sync or async
def concise(inputs: dict, output: dict) -> bool:
return len(output["output"]) < 3 * len(inputs["idea"])

ls_client = Client()

examples = ["universal basic income", "nuclear fusion", "hyperloop", "nuclear powered rockets"]
dataset = ls_client.create_dataset("research ideas")
ls_client.create_examples(
dataset_name=dataset.name,
inputs=[{"idea": e} for e in examples,
)

results = await aevaluate(
researcher_app,
data=dataset,
evaluators=[concise],
# Optional, no max_concurrency by default but it is recommended to set one.
max_concurrency=2,
experiment_prefix="gpt-4o-mini-baseline" # Optional, random by default.
)

Was this page helpful?


You can leave detailed feedback on GitHub.