Glossary

A - C - G - I - M - P - R - S - T - U

A

AI Proxy

The Scorecard AI Proxy is a feature that allows multiple LLM APIs to be managed using a standardized format. The AI Proxy makes it easy to switch between models on different platforms (e.g. Azure, OpenAI, Cohere) without rewriting code.

C

Cookbooks

The Scorecard Cookbooks are code examples showing how to implement Scorecard end-to-end. In contrast to guides, the cookbooks implement a certain use case from beginning to end.

Context

A context is an optional input field of a Testcase that provides additional background information to the user query input field. A context is particularly useful in applications that require access to a knowledge base or dialog history.

G

Guides

The Scorecard Guides are step-by-step guides to learn how to use Scorecard. A specific action with the Scorecard platform will be explained and in a guided experience the Guides help users to achieve a certain goal.

I

Ideal Response

An ideal response is an optional input field of a Testcase that defines the ideal response to a given user query. The ideal response can be specific text, structured data, or criteria that the LLM must meet.

M

Metric

A metric is a standard of measurement. In LLM evaluation, a metric serves as a benchmark for assessing the quality of LLM responses. When defining metrics in Scorecard, users can choose between already defined Core Metrics (including metrics from MLflow and RAGAS) or custom metrics.

P

Playground

The Scorecard Playground is a tool for quickly trying out and iterating on prompts or models by running single Testcases or entire Testsets in a mock environment. It allows users to evaluate and refine their configurations without executing their production LLM system.

Prompt

When it comes to large language models (LLMs), a prompt refers to the specific instructions a user provides to produce an expected output.

R

Retrieval Augmented Generation (RAG)

RAG stands for Retrieval Augmented Generation, a technique in AI where a model retrieves relevant information from a database to help generate more accurate and informative responses. It combines the strengths of information retrieval and text generation.

Run

A Run in Scorecard refers to the process of evaluating a Testset against defined metrics or a defined Scoring Config. The results of a Scorecard Run can be inspected under “Runs & Results” in the Scorecard UI.

S

Scorecard API

The Scorecard API is the application programming interface (short: API) of the Scorecard platform and offers various API endpoints to integrate with.

Scorecard platform

The Scorecard platform includes the Scorecard UI, the Scorecard SDKs, and the Scorecard API.

Scorecard SDK

Scorecard offers currently a Software development kit (short: SDK) in the two programming languages Python and Node.js. The SDKs help you to get Scorecard up and running in no time and provide a user-friendly development experience.

Scorecard UI

The Scorecard UI is the graphical user interface of the Scorecard platform that provides amongst others the Scorecard Playground and dashboards to visualize LLM evaluation results. You can access the Scorecard UI here.

Scoring Config

A Scorecard Scoring Config is a collection of metrics that are used together to score an LLM application. Instead of manually selecting multiple metrics to score an LLM application every time for a particular use case, defining this set of metrics makes it easy to repeatedly score in the same way with multiple metrics. Different Scoring Configs can serve different purposes by ensuring consistency in testing, e.g. a Scoring Config for RAG applications, a Scoring Config for translation use cases, etc.

Span

Spans represent a specific operation or sub-activity of a trace and are a more focused view of a specific portion of a LLM system activity. Each span contains metadata such as timestamps, duration, and contextual information.

T

Testset

A Scorecard Testset is a collection of Testcases used to evaluate the performance of an LLM application across a variety of inputs and scenarios.

Testcase

A Scorecard Testcase is an individual input to an LLM that is used for scoring. It consists of an optional user query to the LLM, an optional context, an optional ideal response to the given user query, and further optional custom fields (see the API reference for more information).

Trace

A trace represents an overall record of the activity of a LLM system, an end-to-end execution path from user input to LLM output. A single trace consists of a series of time intervals known as spans.

Tracing

Tracing records the flow of requests or transactions across the different components of an LLM system. This helps to identify where potential delays occur and how services interact with each other. Relevant events, errors, and activities are documented with traces.

Trust Center

The Scorecard Trust Center is a centralized hub that provides information about Scorecards’s security, privacy, compliance, and data protection practices, designed to build trust with customers and stakeholders.

U

User Query

A user query is an optional input field of a Testcase that defines the user’s prompt to the LLM. A user query can range from a simple text message to providing complex structured data.