Annotation Standard

Document Version: 2.0 Last Updated: 2025-10-01 Target Audience: Session recorders, annotators, reviewers

Glossary

Session An asciinema recording of a terminal session, typically showing the deployment or configuration of software.

Annotated Session A session that has structured annotations added to it, conforming to this specification.

Recording The raw asciinema capture of terminal activity.

Annotation Structured metadata added to a recording that describes what is happening, why, and the outcome.

Timeline A hierarchical structure in the annotator tool where annotations are organized. Different goal levels exist on separate timelines.

Goal Hierarchy The nested structure of goals, subgoals, and sub-subgoals that describe the overall task and its component steps.

Annotation Tag A specific marker type (e.g., goal, mistake, successResult) used to categorize parts of the session.

Person A (Recorder) The individual(s) who created the original terminal session recording.

Person B (Annotator) The individual who interviews Person A and adds structured annotations to the recording.

.session File A zip file containing the annotated recording, audio transcriptions, and metadata conforming to this specification.

Overview & Purpose

Purpose of this document

This document exists to:

Provide an annotation process.
Provide an annotation standard.
Collect guidance for people participating in annotation.

What are Annotated Sessions?

An annotated session is an asciinema recording of terminal activity (typically software deployment) that has been enriched with structured annotations describing:

What actions were performed
Why those actions were taken
Whether goals were achieved
What mistakes occurred and their consequences
What discoveries changed the approach

Why Create Them?

The AutoDocs project aims to train Large Language Models (LLMs) to understand and work with terminal recordings. Annotated sessions serve as training data, teaching models to:

Recognize common deployment patterns
Understand error recovery strategies
Learn from human problem-solving approaches
Eventually perform automatic annotation

Once sufficient manual annotations exist, trained models can automatically annotate new sessions, enabling advanced tooling like automated documentation generation.

Document Purpose

This document provides:

An annotation process for creating annotated sessions
An annotation standard defining what to annotate and how
Guidelines for annotators to ensure consistency and quality

The Annotation Process

The process of creating an annotated session involves four distinct phases:

Phase 1: Recording → Phase 2: Interview → Phase 3: Annotation → Phase 4: Submission

Phase 1: Recording

Objective: Create a terminal recording of meaningful operations work, software development, or other terminal baased productive activities. Who: Person A (one or more individuals)

Steps:

Set up the recording environment:
- Use a Debian Stable system (see Appendix D)
- Upgrade to unstable if required packages are missing
VMs may be reused from previous deployments, or may be new
Start the asciinema recording client
Attempt to do useful work
Keep all input/output inside the terminal:
- Use CLI web tools (curl, wget, lynx, links, etc.)
Stop recording when work is done

Output: .cast asciinema recording file

Common Issues:

Using GUI tools
Stopping recording too early
Forgetting --stdin on asciinema's command line

Phase 2: Interview

Objective: Ensure the annotator fully understands what was done and why. Who: Person A and Person B

Steps:

Record audio of both sides separately
Review the recording together (recommended)
Explain what was done and why
Record follow-up clarification conversations if needed

Output:

Audio recordings (both sides)
Full understanding of actions and rationale

Why Separate Audio Recordings?

Better transcription quality
Clear speaker attribution
Higher-quality training data

Recommended ways to review the recording together:

tmate + asciinema playback can work (during playback, holding . fast-forwards)
Screen share while watching the recording
Screen share while using the annotator tool

If you break into a separate “experimental” session during the meeting:

This is allowed and often useful.
Record that new session with asciinema as well, so it can be reviewed and understood by someone who was not present.

Phase 3: Annotation

Objective: Add structured annotations using the annotator tool. Who: Person B

Steps:

Load the recording
Create timelines for goal hierarchy levels
Apply annotations per standard (see Annotation Standard)
Review for completeness and consistency
Note the spec version used for this annotation (this document version)
If clarification is needed, contact Person A:
- Record audio of both ends of the clarification conversation
- Include the audio (and later transcription) in the .session package

Annotator tool:

Use the annotator tool hosted at: https://github.com/arthurwolf/annotator
(If a specific branch is required for a cohort/release, use the branch specified by the project, e.g. from_students when applicable.)

Output: Annotated recording

Phase 4: Submission

Objective: Package and submit the session.

Steps:

Transcribe audio with Whisper (with timestamps)
Create .session archive
Include required files (see Technical Specification)
Upload to repository

Annotation Standard

This section defines what to annotate and how.

Goal hierarchy and timelines

Goals, subgoals, sub-subgoals (and deeper levels as needed) form a goal hierarchy.
Ideally, every part of the session should belong to a goal.
Subgoals should represent meaningful “chunks” of work contributing to a parent goal.
Goals, subgoals, and sub-(sub…) goals should exist as separate timelines in the annotator tool:
- Think: one “Goal” timeline spanning the whole file,
- Subgoal timelines covering sections that represent large steps,
- Deeper timelines for finer-grained structure as needed.

Success markers

Use success markers to capture whether a goal (or subgoal) was achieved at the moment where the result is observable.

successResult — the goal was achieved (success is visible in the recording output)
successFailure — the goal failed (failure is visible)
successUnknown — success cannot be determined from the recording (or cannot possibly be visible)

When to apply success markers:

Apply success markers only when the success/failure of a command corresponds to the attempted end of a goal.
Example:
- If a goal is “get my program to compile”, then the output of make (where compilation occurs) is the target moment.
For activities where the result cannot be observed (e.g., some parts of OS installs), successUnknown is acceptable.

Tools and subtools (commands and interfaces)

Commands can be annotated as tools/subtools.
Provide:
- An in-context description (considering the current goal and what has happened so far)
- Optionally, an out-of-context description (what the command generally does)

This includes screen-based interactions when they are part of the workflow (if the annotator supports representing them).

Mistakes

Every mistake down to the typo should be tagged, including why it is a mistake and the repercussions.

mistake — the incorrect action or decision
mistakeReason — why it was wrong
mistakeResult — what happened because of it (error message, time lost, wrong path taken, etc.)

Secrets and sensitive data

Use specific tags for secrets and secret-like flows:

passwordPrompt
password
passwordAgain
secret

Annotate secret handling carefully, and prefer describing the flow rather than reproducing secret material.

Discoveries

Discoveries are things learned during the session that change the plan/goals/approach.

discovery (or project’s chosen “Discoveries” tag) — a new fact that changes what happens next
Apply when the discovery causes a change in goals, subgoals, or strategy.

Licensing and legally significant text

If you reproduce a “legally significant” portion of a licensed work (see GNU guidance on “Legally Significant”), list the license in your annotation.

license
copyright

When in doubt, prefer summarizing instead of reproducing large chunks of licensed text.

Application Guidelines

How to apply goal markers

When starting a goal, add a unique goal identifier (format guideline):
- [Goal_ + Unique_identifier] followed by the goal description
When a goal is completed:
- close out the goal and return to the parent goal level (change “level” back)

How to apply mistakes

Whenever there is a mistake:

Apply mistake, mistakeReason, and mistakeResult.

How to apply success markers

Apply successResult, successFailure, or successUnknown only when:
- The result of a goal is visible, OR
- The result cannot possibly be visible.

General guidance

Annotate as you would prefer sessions to be annotated: be as in-depth as you would like.
The goal of annotation is to describe precisely what is happening in the session:
- what is done,
- why it is done,
- and what outcome it leads to.

Technical Specification

The .session archive

A .session file is a zip archive with a .session extension containing (at minimum) the annotated recording and the supporting metadata and audio/transcripts.

Required contents

The .session archive must include:

session.yaml — session metadata (see below)
recording.asciinema — the asciinema recording (annotated)
Audio recordings of the interview and any follow-up clarification conversations
Transcriptions of those audio files (produced by Whisper), including timestamps

session.yaml fields

session.yaml should contain at least:

version: Version number for the format specification (this document) used when creating this .session file.

Optional / planned fields (may be included if supported by the project tooling):

mission (FUTUREWORK): overarching goal being attempted
plan (FUTUREWORK): intended plan (does not need to match what happened)
tests (FUTUREWORK): an array of “unit test”-like checks indicating success/failure
- intended to be written in TinTin++
- loaded after an SSH session has been instantiated
- includes a time limit for considering the test to have gone wrong
- examples:
  - “X command was correctly installed to location Y”
  - “Running command --version returns the right version 0.11”
system (optional): command/answer pairs that define the environment
- examples:
  - cat /etc/issue.net: Ubuntu 22.04 LTS\n
  - ip address: ...

Quick Reference

Core concepts

Use separate timelines for each goal level (goal / subgoal / sub-subgoal).
Add success markers only at moments where a goal outcome is observable (or provably unobservable).

Common tags (non-exhaustive)

Goals: goal (plus unique goal identifiers per project convention)
Success: successResult, successFailure, successUnknown
Mistakes: mistake, mistakeReason, mistakeResult
Secrets: passwordPrompt, password, passwordAgain, secret
Discoveries: discovery (or “Discoveries” tag used by the project)
Licensing: license, copyright
Tools: tool/subtool annotations (in-context and optionally out-of-context)

Complete Worked Example

[Goal_001] Deploy GNU package "foo" on Debian Stable VM
  [Subgoal_001a] Install build dependencies
    tool: apt-get install ...
    successResult: dependencies installed

  [Subgoal_001b] Configure build
    tool: ./configure --prefix=/usr/local
    mistake: used wrong configure flag
    mistakeReason: flag is not supported by this version
    mistakeResult: configure exits with error, must retry
    tool: ./configure --prefix=/usr/local
    successResult: configure completed

  [Subgoal_001c] Build and verify
    tool: make
    successResult: compilation completed
    tool: make test
    successUnknown: tests run but output does not clearly confirm pass/fail (or tests are not available)

  discovery: package requires newer libc than expected → adjust approach / select different target or environment

(Replace identifiers, tags, and exact phrasing with the conventions supported by the current annotator tool version.)

Appendices

Appendix A: Recording environment reminders

Use a Debian Stable VM for recordings.
If Debian Stable lacks required packages, upgrading the VM to unstable is allowed.
VMs may be reused across sessions.

Appendix B: Keeping everything inside the terminal

All input/output should be attempted through the terminal session.
For web interactions, prefer CLI tools such as:
- lynx, links, curl, wget, etc.

Appendix C: Interview and follow-ups

Record both sides separately for quality.
If any follow-up clarification occurs later:
- record both ends again,
- include audio and transcripts in the .session.

Appendix D: Debian Stable VM details

(See project-specific VM setup instructions for the current cohort/repository; keep this appendix synchronized with those instructions.)

AnnotationStandard

Annotation Standard

Table of Contents

Glossary

Overview & Purpose

Purpose of this document

What are Annotated Sessions?

Why Create Them?

Document Purpose

The Annotation Process

Phase 1: Recording

Phase 2: Interview

Phase 3: Annotation

Phase 4: Submission

Annotation Standard

Goal hierarchy and timelines

Success markers

Tools and subtools (commands and interfaces)

Mistakes

Secrets and sensitive data

Discoveries

Licensing and legally significant text

Application Guidelines

How to apply goal markers

How to apply mistakes

How to apply success markers

General guidance

Technical Specification

The .session archive

Required contents

session.yaml fields

Quick Reference

Core concepts

Common tags (non-exhaustive)

Complete Worked Example

Appendices

Appendix A: Recording environment reminders

Appendix B: Keeping everything inside the terminal

Appendix C: Interview and follow-ups

Appendix D: Debian Stable VM details

Navigation menu

Search