AnnotationStandard
Annotation Standard
Document Version: 2.0 Last Updated: 2025-10-01 Target Audience: Session recorders, annotators, reviewers
Table of Contents
- Glossary
- Overview & Purpose
- The Annotation Process
- Annotation Standard
- Application Guidelines
- Technical Specification
- Quick Reference
- Complete Worked Example
- Appendices
Glossary
Session An asciinema recording of a terminal session, typically showing the deployment or configuration of software.
Annotated Session A session that has structured annotations added to it, conforming to this specification.
Recording The raw asciinema capture of terminal activity.
Annotation Structured metadata added to a recording that describes what is happening, why, and the outcome.
Timeline A hierarchical structure in the annotator tool where annotations are organized. Different goal levels exist on separate timelines.
Goal Hierarchy The nested structure of goals, subgoals, and sub-subgoals that describe the overall task and its component steps.
Annotation Tag
A specific marker type (e.g., goal, mistake, successResult) used to categorize parts of the session.
Person A (Recorder) The individual(s) who created the original terminal session recording.
Person B (Annotator) The individual who interviews Person A and adds structured annotations to the recording.
.session File A zip file containing the annotated recording, audio transcriptions, and metadata conforming to this specification.
Overview & Purpose
Purpose of this document
This document exists to:
- Provide an annotation process.
- Provide an annotation standard.
- Collect guidance for people participating in annotation.
What are Annotated Sessions?
An annotated session is an asciinema recording of terminal activity (typically software deployment) that has been enriched with structured annotations describing:
- What actions were performed
- Why those actions were taken
- Whether goals were achieved
- What mistakes occurred and their consequences
- What discoveries changed the approach
Why Create Them?
The AutoDocs project aims to train Large Language Models (LLMs) to understand and work with terminal recordings. Annotated sessions serve as training data, teaching models to:
- Recognize common deployment patterns
- Understand error recovery strategies
- Learn from human problem-solving approaches
- Eventually perform automatic annotation
Once sufficient manual annotations exist, trained models can automatically annotate new sessions, enabling advanced tooling like automated documentation generation.
Document Purpose
This document provides:
- An annotation process for creating annotated sessions
- An annotation standard defining what to annotate and how
- Guidelines for annotators to ensure consistency and quality
The Annotation Process
The process of creating an annotated session involves four distinct phases:
Phase 1: Recording → Phase 2: Interview → Phase 3: Annotation → Phase 4: Submission
Phase 1: Recording
Objective: Create a terminal recording of meaningful operations work, software development, or other terminal baased productive activities. Who: Person A (one or more individuals)
Steps:
- Set up the recording environment:
- Use a Debian Stable system (see Appendix D)
- Upgrade to unstable if required packages are missing
- VMs may be reused from previous deployments, or may be new
- Start the
asciinemarecording client - Attempt to do useful work
- Keep all input/output inside the terminal:
- Use CLI web tools (
curl,wget,lynx,links, etc.)
- Use CLI web tools (
- Stop recording when work is done
Output: .cast asciinema recording file
Common Issues:
- Using GUI tools
- Stopping recording too early
- Forgetting --stdin on asciinema's command line
Phase 2: Interview
Objective: Ensure the annotator fully understands what was done and why. Who: Person A and Person B
Steps:
- Record audio of both sides separately
- Review the recording together (recommended)
- Explain what was done and why
- Record follow-up clarification conversations if needed
Output:
- Audio recordings (both sides)
- Full understanding of actions and rationale
Why Separate Audio Recordings?
- Better transcription quality
- Clear speaker attribution
- Higher-quality training data
Recommended ways to review the recording together:
tmate+ asciinema playback can work (during playback, holding.fast-forwards)- Screen share while watching the recording
- Screen share while using the annotator tool
If you break into a separate “experimental” session during the meeting:
- This is allowed and often useful.
- Record that new session with
asciinemaas well, so it can be reviewed and understood by someone who was not present.
Phase 3: Annotation
Objective: Add structured annotations using the annotator tool. Who: Person B
Steps:
- Load the recording
- Create timelines for goal hierarchy levels
- Apply annotations per standard (see Annotation Standard)
- Review for completeness and consistency
- Note the spec version used for this annotation (this document version)
- If clarification is needed, contact Person A:
- Record audio of both ends of the clarification conversation
- Include the audio (and later transcription) in the
.sessionpackage
Annotator tool:
- Use the annotator tool hosted at:
https://github.com/arthurwolf/annotator - (If a specific branch is required for a cohort/release, use the branch specified by the project, e.g.
from_studentswhen applicable.)
Output: Annotated recording
Phase 4: Submission
Objective: Package and submit the session.
Steps:
- Transcribe audio with Whisper (with timestamps)
- Create
.sessionarchive - Include required files (see Technical Specification)
- Upload to repository
Annotation Standard
This section defines what to annotate and how.
Goal hierarchy and timelines
- Goals, subgoals, sub-subgoals (and deeper levels as needed) form a goal hierarchy.
- Ideally, every part of the session should belong to a goal.
- Subgoals should represent meaningful “chunks” of work contributing to a parent goal.
- Goals, subgoals, and sub-(sub…) goals should exist as separate timelines in the annotator tool:
- Think: one “Goal” timeline spanning the whole file,
- Subgoal timelines covering sections that represent large steps,
- Deeper timelines for finer-grained structure as needed.
Success markers
Use success markers to capture whether a goal (or subgoal) was achieved at the moment where the result is observable.
successResult— the goal was achieved (success is visible in the recording output)successFailure— the goal failed (failure is visible)successUnknown— success cannot be determined from the recording (or cannot possibly be visible)
When to apply success markers:
- Apply success markers only when the success/failure of a command corresponds to the attempted end of a goal.
- Example:
- If a goal is “get my program to compile”, then the output of
make(where compilation occurs) is the target moment.
- If a goal is “get my program to compile”, then the output of
- For activities where the result cannot be observed (e.g., some parts of OS installs),
successUnknownis acceptable.
Tools and subtools (commands and interfaces)
- Commands can be annotated as tools/subtools.
- Provide:
- An in-context description (considering the current goal and what has happened so far)
- Optionally, an out-of-context description (what the command generally does)
This includes screen-based interactions when they are part of the workflow (if the annotator supports representing them).
Mistakes
Every mistake down to the typo should be tagged, including why it is a mistake and the repercussions.
mistake— the incorrect action or decisionmistakeReason— why it was wrongmistakeResult— what happened because of it (error message, time lost, wrong path taken, etc.)
Secrets and sensitive data
Use specific tags for secrets and secret-like flows:
passwordPromptpasswordpasswordAgainsecret
Annotate secret handling carefully, and prefer describing the flow rather than reproducing secret material.
Discoveries
Discoveries are things learned during the session that change the plan/goals/approach.
discovery(or project’s chosen “Discoveries” tag) — a new fact that changes what happens next- Apply when the discovery causes a change in goals, subgoals, or strategy.
Licensing and legally significant text
If you reproduce a “legally significant” portion of a licensed work (see GNU guidance on “Legally Significant”), list the license in your annotation.
licensecopyright
When in doubt, prefer summarizing instead of reproducing large chunks of licensed text.
Application Guidelines
How to apply goal markers
- When starting a goal, add a unique goal identifier (format guideline):
[Goal_ + Unique_identifier]followed by the goal description
- When a goal is completed:
- close out the goal and return to the parent goal level (change “level” back)
How to apply mistakes
Whenever there is a mistake:
- Apply
mistake,mistakeReason, andmistakeResult.
How to apply success markers
- Apply
successResult,successFailure, orsuccessUnknownonly when:- The result of a goal is visible, OR
- The result cannot possibly be visible.
General guidance
- Annotate as you would prefer sessions to be annotated: be as in-depth as you would like.
- The goal of annotation is to describe precisely what is happening in the session:
- what is done,
- why it is done,
- and what outcome it leads to.
Technical Specification
The .session archive
A .session file is a zip archive with a .session extension containing (at minimum) the annotated recording and the supporting metadata and audio/transcripts.
Required contents
The .session archive must include:
session.yaml— session metadata (see below)recording.asciinema— the asciinema recording (annotated)- Audio recordings of the interview and any follow-up clarification conversations
- Transcriptions of those audio files (produced by Whisper), including timestamps
session.yaml fields
session.yaml should contain at least:
version: Version number for the format specification (this document) used when creating this.sessionfile.
Optional / planned fields (may be included if supported by the project tooling):
mission(FUTUREWORK): overarching goal being attemptedplan(FUTUREWORK): intended plan (does not need to match what happened)tests(FUTUREWORK): an array of “unit test”-like checks indicating success/failure- intended to be written in TinTin++
- loaded after an SSH session has been instantiated
- includes a time limit for considering the test to have gone wrong
- examples:
- “X command was correctly installed to location Y”
- “Running
command --versionreturns the right version 0.11”
system(optional): command/answer pairs that define the environment- examples:
cat /etc/issue.net:Ubuntu 22.04 LTS\nip address:...
- examples:
Quick Reference
Core concepts
- Use separate timelines for each goal level (goal / subgoal / sub-subgoal).
- Add success markers only at moments where a goal outcome is observable (or provably unobservable).
Common tags (non-exhaustive)
- Goals:
goal(plus unique goal identifiers per project convention) - Success:
successResult,successFailure,successUnknown - Mistakes:
mistake,mistakeReason,mistakeResult - Secrets:
passwordPrompt,password,passwordAgain,secret - Discoveries:
discovery(or “Discoveries” tag used by the project) - Licensing:
license,copyright - Tools: tool/subtool annotations (in-context and optionally out-of-context)
Complete Worked Example
[Goal_001] Deploy GNU package "foo" on Debian Stable VM
[Subgoal_001a] Install build dependencies
tool: apt-get install ...
successResult: dependencies installed
[Subgoal_001b] Configure build
tool: ./configure --prefix=/usr/local
mistake: used wrong configure flag
mistakeReason: flag is not supported by this version
mistakeResult: configure exits with error, must retry
tool: ./configure --prefix=/usr/local
successResult: configure completed
[Subgoal_001c] Build and verify
tool: make
successResult: compilation completed
tool: make test
successUnknown: tests run but output does not clearly confirm pass/fail (or tests are not available)
discovery: package requires newer libc than expected → adjust approach / select different target or environment
(Replace identifiers, tags, and exact phrasing with the conventions supported by the current annotator tool version.)
Appendices
Appendix A: Recording environment reminders
- Use a Debian Stable VM for recordings.
- If Debian Stable lacks required packages, upgrading the VM to unstable is allowed.
- VMs may be reused across sessions.
Appendix B: Keeping everything inside the terminal
- All input/output should be attempted through the terminal session.
- For web interactions, prefer CLI tools such as:
lynx,links,curl,wget, etc.
Appendix C: Interview and follow-ups
- Record both sides separately for quality.
- If any follow-up clarification occurs later:
- record both ends again,
- include audio and transcripts in the
.session.
Appendix D: Debian Stable VM details
(See project-specific VM setup instructions for the current cohort/repository; keep this appendix synchronized with those instructions.)