AnnotationStandard

From CLAIF Wiki
Revision as of 23:39, 11 January 2026 by ArthurWolf (talk | contribs) (Initial port from the gist at https://gist.githubusercontent.com/arthurwolf/c4521a8baa306c016efd50ee23ffe56e/raw/26081cbc230641b004554778eefb11eee2a2fcf3/annotation.md , conversion from Markdown to MediaWiki formats)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Librecode Annotation Standard

Document Version: 2.0 Last Updated: 2025-10-01 Target Audience: Session recorders, annotators, reviewers


Table of Contents

  1. Glossary
  2. Overview & Purpose
  3. The Annotation Process
  4. Annotation Standard
  5. Application Guidelines
  6. Technical Specification
  7. Quick Reference
  8. Complete Worked Example
  9. Appendices

Glossary

Session An asciinema recording of a terminal session, typically showing the deployment or configuration of software.

Annotated Session A session that has structured annotations added to it, conforming to this specification.

Recording The raw asciinema capture of terminal activity.

Annotation Structured metadata added to a recording that describes what is happening, why, and the outcome.

Timeline A hierarchical structure in the annotator tool where annotations are organized. Different goal levels exist on separate timelines.

Goal Hierarchy The nested structure of goals, subgoals, and sub-subgoals that describe the overall task and its component steps.

Annotation Tag A specific marker type (e.g., goal, mistake, successResult) used to categorize parts of the session.

Person A (Recorder) The individual(s) who created the original terminal session recording.

Person B (Annotator) The individual who interviews Person A and adds structured annotations to the recording.

.session File A zip file containing the annotated recording, audio transcriptions, and metadata conforming to this specification.


Overview & Purpose

What are Annotated Sessions?

An annotated session is an asciinema recording of terminal activity (typically software deployment) that has been enriched with structured annotations describing:

  • What actions were performed
  • Why those actions were taken
  • Whether goals were achieved
  • What mistakes occurred and their consequences
  • What discoveries changed the approach

Why Create Them?

The Librecode project aims to train Large Language Models (LLMs) to understand and work with terminal recordings. Annotated sessions serve as training data, teaching models to:

  • Recognize common deployment patterns
  • Understand error recovery strategies
  • Learn from human problem-solving approaches
  • Eventually perform automatic annotation

Once sufficient manual annotations exist, trained models can automatically annotate new sessions, enabling advanced tooling like automated documentation generation.

Document Purpose

This document provides:

  1. An annotation process for creating annotated sessions
  2. An annotation standard defining what to annotate and how
  3. Guidelines for annotators to ensure consistency and quality

The Annotation Process

The process of creating an annotated session involves four distinct phases:

Phase 1: Recording → Phase 2: Interview → Phase 3: Annotation → Phase 4: Submission

Phase 1: Recording

Objective: Create a terminal recording of meaningful software deployment activity. Who: Person A (one or more individuals)

Steps:

  1. Set up the recording environment:
    • Use a Debian Stable VM (see Appendix D)
    • Upgrade to unstable if required packages are missing
  2. Start the asciinema recording client
  3. Attempt to deploy a piece of GNU software
  4. Keep all input/output inside the terminal:
    • Use CLI web tools (curl, wget, lynx, etc.)
  5. Stop recording when deployment completes

Output: .cast asciinema recording file

Common Issues:

  • Using GUI tools
  • Stopping recording too early

Phase 2: Interview

Objective: Ensure the annotator fully understands what was done and why. Who: Person A and Person B

Steps:

  1. Record audio of both sides separately
  2. Review the recording together
  3. Explain what was done and why
  4. Record follow-up clarification conversations if needed

Output:

  • Audio recordings
  • Full understanding of actions and rationale

Why Separate Audio Recordings?

  • Better transcription quality
  • Clear speaker attribution
  • Higher-quality training data

Phase 3: Annotation

Objective: Add structured annotations using the annotator tool. Who: Person B

Steps:

  1. Load the recording
  2. Create timelines for goal hierarchy levels
  3. Apply annotations per standard
  4. Review completeness
  5. Note spec version used

Output: Annotated recording


Phase 4: Submission

Objective: Package and submit the session.

Steps:

  1. Transcribe audio with Whisper
  2. Create .session archive
  3. Include:
    • session.yaml
    • recording.asciinema
    • Audio + transcriptions
  4. Upload to repository

---