Revision as of 19:35, 22 January 2026

```mediawiki

Librecode Annotation Standard

Document Version: 2.0 Last Updated: 2025-10-01 Target Audience: Session recorders, annotators, reviewers

Glossary

Session An asciinema recording of a terminal session, typically showing the deployment or configuration of software.

Annotated Session A session that has structured annotations added to it, conforming to this specification.

Recording The raw asciinema capture of terminal activity.

Annotation Structured metadata added to a recording that describes what is happening, why, and the outcome.

Timeline A hierarchical structure in the annotator tool where annotations are organized. Different goal levels exist on separate timelines.

Goal Hierarchy The nested structure of goals, subgoals, and sub-subgoals that describe the overall task and its component steps.

Annotation Tag A specific marker type (e.g., goal, mistake, successResult) used to categorize parts of the session.

Person A (Recorder) The individual(s) who created the original terminal session recording.

Person B (Annotator) The individual who interviews Person A and adds structured annotations to the recording.

.session File A zip file containing the annotated recording, audio transcriptions, and metadata conforming to this specification.

Overview & Purpose

Purpose of this document

This document exists to:

Provide an annotation process.
Provide an annotation standard.
Collect guidance for people participating in annotation.

An annotated session is defined as an asciinema recording, with annotations added to it, conforming to a version of this document.

What are Annotated Sessions?

An annotated session is an asciinema recording of terminal activity (typically software deployment) that has been enriched with structured annotations describing:

What actions were performed
Why those actions were taken
Whether goals were achieved
What mistakes occurred and their consequences
What discoveries changed the approach

Why Create Them?

The Librecode project aims to train Large Language Models (LLMs) to understand and work with terminal recordings. Annotated sessions serve as training data, teaching models to:

Recognize common deployment patterns
Understand error recovery strategies
Learn from human problem-solving approaches
Eventually perform automatic annotation

Once sufficient manual annotations exist, trained models can automatically annotate new sessions, enabling advanced tooling like automated documentation generation.

Document Purpose

This document provides:

An annotation process for creating annotated sessions
An annotation standard defining what to annotate and how
Guidelines for annotators to ensure consistency and quality

The Annotation Process

The process of creating an annotated session involves four distinct phases:

Phase 1: Recording → Phase 2: Interview → Phase 3: Annotation → Phase 4: Submission

Phase 1: Recording

Objective: Create a terminal recording of meaningful software deployment activity. Who: Person A (one or more individuals)

Steps:

Set up the recording environment:
- Use a Debian Stable VM (see Appendix D)
- Upgrade to unstable if required packages are missing
VMs may be reused from previous deployments, or may be new
Start the asciinema recording client
Attempt to deploy a piece of GNU software
Keep all input/output inside the terminal:
- Use CLI web tools (curl, wget, lynx, links, etc.)
Stop recording when deployment completes

Output: .cast asciinema recording file

Common Issues:

Using GUI tools
Stopping recording too early

Phase 2: Interview

Objective: Ensure the annotator fully understands what was done and why. Who: Person A and Person B

Steps:

Record audio of both sides separately
Review the recording together (recommended)
Explain what was done and why
Record follow-up clarification conversations if needed

Output:

Audio recordings (both sides)
Full understanding of actions and rationale

Why Separate Audio Recordings?

Better transcription quality
Clear speaker attribution
Higher-quality training data

Recommended ways to review the recording together:

tmate + asciinema playback can work (during playback, holding . fast-forwards)
Screen share while watching the recording
Screen share while using the annotator tool

If you break into a separate “experimental” session during the meeting:

This is allowed and often useful.
Record that new session with asciinema as well, so it can be reviewed and understood by someone who was not present.

Phase 3: Annotation

Objective: Add structured annotations using the annotator tool. Who: Person B

Steps:

Load the recording
Create timelines for goal hierarchy levels
Apply annotations per standard (see Annotation Standard)
Review for completeness and consistency
Note the spec version used for this annotation (this document version)
If clarification is needed, contact Person A:
- Record audio of both ends of the clarification conversation
- Include the audio (and later transcription) in the .session package

Annotator tool:

Use the annotator tool hosted at: https://github.com/arthurwolf/annotator
(If a specific branch is required for a cohort/release, use the branch specified by the project, e.g. from_students when applicable.)

Output: Annotated recording

Phase 4: Submission

Objective: Package and submit the session.

Steps:

Transcribe audio with Whisper (with timestamps)
Create .session archive
Include required files (see Technical Specification)
Upload to repository

Annotation Standard

This section defines what to annotate and how.

Goal hierarchy and timelines

Goals, subgoals, sub-subgoals (and deeper levels as needed) form a goal hierarchy.
Ideally, every part of the session should belong to a goal.
Subgoals should represent meaningful “chunks” of work contributing to a parent goal.
Goals, subgoals, and sub-(sub…) goals should exist as separate timelines in the annotator tool:
- Think: one “Goal” timeline spanning the whole file,
- Subgoal timelines covering sections that represent large steps,
- Deeper timelines for finer-grained structure as needed.

Success markers

Use success markers to capture whether a goal (or subgoal) was achieved at the moment where the result is observable.

successResult — the goal was achieved (success is visible in the recording output)
successFailure — the goal failed (failure is visible)
successUnknown — success cannot be determined from the recording (or cannot possibly be visible)

When to apply success markers:

Apply success markers only when the success/failure of a command corresponds to the attempted end of a goal.
Example:
- If a goal is “get my program to compile”, then the output of make (where compilation occurs) is the target moment.
For activities where the result cannot be observed (e.g., some parts of OS installs), successUnknown is acceptable.

Tools and subtools (commands and interfaces)

Commands can be annotated as tools/subtools.
Provide:
- An in-context description (considering the current goal and what has happened so far)
- Optionally, an out-of-context description (what the command generally does)

This includes screen-based interactions when they are part of the workflow (if the annotator supports representing them).

Mistakes

Every mistake down to the typo should be tagged, including why it is a mistake and the repercussions.

mistake — the incorrect action or decision
mistakeReason — why it was wrong
mistakeResult — what happened because of it (error message, time lost, wrong path taken, etc.)

Secrets and sensitive data

Use specific tags for secrets and secret-like flows:

passwordPrompt
password
passwordAgain
secret

Annotate secret handling carefully, and prefer describing the flow rather than reproducing secret material.

Discoveries

Discoveries are things learned during the session that change the plan/goals/approach.

discovery (or project’s chosen “Discoveries” tag) — a new fact that changes what happens next
Apply when the discovery causes a change in goals, subgoals, or strategy.

Licensing and legally significant text

If you reproduce a “legally significant” portion of a licensed work (see GNU guidance on “Legally Significant”), list the license in your annotation.

license
copyright

When in doubt, prefer summarizing instead of reproducing large chunks of licensed text.

Application Guidelines

How to apply goal markers

When starting a goal, add a unique goal identifier (format guideline):
- [Goal_ + Unique_identifier] followed by the goal description
When a goal is completed:
- close out the goal and return to the parent goal level (change “level” back)

How to apply mistakes

Whenever there is a mistake:

Apply mistake, mistakeReason, and mistakeResult.

How to apply success markers

Apply successResult, successFailure, or successUnknown only when:
- The result of a goal is visible, OR
- The result cannot possibly be visible.

General guidance

Annotate as you would prefer sessions to be annotated: be as in-depth as you would like.
The goal of annotation is to describe precisely what is happening in the session:
- what is done,
- why it is done,
- and what outcome it leads to.

Technical Specification

The .session archive

A .session file is a zip archive with a .session extension containing (at minimum) the annotated recording and the supporting metadata and audio/transcripts.

Required contents

The .session archive must include:

session.yaml — session metadata (see below)
recording.asciinema — the asciinema recording (annotated)
Audio recordings of the interview and any follow-up clarification conversations
Transcriptions of those audio files (produced by Whisper), including timestamps

session.yaml fields

session.yaml should contain at least:

version: Version number for the format specification (this document) used when creating this .session file.

Optional / planned fields (may be included if supported by the project tooling):

mission (FUTUREWORK): overarching goal being attempted
plan (FUTUREWORK): intended plan (does not need to match what happened)
tests (FUTUREWORK): an array of “unit test”-like checks indicating success/failure
- intended to be written in TinTin++
- loaded after an SSH session has been instantiated
- includes a time limit for considering the test to have gone wrong
- examples:
  - “X command was correctly installed to location Y”
  - “Running command --version returns the right version 0.11”
system (optional): command/answer pairs that define the environment
- examples:
  - cat /etc/issue.net: Ubuntu 22.04 LTS\n
  - ip address: ...

Quick Reference

Core concepts

Use separate timelines for each goal level (goal / subgoal / sub-subgoal).
Add success markers only at moments where a goal outcome is observable (or provably unobservable).

Common tags (non-exhaustive)

Goals: goal (plus unique goal identifiers per project convention)
Success: successResult, successFailure, successUnknown
Mistakes: mistake, mistakeReason, mistakeResult
Secrets: passwordPrompt, password, passwordAgain, secret
Discoveries: discovery (or “Discoveries” tag used by the project)
Licensing: license, copyright
Tools: tool/subtool annotations (in-context and optionally out-of-context)

Complete Worked Example

[Goal_001] Deploy GNU package "foo" on Debian Stable VM
  [Subgoal_001a] Install build dependencies
    tool: apt-get install ...
    successResult: dependencies installed

  [Subgoal_001b] Configure build
    tool: ./configure --prefix=/usr/local
    mistake: used wrong configure flag
    mistakeReason: flag is not supported by this version
    mistakeResult: configure exits with error, must retry
    tool: ./configure --prefix=/usr/local
    successResult: configure completed

  [Subgoal_001c] Build and verify
    tool: make
    successResult: compilation completed
    tool: make test
    successUnknown: tests run but output does not clearly confirm pass/fail (or tests are not available)

  discovery: package requires newer libc than expected → adjust approach / select different target or environment

(Replace identifiers, tags, and exact phrasing with the conventions supported by the current annotator tool version.)

Appendices

Appendix A: Recording environment reminders

Use a Debian Stable VM for recordings.
If Debian Stable lacks required packages, upgrading the VM to unstable is allowed.
VMs may be reused across sessions.

Appendix B: Keeping everything inside the terminal

All input/output should be attempted through the terminal session.
For web interactions, prefer CLI tools such as:
- lynx, links, curl, wget, etc.

Appendix C: Interview and follow-ups

Record both sides separately for quality.
If any follow-up clarification occurs later:
- record both ends again,
- include audio and transcripts in the .session.

Appendix D: Debian Stable VM details

(See project-specific VM setup instructions for the current cohort/repository; keep this appendix synchronized with those instructions.) ```

@@ Line 1: / Line 1: @@
+```mediawiki
 = Librecode Annotation Standard =
@@ Line 55: / Line 56: @@
 == Overview & Purpose ==
+=== Purpose of this document ===
+This document exists to:
+* Provide an annotation process.
+* Provide an annotation standard.
+* Collect guidance for people participating in annotation.
+An '''annotated session''' is defined as an asciinema recording, with annotations added to it, conforming to a version of this document.
 === What are Annotated Sessions? ===
@@ Line 101: / Line 111: @@
 #* Use a Debian Stable VM (see Appendix D)
 #* Upgrade to unstable if required packages are missing
+# VMs may be reused from previous deployments, or may be new
 # Start the <code>asciinema</code> recording client
 # Attempt to deploy a piece of GNU software
 # Keep all input/output inside the terminal:
-#* Use CLI web tools (<code>curl</code>, <code>wget</code>, <code>lynx</code>, etc.)
+#* Use CLI web tools (<code>curl</code>, <code>wget</code>, <code>lynx</code>, <code>links</code>, etc.)
 # Stop recording when deployment completes
@@ Line 122: / Line 133: @@
 '''Steps:'''
 # Record audio of both sides separately
-# Review the recording together
+# Review the recording together (recommended)
 # Explain what was done and why
 # Record follow-up clarification conversations if needed
 '''Output:'''
-* Audio recordings
+* Audio recordings (both sides)
 * Full understanding of actions and rationale
@@ Line 134: / Line 145: @@
 * Clear speaker attribution
 * Higher-quality training data
+'''Recommended ways to review the recording together:'''
+* <code>tmate</code> + asciinema playback can work (during playback, holding <code>.</code> fast-forwards)
+* Screen share while watching the recording
+* Screen share while using the annotator tool
+'''If you break into a separate “experimental” session during the meeting:'''
+* This is allowed and often useful.
+* Record that new session with <code>asciinema</code> as well, so it can be reviewed and understood by someone who was not present.
 ----
@@ Line 145: / Line 165: @@
 # Load the recording
 # Create timelines for goal hierarchy levels
-# Apply annotations per standard
+# Apply annotations per standard (see [[#Annotation Standard|Annotation Standard]])
-# Review completeness
+# Review for completeness and consistency
-# Note spec version used
+# Note the spec version used for this annotation (this document version)
+# If clarification is needed, contact Person A:
+#* Record audio of both ends of the clarification conversation
+#* Include the audio (and later transcription) in the <code>.session</code> package
+'''Annotator tool:'''
+* Use the annotator tool hosted at: <code>https://github.com/arthurwolf/annotator</code>
+* (If a specific branch is required for a cohort/release, use the branch specified by the project, e.g. <code>from_students</code> when applicable.)
 '''Output:''' Annotated recording
@@ Line 158: / Line 185: @@
 '''Steps:'''
-# Transcribe audio with Whisper
+# Transcribe audio with Whisper (with timestamps)
 # Create <code>.session</code> archive
-# Include:
+# Include required files (see [[#Technical Specification|Technical Specification]])
-#* <code>session.yaml</code>
-#* <code>recording.asciinema</code>
-#* Audio + transcriptions
 # Upload to repository
----
+----
+== Annotation Standard ==
+This section defines what to annotate and how.
+=== Goal hierarchy and timelines ===
+* Goals, subgoals, sub-subgoals (and deeper levels as needed) form a '''goal hierarchy'''.
+* Ideally, every part of the session should belong to a goal.
+* Subgoals should represent meaningful “chunks” of work contributing to a parent goal.
+* Goals, subgoals, and sub-(sub…) goals should exist as '''separate timelines''' in the annotator tool:
+** Think: one “Goal” timeline spanning the whole file,
+** Subgoal timelines covering sections that represent large steps,
+** Deeper timelines for finer-grained structure as needed.
+=== Success markers ===
+Use success markers to capture whether a goal (or subgoal) was achieved at the moment where the result is observable.
+* <code>successResult</code> — the goal was achieved (success is visible in the recording output)
+* <code>successFailure</code> — the goal failed (failure is visible)
+* <code>successUnknown</code> — success cannot be determined from the recording (or cannot possibly be visible)
+'''When to apply success markers:'''
+* Apply success markers only when the success/failure of a command corresponds to the attempted end of a goal.
+* Example:
+** If a goal is “get my program to compile”, then the output of <code>make</code> (where compilation occurs) is the target moment.
+* For activities where the result cannot be observed (e.g., some parts of OS installs), <code>successUnknown</code> is acceptable.
+=== Tools and subtools (commands and interfaces) ===
+* Commands can be annotated as tools/subtools.
+* Provide:
+** An '''in-context''' description (considering the current goal and what has happened so far)
+** Optionally, an '''out-of-context''' description (what the command generally does)
+This includes screen-based interactions when they are part of the workflow (if the annotator supports representing them).
+=== Mistakes ===
+Every mistake down to the typo should be tagged, including why it is a mistake and the repercussions.
+* <code>mistake</code> — the incorrect action or decision
+* <code>mistakeReason</code> — why it was wrong
+* <code>mistakeResult</code> — what happened because of it (error message, time lost, wrong path taken, etc.)
+=== Secrets and sensitive data ===
+Use specific tags for secrets and secret-like flows:
+* <code>passwordPrompt</code>
+* <code>password</code>
+* <code>passwordAgain</code>
+* <code>secret</code>
+Annotate secret handling carefully, and prefer describing the flow rather than reproducing secret material.
+=== Discoveries ===
+Discoveries are things learned during the session that change the plan/goals/approach.
+* <code>discovery</code> (or project’s chosen “Discoveries” tag) — a new fact that changes what happens next
+* Apply when the discovery causes a change in goals, subgoals, or strategy.
+=== Licensing and legally significant text ===
+If you reproduce a “legally significant” portion of a licensed work (see GNU guidance on “Legally Significant”), list the license in your annotation.
+* <code>license</code>
+* <code>copyright</code>
+When in doubt, prefer summarizing instead of reproducing large chunks of licensed text.
+----
+== Application Guidelines ==
+=== How to apply goal markers ===
+* When starting a goal, add a unique goal identifier (format guideline):
+** <code>[Goal_ + Unique_identifier]</code> followed by the goal description
+* When a goal is completed:
+** close out the goal and return to the parent goal level (change “level” back)
+=== How to apply mistakes ===
+Whenever there is a mistake:
+* Apply <code>mistake</code>, <code>mistakeReason</code>, and <code>mistakeResult</code>.
+=== How to apply success markers ===
+* Apply <code>successResult</code>, <code>successFailure</code>, or <code>successUnknown</code> only when:
+** The result of a goal is visible, OR
+** The result cannot possibly be visible.
+=== General guidance ===
+* Annotate as you would prefer sessions to be annotated: be as in-depth as you would like.
+* The goal of annotation is to describe precisely what is happening in the session:
+** what is done,
+** why it is done,
+** and what outcome it leads to.
+----
+== Technical Specification ==
+=== The .session archive ===
+A <code>.session</code> file is a zip archive with a <code>.session</code> extension containing (at minimum) the annotated recording and the supporting metadata and audio/transcripts.
+=== Required contents ===
+The <code>.session</code> archive must include:
+* <code>session.yaml</code> — session metadata (see below)
+* <code>recording.asciinema</code> — the asciinema recording (annotated)
+* Audio recordings of the interview and any follow-up clarification conversations
+* Transcriptions of those audio files (produced by Whisper), including timestamps
+=== session.yaml fields ===
+<code>session.yaml</code> should contain at least:
+* <code>version</code>: Version number for the format specification (this document) used when creating this <code>.session</code> file.
+Optional / planned fields (may be included if supported by the project tooling):
+* <code>mission</code> (FUTUREWORK): overarching goal being attempted
+* <code>plan</code> (FUTUREWORK): intended plan (does not need to match what happened)
+* <code>tests</code> (FUTUREWORK): an array of “unit test”-like checks indicating success/failure
+** intended to be written in TinTin++
+** loaded after an SSH session has been instantiated
+** includes a time limit for considering the test to have gone wrong
+** examples:
+*** “X command was correctly installed to location Y”
+*** “Running <code>command --version</code> returns the right version 0.11”
+* <code>system</code> (optional): command/answer pairs that define the environment
+** examples:
+*** <code>cat /etc/issue.net</code>: <code>Ubuntu 22.04 LTS\n</code>
+*** <code>ip address</code>: <code>...</code>
+----
+== Quick Reference ==
+=== Core concepts ===
+* Use separate timelines for each goal level (goal / subgoal / sub-subgoal).
+* Add success markers only at moments where a goal outcome is observable (or provably unobservable).
+=== Common tags (non-exhaustive) ===
+* Goals: <code>goal</code> (plus unique goal identifiers per project convention)
+* Success: <code>successResult</code>, <code>successFailure</code>, <code>successUnknown</code>
+* Mistakes: <code>mistake</code>, <code>mistakeReason</code>, <code>mistakeResult</code>
+* Secrets: <code>passwordPrompt</code>, <code>password</code>, <code>passwordAgain</code>, <code>secret</code>
+* Discoveries: <code>discovery</code> (or “Discoveries” tag used by the project)
+* Licensing: <code>license</code>, <code>copyright</code>
+* Tools: tool/subtool annotations (in-context and optionally out-of-context)
+----
+== Complete Worked Example ==
+<pre>
+[Goal_001] Deploy GNU package "foo" on Debian Stable VM
+  [Subgoal_001a] Install build dependencies
+    tool: apt-get install ...
+    successResult: dependencies installed
+  [Subgoal_001b] Configure build
+    tool: ./configure --prefix=/usr/local
+    mistake: used wrong configure flag
+    mistakeReason: flag is not supported by this version
+    mistakeResult: configure exits with error, must retry
+    tool: ./configure --prefix=/usr/local
+    successResult: configure completed
+  [Subgoal_001c] Build and verify
+    tool: make
+    successResult: compilation completed
+    tool: make test
+    successUnknown: tests run but output does not clearly confirm pass/fail (or tests are not available)
+  discovery: package requires newer libc than expected → adjust approach / select different target or environment
+</pre>
+(Replace identifiers, tags, and exact phrasing with the conventions supported by the current annotator tool version.)
+----
+== Appendices ==
+=== Appendix A: Recording environment reminders ===
+* Use a Debian Stable VM for recordings.
+* If Debian Stable lacks required packages, upgrading the VM to unstable is allowed.
+* VMs may be reused across sessions.
+=== Appendix B: Keeping everything inside the terminal ===
+* All input/output should be attempted through the terminal session.
+* For web interactions, prefer CLI tools such as:
+** <code>lynx</code>, <code>links</code>, <code>curl</code>, <code>wget</code>, etc.
+=== Appendix C: Interview and follow-ups ===
+* Record both sides separately for quality.
+* If any follow-up clarification occurs later:
+** record both ends again,
+** include audio and transcripts in the <code>.session</code>.
+=== Appendix D: Debian Stable VM details ===
+(See project-specific VM setup instructions for the current cohort/repository; keep this appendix synchronized with those instructions.)
+```

AnnotationStandard: Difference between revisions

Revision as of 19:35, 22 January 2026

Librecode Annotation Standard

Table of Contents

Glossary

Overview & Purpose

Purpose of this document

What are Annotated Sessions?

Why Create Them?

Document Purpose

The Annotation Process

Phase 1: Recording

Phase 2: Interview

Phase 3: Annotation

Phase 4: Submission

Annotation Standard

Goal hierarchy and timelines

Success markers

Tools and subtools (commands and interfaces)

Mistakes

Secrets and sensitive data

Discoveries

Licensing and legally significant text

Application Guidelines

How to apply goal markers

How to apply mistakes

How to apply success markers

General guidance

Technical Specification

The .session archive

Required contents

session.yaml fields

Quick Reference

Core concepts

Common tags (non-exhaustive)

Complete Worked Example

Appendices

Appendix A: Recording environment reminders

Appendix B: Keeping everything inside the terminal

Appendix C: Interview and follow-ups

Appendix D: Debian Stable VM details

Navigation menu

Search