AnnotationStandard: Difference between revisions
ArthurWolf (talk | contribs) Initial port from the gist at https://gist.githubusercontent.com/arthurwolf/c4521a8baa306c016efd50ee23ffe56e/raw/26081cbc230641b004554778eefb11eee2a2fcf3/annotation.md , conversion from Markdown to MediaWiki formats |
ArthurWolf (talk | contribs) Migrating missing data from the older cryptpad.fr annotation standard. |
||
| Line 1: | Line 1: | ||
```mediawiki | |||
= Librecode Annotation Standard = | = Librecode Annotation Standard = | ||
| Line 55: | Line 56: | ||
== Overview & Purpose == | == Overview & Purpose == | ||
=== Purpose of this document === | |||
This document exists to: | |||
* Provide an annotation process. | |||
* Provide an annotation standard. | |||
* Collect guidance for people participating in annotation. | |||
An '''annotated session''' is defined as an asciinema recording, with annotations added to it, conforming to a version of this document. | |||
=== What are Annotated Sessions? === | === What are Annotated Sessions? === | ||
| Line 101: | Line 111: | ||
#* Use a Debian Stable VM (see Appendix D) | #* Use a Debian Stable VM (see Appendix D) | ||
#* Upgrade to unstable if required packages are missing | #* Upgrade to unstable if required packages are missing | ||
# VMs may be reused from previous deployments, or may be new | |||
# Start the <code>asciinema</code> recording client | # Start the <code>asciinema</code> recording client | ||
# Attempt to deploy a piece of GNU software | # Attempt to deploy a piece of GNU software | ||
# Keep all input/output inside the terminal: | # Keep all input/output inside the terminal: | ||
#* Use CLI web tools (<code>curl</code>, <code>wget</code>, <code>lynx</code>, etc.) | #* Use CLI web tools (<code>curl</code>, <code>wget</code>, <code>lynx</code>, <code>links</code>, etc.) | ||
# Stop recording when deployment completes | # Stop recording when deployment completes | ||
| Line 122: | Line 133: | ||
'''Steps:''' | '''Steps:''' | ||
# Record audio of both sides separately | # Record audio of both sides separately | ||
# Review the recording together | # Review the recording together (recommended) | ||
# Explain what was done and why | # Explain what was done and why | ||
# Record follow-up clarification conversations if needed | # Record follow-up clarification conversations if needed | ||
'''Output:''' | '''Output:''' | ||
* Audio recordings | * Audio recordings (both sides) | ||
* Full understanding of actions and rationale | * Full understanding of actions and rationale | ||
| Line 134: | Line 145: | ||
* Clear speaker attribution | * Clear speaker attribution | ||
* Higher-quality training data | * Higher-quality training data | ||
'''Recommended ways to review the recording together:''' | |||
* <code>tmate</code> + asciinema playback can work (during playback, holding <code>.</code> fast-forwards) | |||
* Screen share while watching the recording | |||
* Screen share while using the annotator tool | |||
'''If you break into a separate “experimental” session during the meeting:''' | |||
* This is allowed and often useful. | |||
* Record that new session with <code>asciinema</code> as well, so it can be reviewed and understood by someone who was not present. | |||
---- | ---- | ||
| Line 145: | Line 165: | ||
# Load the recording | # Load the recording | ||
# Create timelines for goal hierarchy levels | # Create timelines for goal hierarchy levels | ||
# Apply annotations per standard | # Apply annotations per standard (see [[#Annotation Standard|Annotation Standard]]) | ||
# Review completeness | # Review for completeness and consistency | ||
# Note spec version used | # Note the spec version used for this annotation (this document version) | ||
# If clarification is needed, contact Person A: | |||
#* Record audio of both ends of the clarification conversation | |||
#* Include the audio (and later transcription) in the <code>.session</code> package | |||
'''Annotator tool:''' | |||
* Use the annotator tool hosted at: <code>https://github.com/arthurwolf/annotator</code> | |||
* (If a specific branch is required for a cohort/release, use the branch specified by the project, e.g. <code>from_students</code> when applicable.) | |||
'''Output:''' Annotated recording | '''Output:''' Annotated recording | ||
| Line 158: | Line 185: | ||
'''Steps:''' | '''Steps:''' | ||
# Transcribe audio with Whisper | # Transcribe audio with Whisper (with timestamps) | ||
# Create <code>.session</code> archive | # Create <code>.session</code> archive | ||
# Include | # Include required files (see [[#Technical Specification|Technical Specification]]) | ||
# | |||
# Upload to repository | # Upload to repository | ||
--- | ---- | ||
== Annotation Standard == | |||
This section defines what to annotate and how. | |||
=== Goal hierarchy and timelines === | |||
* Goals, subgoals, sub-subgoals (and deeper levels as needed) form a '''goal hierarchy'''. | |||
* Ideally, every part of the session should belong to a goal. | |||
* Subgoals should represent meaningful “chunks” of work contributing to a parent goal. | |||
* Goals, subgoals, and sub-(sub…) goals should exist as '''separate timelines''' in the annotator tool: | |||
** Think: one “Goal” timeline spanning the whole file, | |||
** Subgoal timelines covering sections that represent large steps, | |||
** Deeper timelines for finer-grained structure as needed. | |||
=== Success markers === | |||
Use success markers to capture whether a goal (or subgoal) was achieved at the moment where the result is observable. | |||
* <code>successResult</code> — the goal was achieved (success is visible in the recording output) | |||
* <code>successFailure</code> — the goal failed (failure is visible) | |||
* <code>successUnknown</code> — success cannot be determined from the recording (or cannot possibly be visible) | |||
'''When to apply success markers:''' | |||
* Apply success markers only when the success/failure of a command corresponds to the attempted end of a goal. | |||
* Example: | |||
** If a goal is “get my program to compile”, then the output of <code>make</code> (where compilation occurs) is the target moment. | |||
* For activities where the result cannot be observed (e.g., some parts of OS installs), <code>successUnknown</code> is acceptable. | |||
=== Tools and subtools (commands and interfaces) === | |||
* Commands can be annotated as tools/subtools. | |||
* Provide: | |||
** An '''in-context''' description (considering the current goal and what has happened so far) | |||
** Optionally, an '''out-of-context''' description (what the command generally does) | |||
This includes screen-based interactions when they are part of the workflow (if the annotator supports representing them). | |||
=== Mistakes === | |||
Every mistake down to the typo should be tagged, including why it is a mistake and the repercussions. | |||
* <code>mistake</code> — the incorrect action or decision | |||
* <code>mistakeReason</code> — why it was wrong | |||
* <code>mistakeResult</code> — what happened because of it (error message, time lost, wrong path taken, etc.) | |||
=== Secrets and sensitive data === | |||
Use specific tags for secrets and secret-like flows: | |||
* <code>passwordPrompt</code> | |||
* <code>password</code> | |||
* <code>passwordAgain</code> | |||
* <code>secret</code> | |||
Annotate secret handling carefully, and prefer describing the flow rather than reproducing secret material. | |||
=== Discoveries === | |||
Discoveries are things learned during the session that change the plan/goals/approach. | |||
* <code>discovery</code> (or project’s chosen “Discoveries” tag) — a new fact that changes what happens next | |||
* Apply when the discovery causes a change in goals, subgoals, or strategy. | |||
=== Licensing and legally significant text === | |||
If you reproduce a “legally significant” portion of a licensed work (see GNU guidance on “Legally Significant”), list the license in your annotation. | |||
* <code>license</code> | |||
* <code>copyright</code> | |||
When in doubt, prefer summarizing instead of reproducing large chunks of licensed text. | |||
---- | |||
== Application Guidelines == | |||
=== How to apply goal markers === | |||
* When starting a goal, add a unique goal identifier (format guideline): | |||
** <code>[Goal_ + Unique_identifier]</code> followed by the goal description | |||
* When a goal is completed: | |||
** close out the goal and return to the parent goal level (change “level” back) | |||
=== How to apply mistakes === | |||
Whenever there is a mistake: | |||
* Apply <code>mistake</code>, <code>mistakeReason</code>, and <code>mistakeResult</code>. | |||
=== How to apply success markers === | |||
* Apply <code>successResult</code>, <code>successFailure</code>, or <code>successUnknown</code> only when: | |||
** The result of a goal is visible, OR | |||
** The result cannot possibly be visible. | |||
=== General guidance === | |||
* Annotate as you would prefer sessions to be annotated: be as in-depth as you would like. | |||
* The goal of annotation is to describe precisely what is happening in the session: | |||
** what is done, | |||
** why it is done, | |||
** and what outcome it leads to. | |||
---- | |||
== Technical Specification == | |||
=== The .session archive === | |||
A <code>.session</code> file is a zip archive with a <code>.session</code> extension containing (at minimum) the annotated recording and the supporting metadata and audio/transcripts. | |||
=== Required contents === | |||
The <code>.session</code> archive must include: | |||
* <code>session.yaml</code> — session metadata (see below) | |||
* <code>recording.asciinema</code> — the asciinema recording (annotated) | |||
* Audio recordings of the interview and any follow-up clarification conversations | |||
* Transcriptions of those audio files (produced by Whisper), including timestamps | |||
=== session.yaml fields === | |||
<code>session.yaml</code> should contain at least: | |||
* <code>version</code>: Version number for the format specification (this document) used when creating this <code>.session</code> file. | |||
Optional / planned fields (may be included if supported by the project tooling): | |||
* <code>mission</code> (FUTUREWORK): overarching goal being attempted | |||
* <code>plan</code> (FUTUREWORK): intended plan (does not need to match what happened) | |||
* <code>tests</code> (FUTUREWORK): an array of “unit test”-like checks indicating success/failure | |||
** intended to be written in TinTin++ | |||
** loaded after an SSH session has been instantiated | |||
** includes a time limit for considering the test to have gone wrong | |||
** examples: | |||
*** “X command was correctly installed to location Y” | |||
*** “Running <code>command --version</code> returns the right version 0.11” | |||
* <code>system</code> (optional): command/answer pairs that define the environment | |||
** examples: | |||
*** <code>cat /etc/issue.net</code>: <code>Ubuntu 22.04 LTS\n</code> | |||
*** <code>ip address</code>: <code>...</code> | |||
---- | |||
== Quick Reference == | |||
=== Core concepts === | |||
* Use separate timelines for each goal level (goal / subgoal / sub-subgoal). | |||
* Add success markers only at moments where a goal outcome is observable (or provably unobservable). | |||
=== Common tags (non-exhaustive) === | |||
* Goals: <code>goal</code> (plus unique goal identifiers per project convention) | |||
* Success: <code>successResult</code>, <code>successFailure</code>, <code>successUnknown</code> | |||
* Mistakes: <code>mistake</code>, <code>mistakeReason</code>, <code>mistakeResult</code> | |||
* Secrets: <code>passwordPrompt</code>, <code>password</code>, <code>passwordAgain</code>, <code>secret</code> | |||
* Discoveries: <code>discovery</code> (or “Discoveries” tag used by the project) | |||
* Licensing: <code>license</code>, <code>copyright</code> | |||
* Tools: tool/subtool annotations (in-context and optionally out-of-context) | |||
---- | |||
== Complete Worked Example == | |||
<pre> | |||
[Goal_001] Deploy GNU package "foo" on Debian Stable VM | |||
[Subgoal_001a] Install build dependencies | |||
tool: apt-get install ... | |||
successResult: dependencies installed | |||
[Subgoal_001b] Configure build | |||
tool: ./configure --prefix=/usr/local | |||
mistake: used wrong configure flag | |||
mistakeReason: flag is not supported by this version | |||
mistakeResult: configure exits with error, must retry | |||
tool: ./configure --prefix=/usr/local | |||
successResult: configure completed | |||
[Subgoal_001c] Build and verify | |||
tool: make | |||
successResult: compilation completed | |||
tool: make test | |||
successUnknown: tests run but output does not clearly confirm pass/fail (or tests are not available) | |||
discovery: package requires newer libc than expected → adjust approach / select different target or environment | |||
</pre> | |||
(Replace identifiers, tags, and exact phrasing with the conventions supported by the current annotator tool version.) | |||
---- | |||
== Appendices == | |||
=== Appendix A: Recording environment reminders === | |||
* Use a Debian Stable VM for recordings. | |||
* If Debian Stable lacks required packages, upgrading the VM to unstable is allowed. | |||
* VMs may be reused across sessions. | |||
=== Appendix B: Keeping everything inside the terminal === | |||
* All input/output should be attempted through the terminal session. | |||
* For web interactions, prefer CLI tools such as: | |||
** <code>lynx</code>, <code>links</code>, <code>curl</code>, <code>wget</code>, etc. | |||
=== Appendix C: Interview and follow-ups === | |||
* Record both sides separately for quality. | |||
* If any follow-up clarification occurs later: | |||
** record both ends again, | |||
** include audio and transcripts in the <code>.session</code>. | |||
=== Appendix D: Debian Stable VM details === | |||
(See project-specific VM setup instructions for the current cohort/repository; keep this appendix synchronized with those instructions.) | |||
``` | |||
Revision as of 19:35, 22 January 2026
```mediawiki
Librecode Annotation Standard
Document Version: 2.0 Last Updated: 2025-10-01 Target Audience: Session recorders, annotators, reviewers
Table of Contents
- Glossary
- Overview & Purpose
- The Annotation Process
- Annotation Standard
- Application Guidelines
- Technical Specification
- Quick Reference
- Complete Worked Example
- Appendices
Glossary
Session An asciinema recording of a terminal session, typically showing the deployment or configuration of software.
Annotated Session A session that has structured annotations added to it, conforming to this specification.
Recording The raw asciinema capture of terminal activity.
Annotation Structured metadata added to a recording that describes what is happening, why, and the outcome.
Timeline A hierarchical structure in the annotator tool where annotations are organized. Different goal levels exist on separate timelines.
Goal Hierarchy The nested structure of goals, subgoals, and sub-subgoals that describe the overall task and its component steps.
Annotation Tag
A specific marker type (e.g., goal, mistake, successResult) used to categorize parts of the session.
Person A (Recorder) The individual(s) who created the original terminal session recording.
Person B (Annotator) The individual who interviews Person A and adds structured annotations to the recording.
.session File A zip file containing the annotated recording, audio transcriptions, and metadata conforming to this specification.
Overview & Purpose
Purpose of this document
This document exists to:
- Provide an annotation process.
- Provide an annotation standard.
- Collect guidance for people participating in annotation.
An annotated session is defined as an asciinema recording, with annotations added to it, conforming to a version of this document.
What are Annotated Sessions?
An annotated session is an asciinema recording of terminal activity (typically software deployment) that has been enriched with structured annotations describing:
- What actions were performed
- Why those actions were taken
- Whether goals were achieved
- What mistakes occurred and their consequences
- What discoveries changed the approach
Why Create Them?
The Librecode project aims to train Large Language Models (LLMs) to understand and work with terminal recordings. Annotated sessions serve as training data, teaching models to:
- Recognize common deployment patterns
- Understand error recovery strategies
- Learn from human problem-solving approaches
- Eventually perform automatic annotation
Once sufficient manual annotations exist, trained models can automatically annotate new sessions, enabling advanced tooling like automated documentation generation.
Document Purpose
This document provides:
- An annotation process for creating annotated sessions
- An annotation standard defining what to annotate and how
- Guidelines for annotators to ensure consistency and quality
The Annotation Process
The process of creating an annotated session involves four distinct phases:
Phase 1: Recording → Phase 2: Interview → Phase 3: Annotation → Phase 4: Submission
Phase 1: Recording
Objective: Create a terminal recording of meaningful software deployment activity. Who: Person A (one or more individuals)
Steps:
- Set up the recording environment:
- Use a Debian Stable VM (see Appendix D)
- Upgrade to unstable if required packages are missing
- VMs may be reused from previous deployments, or may be new
- Start the
asciinemarecording client - Attempt to deploy a piece of GNU software
- Keep all input/output inside the terminal:
- Use CLI web tools (
curl,wget,lynx,links, etc.)
- Use CLI web tools (
- Stop recording when deployment completes
Output: .cast asciinema recording file
Common Issues:
- Using GUI tools
- Stopping recording too early
Phase 2: Interview
Objective: Ensure the annotator fully understands what was done and why. Who: Person A and Person B
Steps:
- Record audio of both sides separately
- Review the recording together (recommended)
- Explain what was done and why
- Record follow-up clarification conversations if needed
Output:
- Audio recordings (both sides)
- Full understanding of actions and rationale
Why Separate Audio Recordings?
- Better transcription quality
- Clear speaker attribution
- Higher-quality training data
Recommended ways to review the recording together:
tmate+ asciinema playback can work (during playback, holding.fast-forwards)- Screen share while watching the recording
- Screen share while using the annotator tool
If you break into a separate “experimental” session during the meeting:
- This is allowed and often useful.
- Record that new session with
asciinemaas well, so it can be reviewed and understood by someone who was not present.
Phase 3: Annotation
Objective: Add structured annotations using the annotator tool. Who: Person B
Steps:
- Load the recording
- Create timelines for goal hierarchy levels
- Apply annotations per standard (see Annotation Standard)
- Review for completeness and consistency
- Note the spec version used for this annotation (this document version)
- If clarification is needed, contact Person A:
- Record audio of both ends of the clarification conversation
- Include the audio (and later transcription) in the
.sessionpackage
Annotator tool:
- Use the annotator tool hosted at:
https://github.com/arthurwolf/annotator - (If a specific branch is required for a cohort/release, use the branch specified by the project, e.g.
from_studentswhen applicable.)
Output: Annotated recording
Phase 4: Submission
Objective: Package and submit the session.
Steps:
- Transcribe audio with Whisper (with timestamps)
- Create
.sessionarchive - Include required files (see Technical Specification)
- Upload to repository
Annotation Standard
This section defines what to annotate and how.
Goal hierarchy and timelines
- Goals, subgoals, sub-subgoals (and deeper levels as needed) form a goal hierarchy.
- Ideally, every part of the session should belong to a goal.
- Subgoals should represent meaningful “chunks” of work contributing to a parent goal.
- Goals, subgoals, and sub-(sub…) goals should exist as separate timelines in the annotator tool:
- Think: one “Goal” timeline spanning the whole file,
- Subgoal timelines covering sections that represent large steps,
- Deeper timelines for finer-grained structure as needed.
Success markers
Use success markers to capture whether a goal (or subgoal) was achieved at the moment where the result is observable.
successResult— the goal was achieved (success is visible in the recording output)successFailure— the goal failed (failure is visible)successUnknown— success cannot be determined from the recording (or cannot possibly be visible)
When to apply success markers:
- Apply success markers only when the success/failure of a command corresponds to the attempted end of a goal.
- Example:
- If a goal is “get my program to compile”, then the output of
make(where compilation occurs) is the target moment.
- If a goal is “get my program to compile”, then the output of
- For activities where the result cannot be observed (e.g., some parts of OS installs),
successUnknownis acceptable.
Tools and subtools (commands and interfaces)
- Commands can be annotated as tools/subtools.
- Provide:
- An in-context description (considering the current goal and what has happened so far)
- Optionally, an out-of-context description (what the command generally does)
This includes screen-based interactions when they are part of the workflow (if the annotator supports representing them).
Mistakes
Every mistake down to the typo should be tagged, including why it is a mistake and the repercussions.
mistake— the incorrect action or decisionmistakeReason— why it was wrongmistakeResult— what happened because of it (error message, time lost, wrong path taken, etc.)
Secrets and sensitive data
Use specific tags for secrets and secret-like flows:
passwordPromptpasswordpasswordAgainsecret
Annotate secret handling carefully, and prefer describing the flow rather than reproducing secret material.
Discoveries
Discoveries are things learned during the session that change the plan/goals/approach.
discovery(or project’s chosen “Discoveries” tag) — a new fact that changes what happens next- Apply when the discovery causes a change in goals, subgoals, or strategy.
Licensing and legally significant text
If you reproduce a “legally significant” portion of a licensed work (see GNU guidance on “Legally Significant”), list the license in your annotation.
licensecopyright
When in doubt, prefer summarizing instead of reproducing large chunks of licensed text.
Application Guidelines
How to apply goal markers
- When starting a goal, add a unique goal identifier (format guideline):
[Goal_ + Unique_identifier]followed by the goal description
- When a goal is completed:
- close out the goal and return to the parent goal level (change “level” back)
How to apply mistakes
Whenever there is a mistake:
- Apply
mistake,mistakeReason, andmistakeResult.
How to apply success markers
- Apply
successResult,successFailure, orsuccessUnknownonly when:- The result of a goal is visible, OR
- The result cannot possibly be visible.
General guidance
- Annotate as you would prefer sessions to be annotated: be as in-depth as you would like.
- The goal of annotation is to describe precisely what is happening in the session:
- what is done,
- why it is done,
- and what outcome it leads to.
Technical Specification
The .session archive
A .session file is a zip archive with a .session extension containing (at minimum) the annotated recording and the supporting metadata and audio/transcripts.
Required contents
The .session archive must include:
session.yaml— session metadata (see below)recording.asciinema— the asciinema recording (annotated)- Audio recordings of the interview and any follow-up clarification conversations
- Transcriptions of those audio files (produced by Whisper), including timestamps
session.yaml fields
session.yaml should contain at least:
version: Version number for the format specification (this document) used when creating this.sessionfile.
Optional / planned fields (may be included if supported by the project tooling):
mission(FUTUREWORK): overarching goal being attemptedplan(FUTUREWORK): intended plan (does not need to match what happened)tests(FUTUREWORK): an array of “unit test”-like checks indicating success/failure- intended to be written in TinTin++
- loaded after an SSH session has been instantiated
- includes a time limit for considering the test to have gone wrong
- examples:
- “X command was correctly installed to location Y”
- “Running
command --versionreturns the right version 0.11”
system(optional): command/answer pairs that define the environment- examples:
cat /etc/issue.net:Ubuntu 22.04 LTS\nip address:...
- examples:
Quick Reference
Core concepts
- Use separate timelines for each goal level (goal / subgoal / sub-subgoal).
- Add success markers only at moments where a goal outcome is observable (or provably unobservable).
Common tags (non-exhaustive)
- Goals:
goal(plus unique goal identifiers per project convention) - Success:
successResult,successFailure,successUnknown - Mistakes:
mistake,mistakeReason,mistakeResult - Secrets:
passwordPrompt,password,passwordAgain,secret - Discoveries:
discovery(or “Discoveries” tag used by the project) - Licensing:
license,copyright - Tools: tool/subtool annotations (in-context and optionally out-of-context)
Complete Worked Example
[Goal_001] Deploy GNU package "foo" on Debian Stable VM
[Subgoal_001a] Install build dependencies
tool: apt-get install ...
successResult: dependencies installed
[Subgoal_001b] Configure build
tool: ./configure --prefix=/usr/local
mistake: used wrong configure flag
mistakeReason: flag is not supported by this version
mistakeResult: configure exits with error, must retry
tool: ./configure --prefix=/usr/local
successResult: configure completed
[Subgoal_001c] Build and verify
tool: make
successResult: compilation completed
tool: make test
successUnknown: tests run but output does not clearly confirm pass/fail (or tests are not available)
discovery: package requires newer libc than expected → adjust approach / select different target or environment
(Replace identifiers, tags, and exact phrasing with the conventions supported by the current annotator tool version.)
Appendices
Appendix A: Recording environment reminders
- Use a Debian Stable VM for recordings.
- If Debian Stable lacks required packages, upgrading the VM to unstable is allowed.
- VMs may be reused across sessions.
Appendix B: Keeping everything inside the terminal
- All input/output should be attempted through the terminal session.
- For web interactions, prefer CLI tools such as:
lynx,links,curl,wget, etc.
Appendix C: Interview and follow-ups
- Record both sides separately for quality.
- If any follow-up clarification occurs later:
- record both ends again,
- include audio and transcripts in the
.session.
Appendix D: Debian Stable VM details
(See project-specific VM setup instructions for the current cohort/repository; keep this appendix synchronized with those instructions.) ```