CLAIF Wiki - User contributions [en]

Prompting-guidelines

2026-01-22T23:20:08Z

ArthurWolf:

= Librecode Prompting Guidelines for Students =

Some prompting guidelines given to students as they start working on the Librecode project, or if they need to write prompts for any part of the project.

== Prompt Engineering Guides / Documentation ==

* https://platform.openai.com/docs/guides/prompt-engineering
* https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api
* https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/overview
* https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices
* https://ai.google.dev/gemini-api/docs/prompting-strategies
* https://api-docs.deepseek.com/guides/thinking_mode
* https://api-docs.deepseek.com/guides/json_mode
* https://docs.mistral.ai/capabilities/completion/prompting_capabilities
* https://www.llama.com/docs/how-to-guides/prompting/
* https://cohere.com/llmu/prompt-engineering-basics
* https://docs.ai21.com/docs/prompt-engineering
* https://qwen.readthedocs.io/en/latest/getting_started/concepts.html
* https://bigmodel.cn/dev/howuse/prompt
* https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html
* https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/prompt-engineering?view=foundry-classic

'''If you have to read just one, read the Anthropic one.'''

== The Big Important Pieces ==

# Use XML-like tags to structure the document, including on multiple levels. You can use Markdown inside those tags' contents, but avoid structuring using Markdown headers. Models tend to understand structure better using XML-like tags.
# Provide examples, both of good output and of bad output, clearly delimited by XML-like tags. Give ''multiple'' examples. Providing multiple examples is called the "few shots" / "multishot" technique, and can sometimes completely replace fine-tuning.
# Research what format/style was used to train the model you are trying to use or fine-tune; it often can give major insights and solve issues.
# Unless you have a good reason not to, it's generally a good idea to use a temperature of 0 (or the equivalent top-k / top-p / etc.).
# Keep your system prompt simple and short. The place to put instructions is the prompt itself. Putting too much in the system prompt is a common beginner mistake.
# Templating formats like Handlebars or Jinja make for nicer, readable prompt templates/files.
# Provide context: explain in the prompt what the prompt is "for", what the project the prompt is being used for is all about, and any other useful context you can think of.
# Use an LLM to rewrite your prompts. In particular, give it these links, these rules, any other rules you can think of, and instructions to rewrite the prompt following these instructions, and make it clear that they are writing text that will be read by a machine (not by a human) so they can write compact text without any pleasantries. This generally results in much better prompts.
# Use coding agents to work on your prompt templates. Gemini is free.
# Beyond examples, describe the output: its length, format, style, etc.
# If using a thinking model, actually instruct it to think, and even give it examples of how to think; examples of what a useful chain-of-thought looks like for a specific input.
# It's stupid, but giving it a "persona" where you tell it it's a "world expert" at doing whatever you're asking it to do reliably increases performance, even to this day, by a noticeable amount. See studies. Threatening the model also increases performance, but I personally can never get around to doing it...
# Prefill answers: after the end of your prompt, write the beginning of the answer the prompt would answer. This can sometimes help with preventing some issues, but note this won't work for thinking models.

== Prompt Example: Assigning the `group` of the `sortme` Event (HLC Membership) ==

This is an example of a prompt from the project, and under it an "improved" version following the above advice.

'''Source prompt (GitHub):'''
* Raw: https://raw.githubusercontent.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/refs/heads/lv/model0-fine-tuning/models/model_0/system_prompt.txt
* Repo view: https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/blob/lv/model0-fine-tuning/models/model_0/system_prompt.txt

=== Base Prompt (as in the repo) ===

<pre>
# Goal
Your goal is to use a set of higher-level-communications (HLCs) and one final possibly incomplete HLC to assign a group to the last event, by determining whether it should be considered to be a part of the final HLC.

# Definitions
A higher-level communication (HLC) is a series of related events, representing a single idea, concept, or value.
* The first HLC starts at the beginning of the dataset you are evaluating.
* Events in an HLC are contiguous, no event from any other HLC will occur between the first and last event of a given HLC.
* HLCs are complete only when the content of the HLC represents an idea such as one of the examples given; You cannot reason about HLC membership without examining the content.
* Each HLC will have a unique `group` assigned.

Examples of HLCs include:
* A Bash shell prompt
* A Bash shell command
* A response to a shell command
* A complete keyboard shortcut
* A series of backspaces, deletions, navigations, or additions causing a typo
* A series of backspaces, deletions, navigations, or additions correcting a typo

An event captures communication in a terminal session.
* Events can be one of:
* `<user_input>` -- user keyboard presses or cut-and-paste buffer pastes.
* `<system_output>` -- responses from software.
* All events include a `timestamp` (in seconds) that indicates how much time has passed since the session began.
* Events are always provided in non-decreasing timestamp order; ties are in-order in the dataset.
* Events that are part of the same HLC will have the same `group`, with the exception of the final HLC, which may need many events added to it to become a complete HLC.
* Only the last event will have a `sortme` attribute; there will only be one event with a `sortme` attribute in the dataset.

Each `group` is identified by 0, or a positive integer.
* They are used to identify a HLC, are unique, contiguous, and increase by 1 in the dataset each time one HLC stops, and another starts.

The last event is the event immediately prior to the dataset's end:
* The last event has a `sortme` attribute set to `True`.
* The last event has no group assigned. This implies nothing about its HLC membership.
* The last event has the highest `timestamp` in the dataset.
* The event before the last event is always a part of the final HLC.

The final HLC is the last HLC in the dataset.
* The final HLC may or may not be complete.
* The final HLC always contains the event prior to the last event.
* The last event may or may not be a part of the final HLC.

# Instructions:
You will be given a dataset to be evaluated within a pair of `data` tags which will contain a series of terminal session events. At the end of the dataset, you can find the final HLC, and the last event.

Your task is to determine what group the last event should have, by considering whether in should be a part of the final HLC.

## How to Respond:

Respond with the following two items:
* An explanation in English less than or equal to 200 characters in length on why you believe the last event should be considered to be a part of the final HLC, or why it should not.
* Do not add code blocks, or other multi-line formatting.
* If you determine the last event should be considered a part of the final HLC, state what type of HLC you believe the final HLC to be, and whether you believe adding this event to the HLC would make it complete.
* An answer, either:
* The integer `group` of the final HLC -- If you mean to imply that the last event should be joined to the final HLC
* `NEW` -- If you mean to imply that an HLC should be assigned to the next integer after the current final HLC's `group`, and you mean to imply the last event should be in that new `group`

Use the following template to format your response:

Answer: 

### Example Responses
```

The last Event belongs to the final HLC, because it continues the input of the `ssh` command at the Bash prompt.
Answer: 1

```
```

The last Event belongs to a new HLC, because it contains the first characters of the response to the `ssh` command the user entered at the Bash prompt.
Answer: NEW

```

# Notes:
* Do not rely only on `group`s; use content and interaction flow. Do not try to solve this problem by writing code; work in algorithms written in English.
* Most of the time, the dataset will end in an incomplete HLC, even if you were to add the last element to the final HLC; this is normal, as we are processing terminal input as it arrives, not a complete terminal session.
* In a terminal session, if the remote software wants the user to see what they are typing, it has to repeat the characters back to the user. Echoed characters are common, and usually are a part of the same HLC.

# Dataset to be evaluated:
</pre>

=== Improved Prompt (structured tags, multishot examples, context-rich, deterministic) ===

<pre>
<prompt>
<persona>
You are a world expert in segmenting terminal session events into Higher-Level Communications (HLCs) for the AutoDoc / Librecode annotation pipeline.
You are precise, deterministic, and do not invent fields or groups.
</persona>

<context>
AutoDoc converts Asciinema terminal recordings into structured events and derived documentation.
Librecode uses annotated sessions as training data so models learn terminal workflows, error recovery, and common patterns.
This task is a small deterministic classification: decide whether the final event continues the final HLC or starts the next HLC.
</context>

<task>
Given a dataset of terminal events inside <data> ... </data>, assign the correct group to the single event marked sortme="True".
Output either:
- the integer group of the final HLC (join it), OR
- NEW (start a new HLC with the next integer after the current final HLC group).
</task>

<definitions>
<hlc>
An HLC is a contiguous run of related events expressing one coherent unit (one idea/action), such as:
- a shell prompt display
- a single command being typed (including edits, backspaces, cursor moves, pastes)
- the system output produced by a command
- a complete keyboard shortcut / UI interaction sequence
</hlc>

<event>
Each event is either user input or system output.
Events are ordered by non-decreasing timestamp.
Only one event has sortme="True" (the last event).
The event immediately before the sortme event is always part of the final HLC.
</event>

<groups>
Group identifiers are 0 or positive integers.
They are contiguous and increase by 1 each time a new HLC begins.
All events except the sortme event already have a group.
</groups>
</definitions>

<input_format>
The dataset is wrapped in <data> ... </data>.
Inside <data>, each event is represented as either:
- <user_input ...>TEXT</user_input>
- <system_output ...>TEXT</system_output>
Attributes may include:
- timestamp="SECONDS_SINCE_START"
- group="INTEGER" (missing only on the sortme event)
- sortme="True" (present only on the final event)
Use only what is present.
</input_format>

<decision_process>
<step_1>
Identify final_group := the group of the event immediately before the sortme event.
That event is guaranteed to be in the final HLC.
</step_1>

<step_2>
Identify the final HLC "type" by reading the content of the final_group events near the end (prompt vs command input vs command output vs edit sequence vs shortcut).
</step_2>

<step_3>
Decide whether the sortme event is the same coherent unit (same HLC) or the start of the next unit (new HLC).
Use content and interaction flow; do NOT decide using group numbers alone.
</step_3>

<continuation_indicators>
Strong signals the sortme event belongs to final_group:
- It continues the same command line being typed (more characters, paste, or edits on the same line).
- It is echoed text that corresponds to recent user input (terminal echo), still part of the same action.
- It continues the same output stream of a command (more lines of the same response, same error message, same progress output).
- It continues the same edit/repair sequence (backspaces/cursor moves fixing a typo) that was already happening in final_group.
</continuation_indicators>

<new_hlc_indicators>
Strong signals the sortme event should be NEW:
- A shell prompt appears after command output (prompt is its own HLC in this schema).
- New user typing begins after a prompt HLC (typing a command is a new HLC).
- The sortme event clearly starts a different activity type than final_group (e.g., final_group is output, sortme is the next prompt; or final_group is prompt, sortme is command input).
- A completed action boundary is visible (e.g., the command output ends and a fresh prompt appears; or a prompt ends and the user begins a new command).
</new_hlc_indicators>

<tie_breaker>
If ambiguous, prefer joining final_group unless there is a clear boundary marker (prompt boundary, command submit boundary, or obvious activity-type switch).
Do not guess new groups without evidence.
</tie_breaker>

<thinking>
If you are a thinking-capable model: think privately.
Do NOT reveal hidden reasoning.
The explanation must be <= 200 characters.
</thinking>
</decision_process>

<output_format>
Output exactly 2 lines, no extra whitespace lines, no code blocks, no bullets:
Line 1: an English explanation <= 200 characters.
Line 2: "Answer: " followed by either an integer (final_group) or "NEW".

Allowed:
Explanation: single line, short, concrete, references content (prompt/command/output/edit).
Forbidden:
- multi-line explanations
- markdown formatting
- additional fields
- JSON/YAML output
</output_format>

<examples>
<good_example>
<input>
<data>
<system_output timestamp="0.0" group="0">$ </system_output>
<user_input timestamp="0.2" group="1">ssh user@host</user_input>
<user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
</data>
</input>
<output>
Continues the same command input line (ssh options), so it belongs to the final command-typing HLC.
Answer: 1
</output>
</good_example>

<good_example>
<input>
<data>
<user_input timestamp="10.0" group="4">ls\r</user_input>
<system_output timestamp="10.1" group="5">file_a\nfile_b\n</system_output>
<system_output timestamp="10.2" sortme="True">$ </system_output>
</data>
</input>
<output>
The prompt is a new HLC after command output, so the final event starts the next group.
Answer: NEW
</output>
</good_example>

<good_example>
<input>
<data>
<system_output timestamp="20.0" group="7">$ </system_output>
<user_input timestamp="20.5" sortme="True">c</user_input>
</data>
</input>
<output>
Prompt (group 7) is complete; new typing begins a new command-input HLC, so it must be NEW.
Answer: NEW
</output>
</good_example>

<bad_output_example>
<input>
<data>
<system_output timestamp="0.0" group="0">$ </system_output>
<user_input timestamp="0.2" group="1">ssh user@host</user_input>
<user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
</data>
</input>
<bad_output>
Here is my reasoning:
- It looks like it continues the previous thing.
Answer: 1
</bad_output>
<why_bad>
Invalid: extra lines + bullets. Output must be exactly 2 lines.
</why_bad>
</bad_output_example>

<bad_output_example>
<input>
<data>
<system_output timestamp="10.2" sortme="True">$ </system_output>
</data>
</input>
<bad_output>
The prompt should be group 999 because it is a prompt and prompts should be high numbers.
Answer: 999
</bad_output>
<why_bad>
Invalid: invented group number; only final_group or NEW are allowed.
</why_bad>
</bad_output_example>

<bad_output_example>
<input>
<data>
<system_output timestamp="10.2" sortme="True">$ </system_output>
</data>
</input>
<bad_output>
The prompt is new.
Answer: NEW!!!
</bad_output>
<why_bad>
Invalid: Answer must be exactly NEW (no punctuation).
</why_bad>
</bad_output_example>
</examples>

<runtime_settings>
Prefer deterministic decoding (temperature 0 or equivalent).
</runtime_settings>
</prompt>

<data>
<<<DATASET_EVENTS_GO_HERE>>>
</data>
</pre>

Prompting-guidelines

2026-01-22T23:18:15Z

ArthurWolf:

= Librecode Prompting Guidelines for Students =

Some prompting guidelines given to students as they start working on the Librecode project, or if they need to write prompts for any part of the project.

== Prompt Engineering Guides / Documentation ==

* https://platform.openai.com/docs/guides/prompt-engineering
* https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api
* https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/overview
* https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-4-best-practices
* https://ai.google.dev/gemini-api/docs/prompting-strategies
* https://api-docs.deepseek.com/guides/thinking_mode
* https://api-docs.deepseek.com/guides/json_mode
* https://docs.mistral.ai/capabilities/completion/prompting_capabilities
* https://www.llama.com/docs/how-to-guides/prompting/
* https://cohere.com/llmu/prompt-engineering-basics
* https://docs.ai21.com/docs/prompt-engineering
* https://qwen.readthedocs.io/en/latest/getting_started/concepts.html
* https://bigmodel.cn/dev/howuse/prompt
* https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-engineering-guidelines.html
* https://learn.microsoft.com/en-us/azure/ai-foundry/openai/concepts/prompt-engineering?view=foundry-classic

'''If you have to read just one, read the Anthropic one.'''

== The Big Important Pieces ==

# Use XML-like tags to structure the document, including on multiple levels. You can use Markdown inside those tags' contents, but avoid structuring using Markdown headers. Models tend to understand structure better using XML-like tags.
# Provide examples, both of good output and of bad output, clearly delimited by XML-like tags. Give ''multiple'' examples. Providing multiple examples is called the "few shots" / "multishot" technique, and can sometimes completely replace fine-tuning.
# Research what format/style was used to train the model you are trying to use or fine-tune; it often can give major insights and solve issues.
# Unless you have a good reason not to, it's generally a good idea to use a temperature of 0 (or the equivalent top-k / top-p / etc.).
# Keep your system prompt simple and short. The place to put instructions is the prompt itself. Putting too much in the system prompt is a common beginner mistake.
# Templating formats like Handlebars or Jinja make for nicer, readable prompt templates/files.
# Provide context: explain in the prompt what the prompt is "for", what the project the prompt is being used for is all about, and any other useful context you can think of.
# Use an LLM to rewrite your prompts. In particular, give it these links, these rules, any other rules you can think of, and instructions to rewrite the prompt following these instructions, and make it clear that they are writing text that will be read by a machine (not by a human) so they can write compact text without any pleasantries. This generally results in much better prompts.
# Use coding agents to work on your prompt templates. Gemini is free.
# Beyond examples, describe the output: its length, format, style, etc.
# If using a thinking model, actually instruct it to think, and even give it examples of how to think; examples of what a useful chain-of-thought looks like for a specific input.
# It's stupid, but giving it a "persona" where you tell it it's a "world expert" at doing whatever you're asking it to do reliably increases performance, even to this day, by a noticeable amount. See studies. Threatening the model also increases performance, but I personally can never get around to doing it...
# Prefill answers: after the end of your prompt, write the beginning of the answer the prompt would answer. This can sometimes help with preventing some issues, but note this won't work for thinking models.

== Prompt Example: Assigning the `group` of the `sortme` Event (HLC Membership) ==

This is an example of a prompt from the project, and under it an "improved" version following the above advice.

'''Source prompt (GitHub):'''
* Raw: https://raw.githubusercontent.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/refs/heads/lv/model0-fine-tuning/models/model_0/system_prompt.txt
* Repo view: https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/blob/lv/model0-fine-tuning/models/model_0/system_prompt.txt

=== Base Prompt (as in the repo) ===
<syntaxhighlight lang="text">
# Goal
Your goal is to use a set of higher-level-communications (HLCs) and one final possibly incomplete HLC to assign a group to the last event, by determining whether it should be considered to be a part of the final HLC.

# Definitions
A higher-level communication (HLC) is a series of related events, representing a single idea, concept, or value.
* The first HLC starts at the beginning of the dataset you are evaluating.
* Events in an HLC are contiguous, no event from any other HLC will occur between the first and last event of a given HLC.
* HLCs are complete only when the content of the HLC represents an idea such as one of the examples given; You cannot reason about HLC membership without examining the content.
* Each HLC will have a unique `group` assigned.

Examples of HLCs include:
* A Bash shell prompt
* A Bash shell command
* A response to a shell command
* A complete keyboard shortcut
* A series of backspaces, deletions, navigations, or additions causing a typo
* A series of backspaces, deletions, navigations, or additions correcting a typo

An event captures communication in a terminal session.
* Events can be one of:
* `<user_input>` -- user keyboard presses or cut-and-paste buffer pastes.
* `<system_output>` -- responses from software.
* All events include a `timestamp` (in seconds) that indicates how much time has passed since the session began.
* Events are always provided in non-decreasing timestamp order; ties are in-order in the dataset.
* Events that are part of the same HLC will have the same `group`, with the exception of the final HLC, which may need many events added to it to become a complete HLC.
* Only the last event will have a `sortme` attribute; there will only be one event with a `sortme` attribute in the dataset.

Each `group` is identified by 0, or a positive integer.
* They are used to identify a HLC, are unique, contiguous, and increase by 1 in the dataset each time one HLC stops, and another starts.

The last event is the event immediately prior to the dataset's end:
* The last event has a `sortme` attribute set to `True`.
* The last event has no group assigned. This implies nothing about its HLC membership.
* The last event has the highest `timestamp` in the dataset.
* The event before the last event is always a part of the final HLC.

The final HLC is the last HLC in the dataset.
* The final HLC may or may not be complete.
* The final HLC always contains the event prior to the last event.
* The last event may or may not be a part of the final HLC.

# Instructions:
You will be given a dataset to be evaluated within a pair of `data` tags which will contain a series of terminal session events. At the end of the dataset, you can find the final HLC, and the last event.

Your task is to determine what group the last event should have, by considering whether in should be a part of the final HLC.

## How to Respond:

Respond with the following two items:
* An explanation in English less than or equal to 200 characters in length on why you believe the last event should be considered to be a part of the final HLC, or why it should not.
* Do not add code blocks, or other multi-line formatting.
* If you determine the last event should be considered a part of the final HLC, state what type of HLC you believe the final HLC to be, and whether you believe adding this event to the HLC would make it complete.
* An answer, either:
* The integer `group` of the final HLC -- If you mean to imply that the last event should be joined to the final HLC
* `NEW` -- If you mean to imply that an HLC should be assigned to the next integer after the current final HLC's `group`, and you mean to imply the last event should be in that new `group`

Use the following template to format your response:

Answer: 

### Example Responses
```

The last Event belongs to the final HLC, because it continues the input of the `ssh` command at the Bash prompt.
Answer: 1

```
```

The last Event belongs to a new HLC, because it contains the first characters of the response to the `ssh` command the user entered at the Bash prompt.
Answer: NEW

```

# Notes:
* Do not rely only on `group`s; use content and interaction flow. Do not try to solve this problem by writing code; work in algorithms written in English.
* Most of the time, the dataset will end in an incomplete HLC, even if you were to add the last element to the final HLC; this is normal, as we are processing terminal input as it arrives, not a complete terminal session.
* In a terminal session, if the remote software wants the user to see what they are typing, it has to repeat the characters back to the user. Echoed characters are common, and usually are a part of the same HLC.

# Dataset to be evaluated:
</syntaxhighlight>

=== Improved Prompt (structured tags, multishot examples, context-rich, deterministic) ===

<syntaxhighlight lang="text">
<prompt>
<persona>
You are a world expert in segmenting terminal session events into Higher-Level Communications (HLCs) for the AutoDoc / Librecode annotation pipeline.
You are precise, deterministic, and do not invent fields or groups.
</persona>

<context>
AutoDoc converts Asciinema terminal recordings into structured events and derived documentation.
Librecode uses annotated sessions as training data so models learn terminal workflows, error recovery, and common patterns.
This task is a small deterministic classification: decide whether the final event continues the final HLC or starts the next HLC.
</context>

<task>
Given a dataset of terminal events inside <data> ... </data>, assign the correct group to the single event marked sortme="True".
Output either:
- the integer group of the final HLC (join it), OR
- NEW (start a new HLC with the next integer after the current final HLC group).
</task>

<definitions>
<hlc>
An HLC is a contiguous run of related events expressing one coherent unit (one idea/action), such as:
- a shell prompt display
- a single command being typed (including edits, backspaces, cursor moves, pastes)
- the system output produced by a command
- a complete keyboard shortcut / UI interaction sequence
</hlc>

<event>
Each event is either user input or system output.
Events are ordered by non-decreasing timestamp.
Only one event has sortme="True" (the last event).
The event immediately before the sortme event is always part of the final HLC.
</event>

<groups>
Group identifiers are 0 or positive integers.
They are contiguous and increase by 1 each time a new HLC begins.
All events except the sortme event already have a group.
</groups>
</definitions>

<input_format>
The dataset is wrapped in <data> ... </data>.
Inside <data>, each event is represented as either:
- <user_input ...>TEXT</user_input>
- <system_output ...>TEXT</system_output>
Attributes may include:
- timestamp="SECONDS_SINCE_START"
- group="INTEGER" (missing only on the sortme event)
- sortme="True" (present only on the final event)
Use only what is present.
</input_format>

<decision_process>
<step_1>
Identify final_group := the group of the event immediately before the sortme event.
That event is guaranteed to be in the final HLC.
</step_1>

<step_2>
Identify the final HLC "type" by reading the content of the final_group events near the end (prompt vs command input vs command output vs edit sequence vs shortcut).
</step_2>

<step_3>
Decide whether the sortme event is the same coherent unit (same HLC) or the start of the next unit (new HLC).
Use content and interaction flow; do NOT decide using group numbers alone.
</step_3>

<continuation_indicators>
Strong signals the sortme event belongs to final_group:
- It continues the same command line being typed (more characters, paste, or edits on the same line).
- It is echoed text that corresponds to recent user input (terminal echo), still part of the same action.
- It continues the same output stream of a command (more lines of the same response, same error message, same progress output).
- It continues the same edit/repair sequence (backspaces/cursor moves fixing a typo) that was already happening in final_group.
</continuation_indicators>

<new_hlc_indicators>
Strong signals the sortme event should be NEW:
- A shell prompt appears after command output (prompt is its own HLC in this schema).
- New user typing begins after a prompt HLC (typing a command is a new HLC).
- The sortme event clearly starts a different activity type than final_group (e.g., final_group is output, sortme is the next prompt; or final_group is prompt, sortme is command input).
- A completed action boundary is visible (e.g., the command output ends and a fresh prompt appears; or a prompt ends and the user begins a new command).
</new_hlc_indicators>

<tie_breaker>
If ambiguous, prefer joining final_group unless there is a clear boundary marker (prompt boundary, command submit boundary, or obvious activity-type switch).
Do not guess new groups without evidence.
</tie_breaker>

<thinking>
If you are a thinking-capable model: think privately.
Do NOT reveal hidden reasoning.
The explanation must be <= 200 characters.
</thinking>
</decision_process>

<output_format>
Output exactly 2 lines, no extra whitespace lines, no code blocks, no bullets:
Line 1: an English explanation <= 200 characters.
Line 2: "Answer: " followed by either an integer (final_group) or "NEW".

Allowed:
Explanation: single line, short, concrete, references content (prompt/command/output/edit).
Forbidden:
- multi-line explanations
- markdown formatting
- additional fields
- JSON/YAML output
</output_format>

<examples>
<good_example>
<input>
<data>
<system_output timestamp="0.0" group="0">$ </system_output>
<user_input timestamp="0.2" group="1">ssh user@host</user_input>
<user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
</data>
</input>
<output>
Continues the same command input line (ssh options), so it belongs to the final command-typing HLC.
Answer: 1
</output>
</good_example>

<good_example>
<input>
<data>
<user_input timestamp="10.0" group="4">ls\r</user_input>
<system_output timestamp="10.1" group="5">file_a\nfile_b\n</system_output>
<system_output timestamp="10.2" sortme="True">$ </system_output>
</data>
</input>
<output>
The prompt is a new HLC after command output, so the final event starts the next group.
Answer: NEW
</output>
</good_example>

<good_example>
<input>
<data>
<system_output timestamp="20.0" group="7">$ </system_output>
<user_input timestamp="20.5" sortme="True">c</user_input>
</data>
</input>
<output>
Prompt (group 7) is complete; new typing begins a new command-input HLC, so it must be NEW.
Answer: NEW
</output>
</good_example>

<bad_output_example>
<input>
<data>
<system_output timestamp="0.0" group="0">$ </system_output>
<user_input timestamp="0.2" group="1">ssh user@host</user_input>
<user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
</data>
</input>
<bad_output>
Here is my reasoning:
- It looks like it continues the previous thing.
Answer: 1
</bad_output>
<why_bad>
Invalid: extra lines + bullets. Output must be exactly 2 lines.
</why_bad>
</bad_output_example>

<bad_output_example>
<input>
<data>
<system_output timestamp="10.2" sortme="True">$ </system_output>
</data>
</input>
<bad_output>
The prompt should be group 999 because it is a prompt and prompts should be high numbers.
Answer: 999
</bad_output>
<why_bad>
Invalid: invented group number; only final_group or NEW are allowed.
</why_bad>
</bad_output_example>

<bad_output_example>
<input>
<data>
<system_output timestamp="10.2" sortme="True">$ </system_output>
</data>
</input>
<bad_output>
The prompt is new.
Answer: NEW!!!
</bad_output>
<why_bad>
Invalid: Answer must be exactly NEW (no punctuation).
</why_bad>
</bad_output_example>
</examples>

<runtime_settings>
Prefer deterministic decoding (temperature 0 or equivalent).
</runtime_settings>
</prompt>

<data>
<<<DATASET_EVENTS_GO_HERE>>>
</data>
</syntaxhighlight>

Student Groups

2026-01-22T23:09:36Z

ArthurWolf:

= Student Groups =

This page documents the University of Toronto Mississauga (UTM) student groups who contributed to the LibreCode / Annotator ecosystem between mid-2024 and late-2025, specifically around: converting asciinema terminal recordings into structured events, annotations, and derived documentation artifacts using LLMs.

== Project context ==

'''Mentors'''
* Julia Longtin — https://github.com/julialongtin
* Arthur Wolf — https://github.com/arthurwolf

'''Mentorship program'''
* Human Feedback Foundation (Linux Foundation entity): https://humanfeedback.io/
* University of Toronto Mississauga: https://www.utm.utoronto.ca/

'''Core LibreCode resources'''
* Annotator repository: https://github.com/arthurwolf/annotator
* LibreCode / FaikVM wiki: https://wiki.faikvm.com/mediawiki/index.php/Main_Page
* Public hosted annotator instance: https://linuxpmi.org/
* Prompting guidelines/advice: [[Prompting-guidelines]]

Most student repositories are hosted under this GitHub organization:
https://github.com/CSC392-CSC492-Building-AI-ML-systems

----

== Fall 2024 – Early AutoDocs prototype ==

=== What they worked on ===
This group produced an early prototype of what later became ''AutoDocs'': tooling to segment asciinema terminal recordings into meaningful chunks and generate higher-level annotations.

This work served primarily as a proof of concept and a starting point for later cohorts.

=== Contributors ===
''(TODO: add names / GitHub links if identified)''

=== Code and artifacts ===
* Archived Fall 2024 code base (referenced in later repos):
** Mentioned in the AutoDocs README as '''“Fall 2024 Team’s Code Base”'''
** Linked from the AutoDocs repository:
*** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Notes ===
Later documentation notes that much of this code is outdated or non-functional, but it remains historically important.

----

== Winter 2025 – AutoDocs expansion + documentation ==

=== What they worked on ===
This cohort rebuilt and extended the AutoDocs pipeline into a more complete system and produced formal documentation of their work.

The repository includes a tagged release explicitly described as a rewrite of the project by the Winter 2025 team.
Release link: https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent/releases/tag/winter-2025

=== Contributors ===
'''Known from the Winter 2025 release note and repository contributors list'''
* Brian Zhang — https://github.com/Pyosimros
* Vraj Patel — https://github.com/Vraj-Patel1
* Dan Nguyen — https://github.com/nuhgooyin
* Adreano La Rosa — ''(listed in release note; GitHub handle not yet confirmed)''

'''Additional contributors shown by GitHub'''
* Abdallah Enaya — https://github.com/abdullah-enaya
* Renee K — https://github.com/renee-k
* aml-8 — https://github.com/aml-8
* Christopher Flores — https://github.com/cfstar188
* Uyiosa Iyekekpolor — https://github.com/uyoyo0
* eyexjay — https://github.com/eyexjay

=== Code and artifacts ===
* Main AutoDocs repository:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent
* Release tag capturing the Winter 2025 state:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent/releases/tag/winter-2025
* Public talk page referencing this pipeline (AI Tinkerers Toronto, March 2025):
** https://toronto.aitinkerers.org/talks/rsvp_14QYpww1FyE

=== Notes ===
The current AutoDocs repository states that it was “modified and extended from the Winter 2025 team’s code base”.

----

== AutoDocs (consolidated / ongoing repository) ==

''This is a living repository spanning multiple student cohorts rather than a single group.''

=== Purpose ===
AutoDocs processes asciinema terminal recordings and produces structured outputs such as:
* segmented command ''events'',
* annotated explanations,
* derived artifacts (for example, Dockerfiles).

=== People ===
* Julia Longtin (lead contact): https://github.com/julialongtin
* Model publisher referenced in the README:
** '''bria7801''' on Hugging Face:
*** https://huggingface.co/bria7801/model-0
*** https://huggingface.co/bria7801/model-1
*** https://huggingface.co/bria7801/model-3

=== Repository ===
* https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Contents (high level) ===
* ``data/``, ``frontend/``, ``models/`` directories
* Multiple parser scripts (Parser 0 / 1 / 2)
* References to fine-tuned model checkpoints via Hugging Face links

----

== Autumn 2025 – DocStream consolidation (Educational AI Agent) ==

This appears to be a later cohort or iteration that built on AutoDocs and re-framed the system as '''DocStream''' (same core idea: streamed asciinema logs → events → hierarchical annotations → documentation).
Repository: https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent

=== What they worked on ===
From the repository README, DocStream:
* converts raw, noisy terminal activity into structured, reproducible developer documentation,
* processes streamed asciinema logs,
* segments them into meaningful events,
* generates hierarchical annotations explaining terminal activity,
* includes an evaluation harness (based on an extended EleutherAI LM Evaluation Harness) with task and metric scaffolding under ``data/llm_Evaluation/``.

=== Code and artifacts ===
* Main repository:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent
* Repository structure pointers:
** ``data/`` — datasets and evaluation harness inputs
** ``models/model_0/`` — segmentation training and inference
** ``models/model_1/`` — annotation training and inference
** ``demo/`` — front-end visualization demo
** ``runpod/`` — deployment and runpod materials
* White paper included in the repository root:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/blob/main/WhitePaper.docx
* Previous iteration explicitly linked from the DocStream README:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Models ===
* '''Model 0 — Event Segmentation'''
** Segments streamed terminal logs into XML-structured ''events''
* '''Model 1 — Hierarchical Annotation'''
** Reads Model 0 event chunks and generates summaries with hierarchical depth (goal / subtask structure)

=== Contributors ===
GitHub accounts appearing repeatedly in the commit history and likely core Autumn 2025 contributors:
* Ryan Pankratz — https://github.com/ryan-pankratz
** (also appears as: https://github.com/RyanPankratz)
* Victor Shea — https://github.com/VictorShea
* Moe Reda — https://github.com/Moe-Reda
* Patea4 — https://github.com/Patea4

Student Groups

2026-01-22T23:09:03Z

ArthurWolf:

= Student Groups =

This page documents the University of Toronto Mississauga (UTM) student groups who contributed to the LibreCode / Annotator ecosystem between mid-2024 and late-2025, specifically around: converting asciinema terminal recordings into structured events, annotations, and derived documentation artifacts using LLMs.

== Project context ==

'''Mentors'''
* Julia Longtin — https://github.com/julialongtin
* Arthur Wolf — https://github.com/arthurwolf

'''Mentorship program'''
* Human Feedback Foundation (Linux Foundation entity): https://humanfeedback.io/
* University of Toronto Mississauga: https://www.utm.utoronto.ca/

'''Core LibreCode resources'''
* Annotator repository: https://github.com/arthurwolf/annotator
* LibreCode / FaikVM wiki: https://wiki.faikvm.com/mediawiki/index.php/Main_Page
* Public hosted annotator instance: https://linuxpmi.org/
* Prompting guidelines/advice: https://wiki.faikvm.com/mediawiki/index.php/Prompting-guidelines

Most student repositories are hosted under this GitHub organization:
https://github.com/CSC392-CSC492-Building-AI-ML-systems

----

== Fall 2024 – Early AutoDocs prototype ==

=== What they worked on ===
This group produced an early prototype of what later became ''AutoDocs'': tooling to segment asciinema terminal recordings into meaningful chunks and generate higher-level annotations.

This work served primarily as a proof of concept and a starting point for later cohorts.

=== Contributors ===
''(TODO: add names / GitHub links if identified)''

=== Code and artifacts ===
* Archived Fall 2024 code base (referenced in later repos):
** Mentioned in the AutoDocs README as '''“Fall 2024 Team’s Code Base”'''
** Linked from the AutoDocs repository:
*** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Notes ===
Later documentation notes that much of this code is outdated or non-functional, but it remains historically important.

----

== Winter 2025 – AutoDocs expansion + documentation ==

=== What they worked on ===
This cohort rebuilt and extended the AutoDocs pipeline into a more complete system and produced formal documentation of their work.

The repository includes a tagged release explicitly described as a rewrite of the project by the Winter 2025 team.
Release link: https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent/releases/tag/winter-2025

=== Contributors ===
'''Known from the Winter 2025 release note and repository contributors list'''
* Brian Zhang — https://github.com/Pyosimros
* Vraj Patel — https://github.com/Vraj-Patel1
* Dan Nguyen — https://github.com/nuhgooyin
* Adreano La Rosa — ''(listed in release note; GitHub handle not yet confirmed)''

'''Additional contributors shown by GitHub'''
* Abdallah Enaya — https://github.com/abdullah-enaya
* Renee K — https://github.com/renee-k
* aml-8 — https://github.com/aml-8
* Christopher Flores — https://github.com/cfstar188
* Uyiosa Iyekekpolor — https://github.com/uyoyo0
* eyexjay — https://github.com/eyexjay

=== Code and artifacts ===
* Main AutoDocs repository:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent
* Release tag capturing the Winter 2025 state:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent/releases/tag/winter-2025
* Public talk page referencing this pipeline (AI Tinkerers Toronto, March 2025):
** https://toronto.aitinkerers.org/talks/rsvp_14QYpww1FyE

=== Notes ===
The current AutoDocs repository states that it was “modified and extended from the Winter 2025 team’s code base”.

----

== AutoDocs (consolidated / ongoing repository) ==

''This is a living repository spanning multiple student cohorts rather than a single group.''

=== Purpose ===
AutoDocs processes asciinema terminal recordings and produces structured outputs such as:
* segmented command ''events'',
* annotated explanations,
* derived artifacts (for example, Dockerfiles).

=== People ===
* Julia Longtin (lead contact): https://github.com/julialongtin
* Model publisher referenced in the README:
** '''bria7801''' on Hugging Face:
*** https://huggingface.co/bria7801/model-0
*** https://huggingface.co/bria7801/model-1
*** https://huggingface.co/bria7801/model-3

=== Repository ===
* https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Contents (high level) ===
* ``data/``, ``frontend/``, ``models/`` directories
* Multiple parser scripts (Parser 0 / 1 / 2)
* References to fine-tuned model checkpoints via Hugging Face links

----

== Autumn 2025 – DocStream consolidation (Educational AI Agent) ==

This appears to be a later cohort or iteration that built on AutoDocs and re-framed the system as '''DocStream''' (same core idea: streamed asciinema logs → events → hierarchical annotations → documentation).
Repository: https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent

=== What they worked on ===
From the repository README, DocStream:
* converts raw, noisy terminal activity into structured, reproducible developer documentation,
* processes streamed asciinema logs,
* segments them into meaningful events,
* generates hierarchical annotations explaining terminal activity,
* includes an evaluation harness (based on an extended EleutherAI LM Evaluation Harness) with task and metric scaffolding under ``data/llm_Evaluation/``.

=== Code and artifacts ===
* Main repository:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent
* Repository structure pointers:
** ``data/`` — datasets and evaluation harness inputs
** ``models/model_0/`` — segmentation training and inference
** ``models/model_1/`` — annotation training and inference
** ``demo/`` — front-end visualization demo
** ``runpod/`` — deployment and runpod materials
* White paper included in the repository root:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/blob/main/WhitePaper.docx
* Previous iteration explicitly linked from the DocStream README:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Models ===
* '''Model 0 — Event Segmentation'''
** Segments streamed terminal logs into XML-structured ''events''
* '''Model 1 — Hierarchical Annotation'''
** Reads Model 0 event chunks and generates summaries with hierarchical depth (goal / subtask structure)

=== Contributors ===
GitHub accounts appearing repeatedly in the commit history and likely core Autumn 2025 contributors:
* Ryan Pankratz — https://github.com/ryan-pankratz
** (also appears as: https://github.com/RyanPankratz)
* Victor Shea — https://github.com/VictorShea
* Moe Reda — https://github.com/Moe-Reda
* Patea4 — https://github.com/Patea4

Prompting-guidelines

2026-01-22T23:08:15Z

ArthurWolf: Created page with "= Librecode Prompting Guidelines for Students = Some prompting guidelines given to students as they start working on the Librecode project, or if they need to write prompts for any part of the project. == Prompt Engineering Guides / Documentation == * https://platform.openai.com/docs/guides/prompt-engineering * https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api * https://platform.claude.com/docs/en/build-with-claude/p..."

AnnotationStandard

2026-01-22T19:35:08Z

ArthurWolf: Migrating missing data from the older cryptpad.fr annotation standard.

```mediawiki
= Librecode Annotation Standard =

'''Document Version:''' 2.0
'''Last Updated:''' 2025-10-01
'''Target Audience:''' Session recorders, annotators, reviewers

----

== Table of Contents ==
# [[#Glossary|Glossary]]
# [[#Overview & Purpose|Overview & Purpose]]
# [[#The Annotation Process|The Annotation Process]]
# [[#Annotation Standard|Annotation Standard]]
# [[#Application Guidelines|Application Guidelines]]
# [[#Technical Specification|Technical Specification]]
# [[#Quick Reference|Quick Reference]]
# [[#Complete Worked Example|Complete Worked Example]]
# [[#Appendices|Appendices]]

----

== Glossary ==

'''Session'''
An asciinema recording of a terminal session, typically showing the deployment or configuration of software.

'''Annotated Session'''
A session that has structured annotations added to it, conforming to this specification.

'''Recording'''
The raw asciinema capture of terminal activity.

'''Annotation'''
Structured metadata added to a recording that describes what is happening, why, and the outcome.

'''Timeline'''
A hierarchical structure in the annotator tool where annotations are organized. Different goal levels exist on separate timelines.

'''Goal Hierarchy'''
The nested structure of goals, subgoals, and sub-subgoals that describe the overall task and its component steps.

'''Annotation Tag'''
A specific marker type (e.g., <code>goal</code>, <code>mistake</code>, <code>successResult</code>) used to categorize parts of the session.

'''Person A (Recorder)'''
The individual(s) who created the original terminal session recording.

'''Person B (Annotator)'''
The individual who interviews Person A and adds structured annotations to the recording.

'''.session File'''
A zip file containing the annotated recording, audio transcriptions, and metadata conforming to this specification.

----

== Overview & Purpose ==

=== Purpose of this document ===

This document exists to:
* Provide an annotation process.
* Provide an annotation standard.
* Collect guidance for people participating in annotation.

An '''annotated session''' is defined as an asciinema recording, with annotations added to it, conforming to a version of this document.

=== What are Annotated Sessions? ===

An '''annotated session''' is an asciinema recording of terminal activity (typically software deployment) that has been enriched with structured annotations describing:
* What actions were performed
* Why those actions were taken
* Whether goals were achieved
* What mistakes occurred and their consequences
* What discoveries changed the approach

=== Why Create Them? ===

The Librecode project aims to train Large Language Models (LLMs) to understand and work with terminal recordings. Annotated sessions serve as training data, teaching models to:
* Recognize common deployment patterns
* Understand error recovery strategies
* Learn from human problem-solving approaches
* Eventually perform automatic annotation

Once sufficient manual annotations exist, trained models can automatically annotate new sessions, enabling advanced tooling like automated documentation generation.

=== Document Purpose ===

This document provides:
# An annotation process for creating annotated sessions
# An annotation standard defining what to annotate and how
# Guidelines for annotators to ensure consistency and quality

----

== The Annotation Process ==

The process of creating an annotated session involves four distinct phases:

<pre>
Phase 1: Recording → Phase 2: Interview → Phase 3: Annotation → Phase 4: Submission
</pre>

=== Phase 1: Recording ===

'''Objective:''' Create a terminal recording of meaningful software deployment activity.
'''Who:''' Person A (one or more individuals)

'''Steps:'''
# Set up the recording environment:
#* Use a Debian Stable VM (see Appendix D)
#* Upgrade to unstable if required packages are missing
# VMs may be reused from previous deployments, or may be new
# Start the <code>asciinema</code> recording client
# Attempt to deploy a piece of GNU software
# Keep all input/output inside the terminal:
#* Use CLI web tools (<code>curl</code>, <code>wget</code>, <code>lynx</code>, <code>links</code>, etc.)
# Stop recording when deployment completes

'''Output:''' <code>.cast</code> asciinema recording file

'''Common Issues:'''
* Using GUI tools
* Stopping recording too early

----

=== Phase 2: Interview ===

'''Objective:''' Ensure the annotator fully understands what was done and why.
'''Who:''' Person A and Person B

'''Steps:'''
# Record audio of both sides separately
# Review the recording together (recommended)
# Explain what was done and why
# Record follow-up clarification conversations if needed

'''Output:'''
* Audio recordings (both sides)
* Full understanding of actions and rationale

'''Why Separate Audio Recordings?'''
* Better transcription quality
* Clear speaker attribution
* Higher-quality training data

'''Recommended ways to review the recording together:'''
* <code>tmate</code> + asciinema playback can work (during playback, holding <code>.</code> fast-forwards)
* Screen share while watching the recording
* Screen share while using the annotator tool

'''If you break into a separate “experimental” session during the meeting:'''
* This is allowed and often useful.
* Record that new session with <code>asciinema</code> as well, so it can be reviewed and understood by someone who was not present.

----

=== Phase 3: Annotation ===

'''Objective:''' Add structured annotations using the annotator tool.
'''Who:''' Person B

'''Steps:'''
# Load the recording
# Create timelines for goal hierarchy levels
# Apply annotations per standard (see [[#Annotation Standard|Annotation Standard]])
# Review for completeness and consistency
# Note the spec version used for this annotation (this document version)
# If clarification is needed, contact Person A:
#* Record audio of both ends of the clarification conversation
#* Include the audio (and later transcription) in the <code>.session</code> package

'''Annotator tool:'''
* Use the annotator tool hosted at: <code>https://github.com/arthurwolf/annotator</code>
* (If a specific branch is required for a cohort/release, use the branch specified by the project, e.g. <code>from_students</code> when applicable.)

'''Output:''' Annotated recording

----

=== Phase 4: Submission ===

'''Objective:''' Package and submit the session.

'''Steps:'''
# Transcribe audio with Whisper (with timestamps)
# Create <code>.session</code> archive
# Include required files (see [[#Technical Specification|Technical Specification]])
# Upload to repository

----

== Annotation Standard ==

This section defines what to annotate and how.

=== Goal hierarchy and timelines ===

* Goals, subgoals, sub-subgoals (and deeper levels as needed) form a '''goal hierarchy'''.
* Ideally, every part of the session should belong to a goal.
* Subgoals should represent meaningful “chunks” of work contributing to a parent goal.
* Goals, subgoals, and sub-(sub…) goals should exist as '''separate timelines''' in the annotator tool:
** Think: one “Goal” timeline spanning the whole file,
** Subgoal timelines covering sections that represent large steps,
** Deeper timelines for finer-grained structure as needed.

=== Success markers ===

Use success markers to capture whether a goal (or subgoal) was achieved at the moment where the result is observable.

* <code>successResult</code> — the goal was achieved (success is visible in the recording output)
* <code>successFailure</code> — the goal failed (failure is visible)
* <code>successUnknown</code> — success cannot be determined from the recording (or cannot possibly be visible)

'''When to apply success markers:'''
* Apply success markers only when the success/failure of a command corresponds to the attempted end of a goal.
* Example:
** If a goal is “get my program to compile”, then the output of <code>make</code> (where compilation occurs) is the target moment.
* For activities where the result cannot be observed (e.g., some parts of OS installs), <code>successUnknown</code> is acceptable.

=== Tools and subtools (commands and interfaces) ===

* Commands can be annotated as tools/subtools.
* Provide:
** An '''in-context''' description (considering the current goal and what has happened so far)
** Optionally, an '''out-of-context''' description (what the command generally does)

This includes screen-based interactions when they are part of the workflow (if the annotator supports representing them).

=== Mistakes ===

Every mistake down to the typo should be tagged, including why it is a mistake and the repercussions.

* <code>mistake</code> — the incorrect action or decision
* <code>mistakeReason</code> — why it was wrong
* <code>mistakeResult</code> — what happened because of it (error message, time lost, wrong path taken, etc.)

=== Secrets and sensitive data ===

Use specific tags for secrets and secret-like flows:

* <code>passwordPrompt</code>
* <code>password</code>
* <code>passwordAgain</code>
* <code>secret</code>

Annotate secret handling carefully, and prefer describing the flow rather than reproducing secret material.

=== Discoveries ===

Discoveries are things learned during the session that change the plan/goals/approach.

* <code>discovery</code> (or project’s chosen “Discoveries” tag) — a new fact that changes what happens next
* Apply when the discovery causes a change in goals, subgoals, or strategy.

=== Licensing and legally significant text ===

If you reproduce a “legally significant” portion of a licensed work (see GNU guidance on “Legally Significant”), list the license in your annotation.

* <code>license</code>
* <code>copyright</code>

When in doubt, prefer summarizing instead of reproducing large chunks of licensed text.

----

== Application Guidelines ==

=== How to apply goal markers ===

* When starting a goal, add a unique goal identifier (format guideline):
** <code>[Goal_ + Unique_identifier]</code> followed by the goal description
* When a goal is completed:
** close out the goal and return to the parent goal level (change “level” back)

=== How to apply mistakes ===

Whenever there is a mistake:
* Apply <code>mistake</code>, <code>mistakeReason</code>, and <code>mistakeResult</code>.

=== How to apply success markers ===

* Apply <code>successResult</code>, <code>successFailure</code>, or <code>successUnknown</code> only when:
** The result of a goal is visible, OR
** The result cannot possibly be visible.

=== General guidance ===

* Annotate as you would prefer sessions to be annotated: be as in-depth as you would like.
* The goal of annotation is to describe precisely what is happening in the session:
** what is done,
** why it is done,
** and what outcome it leads to.

----

== Technical Specification ==

=== The .session archive ===

A <code>.session</code> file is a zip archive with a <code>.session</code> extension containing (at minimum) the annotated recording and the supporting metadata and audio/transcripts.

=== Required contents ===

The <code>.session</code> archive must include:
* <code>session.yaml</code> — session metadata (see below)
* <code>recording.asciinema</code> — the asciinema recording (annotated)
* Audio recordings of the interview and any follow-up clarification conversations
* Transcriptions of those audio files (produced by Whisper), including timestamps

=== session.yaml fields ===

<code>session.yaml</code> should contain at least:

* <code>version</code>: Version number for the format specification (this document) used when creating this <code>.session</code> file.

Optional / planned fields (may be included if supported by the project tooling):

* <code>mission</code> (FUTUREWORK): overarching goal being attempted
* <code>plan</code> (FUTUREWORK): intended plan (does not need to match what happened)
* <code>tests</code> (FUTUREWORK): an array of “unit test”-like checks indicating success/failure
** intended to be written in TinTin++
** loaded after an SSH session has been instantiated
** includes a time limit for considering the test to have gone wrong
** examples:
*** “X command was correctly installed to location Y”
*** “Running <code>command --version</code> returns the right version 0.11”
* <code>system</code> (optional): command/answer pairs that define the environment
** examples:
*** <code>cat /etc/issue.net</code>: <code>Ubuntu 22.04 LTS\n</code>
*** <code>ip address</code>: <code>...</code>

----

== Quick Reference ==

=== Core concepts ===
* Use separate timelines for each goal level (goal / subgoal / sub-subgoal).
* Add success markers only at moments where a goal outcome is observable (or provably unobservable).

=== Common tags (non-exhaustive) ===
* Goals: <code>goal</code> (plus unique goal identifiers per project convention)
* Success: <code>successResult</code>, <code>successFailure</code>, <code>successUnknown</code>
* Mistakes: <code>mistake</code>, <code>mistakeReason</code>, <code>mistakeResult</code>
* Secrets: <code>passwordPrompt</code>, <code>password</code>, <code>passwordAgain</code>, <code>secret</code>
* Discoveries: <code>discovery</code> (or “Discoveries” tag used by the project)
* Licensing: <code>license</code>, <code>copyright</code>
* Tools: tool/subtool annotations (in-context and optionally out-of-context)

----

== Complete Worked Example ==

<pre>
[Goal_001] Deploy GNU package "foo" on Debian Stable VM
[Subgoal_001a] Install build dependencies
tool: apt-get install ...
successResult: dependencies installed

[Subgoal_001b] Configure build
tool: ./configure --prefix=/usr/local
mistake: used wrong configure flag
mistakeReason: flag is not supported by this version
mistakeResult: configure exits with error, must retry
tool: ./configure --prefix=/usr/local
successResult: configure completed

[Subgoal_001c] Build and verify
tool: make
successResult: compilation completed
tool: make test
successUnknown: tests run but output does not clearly confirm pass/fail (or tests are not available)

discovery: package requires newer libc than expected → adjust approach / select different target or environment
</pre>

(Replace identifiers, tags, and exact phrasing with the conventions supported by the current annotator tool version.)

----

== Appendices ==

=== Appendix A: Recording environment reminders ===
* Use a Debian Stable VM for recordings.
* If Debian Stable lacks required packages, upgrading the VM to unstable is allowed.
* VMs may be reused across sessions.

=== Appendix B: Keeping everything inside the terminal ===
* All input/output should be attempted through the terminal session.
* For web interactions, prefer CLI tools such as:
** <code>lynx</code>, <code>links</code>, <code>curl</code>, <code>wget</code>, etc.

=== Appendix C: Interview and follow-ups ===
* Record both sides separately for quality.
* If any follow-up clarification occurs later:
** record both ends again,
** include audio and transcripts in the <code>.session</code>.

=== Appendix D: Debian Stable VM details ===
(See project-specific VM setup instructions for the current cohort/repository; keep this appendix synchronized with those instructions.)
```

Main Page

2026-01-11T23:40:13Z

ArthurWolf:

Welcome to the CopyLeft Artificial Intelligence wiki!

Here, you will find documents, howtos, and notes about our efforts to create CopyLeft respecting Free Software Artificial Intelligence systems.

== Projects ==

=== AutoDoc ===

The [[AutoDoc]] project aims at converting Asciinema terminal recordings into structured events, annotations, and derived documentation artifacts using LLMs.

* [[AnnotationStandard]]: Standard for the annotation file format used to annotate `asciinema` sessions.

== Locations ==
We have many hackers working in many locations. Here is where we describe our sites, so we can share infrastructure:

* [[FAIKVM.COM]]
* [[LIBRECODE.EU]]

== Hardware ==
Like all projects, we have hardware dependencies. We try to document them, and their behaviors, so that our distributed team can reproduce one another's environments simply.

Our infrastructure includes, but is not limited to the following devices:
* [[IGEL M350C]]
* [[ASRock H81m-HDS (v1.0)]]
* [[DLink DWR-978]]

== FAIKVM ==
Our infrastructure stack is a combination of FAI, KVM, and a pile of shell scripts.

This infrastructure includes, but is not limited to the following VM types:
* [[WikiServer]]
* [[KanServer]]
* [[FAIServer]]
* [[QEMUHost]]

AnnotationStandard

2026-01-11T23:39:31Z

ArthurWolf: Initial port from the gist at https://gist.githubusercontent.com/arthurwolf/c4521a8baa306c016efd50ee23ffe56e/raw/26081cbc230641b004554778eefb11eee2a2fcf3/annotation.md , conversion from Markdown to MediaWiki formats

= Librecode Annotation Standard =

'''Document Version:''' 2.0
'''Last Updated:''' 2025-10-01
'''Target Audience:''' Session recorders, annotators, reviewers

----

== Table of Contents ==
# [[#Glossary|Glossary]]
# [[#Overview & Purpose|Overview & Purpose]]
# [[#The Annotation Process|The Annotation Process]]
# [[#Annotation Standard|Annotation Standard]]
# [[#Application Guidelines|Application Guidelines]]
# [[#Technical Specification|Technical Specification]]
# [[#Quick Reference|Quick Reference]]
# [[#Complete Worked Example|Complete Worked Example]]
# [[#Appendices|Appendices]]

----

== Glossary ==

'''Session'''
An asciinema recording of a terminal session, typically showing the deployment or configuration of software.

'''Annotated Session'''
A session that has structured annotations added to it, conforming to this specification.

'''Recording'''
The raw asciinema capture of terminal activity.

'''Annotation'''
Structured metadata added to a recording that describes what is happening, why, and the outcome.

'''Timeline'''
A hierarchical structure in the annotator tool where annotations are organized. Different goal levels exist on separate timelines.

'''Goal Hierarchy'''
The nested structure of goals, subgoals, and sub-subgoals that describe the overall task and its component steps.

'''Annotation Tag'''
A specific marker type (e.g., <code>goal</code>, <code>mistake</code>, <code>successResult</code>) used to categorize parts of the session.

'''Person A (Recorder)'''
The individual(s) who created the original terminal session recording.

'''Person B (Annotator)'''
The individual who interviews Person A and adds structured annotations to the recording.

'''.session File'''
A zip file containing the annotated recording, audio transcriptions, and metadata conforming to this specification.

----

== Overview & Purpose ==

=== What are Annotated Sessions? ===

An '''annotated session''' is an asciinema recording of terminal activity (typically software deployment) that has been enriched with structured annotations describing:
* What actions were performed
* Why those actions were taken
* Whether goals were achieved
* What mistakes occurred and their consequences
* What discoveries changed the approach

=== Why Create Them? ===

The Librecode project aims to train Large Language Models (LLMs) to understand and work with terminal recordings. Annotated sessions serve as training data, teaching models to:
* Recognize common deployment patterns
* Understand error recovery strategies
* Learn from human problem-solving approaches
* Eventually perform automatic annotation

Once sufficient manual annotations exist, trained models can automatically annotate new sessions, enabling advanced tooling like automated documentation generation.

=== Document Purpose ===

This document provides:
# An annotation process for creating annotated sessions
# An annotation standard defining what to annotate and how
# Guidelines for annotators to ensure consistency and quality

----

== The Annotation Process ==

The process of creating an annotated session involves four distinct phases:

<pre>
Phase 1: Recording → Phase 2: Interview → Phase 3: Annotation → Phase 4: Submission
</pre>

=== Phase 1: Recording ===

'''Objective:''' Create a terminal recording of meaningful software deployment activity.
'''Who:''' Person A (one or more individuals)

'''Steps:'''
# Set up the recording environment:
#* Use a Debian Stable VM (see Appendix D)
#* Upgrade to unstable if required packages are missing
# Start the <code>asciinema</code> recording client
# Attempt to deploy a piece of GNU software
# Keep all input/output inside the terminal:
#* Use CLI web tools (<code>curl</code>, <code>wget</code>, <code>lynx</code>, etc.)
# Stop recording when deployment completes

'''Output:''' <code>.cast</code> asciinema recording file

'''Common Issues:'''
* Using GUI tools
* Stopping recording too early

----

=== Phase 2: Interview ===

'''Objective:''' Ensure the annotator fully understands what was done and why.
'''Who:''' Person A and Person B

'''Steps:'''
# Record audio of both sides separately
# Review the recording together
# Explain what was done and why
# Record follow-up clarification conversations if needed

'''Output:'''
* Audio recordings
* Full understanding of actions and rationale

'''Why Separate Audio Recordings?'''
* Better transcription quality
* Clear speaker attribution
* Higher-quality training data

----

=== Phase 3: Annotation ===

'''Objective:''' Add structured annotations using the annotator tool.
'''Who:''' Person B

'''Steps:'''
# Load the recording
# Create timelines for goal hierarchy levels
# Apply annotations per standard
# Review completeness
# Note spec version used

'''Output:''' Annotated recording

----

=== Phase 4: Submission ===

'''Objective:''' Package and submit the session.

'''Steps:'''
# Transcribe audio with Whisper
# Create <code>.session</code> archive
# Include:
#* <code>session.yaml</code>
#* <code>recording.asciinema</code>
#* Audio + transcriptions
# Upload to repository

---

Main Page

2025-12-22T20:35:43Z

ArthurWolf:

Welcome to the CopyLeft Artificial Intelligence wiki!

Here, you will find documents, howtos, and notes about our efforts to create CopyLeft respecting Free Software Artificial Intelligence systems.

== Hardware ==
Like all projects, we have hardware dependencies. We try to document them, and their behaviors, so that our distributed team can reproduce one another's environments simply.

Our infrastructure includes, but is not limited to the following devices:
* [[IGEL M350C]]
* [[ASRock H81m-HDS (v1.0)]]

== FAIKVM ==
Our infrastructure stack is a combination of FAI, KVM, and a pile of shell scripts.

This infrastructure includes, but is not limited to the following VM types:
* [[WikiServer]]
* [[FAIServer]]
* [[QEMUHost]]

== Projects ==

=== Autodoc ===

The Autodoc project aims at converting asciinema terminal recordings into structured events, annotations, and derived documentation artifacts using LLMs.

List of student projects that worked on the Autodoc project: [[Student_Groups]]

Student Groups

2025-12-22T20:33:29Z

ArthurWolf: Initial page creation

= Student Groups =

This page documents the University of Toronto Mississauga (UTM) student groups who contributed to the LibreCode / Annotator ecosystem between mid-2024 and late-2025, specifically around: converting asciinema terminal recordings into structured events, annotations, and derived documentation artifacts using LLMs.

== Project context ==

'''Mentors'''
* Julia Longtin — https://github.com/julialongtin
* Arthur Wolf — https://github.com/arthurwolf

'''Mentorship program'''
* Human Feedback Foundation (Linux Foundation entity): https://humanfeedback.io/
* University of Toronto Mississauga: https://www.utm.utoronto.ca/

'''Core LibreCode resources'''
* Annotator repository: https://github.com/arthurwolf/annotator
* LibreCode / FaikVM wiki: https://wiki.faikvm.com/mediawiki/index.php/Main_Page
* Public hosted annotator instance: https://linuxpmi.org/

Most student repositories are hosted under this GitHub organization:
https://github.com/CSC392-CSC492-Building-AI-ML-systems

----

== Fall 2024 – Early AutoDocs prototype ==

=== What they worked on ===
This group produced an early prototype of what later became ''AutoDocs'': tooling to segment asciinema terminal recordings into meaningful chunks and generate higher-level annotations.

This work served primarily as a proof of concept and a starting point for later cohorts.

=== Contributors ===
''(TODO: add names / GitHub links if identified)''

=== Code and artifacts ===
* Archived Fall 2024 code base (referenced in later repos):
** Mentioned in the AutoDocs README as '''“Fall 2024 Team’s Code Base”'''
** Linked from the AutoDocs repository:
*** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Notes ===
Later documentation notes that much of this code is outdated or non-functional, but it remains historically important.

----

== Winter 2025 – AutoDocs expansion + documentation ==

=== What they worked on ===
This cohort rebuilt and extended the AutoDocs pipeline into a more complete system and produced formal documentation of their work.

The repository includes a tagged release explicitly described as a rewrite of the project by the Winter 2025 team.
Release link: https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent/releases/tag/winter-2025

=== Contributors ===
'''Known from the Winter 2025 release note and repository contributors list'''
* Brian Zhang — https://github.com/Pyosimros
* Vraj Patel — https://github.com/Vraj-Patel1
* Dan Nguyen — https://github.com/nuhgooyin
* Adreano La Rosa — ''(listed in release note; GitHub handle not yet confirmed)''

'''Additional contributors shown by GitHub'''
* Abdallah Enaya — https://github.com/abdullah-enaya
* Renee K — https://github.com/renee-k
* aml-8 — https://github.com/aml-8
* Christopher Flores — https://github.com/cfstar188
* Uyiosa Iyekekpolor — https://github.com/uyoyo0
* eyexjay — https://github.com/eyexjay

=== Code and artifacts ===
* Main AutoDocs repository:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent
* Release tag capturing the Winter 2025 state:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent/releases/tag/winter-2025
* Public talk page referencing this pipeline (AI Tinkerers Toronto, March 2025):
** https://toronto.aitinkerers.org/talks/rsvp_14QYpww1FyE

=== Notes ===
The current AutoDocs repository states that it was “modified and extended from the Winter 2025 team’s code base”.

----

== AutoDocs (consolidated / ongoing repository) ==

''This is a living repository spanning multiple student cohorts rather than a single group.''

=== Purpose ===
AutoDocs processes asciinema terminal recordings and produces structured outputs such as:
* segmented command ''events'',
* annotated explanations,
* derived artifacts (for example, Dockerfiles).

=== People ===
* Julia Longtin (lead contact): https://github.com/julialongtin
* Model publisher referenced in the README:
** '''bria7801''' on Hugging Face:
*** https://huggingface.co/bria7801/model-0
*** https://huggingface.co/bria7801/model-1
*** https://huggingface.co/bria7801/model-3

=== Repository ===
* https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Contents (high level) ===
* ``data/``, ``frontend/``, ``models/`` directories
* Multiple parser scripts (Parser 0 / 1 / 2)
* References to fine-tuned model checkpoints via Hugging Face links

----

== Autumn 2025 – DocStream consolidation (Educational AI Agent) ==

This appears to be a later cohort or iteration that built on AutoDocs and re-framed the system as '''DocStream''' (same core idea: streamed asciinema logs → events → hierarchical annotations → documentation).
Repository: https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent

=== What they worked on ===
From the repository README, DocStream:
* converts raw, noisy terminal activity into structured, reproducible developer documentation,
* processes streamed asciinema logs,
* segments them into meaningful events,
* generates hierarchical annotations explaining terminal activity,
* includes an evaluation harness (based on an extended EleutherAI LM Evaluation Harness) with task and metric scaffolding under ``data/llm_Evaluation/``.

=== Code and artifacts ===
* Main repository:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent
* Repository structure pointers:
** ``data/`` — datasets and evaluation harness inputs
** ``models/model_0/`` — segmentation training and inference
** ``models/model_1/`` — annotation training and inference
** ``demo/`` — front-end visualization demo
** ``runpod/`` — deployment and runpod materials
* White paper included in the repository root:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/blob/main/WhitePaper.docx
* Previous iteration explicitly linked from the DocStream README:
** https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent

=== Models ===
* '''Model 0 — Event Segmentation'''
** Segments streamed terminal logs into XML-structured ''events''
* '''Model 1 — Hierarchical Annotation'''
** Reads Model 0 event chunks and generates summaries with hierarchical depth (goal / subtask structure)

=== Contributors ===
GitHub accounts appearing repeatedly in the commit history and likely core Autumn 2025 contributors:
* Ryan Pankratz — https://github.com/ryan-pankratz
** (also appears as: https://github.com/RyanPankratz)
* Victor Shea — https://github.com/VictorShea
* Moe Reda — https://github.com/Moe-Reda
* Patea4 — https://github.com/Patea4

WikiServer

2025-12-21T19:39:08Z

ArthurWolf:

This is the VM type that hosts this wiki.

Immediately after install, run the scripts in /root to configure the environment.

To add a user:

<pre>
cd /usr/share/mediawiki/
php maintenance/run.php createAndPromote <username> <password> --bureaucrat --sysop
<pre>

WikiServer

2025-12-21T19:38:21Z

ArthurWolf: changed from markdown code block to mediawiki code block

This is the VM type that hosts this wiki.

Immediately after install, run the scripts in /root to configure the environment.

To add a user:

<code>
cd /usr/share/mediawiki/
php maintenance/run.php createAndPromote <username> <password> --bureaucrat --sysop
<code>