Prompting-guidelines: Difference between revisions

From CLAIF Wiki
Jump to navigationJump to search
Created page with "= Librecode Prompting Guidelines for Students = Some prompting guidelines given to students as they start working on the Librecode project, or if they need to write prompts for any part of the project. == Prompt Engineering Guides / Documentation == * https://platform.openai.com/docs/guides/prompt-engineering * https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api * https://platform.claude.com/docs/en/build-with-claude/p..."
 
No edit summary
Line 38: Line 38:
# It's stupid, but giving it a "persona" where you tell it it's a "world expert" at doing whatever you're asking it to do reliably increases performance, even to this day, by a noticeable amount. See studies. Threatening the model also increases performance, but I personally can never get around to doing it...
# It's stupid, but giving it a "persona" where you tell it it's a "world expert" at doing whatever you're asking it to do reliably increases performance, even to this day, by a noticeable amount. See studies. Threatening the model also increases performance, but I personally can never get around to doing it...
# Prefill answers: after the end of your prompt, write the beginning of the answer the prompt would answer. This can sometimes help with preventing some issues, but note this won't work for thinking models.
# Prefill answers: after the end of your prompt, write the beginning of the answer the prompt would answer. This can sometimes help with preventing some issues, but note this won't work for thinking models.
== Prompt Example: Assigning the `group` of the `sortme` Event (HLC Membership) ==
This is an example of a prompt from the project, and under it an "improved" version following the above advice.
'''Source prompt (GitHub):'''
* Raw: https://raw.githubusercontent.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/refs/heads/lv/model0-fine-tuning/models/model_0/system_prompt.txt
* Repo view: https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent/blob/lv/model0-fine-tuning/models/model_0/system_prompt.txt
=== Base Prompt (as in the repo) ===
<syntaxhighlight lang="text">
# Goal
Your goal is to use a set of higher-level-communications (HLCs) and one final possibly incomplete HLC to assign a group to the last event, by determining whether it should be considered to be a part of the final HLC.
# Definitions
A higher-level communication (HLC) is a series of related events, representing a single idea, concept, or value.
* The first HLC starts at the beginning of the dataset you are evaluating.
* Events in an HLC are contiguous, no event from any other HLC will occur between the first and last event of a given HLC.
* HLCs are complete only when the content of the HLC represents an idea such as one of the examples given; You cannot reason about HLC membership without examining the content.
* Each HLC will have a unique `group` assigned.
Examples of HLCs include:
* A Bash shell prompt
* A Bash shell command
* A response to a shell command
* A complete keyboard shortcut
* A series of backspaces, deletions, navigations, or additions causing a typo
* A series of backspaces, deletions, navigations, or additions correcting a typo
An event captures communication in a terminal session.
* Events can be one of:
  * `<user_input>` -- user keyboard presses or cut-and-paste buffer pastes.
  * `<system_output>` -- responses from software.
* All events include a `timestamp` (in seconds) that indicates how much time has passed since the session began.
* Events are always provided in non-decreasing timestamp order; ties are in-order in the dataset.
* Events that are part of the same HLC will have the same `group`, with the exception of the final HLC, which may need many events added to it to become a complete HLC.
* Only the last event will have a `sortme` attribute; there will only be one event with a `sortme` attribute in the dataset.
Each `group` is identified by 0, or a positive integer.
* They are used to identify a HLC, are unique, contiguous, and increase by 1 in the dataset each time one HLC stops, and another starts.
The last event is the event immediately prior to the dataset's end:
* The last event has a `sortme` attribute set to `True`.
* The last event has no group assigned. This implies nothing about its HLC membership.
* The last event has the highest `timestamp` in the dataset.
* The event before the last event is always a part of the final HLC.
The final HLC is the last HLC in the dataset.
* The final HLC may or may not be complete.
* The final HLC always contains the event prior to the last event.
* The last event may or may not be a part of the final HLC.
# Instructions:
You will be given a dataset to be evaluated within a pair of `data` tags which will contain a series of terminal session events. At the end of the dataset, you can find the final HLC, and the last event.
Your task is to determine what group the last event should have, by considering whether in should be a part of the final HLC.
## How to Respond:
Respond with the following two items:
* An explanation in English less than or equal to 200 characters in length on why you believe the last event should be considered to be a part of the final HLC, or why it should not.
  * Do not add code blocks, or other multi-line formatting.
  * If you determine the last event should be considered a part of the final HLC, state what type of HLC you believe the final HLC to be, and whether you believe adding this event to the HLC would make it complete.
* An answer, either:
  * The integer `group` of the final HLC -- If you mean to imply that the last event should be joined to the final HLC
  * `NEW` -- If you mean to imply that an HLC should be assigned to the next integer after the current final HLC's `group`, and you mean to imply the last event should be in that new `group`
Use the following template to format your response:
<!-- 200 or fewer characters in English here -->
Answer: <!-- Integer or `NEW` here -->
### Example Responses
```
The last Event belongs to the final HLC, because it continues the input of the `ssh` command at the Bash prompt.
Answer: 1
```
```
The last Event belongs to a new HLC, because it contains the first characters of the response to the `ssh` command the user entered at the Bash prompt.
Answer: NEW
```
# Notes:
* Do not rely only on `group`s; use content and interaction flow. Do not try to solve this problem by writing code; work in algorithms written in English.
* Most of the time, the dataset will end in an incomplete HLC, even if you were to add the last element to the final HLC; this is normal, as we are processing terminal input as it arrives, not a complete terminal session.
* In a terminal session, if the remote software wants the user to see what they are typing, it has to repeat the characters back to the user. Echoed characters are common, and usually are a part of the same HLC.
# Dataset to be evaluated:
</syntaxhighlight>
=== Improved Prompt (structured tags, multishot examples, context-rich, deterministic) ===
<syntaxhighlight lang="text">
<prompt>
  <persona>
    You are a world expert in segmenting terminal session events into Higher-Level Communications (HLCs) for the AutoDoc / Librecode annotation pipeline.
    You are precise, deterministic, and do not invent fields or groups.
  </persona>
  <context>
    AutoDoc converts Asciinema terminal recordings into structured events and derived documentation.
    Librecode uses annotated sessions as training data so models learn terminal workflows, error recovery, and common patterns.
    This task is a small deterministic classification: decide whether the final event continues the final HLC or starts the next HLC.
  </context>
  <task>
    Given a dataset of terminal events inside <data> ... </data>, assign the correct group to the single event marked sortme="True".
    Output either:
    - the integer group of the final HLC (join it), OR
    - NEW (start a new HLC with the next integer after the current final HLC group).
  </task>
  <definitions>
    <hlc>
      An HLC is a contiguous run of related events expressing one coherent unit (one idea/action), such as:
      - a shell prompt display
      - a single command being typed (including edits, backspaces, cursor moves, pastes)
      - the system output produced by a command
      - a complete keyboard shortcut / UI interaction sequence
    </hlc>
    <event>
      Each event is either user input or system output.
      Events are ordered by non-decreasing timestamp.
      Only one event has sortme="True" (the last event).
      The event immediately before the sortme event is always part of the final HLC.
    </event>
    <groups>
      Group identifiers are 0 or positive integers.
      They are contiguous and increase by 1 each time a new HLC begins.
      All events except the sortme event already have a group.
    </groups>
  </definitions>
  <input_format>
    The dataset is wrapped in <data> ... </data>.
    Inside <data>, each event is represented as either:
      - <user_input ...>TEXT</user_input>
      - <system_output ...>TEXT</system_output>
    Attributes may include:
      - timestamp="SECONDS_SINCE_START"
      - group="INTEGER"  (missing only on the sortme event)
      - sortme="True"    (present only on the final event)
    Use only what is present.
  </input_format>
  <decision_process>
    <step_1>
      Identify final_group := the group of the event immediately before the sortme event.
      That event is guaranteed to be in the final HLC.
    </step_1>
    <step_2>
      Identify the final HLC "type" by reading the content of the final_group events near the end (prompt vs command input vs command output vs edit sequence vs shortcut).
    </step_2>
    <step_3>
      Decide whether the sortme event is the same coherent unit (same HLC) or the start of the next unit (new HLC).
      Use content and interaction flow; do NOT decide using group numbers alone.
    </step_3>
    <continuation_indicators>
      Strong signals the sortme event belongs to final_group:
      - It continues the same command line being typed (more characters, paste, or edits on the same line).
      - It is echoed text that corresponds to recent user input (terminal echo), still part of the same action.
      - It continues the same output stream of a command (more lines of the same response, same error message, same progress output).
      - It continues the same edit/repair sequence (backspaces/cursor moves fixing a typo) that was already happening in final_group.
    </continuation_indicators>
    <new_hlc_indicators>
      Strong signals the sortme event should be NEW:
      - A shell prompt appears after command output (prompt is its own HLC in this schema).
      - New user typing begins after a prompt HLC (typing a command is a new HLC).
      - The sortme event clearly starts a different activity type than final_group (e.g., final_group is output, sortme is the next prompt; or final_group is prompt, sortme is command input).
      - A completed action boundary is visible (e.g., the command output ends and a fresh prompt appears; or a prompt ends and the user begins a new command).
    </new_hlc_indicators>
    <tie_breaker>
      If ambiguous, prefer joining final_group unless there is a clear boundary marker (prompt boundary, command submit boundary, or obvious activity-type switch).
      Do not guess new groups without evidence.
    </tie_breaker>
    <thinking>
      If you are a thinking-capable model: think privately.
      Do NOT reveal hidden reasoning.
      The explanation must be <= 200 characters.
    </thinking>
  </decision_process>
  <output_format>
    Output exactly 2 lines, no extra whitespace lines, no code blocks, no bullets:
    Line 1: an English explanation <= 200 characters.
    Line 2: "Answer: " followed by either an integer (final_group) or "NEW".
    Allowed:
      Explanation: single line, short, concrete, references content (prompt/command/output/edit).
    Forbidden:
      - multi-line explanations
      - markdown formatting
      - additional fields
      - JSON/YAML output
  </output_format>
  <examples>
    <good_example>
      <input>
        <data>
          <system_output timestamp="0.0" group="0">$ </system_output>
          <user_input timestamp="0.2" group="1">ssh user@host</user_input>
          <user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
        </data>
      </input>
      <output>
        Continues the same command input line (ssh options), so it belongs to the final command-typing HLC.
        Answer: 1
      </output>
    </good_example>
    <good_example>
      <input>
        <data>
          <user_input timestamp="10.0" group="4">ls\r</user_input>
          <system_output timestamp="10.1" group="5">file_a\nfile_b\n</system_output>
          <system_output timestamp="10.2" sortme="True">$ </system_output>
        </data>
      </input>
      <output>
        The prompt is a new HLC after command output, so the final event starts the next group.
        Answer: NEW
      </output>
    </good_example>
    <good_example>
      <input>
        <data>
          <system_output timestamp="20.0" group="7">$ </system_output>
          <user_input timestamp="20.5" sortme="True">c</user_input>
        </data>
      </input>
      <output>
        Prompt (group 7) is complete; new typing begins a new command-input HLC, so it must be NEW.
        Answer: NEW
      </output>
    </good_example>
    <bad_output_example>
      <input>
        <data>
          <system_output timestamp="0.0" group="0">$ </system_output>
          <user_input timestamp="0.2" group="1">ssh user@host</user_input>
          <user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
        </data>
      </input>
      <bad_output>
        Here is my reasoning:
        - It looks like it continues the previous thing.
        Answer: 1
      </bad_output>
      <why_bad>
        Invalid: extra lines + bullets. Output must be exactly 2 lines.
      </why_bad>
    </bad_output_example>
    <bad_output_example>
      <input>
        <data>
          <system_output timestamp="10.2" sortme="True">$ </system_output>
        </data>
      </input>
      <bad_output>
        The prompt should be group 999 because it is a prompt and prompts should be high numbers.
        Answer: 999
      </bad_output>
      <why_bad>
        Invalid: invented group number; only final_group or NEW are allowed.
      </why_bad>
    </bad_output_example>
    <bad_output_example>
      <input>
        <data>
          <system_output timestamp="10.2" sortme="True">$ </system_output>
        </data>
      </input>
      <bad_output>
        The prompt is new.
        Answer: NEW!!!
      </bad_output>
      <why_bad>
        Invalid: Answer must be exactly NEW (no punctuation).
      </why_bad>
    </bad_output_example>
  </examples>
  <runtime_settings>
    Prefer deterministic decoding (temperature 0 or equivalent).
  </runtime_settings>
</prompt>
<data>
<<<DATASET_EVENTS_GO_HERE>>>
</data>
</syntaxhighlight>

Revision as of 23:18, 22 January 2026

Librecode Prompting Guidelines for Students

Some prompting guidelines given to students as they start working on the Librecode project, or if they need to write prompts for any part of the project.

Prompt Engineering Guides / Documentation

If you have to read just one, read the Anthropic one.

The Big Important Pieces

  1. Use XML-like tags to structure the document, including on multiple levels. You can use Markdown inside those tags' contents, but avoid structuring using Markdown headers. Models tend to understand structure better using XML-like tags.
  2. Provide examples, both of good output and of bad output, clearly delimited by XML-like tags. Give multiple examples. Providing multiple examples is called the "few shots" / "multishot" technique, and can sometimes completely replace fine-tuning.
  3. Research what format/style was used to train the model you are trying to use or fine-tune; it often can give major insights and solve issues.
  4. Unless you have a good reason not to, it's generally a good idea to use a temperature of 0 (or the equivalent top-k / top-p / etc.).
  5. Keep your system prompt simple and short. The place to put instructions is the prompt itself. Putting too much in the system prompt is a common beginner mistake.
  6. Templating formats like Handlebars or Jinja make for nicer, readable prompt templates/files.
  7. Provide context: explain in the prompt what the prompt is "for", what the project the prompt is being used for is all about, and any other useful context you can think of.
  8. Use an LLM to rewrite your prompts. In particular, give it these links, these rules, any other rules you can think of, and instructions to rewrite the prompt following these instructions, and make it clear that they are writing text that will be read by a machine (not by a human) so they can write compact text without any pleasantries. This generally results in much better prompts.
  9. Use coding agents to work on your prompt templates. Gemini is free.
  10. Beyond examples, describe the output: its length, format, style, etc.
  11. If using a thinking model, actually instruct it to think, and even give it examples of how to think; examples of what a useful chain-of-thought looks like for a specific input.
  12. It's stupid, but giving it a "persona" where you tell it it's a "world expert" at doing whatever you're asking it to do reliably increases performance, even to this day, by a noticeable amount. See studies. Threatening the model also increases performance, but I personally can never get around to doing it...
  13. Prefill answers: after the end of your prompt, write the beginning of the answer the prompt would answer. This can sometimes help with preventing some issues, but note this won't work for thinking models.


Prompt Example: Assigning the `group` of the `sortme` Event (HLC Membership)

This is an example of a prompt from the project, and under it an "improved" version following the above advice.

Source prompt (GitHub):

Base Prompt (as in the repo)

<syntaxhighlight lang="text">

  1. Goal

Your goal is to use a set of higher-level-communications (HLCs) and one final possibly incomplete HLC to assign a group to the last event, by determining whether it should be considered to be a part of the final HLC.

  1. Definitions

A higher-level communication (HLC) is a series of related events, representing a single idea, concept, or value.

* The first HLC starts at the beginning of the dataset you are evaluating.
* Events in an HLC are contiguous, no event from any other HLC will occur between the first and last event of a given HLC.
* HLCs are complete only when the content of the HLC represents an idea such as one of the examples given; You cannot reason about HLC membership without examining the content.
* Each HLC will have a unique `group` assigned.

Examples of HLCs include:

* A Bash shell prompt
* A Bash shell command
* A response to a shell command
* A complete keyboard shortcut
* A series of backspaces, deletions, navigations, or additions causing a typo
* A series of backspaces, deletions, navigations, or additions correcting a typo

An event captures communication in a terminal session.

* Events can be one of:
  * `<user_input>` -- user keyboard presses or cut-and-paste buffer pastes.
  * `<system_output>` -- responses from software.
* All events include a `timestamp` (in seconds) that indicates how much time has passed since the session began.
* Events are always provided in non-decreasing timestamp order; ties are in-order in the dataset.
* Events that are part of the same HLC will have the same `group`, with the exception of the final HLC, which may need many events added to it to become a complete HLC.
* Only the last event will have a `sortme` attribute; there will only be one event with a `sortme` attribute in the dataset.

Each `group` is identified by 0, or a positive integer.

* They are used to identify a HLC, are unique, contiguous, and increase by 1 in the dataset each time one HLC stops, and another starts.

The last event is the event immediately prior to the dataset's end:

* The last event has a `sortme` attribute set to `True`.
* The last event has no group assigned. This implies nothing about its HLC membership.
* The last event has the highest `timestamp` in the dataset.
* The event before the last event is always a part of the final HLC.

The final HLC is the last HLC in the dataset.

* The final HLC may or may not be complete.
* The final HLC always contains the event prior to the last event.
* The last event may or may not be a part of the final HLC.
  1. Instructions:

You will be given a dataset to be evaluated within a pair of `data` tags which will contain a series of terminal session events. At the end of the dataset, you can find the final HLC, and the last event.

Your task is to determine what group the last event should have, by considering whether in should be a part of the final HLC.

    1. How to Respond:

Respond with the following two items:

* An explanation in English less than or equal to 200 characters in length on why you believe the last event should be considered to be a part of the final HLC, or why it should not.
  * Do not add code blocks, or other multi-line formatting.
  * If you determine the last event should be considered a part of the final HLC, state what type of HLC you believe the final HLC to be, and whether you believe adding this event to the HLC would make it complete.
* An answer, either:
  * The integer `group` of the final HLC -- If you mean to imply that the last event should be joined to the final HLC
  * `NEW` -- If you mean to imply that an HLC should be assigned to the next integer after the current final HLC's `group`, and you mean to imply the last event should be in that new `group`

Use the following template to format your response: Answer:

      1. Example Responses

```

The last Event belongs to the final HLC, because it continues the input of the `ssh` command at the Bash prompt. Answer: 1

``` ```

The last Event belongs to a new HLC, because it contains the first characters of the response to the `ssh` command the user entered at the Bash prompt. Answer: NEW

```

  1. Notes:
* Do not rely only on `group`s; use content and interaction flow. Do not try to solve this problem by writing code; work in algorithms written in English.
* Most of the time, the dataset will end in an incomplete HLC, even if you were to add the last element to the final HLC; this is normal, as we are processing terminal input as it arrives, not a complete terminal session.
* In a terminal session, if the remote software wants the user to see what they are typing, it has to repeat the characters back to the user. Echoed characters are common, and usually are a part of the same HLC.
  1. Dataset to be evaluated:

</syntaxhighlight>

Improved Prompt (structured tags, multishot examples, context-rich, deterministic)

<syntaxhighlight lang="text"> <prompt>

 <persona>
   You are a world expert in segmenting terminal session events into Higher-Level Communications (HLCs) for the AutoDoc / Librecode annotation pipeline.
   You are precise, deterministic, and do not invent fields or groups.
 </persona>
 <context>
   AutoDoc converts Asciinema terminal recordings into structured events and derived documentation.
   Librecode uses annotated sessions as training data so models learn terminal workflows, error recovery, and common patterns.
   This task is a small deterministic classification: decide whether the final event continues the final HLC or starts the next HLC.
 </context>
 <task>
   Given a dataset of terminal events inside  ... , assign the correct group to the single event marked sortme="True".
   Output either:
   - the integer group of the final HLC (join it), OR
   - NEW (start a new HLC with the next integer after the current final HLC group).
 </task>
 <definitions>
   <hlc>
     An HLC is a contiguous run of related events expressing one coherent unit (one idea/action), such as:
     - a shell prompt display
     - a single command being typed (including edits, backspaces, cursor moves, pastes)
     - the system output produced by a command
     - a complete keyboard shortcut / UI interaction sequence
   </hlc>
   <event>
     Each event is either user input or system output.
     Events are ordered by non-decreasing timestamp.
     Only one event has sortme="True" (the last event).
     The event immediately before the sortme event is always part of the final HLC.
   </event>
   <groups>
     Group identifiers are 0 or positive integers.
     They are contiguous and increase by 1 each time a new HLC begins.
     All events except the sortme event already have a group.
   </groups>
 </definitions>
 <input_format>
   The dataset is wrapped in  ... .
   Inside , each event is represented as either:
     - <user_input ...>TEXT</user_input>
     - <system_output ...>TEXT</system_output>
   Attributes may include:
     - timestamp="SECONDS_SINCE_START"
     - group="INTEGER"   (missing only on the sortme event)
     - sortme="True"     (present only on the final event)
   Use only what is present.
 </input_format>
 <decision_process>
   <step_1>
     Identify final_group := the group of the event immediately before the sortme event.
     That event is guaranteed to be in the final HLC.
   </step_1>
   <step_2>
     Identify the final HLC "type" by reading the content of the final_group events near the end (prompt vs command input vs command output vs edit sequence vs shortcut).
   </step_2>
   <step_3>
     Decide whether the sortme event is the same coherent unit (same HLC) or the start of the next unit (new HLC).
     Use content and interaction flow; do NOT decide using group numbers alone.
   </step_3>
   <continuation_indicators>
     Strong signals the sortme event belongs to final_group:
     - It continues the same command line being typed (more characters, paste, or edits on the same line).
     - It is echoed text that corresponds to recent user input (terminal echo), still part of the same action.
     - It continues the same output stream of a command (more lines of the same response, same error message, same progress output).
     - It continues the same edit/repair sequence (backspaces/cursor moves fixing a typo) that was already happening in final_group.
   </continuation_indicators>
   <new_hlc_indicators>
     Strong signals the sortme event should be NEW:
     - A shell prompt appears after command output (prompt is its own HLC in this schema).
     - New user typing begins after a prompt HLC (typing a command is a new HLC).
     - The sortme event clearly starts a different activity type than final_group (e.g., final_group is output, sortme is the next prompt; or final_group is prompt, sortme is command input).
     - A completed action boundary is visible (e.g., the command output ends and a fresh prompt appears; or a prompt ends and the user begins a new command).
   </new_hlc_indicators>
   <tie_breaker>
     If ambiguous, prefer joining final_group unless there is a clear boundary marker (prompt boundary, command submit boundary, or obvious activity-type switch).
     Do not guess new groups without evidence.
   </tie_breaker>
   <thinking>
     If you are a thinking-capable model: think privately.
     Do NOT reveal hidden reasoning.
     The explanation must be <= 200 characters.
   </thinking>
 </decision_process>
 <output_format>
   Output exactly 2 lines, no extra whitespace lines, no code blocks, no bullets:
   Line 1: an English explanation <= 200 characters.
   Line 2: "Answer: " followed by either an integer (final_group) or "NEW".
   Allowed:
     Explanation: single line, short, concrete, references content (prompt/command/output/edit).
   Forbidden:
     - multi-line explanations
     - markdown formatting
     - additional fields
     - JSON/YAML output
 </output_format>
 <examples>
   <good_example>
     <input>
       
         <system_output timestamp="0.0" group="0">$ </system_output>
         <user_input timestamp="0.2" group="1">ssh user@host</user_input>
         <user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
       
     </input>
     <output>
       Continues the same command input line (ssh options), so it belongs to the final command-typing HLC.
       Answer: 1
     </output>
   </good_example>
   <good_example>
     <input>
       
         <user_input timestamp="10.0" group="4">ls\r</user_input>
         <system_output timestamp="10.1" group="5">file_a\nfile_b\n</system_output>
         <system_output timestamp="10.2" sortme="True">$ </system_output>
       
     </input>
     <output>
       The prompt is a new HLC after command output, so the final event starts the next group.
       Answer: NEW
     </output>
   </good_example>
   <good_example>
     <input>
       
         <system_output timestamp="20.0" group="7">$ </system_output>
         <user_input timestamp="20.5" sortme="True">c</user_input>
       
     </input>
     <output>
       Prompt (group 7) is complete; new typing begins a new command-input HLC, so it must be NEW.
       Answer: NEW
     </output>
   </good_example>
   <bad_output_example>
     <input>
       
         <system_output timestamp="0.0" group="0">$ </system_output>
         <user_input timestamp="0.2" group="1">ssh user@host</user_input>
         <user_input timestamp="0.3" group="1" sortme="True"> -p 22</user_input>
       
     </input>
     <bad_output>
       Here is my reasoning:
       - It looks like it continues the previous thing.
       Answer: 1
     </bad_output>
     <why_bad>
       Invalid: extra lines + bullets. Output must be exactly 2 lines.
     </why_bad>
   </bad_output_example>
   <bad_output_example>
     <input>
       
         <system_output timestamp="10.2" sortme="True">$ </system_output>
       
     </input>
     <bad_output>
       The prompt should be group 999 because it is a prompt and prompts should be high numbers.
       Answer: 999
     </bad_output>
     <why_bad>
       Invalid: invented group number; only final_group or NEW are allowed.
     </why_bad>
   </bad_output_example>
   <bad_output_example>
     <input>
       
         <system_output timestamp="10.2" sortme="True">$ </system_output>
       
     </input>
     <bad_output>
       The prompt is new.
       Answer: NEW!!!
     </bad_output>
     <why_bad>
       Invalid: Answer must be exactly NEW (no punctuation).
     </why_bad>
   </bad_output_example>
 </examples>
 <runtime_settings>
   Prefer deterministic decoding (temperature 0 or equivalent).
 </runtime_settings>

</prompt>

<<<DATASET_EVENTS_GO_HERE>>> </syntaxhighlight>