Student Groups

From CLAIF Wiki
Revision as of 23:09, 22 January 2026 by ArthurWolf (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Student Groups

This page documents the University of Toronto Mississauga (UTM) student groups who contributed to the LibreCode / Annotator ecosystem between mid-2024 and late-2025, specifically around: converting asciinema terminal recordings into structured events, annotations, and derived documentation artifacts using LLMs.

Project context

Mentors

Mentorship program

Core LibreCode resources

Most student repositories are hosted under this GitHub organization: https://github.com/CSC392-CSC492-Building-AI-ML-systems


Fall 2024 – Early AutoDocs prototype

What they worked on

This group produced an early prototype of what later became AutoDocs: tooling to segment asciinema terminal recordings into meaningful chunks and generate higher-level annotations.

This work served primarily as a proof of concept and a starting point for later cohorts.

Contributors

(TODO: add names / GitHub links if identified)

Code and artifacts

Notes

Later documentation notes that much of this code is outdated or non-functional, but it remains historically important.


Winter 2025 – AutoDocs expansion + documentation

What they worked on

This cohort rebuilt and extended the AutoDocs pipeline into a more complete system and produced formal documentation of their work.

The repository includes a tagged release explicitly described as a rewrite of the project by the Winter 2025 team. Release link: https://github.com/CSC392-CSC492-Building-AI-ML-systems/educational-AI-agent/releases/tag/winter-2025

Contributors

Known from the Winter 2025 release note and repository contributors list

Additional contributors shown by GitHub

Code and artifacts

Notes

The current AutoDocs repository states that it was “modified and extended from the Winter 2025 team’s code base”.


AutoDocs (consolidated / ongoing repository)

This is a living repository spanning multiple student cohorts rather than a single group.

Purpose

AutoDocs processes asciinema terminal recordings and produces structured outputs such as:

  • segmented command events,
  • annotated explanations,
  • derived artifacts (for example, Dockerfiles).

People

Repository

Contents (high level)

  • ``data/``, ``frontend/``, ``models/`` directories
  • Multiple parser scripts (Parser 0 / 1 / 2)
  • References to fine-tuned model checkpoints via Hugging Face links

Autumn 2025 – DocStream consolidation (Educational AI Agent)

This appears to be a later cohort or iteration that built on AutoDocs and re-framed the system as DocStream (same core idea: streamed asciinema logs → events → hierarchical annotations → documentation). Repository: https://github.com/CSC392-CSC492-Building-AI-ML-systems/Autumn2025EducationalAIAgent

What they worked on

From the repository README, DocStream:

  • converts raw, noisy terminal activity into structured, reproducible developer documentation,
  • processes streamed asciinema logs,
  • segments them into meaningful events,
  • generates hierarchical annotations explaining terminal activity,
  • includes an evaluation harness (based on an extended EleutherAI LM Evaluation Harness) with task and metric scaffolding under ``data/llm_Evaluation/``.

Code and artifacts

Models

  • Model 0 — Event Segmentation
    • Segments streamed terminal logs into XML-structured events
  • Model 1 — Hierarchical Annotation
    • Reads Model 0 event chunks and generates summaries with hierarchical depth (goal / subtask structure)

Contributors

GitHub accounts appearing repeatedly in the commit history and likely core Autumn 2025 contributors: