CaseStudies

MassGen v0.1.1: Custom Tools with GitHub Issue Market Analysis

MassGen is focused on case-driven development. MassGen v0.1.1 introduces a comprehensive custom tools system that enables users to extend agent capabilities with domain-specific Python functions. This case study demonstrates Self-Evolution through Market Analysis where agents use custom tools combined with web search to analyze GitHub issues, research market trends, and autonomously drive feature prioritization.

:depth: 3
:local:

(planning-phase)=

đź“‹ PLANNING PHASE

(evaluation-design)=

📝 Evaluation Design

Prompt

The prompt tests whether MassGen agents can autonomously analyze their own codebase, GitHub issues, and market trends to drive product development:

Analyze the massgen dir and MassGen GitHub issues to understand what features users are requesting. Also research current trends in multi-agent AI systems and LLM orchestration. Based on the existing code, the open issues and market trends, write a prioritized recommendation report for the next release.

This prompt requires agents to:

  1. Access and analyze the MassGen codebase (via filesystem access)
  2. Fetch and analyze GitHub issues (via custom tool)
  3. Research market trends (via web search)
  4. Synthesize all sources into actionable recommendations

Baseline Config

Prior to v0.1.1, MassGen lacked a native custom tools system. Users had two main options for extending agents with domain-specific capabilities like GitHub API integration:

While MCP servers worked, they added operational complexity. There was no simple, native way to add Python functions as tools directly within the MassGen process.

Baseline Command

Pre-v0.1.1 equivalent would require web search only (no custom GitHub tool):

uv run massgen \
  --config massgen/configs/basic/multi/gemini_gpt5nano_claude.yaml \
  "Research multi-agent AI trends and make recommendations for MassGen's next release."

Limitations:

(evaluation-analysis)=

đź”§ Evaluation Analysis

Results & Failure Modes

Before v0.1.1, users attempting similar tasks would face:

No Custom Tool Integration:

Limited Self-Evolution:

Example Failure: Asking agents to “analyze GitHub issues” would result in:

Success Criteria

  1. Custom Tool System Works: Users can register Python functions as tools via YAML config
  2. Tool Discovery: Agents automatically discover and use custom tools
  3. Multi-Source Analysis: Agents combine custom tools + web search + filesystem access
  4. Structured Data Access: Custom tool provides structured GitHub issue data
  5. Self-Evolution: Agents demonstrate market-driven feature prioritization
  6. Multi-Agent Collaboration: Both agents use custom tools and reach consensus

(desired-features)=

🎯 Desired Features

With these goals defined, v0.1.1 implements:


(testing-phase)=

🚀 TESTING PHASE

(implementation-details)=

📦 Implementation Details

Version

MassGen v0.1.1

(new-features)=

✨ New Features

Custom Tools System:

GitHub Issue Analyzer Tool (massgen/tool/_self_evolution/_github_issue_analyzer.py):

Configuration Integration:

New Config

massgen/configs/tools/custom_tools/github_issue_market_analysis.yaml:

agents:
  - id: "agent_a"
    backend:
      type: "gemini"
      model: "gemini-2.5-pro"
      cwd: "workspace1"

      custom_tools:
        - name: ["fetch_github_issues"]
          category: "market_analysis"
          path: "massgen/tool/_self_evolution/_github_issue_analyzer.py"
          function: ["fetch_github_issues"]

      enable_web_search: true

  - id: "agent_b"
    backend:
      type: "openai"
      model: "gpt-5-mini"
      cwd: "workspace2"

      text:
        verbosity: "medium"
      reasoning:
        effort: "low"
        summary: "auto"

      custom_tools:
        - name: ["fetch_github_issues"]
          category: "market_analysis"
          path: "massgen/tool/_self_evolution/_github_issue_analyzer.py"
          function: ["fetch_github_issues"]

      enable_web_search: true

orchestrator:
  snapshot_storage: "snapshots"
  agent_temporary_workspace: "temp_workspaces"
  context_paths:
    - path: "massgen"
      permission: "read"
  voting_sensitivity: "balanced"
  answer_novelty_requirement: "lenient"
  max_new_answers_per_agent: 5

Command

uv run massgen \
  --config massgen/configs/tools/custom_tools/github_issue_market_analysis.yaml \
  "Analyze the massgen dir and MassGen GitHub issues to understand what features users are requesting. Also research current trends in multi-agent AI systems and LLM orchestration. Based on the existing code, the open issues and market trends, write a prioritized recommendation report for the next release."

(agents)=

🤖 Agents

Both agents have:

(demo)=

🎥 Demo

Watch the full demonstration of MassGen v0.1.1 custom tools in action:

MassGen v0.1.1 Custom Tools Demo

Click to watch: MassGen v0.1.1: Custom Tools with GitHub Issue Market Analysis

Execution Log: (local, not in repo) .massgen/massgen_logs/log_20251020_012622/

Runtime: ~11 minutes (01:26:23 - 01:37:XX)


(evaluation-and-analysis)=

📊 EVALUATION & ANALYSIS

Results

The v0.1.1 custom tools feature enabled a successful multi-source analysis that would have been more difficult in prior versions.

The Collaborative Process

Agent Behavior:

Both agents followed a comprehensive analysis workflow:

  1. Codebase Analysis: Used filesystem MCP tools to explore the massgen/ directory structure
  2. GitHub Issue Analysis: Used the custom fetch_github_issues tool to retrieve and categorize open issues
  3. Market Research: Used web search to research current trends in multi-agent AI and LLM orchestration
  4. Synthesis: Combined all three sources into prioritized recommendations

Coordination Pattern (from coordination_table.txt):

The Voting Pattern

Voting Summary:

Voting Rationale (from logs):

Agent A’s vote for agent2.2:

“The agent provided a comprehensive and well-structured…”

Agent A’s vote for agent2.4 (winning answer):

“Agent 2’s answer is exceptionally comprehensive…”

Agent B’s vote for agent2.4:

“Comprehensive analysis of code, issues, and market trends…”

Winner: Agent B (agent_b) selected as winner with answer agent2.4

Answer Evolution Over Time

One of the most interesting aspects of this execution was watching how Agent B’s answers evolved and improved through multiple iterations. This demonstrates the value of multi-agent collaboration and iterative refinement.

Agent B’s Answer Progression (4 iterations):

agent2.1 (First Answer) - 01:29:42

Structure:

Characteristics:

Weakness: Lacks concrete implementation details and actionable next steps.


agent2.2 (Second Answer) - 01:30:14

Improvements:

Characteristics:

Vote: Agent A voted for this answer, noting it was “comprehensive and well-structured”


agent2.3 (Third Answer) - 01:30:53

Improvements:

Characteristics:


agent2.4 (Fourth Answer - WINNER) - 01:31:32

Major Improvements:

Characteristics:

Votes: Both Agent A and Agent B voted for this answer


Key Insights from Evolution:

  1. Information Hierarchy Matters: Moving executive summary to the top (agent2.4) made the answer immediately actionable
  2. Concrete > Abstract: Adding code examples and specific timelines dramatically improved usefulness
  3. Actionability Wins: The winning answer provided clear “what to do next” guidance
  4. Refinement Works: Each iteration built on the previous, adding missing elements

Why agent2.4 Won:

From Agent A’s final vote:

“Agent 2’s answer is exceptionally comprehensive…”

The winning answer wasn’t just longer - it was implementation-ready. It provided:

This evolution demonstrates how multi-agent collaboration with voting drives toward not just correct answers, but maximally useful answers.


The Final Answer

Winner: Agent B (GPT-5-mini)

Answer Quality:

The winning answer demonstrated exceptional synthesis of multiple data sources:

Executive Summary (from final answer):

“Goal: Make MassGen the easiest, safest, and fastest way to build multi-agent LLM orchestrations for experimentation and light production use.”

“Highest-impact next-release focus (MVP): (1) stabilize a minimal Adapter/Plugin contract + registry and ship 2–3 official adapters, (2) add structured observability with run/step trace IDs, and (3) ship a lightweight in-process scheduler/task-queue with pause/resume/checkpoint semantics.”

Data Sources Utilized (from answer):

  1. ✅ Codebase Analysis: “The massgen code tree (core concepts: agent, planner, executor), adapters, CLI surface, examples, and tests”
  2. ✅ GitHub Issues: “A snapshot of open GitHub issues and community requests (categorized: adapters, observability, orchestration, docs, examples)”
  3. ✅ Market Trends: “Current market trends in multi-agent/LLM orchestration: growth of adapters/plugin ecosystems, demand for observability/traceability, lightweight orchestration…”

Prioritized Recommendations:

The answer provided a detailed roadmap with:

Concrete Deliverables:

The answer included implementation-ready artifacts:

Custom Tool Usage

Tool Registration (from logs):

01:26:23 | INFO | Registered custom tool: fetch_github_issues from massgen/tool/_self_evolution/_github_issue_analyzer.py (category: market_analysis, desc: 'Fetch and analyze GitHub issues for market-driven ...')

Tool Discovery (from logs):

01:26:24 | INFO | Stream chunk [content]: đź”§ Custom Tool: Custom tools initiated (1 tools available): custom_tool__fetch_github_issues
01:26:30 | INFO | 🔍 [DEBUG] Available custom tools: ['custom_tool__fetch_github_issues']

Evidence of Use:

The final answer explicitly references GitHub issue analysis:

Comparison: Without the custom tool, agents would have attempted web search scraping, which would have:

Self-Evolution Demonstrated

Self-Evolution through Market Analysis âś…

This case study demonstrates how MassGen can autonomously drive its own product roadmap through:

  1. User Feedback Analysis: Custom tool fetches and categorizes GitHub issues to understand user needs
  2. Market Intelligence: Web search provides competitive landscape and trend analysis
  3. Codebase Understanding: Filesystem access enables gap analysis between current state and user needs
  4. Data-Driven Prioritization: Synthesis of all sources produces actionable, prioritized recommendations
  5. Implementation-Ready Output: Provides concrete API designs, timelines, and acceptance criteria

Self-Evolution Pipeline:

GitHub Issues (User Needs)
    + Market Trends (Competitive Landscape)
    + Codebase Analysis (Current Capabilities)
    → Prioritized Feature Roadmap
    → Implementation Artifacts

(conclusion)=

🎯 Conclusion

Why Custom Tools Enable Self-Evolution

The v0.1.1 custom tools feature is transformative for self-evolution because it:

  1. Lowers Integration Barrier: Users can add domain-specific capabilities with small amount of Python code
  2. Enables Data Access: Custom tools provide structured API access (vs. unstructured web scraping)
  3. Maintains Type Safety: Automatic schema generation from type hints ensures correctness
  4. Works Everywhere: Cross-backend compatibility means tools work with any model
  5. Composes with Existing Tools: Custom tools + web search + filesystem = powerful synthesis

Broader Implications

For MassGen Development:

For MassGen Users:

Comparison to Baseline

Capability Pre-v0.1.1 v0.1.1 Custom Tools
GitHub Issue Analysis ❌ Web scraping only ✅ Structured API access
Multi-Source Synthesis ⚠️ Limited (web + code) ✅ Comprehensive (API + web + code)
Extension Mechanism ❌ Requires backend impl ✅ Simple Python function
Type Safety ❌ Manual validation ✅ Automatic from hints
Cross-Backend ❌ Backend-specific ✅ Works everywhere
Self-Evolution ⚠️ Basic (web research only) ✅ Advanced (API + analysis)

Success Metrics

âś… Custom Tool System Works: Tool registered and discovered correctly

âś… Tool Discovery: Agents found and used custom_tool__fetch_github_issues

âś… Multi-Source Analysis: Combined GitHub API + web search + filesystem

âś… Structured Data Access: Tool provided categorized, engagement-ranked issues

âś… Self-Evolution: Produced actionable, data-driven product roadmap

âś… Multi-Agent Collaboration: Both agents used tools; consensus reached via voting

Next Steps for Self-Evolution

This case study demonstrates market-driven self-evolution. The custom tools system enables progression to more advanced capabilities, which will be explored in future versions:


(status-tracker)=

📌 Status Tracker