This case study demonstrates MassGen’s ability to achieve unanimous consensus through strategic vote switching, where agents recognize and reward superior detail and structure in responses to current events queries. This case study was run on version v0.0.3.
Command:
massgen --config @examples/basic/multi/gemini_4o_claude "Which AI won IMO 2025?"
Prompt: Which AI won IMO 2025?
Agents:
Watch the recorded demo:
| Duration: 141.0s | 735 chunks | 15 events |
Each agent approached the recent AI achievement query with different levels of detail and research depth:
Agent 1 (gemini2.5flash) provided accurate foundational information, correctly identifying that both Google’s Gemini Deep Think and OpenAI’s experimental model achieved gold medal scores. However, its initial response lacked specific performance metrics and structural organization.
Agent 2 (gpt-4o) conducted comprehensive web research and delivered a well-structured response with specific details: both models solved “five out of six problems” and achieved gold medal-level performance, with clear distinctions between official participation (Google) and independent verification (OpenAI).
Agent 3 (claude-3-5-haiku) performed extensive research with detailed background context, providing rich information about IMO structure, grading processes, and broader implications, but with less concise organization.
The voting pattern revealed a sophisticated recognition of quality and comprehensiveness:
Initial Position:
The Decisive Shift:
This resulted in a unanimous 3-0 consensus for Agent 2.
The agents specifically recognized Agent 2’s superior qualities:
Agent 2 presented the final response, featuring:
This case study exemplifies MassGen’s effectiveness in recognizing and promoting superior response quality through strategic vote switching. Rather than agents simply defending their own answers, the system demonstrated sophisticated quality assessment where Agent 1 specifically acknowledged that Agent 2’s additional detail about performance metrics made it “more comprehensive.” The unanimous consensus emerged from agents recognizing concrete improvements in structure, specificity, and presentation. This showcases MassGen’s strength in achieving quality-driven consensus on current events queries, where accuracy is baseline but comprehensiveness and clarity determine the winning response. The system successfully balanced factual accuracy with presentation quality, resulting in a final answer that was both informative and well-organized for optimal user understanding.