Introducing Tendem:
The first Hybrid AI + Human agent
Tendem is a hybrid AI + Human agent that combines AI automation with human expertise to deliver business-ready results. It automates the entire workflow—from task breakdown to expert matching and quality verification—resulting in higher-quality outputs than AI-only tools and faster, better results than human-only marketplaces.
Tendem performance validation
To assess Tendem's success, we conducted rigorous in-house testing across 94 diverse, hard real-world business tasks. The benchmark measures client-perceived outcomes across three key dimensions: result quality, execution time, and price.
Tendem Internal Benchmark
Benchmark Composition: 94 tasks across four key business functions
The success of the Tendem Agent is rooted in its process, which ensures human judgment is applied precisely where the AI is most brittle.
Testing across 94 complex business tasks reveals that a Hybrid AI + Human approach solves the "last mile" problem of quality.
74.5%
Tendem
53.2%
Upwork
40.4%
ChatGPT Agent
16.4 hours
Tendem
35.0 hours
Upwork
ChatGPT Agent
Tendem Benchmark Insights
Quality First
Tendem improves the overall Good rate by +21.3 percentage points (pp) versus Upwork. The largest quality gain is in Completeness (+22.3 pp), indicating that step-gates prevent omissions and enforce acceptance criteria.
Speed
Tendem cuts the median total time by an impressive 53% compared to the human-only baseline. This is driven by faster connection and execution times.
Closing the Gap
For ambiguous, multi-document, or spec-heavy tasks, the hybrid step-gates substantially cut omission/fabrication errors, solving the "last-mile gap" where AI-only systems fail.
Underlying Strength
Tendem's autonomous AI agent is competitive on web browsing/tool-use and close to leading models on hard knowledge, providing a solid backbone for the hybrid system.
External Industry Benchmarks
Tendem’s AI agent — tested in fully automated mode without any human input — performs on par with leading AI systems on standard industry benchmarks:
System
Humanity’s Last Exam
GAIA
Tendem’s AI Agent
71.0%
39.0%
78.2%
ChatGPT Agent
68.9%
41.6%
—
ChatGPT Deep Research
51.5%
26.6%
67.4%
Manus
—
—
73.4%
Contributors
Toloka AI
Toloka AI
