Data Solutions

Platform

Resource Hub

Company

Arena

Talk to us

Introducing Tendem:
The first Hybrid AI + Human agent

Tendem is a hybrid AI + Human agent that combines AI automation with human expertise to deliver business-ready results. It automates the entire workflow—from task breakdown to expert matching and quality verification—resulting in higher-quality outputs than AI-only tools and faster, better results than human-only marketplaces.

Website

Whitepaper

Dataset on GitHub

Tendem performance validation

To assess Tendem's success, we conducted rigorous in-house testing across 94 diverse, hard real-world business tasks. The benchmark measures client-perceived outcomes across three key dimensions: result quality, execution time, and price.

System Comparison

We compare Tendem (Hybrid) against Upwork (Human-only) and ChatGPT Agent (AI-only).

Real-World Constraints

Tasks mirror professional task execution, including ambiguous briefs, multi-stage complexity, and the use of attached input files.

Quality Metrics

Results are graded by independent human QA experts on a four-point scale across four criteria: Overall Quality, Accuracy, Completeness, and Style & Formatting.

Tendem Internal Benchmark

Benchmark Composition: 94 tasks across four key business functions

Operations (28 Tasks): Data structuring, scheduling, document retrieval.

Marketing (24 Tasks): Market research, content creation, proofreading.

Analysis (22 Tasks): Dashboards, exploratory data analysis, competitive research.

Sales (20 Tasks): Contact data collection, field enrichment.

Tendem Internal Benchmark Composition: 94 tasks across four key business functions: Operations (28 Tasks): Data structuring, scheduling, document retrieval. Marketing (24 Tasks): Market research, content creation, proofreading. Analysis (22 Tasks): Dashboards, exploratory data analysis, competitive research.

How Tendem achieves superior results

The success of the Tendem Agent is rooted in its process, which ensures human judgment is applied precisely where the AI is most brittle.

Plan with Gated Steps

The AI Agent decomposes the task and gates critical, high-risk steps under Human Expert supervision.

Plan with Gated Steps

The AI Agent decomposes the task and gates critical, high-risk steps under Human Expert supervision.

Hybrid Execution

The AI carries out the routine work fast, while the Human Expert steps in for contextual accuracy and refinement.

Hybrid Execution

The AI carries out the routine work fast, while the Human Expert steps in for contextual accuracy and refinement.

Multi-Layer Quality Assurance

Automated online checks are followed by a final human QA pass against users requirements

Multi-Layer Quality Assurance

Automated online checks are followed by a final human QA pass against users requirements

Tendem delivers 74.5% high-quality results, outperforming humans and pure AI

Testing across 94 complex business tasks reveals that a Hybrid AI + Human approach solves the "last mile" problem of quality.

Quality comparison

% of high-quality results

% of high-quality results

74.5%

Tendem

53.2%

Upwork

40.4%

ChatGPT Agent

Speed comparison

Total time to final results (median)

Total time to final results (median)

16.4 hours

Tendem

35.0 hours

Upwork

0.13 hours

ChatGPT Agent

Tendem Benchmark Insights

Quality First

Tendem improves the overall Good rate by +21.3 percentage points (pp) versus Upwork. The largest quality gain is in Completeness (+22.3 pp), indicating that step-gates prevent omissions and enforce acceptance criteria.

Speed

Tendem cuts the median total time by an impressive 53% compared to the human-only baseline. This is driven by faster connection and execution times.

Closing the Gap

For ambiguous, multi-document, or spec-heavy tasks, the hybrid step-gates substantially cut omission/fabrication errors, solving the "last-mile gap" where AI-only systems fail.

Underlying Strength

Tendem's autonomous AI agent is competitive on web browsing/tool-use and close to leading models on hard knowledge, providing a solid backbone for the hybrid system.

External Industry Benchmarks

Tendem’s AI agent — tested in fully automated mode without any human input — performs on par with leading AI systems on standard industry benchmarks:

System

BrowseComp

Browse
Comp

Humanity’s Last Exam

GAIA

Tendem’s AI Agent

71.0%

39.0%

78.2%

ChatGPT Agent

68.9%

41.6%

—

ChatGPT Deep Research

51.5%

26.6%

67.4%

Manus

—

73.4%

Contributors

Sergei Tilga

Toloka AI

Konstantin Chernyshev

Toloka AI

Experience Tendem for yourself

Introducing Tendem: The first Hybrid AI + Human agent

Tendem performance validation

System Comparison

Real-World Constraints

Quality Metrics

Tendem Internal Benchmark

How Tendem achieves superior results

How Tendem achieves superior results

Plan with Gated Steps

Plan with Gated Steps

Hybrid Execution

Hybrid Execution

Multi-Layer Quality Assurance

Multi-Layer Quality Assurance

Tendem delivers 74.5% high-quality results, outperforming humans and pure AI

Tendem delivers 74.5% high-quality results, outperforming humans and pure AI

Quality comparison

Quality comparison

Speed comparison

Speed comparison

Tendem Benchmark Insights

External Industry Benchmarks

Contributors

Experience Tendem for yourself

Introducing Tendem:
The first Hybrid AI + Human agent