Wetstone logowetstone.
Now in private beta
// for hiring teams

Your take-home just got solved by GPT in 30 seconds

Wetstone is how you hire engineers who can actually judge, spec, and debug AI-generated system and code design — not just prompt it.

// 01 · the problem

The interview is broken.
You already know this.

01

Take-homes are LLM fodder.

Any candidate with Claude Code passes your take-home. Signal is zero.

02

Live coding is theater.

Either you watch them fight an IDE, or you watch them fight an AI tool they'd never use on the job.

03

You still don't know if they can design.

None of this tells you whether they'll catch a load-bearing flaw in an AI-generated system or a subtle bug in AI-generated code.

// 02 · the platform

Everything you'd expect from a technical assessment platform. Built for 2026.

Custom problem sets

Pick from 500+ problems or commission private ones tied to your stack.

Live and take-home modes

Timed, proctored, or async. Both work.

Auto-graded submissions

Code execution + LLM-judge harness + rubric scoring on design and correctness.

Integrated video interviews

Screen share + code editor + playback. No Zoom tab chaos.

Plagiarism & AI-use detection

We can see when candidates pasted from another model.

Candidate scorecards

Rubric-level breakdowns on system and code design, not just pass/fail.

ATS integrations

Greenhouse, Lever, Ashby, Workable.

Team dashboards

Track funnel metrics, calibrate interviewers, compare candidates fairly.

Wetstone Rating verification

Candidates can share their public rating directly into your pipeline.

SOC 2 + SSO

Because your security team will ask.

// 03 · how a problem works

One bug. Four steps.

// Read plausible AI-generated code, find the failing assumption, patch it, and see how the top 1% diagnosed it.

retrieve.py
BUG-HUNT · 002
def retrieve_context(query: str, k: int = 5):
embedding = embed(query)
results = vector_store.query(embedding, top_k=k)
docs = [r.document for r in results]
reranked = sorted(docs, key=lambda d: d.score)
return reranked[:k]
# downstream:
context = retrieve_context(user_question)
answer = llm.generate(prompt, context)

// 1,247 attempted · 31% caught it · median 4m 12s

Try a sample →
  1. 01
    Read

    Skim the code and scenario. The bug is plausible by design.

  2. 02
    Diagnose

    Click the suspicious line. Explain the failing assumption in one sentence.

  3. 03
    Fix & submit

    Patch it. Hidden tests run against your change.

  4. 04
    Compare

    See how the top 1% diagnosed it, and how fast.

// 04 · how it works

Three steps from broken loop to better signal.

01

Kickoff

30-minute call to match problems to your stack and bar.

02

Deploy

Wetstone link replaces your take-home. Send it to candidates today.

03

Hire sharper

You get a calibrated signal on AI-generated system and code design. We track outcomes with you.

// 05 · pricing

Honest pricing. Annual discounts.

Starter
Free

3 assessments

for teams wanting to try out

Startup
$500/mo

20 assessments

for teams hiring 1–3 engineers

Most popular
Growth
$2,000/mo

unlimited assessments · 1 team

for 5–20 hires/year

Enterprise
Custom

SSO, SOC 2, custom problem authoring

talk to us

Get a demo.

// Our team will reach out within one business day.

// We never share your email. SOC 2 in progress.

For Hiring Teams — Wetstone