All roles

Mathematics Model Prompt Evaluator

Remote · USA Full-time New today

Role Overview

We are seeking expert mathematicians to author and verify high-quality open-ended prompts for AI model evaluation. You will craft and review challenging, unambiguous mathematical problems across core subdomains, assessing AI reasoning quality and helping establish rigorous evaluation standards for frontier language models.

  • *You will be assigned one of two task types:**

• *Authoring Task** Create 5 original, open-ended prompts from your assigned subdomain at varying difficulty levels (undergraduate, advanced undergraduate, or graduate/professional). Prompts should require human judgment to evaluate the quality of the AI's response, such as chain-of-thought reasoning or proof construction.

  • *Verification Task**

Review 5 authored prompts for clarity, scope alignment, difficulty accuracy, and uniqueness. Edit prompts and difficulty ratings where needed.

  • *Mathematics Subdomains Covered**

Probability & Statistics, Algebra (incl. Linear Algebra), Ordinary/Partial Differential Equations & Dynamical Systems, Geometry, Graph Theory, Number Theory.

  • *Key Responsibilities**

- Author clear, unambiguous, open-ended mathematical prompts that elicit evaluable AI responses - Verify prompts are within the scope of the assigned subdomain and correctly rated for difficulty - Ensure all 5 prompts in a task are sufficiently distinct from one another with varying difficulty levels - Apply expert judgment to assess the depth and quality of mathematical reasoning required - Edit prompts and difficulty assignments where standards are not met

  • *Ideal Qualifications**

- Master's degree or higher in Mathematics, Applied Mathematics, Statistics, or a closely related field - 2–6 years of professional or research experience in a quantitative field - Strong command of graduate-level mathematical concepts including proof writing, analysis, and formal reasoning - Experience in academic research, mathematical competition design, or quantitative industry roles is a plus - Excellent written English and ability to craft precise, well-scoped technical questions

  • *More About the Opportunity**

- Expected commitment: 10+ hours/week - Asynchronous, fully remote work Apply tot his job Apply To this Job

Related roles

Safety Evaluator

Remote · USA Full-time

CRM Product Owner, GTM

Remote · USA Full-time

Product Manager - Vault CRM Suite

Remote · USA Full-time

T&D Project Manager (Remote + Travel)

Remote · USA Full-time

Customer Service (remote work , no vaccination required)

Remote · USA Full-time

Join Our Team of Remote Customer Service Pros - Earn 19 Per Hour

Remote · USA Full-time

Project Manager with Snowflake

Remote · USA Full-time

JDE ERP IT Project Manager (Remote EST, CST, MST ONLY)

Remote · USA Full-time

Senior Product Owner, MarTech

Remote · USA Full-time

Clinical Product Manager - Global Endoscopy (Remote)

Remote · USA Full-time

Experienced Customer Service Representative – Remote Opportunities at arenaflex

Remote · USA Full-time

Praktikum Social Media Editing (m/w/d) Standort München oder Remote deutschlandweit

Remote · USA Full-time

QA Engineer — White Box & Black Box Testing

Remote · USA Full-time

Experienced Customer Care Representative – Remote Customer Support for arenaflex Pharmacy

Remote · USA Full-time

Experienced Tax Data Entry and Filing Intern – Spring 2025

Remote · USA Full-time

Virtual Customer Representative (Entry Level) – Financial Planning and Advisory Services

Remote · USA Full-time

Experienced Full Stack Customer Service Representative – Remote Call Center

Remote · USA Full-time

Senior Actuarial Analyst (Medicaid Risk Adjustment)- REMOTE

Remote · USA Full-time

Part-Time Remote Customer Service Associate – Flexible Hours, Home‑Based Support Role at arenaflex

Remote · USA Full-time

Experienced Data Entry Specialist – Remote Opportunity with arenaflex

Remote · USA Full-time