Principal Coding Annotator & LLM Evaluation Engineer
Braintrust · Royaume-Uni
Job description
About the role
We are building and evaluating state-of-the-art large language models (LLMs) and need an experienced software engineer to join our evaluation and annotation team. This six‑month contract (with possible extension) focuses on designing coding benchmarks, assessing model outputs, and feeding insights back into model improvement pipelines.
Key responsibilities
- Create high‑quality coding prompts and reference solutions similar to benchmark suites such as SWE‑Bench.
- Evaluate LLM‑generated code for tasks including generation, refactoring, debugging, and implementation.
- Identify, document, and analyze model failure modes, edge cases, and reasoning gaps.
- Conduct head‑to‑head comparisons between private Mistral‑based LLMs and leading external models.
- Build or configure coding environments that support evaluation and reinforcement‑learning workflows.
- Follow detailed annotation and evaluation guidelines to ensure high consistency.
Required profile
- 10+ years of professional software development experience.
- Strong Python programming skills (required) and knowledge of at least one additional language (bonus).
- At least 1 year of experience in coding annotation or LLM evaluation, even part‑time.
- Proven ability to apply structured evaluation criteria and write clear technical feedback.
- Fluent written and spoken English.
- Team‑lead or mentoring experience is a strong plus.
Required skills
- Python
What we offer
- Contract engagement of 6 months with potential for long‑term extension.
- Option to work onsite in Paris or London, or remotely from anywhere in Europe for strong candidates.
- Opportunity to work hands‑on with cutting‑edge LLMs and influence model reliability.
- High‑impact technical work within a focused senior team.
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 2 hours ago
Expires 1 month from now
1 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Braintrust
Royaume-Uni