-
Own the E2E framework:Build, maintain, and scale our automated testing framework using Playwright (TypeScript/Python).
-
Test the unpredictable:Design strategies to test non-deterministic LLM outputs, AI agents, and RAG pipelines where standard assertions don't always work.
-
Tackle LLM-specific challenges:Build guardrails and automated checks for prompt drift, hallucinations, latency, and context window limits.
-
Evaluate Agent behavior:Create scenarios to test how our AI agents handle edge cases, multi-step reasoning, and error recovery in real-world document processing workflows.
-
Integrate and collaborate:Wire your tests into our CI/CD pipelines to ensure we can ship quickly without breaking the core AI logic. Work closely with AI researchers, backend engineers, and product managers to define what "quality" means for an AI agent.
-
Experience:5+ years in QA Automation or Software Engineering in Test (SDET).
-
Playwright expertise:You have deep, hands-on experience building reliable, scalable test suites in Playwright. You know how to handle flaky tests, parallel execution, and complex DOM structures.
-
Coding chops:Strong programming skills in TypeScript, JavaScript, or Python.
-
AI/LLM testing experience:You understand how LLMs work under the hood. You know the challenges of testing them (non-determinism, evaluating accuracy vs. exact match, security/injection risks) and have used tools or frameworks (like LLM-as-a-judge, LangSmith, DeepEval, etc.) to evaluate them.
-
Systems thinking:You can look at a complex architecture involving a frontend, backend APIs, vector databases, and LLM endpoints, and know exactly where things are likely to break.
-
Communication:You can clearly explain complex QA issues to both highly technical machine learning engineers and non-technical stakeholders.
-
Experience in the financial tech or document automation space.
-
Familiarity with containerization (Docker, Kubernetes) and advanced CI/CD setups (GitHub Actions, GitLab CI).
-
Experience testing API performance and LLM endpoint latency.
-
We trust amazing people to do amazing thingsand make a long-term impact - we give you Freedom and ownership of meaningful work that directly impacts the business
-
We're building a positive organizational culture wherepersonal and professional growth are just as important as business growth
-
We believe different perspectives make Hypatos a better community - that is why we'recommitted to building a diverse and inclusive environment where you feel you belong
-
Beyond atop market compensation packageincludingcompany shares, you will enjoy apersonal development budget, meal allowance, sporting activities and free beers :)
-
Own the E2E framework:Build, maintain, and scale our automated testing framework using Playwright (TypeScript/Python).
-
Test the unpredictable:Design strategies to test non-deterministic LLM outputs, AI agents, and RAG pipelines where standard assertions don't always work.
-
Tackle LLM-specific challenges:Build guardrails and automated checks for prompt drift, hallucinations, latency, and context window limits.
-
Evaluate Agent behavior:Create scenarios to test how our AI agents handle edge cases, multi-step reasoning, and error recovery in real-world document processing workflows.
-
Integrate and collaborate:Wire your tests into our CI/CD pipelines to ensure we can ship quickly without breaking the core AI logic. Work closely with AI researchers, backend engineers, and product managers to define what "quality" means for an AI agent.
-
Experience:5+ years in QA Automation or Software Engineering in Test (SDET).
-
Playwright expertise:You have deep, hands-on experience building reliable, scalable test suites in Playwright. You know how to handle flaky tests, parallel execution, and complex DOM structures.
-
Coding chops:Strong programming skills in TypeScript, JavaScript, or Python.
-
AI/LLM testing experience:You understand how LLMs work under the hood. You know the challenges of testing them (non-determinism, evaluating accuracy vs. exact match, security/injection risks) and have used tools or frameworks (like LLM-as-a-judge, LangSmith, DeepEval, etc.) to evaluate them.
-
Systems thinking:You can look at a complex architecture involving a frontend, backend APIs, vector databases, and LLM endpoints, and know exactly where things are likely to break.
-
Communication:You can clearly explain complex QA issues to both highly technical machine learning engineers and non-technical stakeholders.
-
Experience in the financial tech or document automation space.
-
Familiarity with containerization (Docker, Kubernetes) and advanced CI/CD setups (GitHub Actions, GitLab CI).
-
Experience testing API performance and LLM endpoint latency.
-
We trust amazing people to do amazing thingsand make a long-term impact - we give you Freedom and ownership of meaningful work that directly impacts the business
-
We're building a positive organizational culture wherepersonal and professional growth are just as important as business growth
-
We believe different perspectives make Hypatos a better community - that is why we'recommitted to building a diverse and inclusive environment where you feel you belong
-
Beyond atop market compensation packageincludingcompany shares, you will enjoy apersonal development budget, meal allowance, sporting activities and free beers :)

BrowserStack