Immediate joiners required !! 2 days WFO - Hinjewadi Phase 3
** Who are we**
Fulcrum Digital is an agile and next-generation digital accelerating company providing digital transformation and technology services right from ideation to implementation. These services have applicability across a variety of industries, including banking & financial services, insurance, retail, higher education, food, healthcare, and manufacturing.
Key Responsibilities
-
Design and execute test cases for LLM agents, RAG pipelines, agentic workflows , and AI-assisted decision tools
-
Validate AI outputs against ground truth using structured accuracy scoring (NAICS, risk exposure flags, hazard group mapping)
-
Detect hallucinations, reasoning gaps, source fabrication, and misattributions in model-generated content
-
Run multi-model comparative testing across GPT, Claude, Gemini, and Perplexity — evaluating accuracy, latency, and output completeness
-
Test prompt versions iteratively and track accuracy changes across prompt cycles
-
Validate citation accuracy, document ingestion pipelines , and cross-document context handling
-
Design edge case and negative tests for AI-specific failure modes — content filter triggers, tool call limits, missing documents, and incomplete synthesis
-
Perform regression testing after model upgrades, prompt changes, and backend fixes, and maintain structured QA sign-off in JIRA
What Makes This Role Different from Traditional QA
-
You evaluate whether an AI is reasoning correctly — not just whether the UI behaves as expected
-
You build evaluation rubrics for non-deterministic outputs and apply LLM-as-a-Judge techniques to score quality at scale
-
You treat every model or prompt change as a potential accuracy regression , not just a functional one
-
You understand that in live AI systems, a passing test today does not guarantee a passing test tomorrow
Required Skillsets
-
7–8 years of QA experience with minimum 2 years in Generative AI / LLM-based projects
-
Hands-on experience testing chatbots, RAG systems, or agentic AI pipelines
-
Proven ability to perform ground truth validation and detect hallucinations and reasoning failures
-
Familiarity with multi-model evaluation , prompt-aware testing, and JIRA-based defect reporting
Preferred Skillsets
-
Background in insurance or regulated industries ; exposure to underwriting or risk classification concepts
-
Familiarity with Azure OpenAI, AWS Bedrock , or SharePoint-integrated AI environments
-
Knowledge of AI governance, content filtering, and PII redaction validation
