JOB DETAILS

AI Solutions and Platforms Operations Engineer

CompanyPepsiCo
LocationIndia
Work ModeOn Site
PostedMay 18, 2026
About The Company
PepsiCo is a playground for curious people. We invite thinkers, doers, and changemakers to champion innovation, take calculated risks, and challenge the status quo. From executives to team members on the front lines, we’re excited about the future. We take chances. Together, we dare to make the world a better place. Our associates are the magic ingredient. Each of them plays an integral role in helping create deep connections between people and our products. Think about your last group celebration: Chances are, one of our iconic brands was by your side. At PepsiCo, you’re invited to be a part of a global team of innovators who make, move, and sell these products—which are enjoyed by more than 1 billion people a day. A career at PepsiCo means working in a culture where everyone’s welcome. Here, you can dare to be yourself. No matter who you are or where you’re from, you can influence the people around you and the world at large. By showing up, you’ll have the opportunity to learn, develop and grow your skills for the future. Our supportive teams can fuel your professional goals to make a global impact on people and the planet. Join us. Dare for Better.
About the Role
Overview

The AI Observability Engineer (Agentic Frameworks & AI Agent Operations Center Developer) builds and operationalizes agentic AI solutions using modern orchestration frameworks and contributes to an AI Agent Operations Center that enables safe, reliable, and observable agent behavior at scale. This role focuses on developing agent workflows (planning, tool execution, memory, and RAG), integrating guardrails and evaluations, and delivering operational capabilities such as run management, telemetry, and incident triage for production agents. 


Responsibilities

  1. AI Agent Operations Center (70%)
    • Build “operations center” capabilities for agent runtime management: agent registry, versioning, deployment tracking, and run histories
    • Enable operational workflows such as incident triage, replay/debug runs, trace correlation, and root-cause analysis across agent steps
    • Implement operational dashboards and views for agent health: success rate, latency, tool failure rate, cost per run, and loop detection
    • Instrument agent flows end-to-end using OpenTelemetry (or equivalent), enabling correlation across prompts, tool calls, retrieval, and responses
    • Implement semantic conventions and tagging standards (agent name/version, tool name, model provider, environment, tenant/app)
    • Partner with SRE/observability teams to ensure production-grade monitoring, alerting, and operational readiness
  2. Collaboration with Teams (10%)
    • Collaborate with transformation teams and business stakeholders to understand requirements and tailor AI agents to specific domains.
    • Work closely with AI platform teams to build scalable and cross-domain AI agents while ensuring end-to-end observability.
  3. Integration & Deployment (10%)
    • Build and maintain CI/CD pipelines for agent services and operations center components, including automated testing and deployment
    • Automate onboarding for new agent use cases (templates, scaffolding, configuration checks)
    • Drive best practices for secure, scalable, and cost-effective agent deployments
  4. Continuous Learning (10%)
    • Stay updated with the latest advancements in AI and machine learning technologies and integrate these into existing or new AI agents.
    • Conduct thorough testing and validation to ensure the reliability and accuracy of AI agents and solutions.

Qualifications

Key Skills/Experience Required Minimum Qualifications:

 

  • Education: Bachelor’s in Computer Science, AI/ML, Data Science, or a related field.
  • Experience: 3–5+ years of software engineering experience; 1+ years building and observe AI/ML or GenAI applications preferred
  • Required Expertise:
    • Hands-on experience with agentic frameworks (Crew.ai, LangChain, Semantic Kernel, AutoGen, or similar)
    • Proficiency in Python (primary) and familiarity with APIs/microservices patterns
    • Strong experience with RAG patterns (embeddings, vector search, retrieval evaluation, chunking strategies)
    • Experience with cloud environments (Azure/AWS/GCP) and containerized deployments (Kubernetes/AKS/EKS)
    • Familiarity with observability fundamentals (logs/metrics/traces) and production troubleshooting
    • Experience building internal developer platforms or operational consoles (agent registry, run tracking, dashboards)
    • Familiarity with OpenTelemetry, distributed tracg, and telemetry pipelines
    • Experience with Azure AI Search / vector databases, prompt/version management, and evaluation frameworks
    • Knowledge of Responsible AI practices: data handling, safety guardrails, audit trails, and redaction strategies
    • FinOps exposure: token/GPU cost optimization and chargeback/showback reporting
 
 
  • Technical Proficiency: Agent orchestration design (planning, tool execution, memory, RAG), Strong engineering discipline: testing, versioning, CI/CD, automation, Operational mindset: reliability, debuggability, and incident response support
  • Problem-Solving: Ability to translate business challenges into technical solutions.
  • Collaboration Skills: Effective at working within cross-functional teams.
  • Agility: Flexibility to adapt to changing requirements and new technologies.
  • Communication Skills: Capable of explaining complex technical concepts to non-technical stakeholders.

 

 

 

 

Key Skills
Agentic FrameworksPythonRAGOpenTelemetryKubernetesAzure AI SearchCI/CDVector DatabasesDistributed TracingResponsible AIFinOpsMicroservices
Categories
SoftwareTechnologyEngineeringData & Analytics
Job Information
📋Core Responsibilities
Build and operationalize an AI Agent Operations Center to manage agent runtime, versioning, and telemetry. Develop agent workflows including planning, tool execution, and RAG while ensuring production-grade monitoring and observability.
📋Job Type
full time
📊Experience Level
2-5
💼Company Size
137351
📊Visa Sponsorship
No
💼Language
English
🏢Working Hours
40 hours
Apply Now →

You'll be redirected to
the company's application page