JOB DETAILS

Site Reliability Engineer

CompanyCogent People Inc
LocationColumbia
Work ModeOn Site
PostedJune 8, 2026
About The Company
Cogent People is an expert IT and business consulting company serving federal and state government agencies. With more than twelve years of experience in data innovation, software engineering, and systems integration, we have established a reputation in the government contracting industry for exceeding expectations and providing long-lasting value, progressive solutions, and superior delivery and results. We are recognized for our integrity and commitment to excellence. We are a certified 8(a), minority-owned small and disadvantaged business based in Columbia, MD.
About the Role

Description

 

Employment Type: Full-time, W2 position with Cogent People Inc. This is a direct hire position with full benefits. 


Location: Hybrid Columbia MD 3 times per week OR Remote (as applicable to role) 


Work Authorization Requirements 


To comply with government contracting requirements, candidates must meet all of the following: 

  • Must be a U.S. Citizen, Permanent Resident, or valid EAD holder  
  • Must have lived in the United States for at least 3 of the past 5 years  
  • Must be currently authorized to work in the U.S. without sponsorship  

Sponsorship (H-1B) is not available for this position (now or in the future). 


Candidates who do not meet these requirements will not be considered. 


Clearance Requirement 


Public Trust required or ability to obtain, depending on assignment. 


About Cogent People Inc. 


Cogent People Inc. is a government consulting and technology services firm supporting mission-critical federal and commercial programs. We deliver secure, scalable, and modern digital solutions across complex IT environments. 

Our teams thrive at the intersection of engineering excellence and mission impact, building systems that matter. 


Job Overview  

 

Cogent People Inc. is seeking a Site Reliability to support system reliability, monitoring, and operational stability across environments.


This role is responsible for implementing observability and automation practices, supporting production systems, and ensuring system performance and availability. The position plays a key role in incident response, root cause analysis, and ongoing system optimization in collaboration with DevOps and development teams.


The ideal candidate will bring experience in system monitoring, DevOps practices, and production support, along with the ability to collaborate across cross-functional engineering teams in a fast-paced environment.

This position may be contingent upon contract award.

Requirements

 

What You'll Do 


System Reliability & Observability 

  • Support system reliability, monitoring, and operational stability across environments  
  • Implement and maintain observability practices, including monitoring, logging, and alerting  
  • Contribute to automation efforts that improve system reliability and operational efficiency  

Incident Response & Performance Optimization 

  • Participate in incident response activities and production support  
  • Perform root cause analysis for system issues and outages  
  • Support performance optimization and tuning of applications and infrastructure  

DevOps & Collaboration 

  • Work with DevOps and development teams to maintain production readiness  
  • Contribute to continuous improvement of deployment and operational processes  
  • Collaborate across engineering teams to support stable and scalable systems  

What We’re Looking For 

  • Bachelor’s degree in Computer Science, Information Systems, or a related field, or an equivalent combination of education and experience  
  • Experience in system reliability, DevOps, or production support roles  
  • Experience with monitoring, logging, and observability tools  
  • Understanding of incident management and root cause analysis processes  
  • Familiarity with cloud environments and infrastructure concepts  
  • Experience supporting automated deployment or operational workflows  
  • Strong problem-solving and troubleshooting skills  
  • Excellent written and verbal communication skills  
  • Ability to work effectively in fast-paced, production-critical environments  
  • Strong collaboration skills across development and operations teams  

What Will Set You Apart 

  • Experience with AWS or other cloud platforms  
  • Familiarity with infrastructure-as-code tools (e.g., Terraform or similar)  
  • Experience with tools such as Splunk, Datadog, Prometheus, or similar observability platforms  
  • Experience with CI/CD pipelines and DevOps automation tools  
  • Prior experience supporting enterprise-scale or regulated environments  
  • Knowledge of application performance tuning and distributed systems behavior 

Why Cogent People Inc.? 


At Cogent People, we combine technical excellence with a mission-driven culture. Our teams work on meaningful, high-impact projects that support government and enterprise transformation initiatives. 


We offer: 

  • Competitive compensation  
  • Career growth and professional development opportunities  
  • Exposure to complex, mission-critical systems  
  • A collaborative and supportive team environment  
  • Long-term client engagements with stability and continuity  

We are a Certified Great Place to Work, committed to building an inclusive and high-performance culture. 


Benefits 

  • Medical, Dental, and Vision Insurance (comprehensive coverage)  
  • 401(k) with company match  
  • Company-paid life insurance  
  • Short-term and long-term disability coverage  
  • Paid Time Off: 3 weeks annually + 10 paid holidays  
  • Employee assistance and wellness resources (as applicable)  

Compliance Notice 


Cogent People Inc. conducts employment verification for all candidates. Misrepresentation of work authorization, residency history, or professional experience will result in disqualification. 


We are an Equal Opportunity Employer (EEO) and evaluate all applicants based on qualifications, experience, and role requirements. 


We do not engage third-party recruiters for this role unless explicitly stated.

Key Skills
System ReliabilityObservabilityIncident ResponseRoot Cause AnalysisDevOpsProduction SupportCloud InfrastructureAutomationMonitoringLoggingAlertingPerformance TuningCI/CDInfrastructure As CodeTroubleshootingCollaboration
Categories
TechnologySoftwareEngineeringGovernment & Public SectorConsulting
Benefits
Medical InsuranceDental InsuranceVision Insurance401(k) With Company MatchCompany-paid Life InsuranceShort-term Disability CoverageLong-term Disability CoveragePaid Time OffEmployee Assistance And Wellness Resources
Job Information
📋Core Responsibilities
The role focuses on maintaining system reliability, observability, and operational stability across environments. Key duties include implementing monitoring practices, managing incident response, and collaborating with DevOps teams for system optimization.
📋Job Type
full time
📊Experience Level
2-5
💼Company Size
30
📊Visa Sponsorship
No
💼Language
English
🏢Working Hours
40 hours
Apply Now →

You'll be redirected to
the company's application page