JOB DETAILS

Lead SW Architect

CompanyNeuReality
LocationIsrael
Work ModeOn Site
PostedJune 14, 2026
About The Company
AI infrastructure has a hidden problem: the network and orchestration layer. As models scale to trillions of parameters and inference demand explodes, two bottlenecks emerge: how data moves between GPUs and how workloads are managed across them. The industry added more GPUs, scaled clusters, optimized models. But utilization still hovers around 50-70%. The compute is there, idle, burning watts. The bottleneck isn't the silicon. It's how data moves and how work gets distributed. Traditional networking was built for general-purpose workloads, not AI's east-west traffic and microsecond-sensitive synchronization. Traditional orchestration treats GPUs as generic compute, blind to the demands of prefill, decode, and model synchronization. Every GPU cycle wasted waiting is money and energy lost. We asked: What if the network wasn't just faster, but intelligent? What if orchestration understood AI workloads natively? NR-NEXUS is an inference operating system for large-scale inference. Hardware-agnostic, it unifies fragmented open-source frameworks into a single production platform, running across hyperscale clouds, GPU clusters, and emerging XPUs. NR2 AI-SuperNIC eliminates data-movement bottlenecks limiting GPU utilization. It executes the networking data path in hardware with no CPUs in the critical path, integrates in-network compute to offload communication operations, and supports open Ethernet-based networking. Together, they transform distributed GPU and XPU clusters into high-throughput token factories. The result: GPUs at near-100% utilization. Inference scales without adding racks. Energy consumption drops. This isn't incremental optimization. It's rethinking the data path and control plane so AI infrastructure matches AI ambition. For our customers: maximum performance from existing hardware. Lower cost, lower power, lower latency, higher throughput. NeuReality is headquartered in Tel Aviv with offices across North America and Europe.
About the Role

NeuReality is seeking a Lead System Architect to join our system architecture team and help define NR-NEXUS, our next-generation AI inference platform.

Responsibilities

  • Lead the software architecture and technical roadmap for NeuReality’s NR-Nexus
  • Write system specifications for NR-Nexus product
  • Research AI infrastructure, SaaS platforms, model serving, and inference trends
  • Work with engineering to translate technical capabilities into product value
  • Work closely with engineering teams to optimize performance, scalability, and feature delivery.
  • Define performance goals and lead profiling, benchmarking, and optimization efforts for GenAI and distributed AI workloads.
  • Collaborate with customers, partners, and open-source communities to ensure ecosystem compatibility and adoption.
  • Mentor software engineers and provide technical leadership

Requirements

  • 7+ years of software engineering experience, including 3+ years in software architecture or technical leadership.
  • Strong experience with Kubernetes-based platforms and cloud-native architecture.
  • Deep understanding of Gen AI/LLM infrastructure and distributed workloads
  • Experience designing management software or SaaS platforms for production systems.
  • Strong background in distributed systems, microservices, APIs, and automation.
  • Hands-on experience with observability stacks, monitoring, logging, alerting, and SLA/SLO tracking.
  • Experience with CI/CD, deployment automation, upgrades, and rollback mechanisms.
  • Good understanding of security, authentication, authorization, and integration with customer data center environments.

Nice to have

  • Deep understanding of GenAI / LLM inference infrastructure, including model serving, scaling, batching, latency, throughput, and resource utilization.
  • Experience with production AI inference clusters using GPUs, AI accelerators, or other specialized compute infrastructure.
  • Understanding of how distributed inference systems operate, including scheduling, load balancing, autoscaling, failover, and cluster-level observability.
  • Experience with LLM serving frameworks such as vLLM, Triton Inference Server, TensorRT-LLM, or similar.
  • Familiarity with GPU/accelerator orchestration, device plugins, resource scheduling, and cluster capacity planning.
  • Familiarity with GPU communication technologies such as GPUDirect RDMA, NCCL, NVLink, or UALink.
  • Experience optimizing communication for distributed AI/ML workloads.
  • Knowledge of Prometheus, Grafana, OpenTelemetry, Helm, Argo CD, Istio, KServe, Kubeflow, or similar tools.
  • Experience deploying software in on-prem, edge, private cloud, or hybrid environments.


Key Skills
Software ArchitectureKubernetesCloud-Native ArchitectureGenAI/LLM InfrastructureDistributed SystemsMicroservicesSaaS PlatformsObservability StacksCI/CDGPU OrchestrationModel ServingAPI Design
Categories
SoftwareTechnologyEngineeringManagement & LeadershipData & Analytics
Job Information
📋Core Responsibilities
Lead the software architecture and technical roadmap for the NR-Nexus AI inference platform. Collaborate with engineering teams to optimize performance, scalability, and feature delivery for distributed AI workloads.
📋Job Type
full time
📊Experience Level
5-10
💼Company Size
80
📊Visa Sponsorship
No
💼Language
English
🏢Working Hours
40 hours
Apply Now →

You'll be redirected to
the company's application page