sub-AI (opt) / III. SOVEREIGN AI INFRASTRUCTURE

AI Systems Reliability Engineering

AI infrastructure engineering for runtime, control plane, compute, serving, reliability, interoperability, and telemetry. Senior AI engineer scope covers GPU or CPU capacity plan, runtime topology, control plane, serving gateway, observability, and failover path, using model profiles, inference traffic, batch jobs, latency targets, storage movement, and operational SLO records and measurable checks for throughput, tail latency, saturation point, failure recovery, version compatibility, and cost-to-serve profile.

Architecture & Risk Blueprint

Senior AI engineering discovery for architecture, risk boundaries, data readiness, and implementation scope.
  • For AI Systems Reliability Engineering, the AI engineer will map target workflows, data owners, integration points, and known failure modes before design starts
  • Define GPU or CPU capacity plan, runtime topology, control plane, serving gateway, observability, and failover path with clear service boundaries, control points, and engineering assumptions
  • Prepare the dataset, model, runtime, access, and logging requirements needed for AI Systems Reliability Engineering
  • Build the evaluation plan for throughput, tail latency, saturation point, failure recovery, version compatibility, and cost-to-serve profile so acceptance is measurable, not impression-based
  • Document risks around unsafe action, data leakage, dependency failure, integration drift, unclear accountability, and evidence gaps and turn them into mitigation tasks with named owners
  • Deliver architecture diagrams, runbooks, test records, release notes, acceptance criteria, and engineering backlog for procurement, technical review, and implementation approval
USD 2.806.707 IDR 48.415.700.000 Request Scope

Controlled Prototype & Evaluation

A limited working system to test behavior, integration boundaries, evaluation criteria, and operational limits.
  • For AI Systems Reliability Engineering, the AI engineer will implement a controlled prototype around the highest-risk workflow, not a presentation mockup
  • Connect model profiles, inference traffic, batch jobs, latency targets, storage movement, and operational SLO records to a limited runtime using test credentials, sandbox data, and strict access separation
  • Build the first deployment pipeline, model registry, inference gateway, autoscaling policy, telemetry collector, and rollback workflow slice with traceable inputs, outputs, errors, and reviewer notes
  • Run evaluation cases for throughput, tail latency, saturation point, failure recovery, version compatibility, and cost-to-serve profile and record pass, fail, and uncertain outcomes
  • Review security, data handling, prompt or policy behavior, and integration limits before production planning
  • Produce a prototype report with architecture changes, blocked items, engineering estimates, and release criteria
USD 3.971.930 IDR 68.515.800.000 Request Scope

Operations, Reliability & Governance

Operating model for reliability, monitoring, release control, evidence records, and long-term technical governance.
  • For AI Systems Reliability Engineering, the AI engineer will set up operating controls for release approval, model or policy change, incident response, and evidence storage
  • Define service indicators for throughput, tail latency, saturation point, failure recovery, version compatibility, and cost-to-serve profile with alert thresholds, review cadence, and escalation roles
  • Create monitoring for deployment pipeline, model registry, inference gateway, autoscaling policy, telemetry collector, and rollback workflow including latency, errors, drift, cost, version changes, and abnormal behavior
  • Run tabletop recovery checks for unsafe action, data leakage, dependency failure, integration drift, unclear accountability, and evidence gaps and document the decision path for urgent rollback or shutdown
  • Prepare governance records for access review, data retention, evaluation evidence, vendor dependency, and stakeholder reporting
  • Maintain an engineering roadmap for reliability, cost, security, and capability improvements after launch
USD 7.456.000 IDR 128.616.000.000 Request Scope