Draft a technology stack document that maps AI/ML accelerator options (Intel Habana Gaudi2, AMD Instinct, Google TPU v5e) against latency-sensitive ad-tech inference workloads at 5M req/s with sub-20ms P99 requirements

Generate draft a technology stack document that maps ai/ml accelerator options (intel habana gaudi2, amd instinct, google tpu v5e) against latency-sensitive ad-tech inference workloads at 5m req/s with sub-20ms p99 requirements for Computer Systems Design and Related Services industry

Computer Systems Design and Related Services

Agent Configuration

Login required: You need to sign in to execute this agent.

Click to upload or drag and drop

Allowed: XLSX, PDF, VSDX, PNG, CSV

Max size: 50MB

Upload existing architecture diagrams, current performance data, or baseline documentation in Excel, PDF, or Visio format
Specify the sustained traffic pattern for ad-tech inference workloads
Define the critical latency threshold and percentile requirements for ad auction decisions
Select the types of ML models currently deployed or planned for the inference workload
Define the primary optimization targets for accelerator selection
Specify regulatory or data residency constraints impacting architecture decisions
Indicate primary infrastructure environments for deployment
Specify hardware budget ranges and cost optimization priorities
Provide detailed metrics from existing GPU-based inference deployment including current bottleneck locations, measured throughput, and known latency spikes
Specify APIs, frameworks, libraries, and deployment tools that must maintain compatibility with new accelerator choice
Identify the key stakeholder who will drive final accelerator decision