Millisecond AI: LLM Inference at the 5G Edge

AI & Data Innovation

Talk

Session Code

Sess-17

Day 1

11:30 - 12:00 EST

About the Session

With 5G’s rollout, delivering AI services in milliseconds has never been more critical. This session shows how to deploy large language models (LLMs) on edge infrastructure leveraging Open5GS for a virtualized 5G core and Ollama for local inference to slash latency and cloud expenses. We’ll walk through a production-grade architecture and demo a simulated 5G device calling an edge-hosted AI endpoint. You’ll learn how to optimize workload placement, enable CPU-only inference, and balance reliability with resource constraints when integrating AI into telecom networks. By the end, you’ll have a practical blueprint for bringing real-time intelligence to users and unlocking new edge-driven innovation in the AI & Data Innovations track.

Speakers

Marco González

Red Hat, Sr. Software Engineer

Prakash Rao

Accenture Japan, Cloud Architect