README

This document describes the architecture of the Atriva AI Inference Service, a RESTful API backend that processes shared decoded frames from the video-pipeline container and returns inference results to the Atriva Core API.

1. System Overview

The Atriva AI Inference Service operates as a microservice within the Atriva ecosystem, bridging the video-pipeline and the Core API through AI-powered analysis.

┌─────────────────────────────────────────────────────────────────────────────┐
│                           Atriva Platform                                   │
│                                                                             │
│  ┌──────────────────┐    Shared Volume    ┌──────────────────────────────┐  │
│  │  Video-Pipeline  │ ──────────────────► │   AI Inference Service       │  │
│  │    Container     │   (Decoded Frames)  │      (OpenVINO)              │  │
│  │                  │                     │                              │  │
│  │  • RTSP/RTMP     │                     │  • FastAPI REST API          │  │
│  │  • Decode        │                     │  • OpenVINO Runtime          │  │
│  │  • Frame Export  │                     │  • Model Management          │  │
│  └──────────────────┘                     └──────────────┬───────────────┘  │
│                                                          │                  │
│                                                          │ Inference        │
│                                                          │ Results          │
│                                                          ▼                  │
│                                           ┌──────────────────────────────┐  │
│                                           │   Atriva Core API Backend    │  │
│                                           │                              │  │
│                                           │  • Event Processing          │  │
│                                           │  • Detection Storage         │  │
│                                           │  • Alert Management          │  │
│                                           └──────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────────┘

2. Core Components

2.1 FastAPI REST Server

The service exposes a RESTful API built with FastAPI, providing:

Component	Purpose
`main.py`	Application entry point, server configuration
`routes.py`	API endpoint definitions and request handling
`services.py`	Business logic and model inference orchestration
`models.py`	Pydantic schemas for request/response validation

2.2 Shared Frame Access Layer

Frames decoded by the video-pipeline container are accessed through a shared volume:

/shared/frames/
├── camera1/
│   ├── frame_0001.jpg
│   ├── frame_0002.jpg
│   └── latest.jpg
├── camera2/
│   └── ...
└── metadata/
    └── cameras.json

The shared_data.py module handles:

Camera discovery and enumeration
Frame file access and validation
Latest frame retrieval per camera
Frame metadata management

2.3 OpenVINO Inference Engine

The inference backend leverages Intel OpenVINO for optimized execution:

┌─────────────────────────────────────────────────────┐
│              OpenVINO Inference Engine              │
│                                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │
│  │   Model     │  │  Compiled   │  │   Infer     │  │
│  │   Loading   │──│   Model     │──│   Request   │  │
│  │   (.xml)    │  │   Cache     │  │   Queue     │  │
│  └─────────────┘  └─────────────┘  └─────────────┘  │
│                                                     │
│  Supported Devices: CPU (INT8/FP16/FP32)            │
└─────────────────────────────────────────────────────┘

2.4 Model Registry

Pre-trained models optimized for OpenVINO:

Model	Type	Use Case
YOLOv8n	Detection	General object detection
YOLOv8s/m	Detection	Higher accuracy detection
LPRNet	Recognition	License plate reading
Vehicle Tracking	Detection + Tracking	Vehicle analytics

3. Data Flow

3.1 Frame Ingestion Flow

Video-Pipeline Container              AI Inference Service
        │                                     │
        │  1. Decode RTSP/RTMP Stream         │
        ▼                                     │
  ┌───────────┐                               │
  │  Decoded  │                               │
  │   Frame   │                               │
  └─────┬─────┘                               │
        │                                     │
        │  2. Write to Shared Volume          │
        ▼                                     │
  ┌───────────┐    3. Read Frame        ┌─────┴─────┐
  │  /shared/ │ ──────────────────────► │  Frame    │
  │  frames/  │                         │  Loader   │
  └───────────┘                         └─────┬─────┘
                                              │
                                              ▼
                                        ┌───────────┐
                                        │ Inference │
                                        │  Engine   │
                                        └───────────┘

3.2 Inference Request Flow

┌──────────────────┐
│  Atriva Core API │
│    or Client     │
└────────┬─────────┘
         │
         │  POST /inference/latest-frame
         │  POST /shared/cameras/{id}/inference
         ▼
┌──────────────────────────────────────────────────────┐
│                 AI Inference Service                 │
│                                                      │
│  1. Validate Request                                 │
│       │                                              │
│       ▼                                              │
│  2. Load Frame from Shared Volume                    │
│       │                                              │
│       ▼                                              │
│  3. Preprocess (Resize, Normalize, Tensor Convert)   │
│       │                                              │
│       ▼                                              │
│  4. Run OpenVINO Inference                           │
│       │                                              │
│       ▼                                              │
│  5. Post-process (NMS, Decode Boxes, Filter)         │
│       │                                              │
│       ▼                                              │
│  6. Return JSON Response                             │
│                                                      │
└──────────────────────────────────────────────────────┘
         │
         │  { "detections": [...], "camera_id": "..." }
         ▼
┌──────────────────┐
│  Atriva Core API │
└──────────────────┘

4. API Integration Points

4.1 Inbound APIs (Consumed by Core API)

Endpoint	Method	Description
`/health`	GET	Service health and shared volume status
`/models`	GET	Available models list
`/shared/cameras`	GET	Cameras with available frames
`/shared/cameras/{id}/inference`	POST	Run inference on camera’s latest frame
`/inference/latest-frame`	POST	Inference on specified camera frame
`/inference/background`	POST	Start background inference worker

4.2 Outbound Communication (To Core API)

Inference results are returned synchronously via HTTP response:

{
  "camera_id": "camera1",
  "frame_path": "/shared/frames/camera1/latest.jpg",
  "timestamp": "2025-12-14T10:30:00Z",
  "model_name": "yolov8n",
  "detections": [
    {
      "class_id": 2,
      "class_name": "car",
      "confidence": 0.92,
      "bbox": [150, 200, 350, 450]
    }
  ]
}

5. Deployment Architecture

5.1 Container Topology

┌─────────────────────────────────────────────────────────────┐
│                      Docker Host                            │
│                                                             │
│  ┌─────────────────┐         ┌─────────────────────────┐    │
│  │ video-pipeline  │         │   ai-inference          │    │
│  │                 │         │                         │    │
│  │  Port: N/A      │         │  Port: 8001:8001        │    │
│  │                 │         │                         │    │
│  │  Volumes:       │         │  Volumes:               │    │
│  │  - /shared      │◄───────►│  - /shared (readonly)   │    │
│  │                 │         │  - /models              │    │
│  └─────────────────┘         └─────────────────────────┘    │
│                                       │                     │
│                                       │ REST API            │
│                                       ▼                     │
│                              ┌─────────────────────────┐    │
│                              │   atriva-core-api       │    │
│                              │                         │    │
│                              │  Port: 8080:8080        │    │
│                              └─────────────────────────┘    │
└─────────────────────────────────────────────────────────────┘

5.2 Volume Mounts

Volume	Purpose	Access
`/shared/frames`	Decoded video frames	Read-only
`/models`	OpenVINO model files	Read-only
`/tmp/inference`	Temporary processing	Read-write

6. Performance Considerations

6.1 Accelerator Selection

Accelerator	Precision	Use Case
`cpui8`	INT8	Maximum throughput, slight accuracy trade-off
`cpu16`	FP16	Balanced performance/accuracy
`cpu32`	FP32	Maximum accuracy, baseline performance

6.2 Optimization Strategies

Model Caching: Compiled models are cached to avoid recompilation
Frame Polling: Efficient file-based frame access from shared volume
Async Inference: Non-blocking inference for high-throughput scenarios
Batch Processing: Multiple frames processed in single inference call (when supported)

7. Security Considerations

Shared volume mounted as read-only for frame access
API endpoints validated with Pydantic schemas
No direct external network access required
Internal service communication only

Next Steps

➡️ API Endpoints — Detailed endpoint documentation
➡️ Models — Supported models and preparation
➡️ Development — Local development setup

Atriva Edge AI Platform