# EU AI Act Annex III

EU Regulation 2024/1689 — **Artificial Intelligence Act**. Edge is an **AI evaluation system**, not an AI system in the regulated sense. But its outputs (eval scores, golden datasets) are used to *evaluate* AI systems that may themselves be classified as **high-risk** under Annex III. The mapping below documents how Edge supports the deployer of a high-risk AI system.

## When does Annex III apply?

| Annex III area                    | Example AI use covered                                       |
| --------------------------------- | ------------------------------------------------------------ |
| 1. Biometrics                     | Remote biometric identification                              |
| 2. Critical infrastructure        | Safety components of digital infra                           |
| 3. Education                      | Admission, evaluation, exam proctoring                       |
| 4. Employment                     | CV screening, evaluation, monitoring                         |
| 5. Essential services             | **Credit scoring, life and health insurance** ← banking core |
| 6. Law enforcement                | Risk assessment, evidence reliability                        |
| 7. Migration, asylum, border      | Risk assessment, document verification                       |
| 8. Justice & democratic processes | Decision-support for judges                                  |

Banking AI features (credit scoring, fraud detection on customer treatment, KYC AI) fall under **Area 5**. AI features used internally for advisor productivity *may* be outside Annex III but are commonly held to similar standards by the bank's GRC.

## How Edge contributes to provider/deployer obligations

| Obligation                              | Article | How Edge contributes                                                                                                    |
| --------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------- |
| **Risk management system**              | Art 9   | Edge produces the evidence that feeds the risk register: eval scores per release, regression alerts, model drift trends |
| **Data and data governance**            | Art 10  | Frozen golden datasets are the evaluation ground truth; lifecycle (DRAFT → APPROVED → FROZEN) is logged                 |
| **Technical documentation**             | Art 11  | This whole site, plus the eval report exports per release                                                               |
| **Record-keeping**                      | Art 12  | `evaluation_runs` collection is the append-only record of every eval; retention indefinite                              |
| **Transparency**                        | Art 13  | Per-item drilldown explains why a model passed or failed each metric                                                    |
| **Human oversight**                     | Art 14  | Annotation Studio + Golden Review workflow keep humans in the loop on every golden item                                 |
| **Accuracy, robustness, cybersecurity** | Art 15  | The 24 evaluation metrics measure accuracy; bias/toxicity guardrails are planned (Gate 18)                              |
| **Post-market monitoring**              | Art 72  | Production traffic can run online-capable metrics (7 of the 24) without ground truth                                    |

## Online-capable metrics (Art 72 post-market monitoring)

These 7 metrics can run on production traffic without golden answers:

| Metric                 | Why it's online-capable                     |
| ---------------------- | ------------------------------------------- |
| `answer_relevancy`     | Only needs question + answer                |
| `faithfulness`         | Needs answer + retrieved context (RAG path) |
| `contextual_relevancy` | Needs question + retrieved context          |
| `hallucination`        | Needs answer + context                      |
| `bias`                 | Needs answer only                           |
| `toxicity`             | Needs answer only                           |
| `directive_compliance` | Needs answer + retrieved context            |

See [Architecture / Data flow](/architecture/data-flow.md) for the execution path.

## Model card publication (planned — Gate 19)

Each Edge release will publish a model card describing:

* Evaluation methodology (which metrics, which goldens).
* Performance on the standard benchmark set.
* Known limitations.
* Last-evaluated date.

Target: Q4 2026.

## Bias / toxicity guardrails (planned — Gate 18)

Today, bias and toxicity metrics are *measured*. Gate 18 adds *guardrails* — production traffic can be configured to reject responses scoring above a threshold, with the rejection logged in `admin_actions`.

## What Edge does **not** do

* Edge does **not** make decisions about individuals. The deployer's AI system does. Edge measures that system.
* Edge does **not** certify any third-party AI system as Annex III compliant. The deployer's compliance team makes that determination using Edge evidence as one input.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.edge.nyami.fr/compliance/eu-ai-act.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.