Skip to main content

Otoroshi — FAQ

A practical Q&A for teams evaluating Otoroshi as their API gateway and API management platform. Questions are grouped by theme so you can jump straight to what matters to you.

TL;DR — Otoroshi is an Apache-2.0 API gateway and API Management platform built by MAIF since 2017. It runs in production at MAIF, Clever Cloud and many other organizations, and powers the commercial Cloud APIM offer (SaaS or on-prem with Clever Cloud) which adds hosted operations, premium support and enterprise extensions on top of the open-source core.


API lifecycle and versioning

Does Otoroshi manage the full lifecycle of an API?

Yes. The API entity models the lifecycle as a state machine:

staging ──► published ──► deprecated ──► removed
  • staging — design phase, only draft/test routes are served (a dedicated header exposes draft endpoints safely).
  • published — live, consumer plans accept new subscriptions.
  • deprecated — traffic still flows, no new subscriptions.
  • removed — switched off.

An API groups its routes, backends, plugin flows, plans, subscriptions, deployments and developer-portal pages under a single entity with a shared domain and context path. At runtime it compiles down to standard routes, so there is no extra overhead.

How does API versioning work?

Each API carries an explicit version field and keeps a history of previous versions. A single API can expose several versions concurrently, behind different context paths or headers. The developer portal can surface multiple live versions to consumers, and API deployments keep a snapshot of what was published and when.

Can I update APIs without downtime?

Yes. All routes, APIs, API keys, certificates, plugins and most runtime settings are dynamic: changes propagate across the cluster within seconds, with no restart and no request loss — including for on-the-fly certificate generation.


Environments and CI/CD

How do I manage staging / pre-prod / prod?

Otoroshi is infrastructure-as-code friendly. Three tools cover every workflow:

  • Admin REST API — every object is a JSON resource. Anything the UI can do, the API can do.
  • otoroshictl CLI — dump, diff, apply and sync declarative YAML/JSON files. Designed for GitOps and CI/CD.
  • Kubernetes CRDs — declare routes, APIs, certificates, API keys, auth modules, etc. as Kubernetes manifests, reconciled by the Otoroshi operator / Ingress controller.

Typical promotion patterns:

  • Git-driven — config in Git, CI applies via otoroshictl apply or kubectl apply.
  • Config management — Puppet / Salt / Ansible call the admin API or otoroshictl.
  • Export/import — full datastore export/import for cold copy or bootstrapping.

Is there a CI/CD path for APIs?

Yes. The whole gateway is API-first and declarative, so the standard pattern is Git PR → CI (otoroshictl diff) → automated apply to the target environment. The same pipeline handles plugins, certificates, auth modules, JWT verifiers and data exporters.


Roles, RBAC and admin access

What roles can I model?

Otoroshi has a fine-grained RBAC model on two levels:

  • Organizations (tenants) — the broad isolation unit.
  • Teams — groupings inside an organization.

Each entity belongs to exactly one organization and one or more teams. Admin users and API keys carry rights of the form:

[{ "tenant": "orga-prod:rw", "teams": ["team-apis:rw", "team-platform:r"] }]

Access modes are r (read), w (write), not (forbidden), with wildcards (*). This covers:

  • a global super-admin,
  • a team lead with full control on their perimeter,
  • a security officer with read-only on everything,
  • an API owner scoped to one API or team,
  • a support / analytics role with read-only on events and stats.

How does SSO work for the admin UI?

Admin authentication is delegated to an authentication module: OAuth 2.0 / 2.1 with PKCE, OpenID Connect, LDAP (with nested groups), SAML v2, Basic, WebAuthn / FIDO2 passwordless, Auth0 passwordless, JWT, or a WASM-based custom module. Modules can be chained. Rights can be computed at login time from IdP groups/claims.

Can I restrict an admin to analytics only?

Yes. Assign read-only rights scoped to the analytics perimeter, or expose only the events / metrics part of the API via a scoped API key.


Multi-tenancy and governance

Can a single Otoroshi host several isolated tenants?

Yes. The Organizations / Teams model is built for that: one cluster, strictly isolated tenants, each with its own admins, routes, APIs, keys and certificates. Governance can be sliced by environment, business domain, subsidiary or exposure type (internal / intra-group / external).

Isolation is enforced at read and write level on every admin API call and UI page. Entities without explicit team assignment fall back to a default team, which keeps small deployments simple.

Can one control plane drive multiple data planes?

Yes:

  • Leader / worker clustering — one control-plane cluster (leaders + datastore) and any number of data-plane workers across zones, regions or datacenters.
  • Relay routing — tag routes by network location (provider, region, zone, datacenter, rack) and let leaders forward traffic to the right zone without exposing backends publicly.
  • Tunnels — TCP/UDP/WebSocket tunnels to reach private backends without opening inbound ports.

Can gateways run as independent micro-gateways?

Yes, three patterns are supported:

  • Worker-only — a gateway that only routes traffic, backed by the in-memory fast datastore, synced from the leader cluster.
  • Backup mode — workers keep serving traffic even if the leader cluster is unreachable, using the last known-good snapshot.
  • Sidecar injection — Kubernetes sidecar deployment for service-mesh use cases.

Resilience and operations

What happens if the datastore or leader is unreachable?

Workers keep serving traffic. They cache configuration and use an on-disk backup of the last valid state. A datastore or leader outage does not take your APIs down — only administrative writes are suspended until the control plane recovers.

How do rolling restarts work?

The gateway holds no state in the JVM beyond cache, so rolling restarts are trivial: drain, stop, restart. Clustering handles the rest. A full restart with cache purge plus datastore restore follows the same pattern: restore the Redis / Postgres / Cassandra backup, start leaders first, workers resync automatically.

/live, /ready, /startup and /health endpoints are exposed for Kubernetes probes.

Is there IaC for the gateway itself?

Yes. Official Helm chart, Kustomize overlays, and CRDs. Reference deployments are provided for AWS, Azure, GCP, Kubernetes, Docker, Clever Cloud and bare-metal. otoroshictl can fully bootstrap an empty instance from a Git repository.

What does an upgrade look like?

A standard rolling update:

  1. Back up the datastore (Redis / PostgreSQL / Cassandra).
  2. Upgrade the leader cluster, one node at a time.
  3. Upgrade the worker cluster node by node.

Downgrades follow the same pattern. Schema changes between minor versions are handled automatically; breaking changes are called out in release notes.

Is there an SLA?

The OSS distribution carries no contractual SLA — it is free software. SLAs (e.g. 99.5 % availability, 1 h response time, 4 h recovery time) are part of a Cloud APIM support subscription, where Cloud APIM engineers provide incident response and security patches with guaranteed reaction and repair times.


Consumers, API keys and subscriptions

How do API keys work?

The API key entity models machine-to-machine and developer credentials:

  • clientId + clientSecret (configurable lengths), with multiple extraction methods: custom headers, HTTP Basic, Bearer, JWT-wrapped, or client-id-only.
  • Quotas — per-second throttling, daily and monthly caps, enforced cluster-wide.
  • Rate limiting — distributed, per-IP, per-key, per-route, or custom.
  • Restrictions — allow/deny rules on HTTP methods and paths.
  • Expiration (validUntil) and read-only mode.
  • Automatic secret rotation with a grace period during which both secrets are accepted (zero-downtime renewal).
  • Revocation — flip enabled=false or delete; effect is immediate.

How does the subscription flow work?

Two levels:

  • Direct — an operator creates the API key from the admin UI or API and hands it to the consumer.
  • Self-service via Daikoku — MAIF's companion developer portal: API marketplace, search, documentation, plans with pricing and contracts, subscription requests, approval workflows, team billing and consumer analytics. Bidirectional API key sync with Otoroshi.

Otoroshi also ships its own developer-portal pages bound to the API entity (logo, banners, docs, nav) for lighter use cases that do not require Daikoku.

Can consumers see their own stats?

Yes. Per-API-key analytics are tracked and exposed:

  • In the admin UI, filtered to their keys.
  • Via the analytics API (restricted by access rights).
  • Via Daikoku, with a per-plan and per-key dashboard over the configured retention window (typically 1 month in hot storage, up to 1 year or more with an external Elasticsearch / Kafka pipeline).

Access can be gated by any auth module (OIDC, LDAP, SAML, etc.) so authorization stays in your existing IdP — or in an OPA-style policy engine via the built-in Open Policy Agent WASM plugin.


Authentication and authorization

How do I authenticate machines?

  • API keys (headers, Basic, Bearer, JWT-wrapped, client-id-only).
  • mTLS client certificates, with per-route trust store customization.
  • JWT validation — multiple verifiers per route, JWE/encrypted JWT, re-signing and transformation.
  • HMAC request signing / validation.
  • Biscuit tokens (datalog-based fine-grained authorization).
  • OpenFGA fine-grained authorization.
  • IP allow / deny lists with CIDR, country (MaxMind / IPStack), or time-restricted access.

How do I authenticate end users?

  • OAuth 2.0 / 2.1 with PKCE.
  • OpenID Connect.
  • LDAP (with nested groups).
  • SAML v2.
  • OAuth 1.0a.
  • Basic.
  • WebAuthn / FIDO2 passwordless.
  • Auth0 passwordless.
  • Internal user store.
  • WASM-based custom authentication.
  • Modules can be chained (e.g. LDAP + MFA).

What level of granularity do I get?

Every security plugin (API key, JWT, mTLS, OAuth, RBAC, Biscuit, OpenFGA, Coraza WAF, etc.) is part of the plugin chain and can be applied:

  • per gateway (global plugins),
  • per API,
  • per route,
  • per HTTP method,
  • per path or sub-path,
  • per header / query-param / cookie value.

Public vs. private path separation is first-class on every route.

How do I validate inputs without writing code?

Three ready-to-use layers:

  • Query-string / header / cookie — built-in validators (exact, regex, wildcard, required, JSON-Path).
  • Body — JSON-Path context validator, JQ body-filter plugin, Coraza WAF parsing bodies against OWASP CRS.
  • WASM validators — drop in a JSON Schema, protobuf or custom-format validator compiled to WebAssembly.

What policies and security headers ship out of the box?

  • Per-route CORS policy.
  • Security headers: HSTS, CSP, X-Frame-Options, X-XSS-Protection, X-Content-Type-Options, Referrer-Policy, Permissions-Policy.
  • WAF — Coraza with OWASP Core Rule Set.
  • Payload size limits, both upload and download.
  • Bandwidth throttling on request and response.
  • URL allow / deny lists, URL rewriting, path stripping, redirections.

Routing, load balancing and traffic management

What load-balancing strategies are supported?

  • Round Robin
  • Random
  • Sticky (cookie-based session affinity)
  • IP Address Hash
  • Best Response Time
  • Weighted Best Response Time
  • Least Connections
  • Power of Two Random Choices
  • Header Hash
  • Cookie Hash
  • Query Hash
  • Backend failover targets

What can I match a route on?

  • HTTP method.
  • Hostname (exact, wildcard).
  • Path (exact, prefix, wildcard, regex with parameter validation).
  • Header values (exact, regex, wildcard).
  • Query-param values.
  • Cookie values.
  • Target predicates: geolocation, cloud region, datacenter, rack, zone.

What about circuit breakers, retries and caching?

Built-in plugins cover circuit breakers with configurable thresholds and half-open states, retries with exponential backoff, response caching, gzip / brotli compression, request / response mirroring, traffic capture (GoReplay format), canary deployments (percentage-based and time-controlled) and chaos engineering (Snow Monkey — latency and failure injection).

What service discovery integrations exist?

DNS, Eureka (internal & external), Kubernetes API (namespace scanning), Otoroshi self-registration protocol, Tailscale.


Protocols

Which protocols are supported?

ProtocolIngressEgress
HTTP/1.1yesyes
HTTP/2 (incl. H2C)yesyes
HTTP/3 (QUIC)yesyes
WebSocket (with message validation, transformation, mirroring)yesyes
SSE (Server-Sent Events)yesyes
gRPC & gRPC-Webyes (via Netty listener)yes
GraphQL (proxy, query composition, schema-first backend)yesyes
RESTyesyes
SOAP (SOAP client, JSON↔XML, WS-* via plugin chain)yesyes
mTLS (end-to-end)yesyes
TCP (raw, with SNI routing)yesyes

What about event-driven protocols (Kafka, MQTT, etc.)?

Cloud APIM ships an event-native extension that translates event-bus protocols (Kafka, Pulsar, RabbitMQ, MQTT, NATS) to HTTP and back. You can expose Kafka topics as HTTP endpoints, send requests to RabbitMQ queues, or bridge MQTT to your REST catalog — with the same plugin chain (auth, quotas, transformation, observability) as for HTTP traffic.


Performance

What overhead does Otoroshi add?

On a realistic production-grade plugin chain (TLS + API key + JWT + rate limit + access log), measured overhead is typically below 10 ms p95 on reference hardware — well under the 20 ms threshold often required in RFPs.

Notable points:

  • Core proxy on Akka HTTP, with an optional Reactor Netty server (also used for HTTP/3).
  • O(log n) prefix-tree router, comfortable with tens of thousands of concurrent routes.
  • Data plane keeps configuration in memory, synced from the control plane.
  • Plugins that dominate added latency: Coraza WAF (body inspection), heavy WASM, remote JWKS fetches on first hit, external remote-call plugins. All are individually measurable in the built-in analytics.

Official benchmarks and profiling guidance are part of the Cloud APIM support offer.


Observability

What health, metrics and logs endpoints are exposed?

  • /health, /live, /ready, /startup (protected by access key or API key).
  • /metrics in JSON or Prometheus format.
  • OpenTelemetry (OTLP) export of server metrics and logs, with gzip, gRPC and mTLS support.
  • Native exporters for Datadog, Prometheus, StatsD and New Relic.

How are events and alerts handled?

Every request, admin action and security event produces a structured JSON event. 25+ built-in exporters can ship events in real time to:

Elasticsearch, Apache Kafka, Apache Pulsar, PostgreSQL, webhooks, files, S3, Mailer (Mailgun / Mailjet / Sendgrid / SMTP), console / logger, Splunk, Datadog, New Relic, GoReplay (file & S3), TCP / UDP / Syslog, JMS, WASM-based custom exporter, workflow-based custom exporter.

Alerts are events filtered/projected by an exporter, so alert routing (PagerDuty, OpsGenie, Slack…) lives in your existing alerting stack.

Does Otoroshi support distributed tracing?

Yes. W3C Trace Context propagation across calls makes Otoroshi transparent to your existing tracer (Jaeger, Zipkin, Datadog APM, Tempo, etc.).

Can I redact sensitive fields from logs?

Yes. The event pipeline supports projection and transformation before export, so passwords, tokens or credit-card numbers in headers, query parameters or bodies can be stripped, hashed or masked before logs leave the gateway. Fine-grained header/body logging can be enabled per route for debugging without leaking to central logs.

What about security supervision?

Built-in detectors for Log4Shell and React2Shell, Fail2Ban-style auto-banning, anomaly events on auth failures, rate-limit hits and quota breaches, certificate expiration alerts, 90-day route state history, security.txt (RFC 9116) and robots.txt support.


Certificates, TLS and PKI

What does the built-in PKI cover?

  • Internal PKI — generate CAs, sub-CAs and leaf certificates on the fly, sign external CSRs, OCSP responder and AIA endpoint, JWKS exposure for public keys, P12 import.
  • ACME / Let's Encrypt automated issuance and renewal.
  • On-the-fly certificate generation from a CA, with no request loss.
  • Dynamic TLS termination, per-listener and per-hostname.
  • End-to-end mTLS on both legs (front and backend), with per-route trusted-CA customization.
  • Bidirectional sync between Kubernetes secrets and Otoroshi certificates.
  • Tailscale certificates integration.

Developer portal and marketplace

What does Otoroshi ship for developer experience?

  • A built-in developer portal bound to the API entity — pages, navigation, logos, banners, plans, subscriptions, auto-generated credentials.
  • An integration with Daikoku (MAIF, Apache 2.0) for the full marketplace experience: catalogs, categories, search, SLA & pricing (metered or flat), self-service subscription workflows, email notifications, team billing, consumer analytics, internal / intra-group / external audiences, version surfacing, sandbox APIs, community pages, full contractual flow (subscription → approval → delivery → usage → renewal).

Both portals expose OpenAPI / AsyncAPI documentation uploaded with the API definition.


Deployment

Where can I deploy Otoroshi?

  • Kubernetes — Helm chart, Kustomize overlays, Ingress controller, CRDs, admission webhook, sidecar injection.
  • AWS — ECS, EKS, EC2 with reference templates.
  • Azure — AKS, App Service.
  • GCP — GKE, Cloud Run.
  • Clever Cloud — first-class PaaS deployment.
  • Docker / bare-metal / on-premises — official images and systemd-ready packaging.
  • SaaSCloud APIM offers a managed Otoroshi with the same features plus premium extensions.

Which storage backends are supported?

Redis, PostgreSQL, Cassandra, in-memory (with file persistence), S3, HTTP.


Extensibility

How do I extend Otoroshi?

  • 200+ built-in plugins — auth, transformation, security, caching, compression, rate limiting, circuit breaking, transformation (jq, regex, XML/JSON, SOAP), GraphQL composition, HTML patching, redirection, mocks, maintenance/build modes, etc.
  • WASM plugins — write middleware in Rust, Go, AssemblyScript, JS, TS, Python or C, and run it sandboxed (Extism-based).
  • Scala plugins — native JVM plugins packaged as jars for maximum performance.
  • Workflows — visual flow engine with a step-by-step debugger for no-code processing pipelines.
  • Admin extensions — add your own entities, admin API endpoints and admin-UI pages alongside built-in ones.
  • Open Policy Agent — evaluate OPA policies inside the gateway via WASM.

Security and support

Has Otoroshi been audited?

It has been in production at MAIF (a top-10 French insurance company) since 2017, with internal security reviews, pen-tests and public, transparent CVE responses on GitHub. Formal third-party audits and bug-bounty programs are part of the Cloud APIM commercial offer.

How are vulnerabilities handled?

  • Disclosures go through GitHub security advisories (private disclosure, coordinated fix, public CVE).
  • Cloud APIM subscribers get proactive notifications, pre-release patches when possible, and severity-based remediation windows:
    • Critical — patch within 24–72 h.
    • High — patch within 1 week.
    • Medium — patch in the next monthly release.
    • Low — patch in the next minor release.
  • Anonymous usage reporting is opt-in.

What is the licensing model?

Apache 2.0. MAIF is committed to keeping Otoroshi open-source. There is no dual-license or BSL trap: contributions are accepted under Apache 2.0 CLA, development happens in the open on GitHub. Cloud APIM adds commercial extensions and support on top of the unmodified open-source core.

What does commercial support cover?

Cloud APIM contracts include tiered SLAs on availability (99.5 % and up), response time (GTI, down to 1 h), recovery time (GTR, down to 4 h), and engineering access for roadmap influence and architecture reviews. Support covers the OSS distribution even when self-hosted.

What is the release cadence?

  • Minor releases every 4 to 8 weeks (features, fixes, security patches).
  • On-demand patch releases for critical issues.
  • LTS tracks maintained 12–24 months are part of the Cloud APIM offer — production teams that do not want to track master can stay on an LTS branch and apply backported patches only.

Who uses Otoroshi?

Public adopters include MAIF, SNCF Connect & Tech, French public-sector entities, and many more. Community channels: GitHub, Discord. Reference calls with existing users can be arranged by Cloud APIM on request.


Costs and operations

What is the cheapest path to production?

Self-host the OSS distribution on a small Kubernetes cluster (or 3 VMs) with PostgreSQL, one leader and two workers. This is a valid production setup for low-to-medium traffic, costing only the infrastructure you provide.

When do I actually need Cloud APIM?

  • You need a contractual SLA and 24/7 incident response.
  • You want premium extensions: event gateway (Kafka, Pulsar, RabbitMQ, MQTT → HTTP), AI gateway, etc.
  • You want Clever Cloud engineers to run the platform for you (SaaS).
  • You want long-term LTS branches with guaranteed backport windows.

What skills do I need to operate Otoroshi?

Standard SRE / platform-engineering skills: Kubernetes (or VM) operations, PostgreSQL or Redis administration, Prometheus / Grafana / OTEL monitoring, basic JVM tuning. No Scala knowledge required — UI, API, CLI and CRDs cover everything. Custom plugins require either WASM (any language) or Scala (native plugins).


Quick reference by profile

ProfilePrimary tools
Developer (API producer)API entity UI, otoroshictl, K8s CRDs, admin API, dev portal
Developer (API consumer)Daikoku / developer portal, consumer analytics, sandbox APIs
API product ownerDaikoku / developer portal, subscription workflows, stats
Platform / ops adminAdmin UI, admin API, OTEL / Prometheus / Grafana, cluster view
Security officerRead-only admin, Coraza WAF, audit events, advisories feed
Finance / billingDaikoku billing, per-plan metering