ADR-007: Push vs Pull Ingestion Model
Status: Proposed Date: 2026-03-22
Context
Portitor integrates with external systems that expose fundamentally different connectivity models:
Push sources call Portitor. The external system sends an HTTP webhook when an event occurs (order placed, order packed, shipment dispatched). Portitor is the server; the integration partner is the client.
Pull sources must be polled. The external system exposes no webhook capability — it may offer a REST API to query for new records, an FTP/SFTP drop folder, or a flat file on a schedule. Portitor must initiate the connection.
This distinction is not theoretical. Production experience with Nordic e-commerce integrations includes:
- Bring (FTP): No API. Order files dropped to an FTP server. Portitor polled on an interval, scanning for new or changed files since the last cursor position.
- Prime Cargo (webhook): HTTP callbacks authenticated via JWT, firing on
OrderPackedevents. - PostNord, DHL (REST API polling): Tracking updates retrieved by querying their API on a schedule.
Treating these as the same plugin shape would force unnatural abstractions on both sides.
Decision
Push and pull ingestion are distinct plugin shapes with separate interfaces.
Push Source Plugin
Implements an HTTP handler. The platform registers it as a webhook endpoint. Authentication is delegated to the AuthPlugin (see ADR-006).
PushSourcePlugin
path() → string // e.g. "/webhooks/prime-cargo"
handle(request, identity) → EventListPull Source Plugin
Implements a polling loop. The platform schedules it. Credentials for the remote system are held in plugin configuration — not in JWT tokens.
PullSourcePlugin
schedule() → CronExpression // e.g. "*/5 * * * *"
fetch(cursor: S3Cursor) → EventList
cursorKey() → string // S3 key where cursor is persistedThe S3Cursor is a pointer to the last successfully processed position — a timestamp, a filename, a sequence number, or any opaque string the plugin defines. The platform persists and restores the cursor between executions, giving the plugin exactly-once-like processing without a database.
Consequences
Positive:
- FTP, SFTP, REST polling, and file-based integrations are first-class citizens
- Push plugins have no polling overhead — they fire on demand
- The S3Cursor pattern replaces ad-hoc "date file" state tracking with a durable, replayable position pointer stored in the event store itself
- Auth concerns are cleanly separated: push delegates to
AuthPlugin, pull uses outbound credentials in config
Negative:
- Two plugin interfaces instead of one — plugin authors must choose the right shape for their integration partner
- Pull plugins introduce scheduler infrastructure as a platform dependency
Cursor Persistence
The platform stores the cursor at:
{company}/cursors/{plugin-id}/cursor.jsonOn each pull execution:
- Platform reads the cursor from S3 (or initialises to epoch if absent)
- Plugin receives cursor, fetches events since that position
- Platform writes each event payload to S3 and enqueues an order pointer for the pipeline (ADR-012) — identical to the push ingest path
- Platform updates the cursor in S3 after all events in the batch are written
If step 3 or 4 fails, the cursor is not advanced — the next execution reprocesses from the same position. Plugins must be idempotent.
The cursor is updated once per batch, not per event. A mid-batch failure leaves all fetched orders in the pipeline and the cursor at its previous position; the next execution will re-fetch the same batch. Idempotency in step 3 ensures duplicate writes are safe.
Alternatives Considered
- Single plugin interface with a
modeflag: Rejected — forces pull plugins to implement a no-op HTTP handler and push plugins to implement a no-op scheduler, obscuring intent - Polling as a platform concern (plugin just provides a URL to poll): Rejected — the polling logic and cursor semantics are too integration-specific to abstract at the platform level (FTP scanning differs fundamentally from REST pagination)