DB Models

All of the information necessary to store and query files and run information is represented in a postgres database, with a Django ORM API for querying for file information, and creating runs

File - Represents a file existing either on either a local or remote file system

FileMetadata - Represents arbitrary information associated with a File object. Stored in unstructured JSON format

Port - Represents an input to or output from a pipeline. Used to link File objects to the inputs and outputs of Pipeline Runs.

Pipeline - Stores the information related to a pipeline, the code for which is hosted in GitHub

Run - Represents a run of a pipeline.

Operator - Represents the logic / code that needs to be executed in order to build the inputs to a pipeline. This DB is linked to a python Class of type Operator, and this class implements the method get_jobs() which will return a list of serialized job information objects.

Schema Diagram

Last updated