Skip to content

Stardag

đźš§ Work in progress đźš§

This documentation is still taking shape. Questions, feedback, and suggestions are very welcome — feel free to email us or open an issue on GitHub if anything is unclear or missing.

Declarative and composable DAGs

Stardag provides a clean Python API for representing persistently stored assets, the code that produces them, and their dependencies as a declarative Directed Acyclic Graph (DAG). It is a spiritual—but highly modernized—descendant of Luigi, designed for iterative data and ML workflows.

It emphasizes ease of use, composability, and compatibility with existing data workflow frameworks, rather than locking you into a closed ecosystem.

Stardag is built on top of, and integrates seamlessly with, Pydantic. It uses expressive type annotations to reduce boilerplate and make task I/O contracts explicit. This enables composable tasks and pipelines, while still maintaining a fully declarative specification of every produced asset.

See the Core Concepts section for a deeper dive into the architecture, and Design Philosophy for the guiding principles behind the project.


Why Use Stardag?

Stardag’s primary objective is to boost productivity in Data Science, Machine Learning, and AI workflows, where the line between production and development or experimentation is often blurry.

It provides lightweight tools to structure data processing, make dependencies explicit, and maintain a clear overview of how assets are produced. Crucially, it brings many of the benefits of Data-as-Code (DaC) to everyday workflows: managing complexity, improving reproducibility, and reducing boilerplate, without sacrificing flexibility or developer ergonomics.

Quick Example

import stardag as sd

@sd.task
def get_range(limit: int) -> list[int]:
    return list(range(limit))

@sd.task
def get_sum(integers: sd.Depends[list[int]]) -> int:
    return sum(integers)

# Declarative DAG specification - no computation yet
sum_task = get_sum(integers=get_range(limit=4))

# Materialize all tasks' targets
sd.build(sum_task)

# Load the result
assert sum_task.output().load() == 6
# inspect intermediate results
assert sum_task.integers.output().load() == [0, 1, 2, 3]

The Stardag Offering

Stardag consists of three components:

Component Description
SDK Python library for defining and building DAGs
CLI Command-line tools for authentication and configuration
Platform Optional API service and Web UI for monitoring and collaboration

What's Next?

Getting Started

Installation Quick Start Your First DAG

Go Deeper

Core Concepts How-To Guides Configuration Platform