Tasks¶
Tasks are the fundamental building blocks of Stardag. A task represents a unit of work that produces an output.
What is a Task?¶
A task is:
- A specification of what to compute
- A Pydantic model with typed parameters
- Serializable to JSON for storage and transfer
- Hashable to produce a deterministic ID
The Task Contract and Core Interface¶
Below is a minimal example of task:
import stardag as sd
# Some external persistent state (typically *not* in memory as here)
world_state = {}
class MyTask(sd.BaseTask):
# Declare any parameters
parameter: str
def run(self):
# do some work
result = len(self.parameter)
# persist the result
world_state[self.parameter] = result
def complete(self):
# let the outside world know if this task is complete
return self.parameter in world_state
Even if contrived, it emphasizes the fundamental contract of a stardag task; At the very least, any task must implement the methods complete and run, and:
completeshould returnTrueonly if the task's desired world state is achievedrunshould only execute successfully once this state is achieved
To define how tasks depend on other tasks, each task must also implement the method:
for which BaseTask default implementation simply returns None (no dependencies). When a task does return one or more tasks, it can - and should - make the assumption that:
- all tasks returned from
self.requires()are complete whenself.run()is executed.
To some extent, that's it.
This allows us to implement build logic that traverses the Directed Acyclic Graph (DAG) of tasks and executes run in the correct order until the final desired tasks are complete.
# instantiate an instance
my_task = MyTask(parameter="hello")
# build (or "materialize") the task and upstream
sd.build(my_task)
assert world_state == {"hello": 5}
In the following section we will cover the fact that most tasks use Targets, and in particular FileSystemTargets, to persistently store their output and for downstream tasks to retrieve it as input.