Skip to content

Targets

Targets represent where and how task outputs are stored.

The Typical Task uses a Target

In most scenarios, downstream tasks need to load the output from upstream dependencies. For this purpose the class TargetTask, which inherits BaseTask, introduces the concept of Targets.

Its extension of BaseTask can be summarized in seven lines of code:

class TargetTask(BaseTask, Generic[TargetType]):
    def complete(self) -> bool:
        return self.target().exists()

    @abstractmethod
    def target(self) -> TargetType:
        ...

That is, it adds a default implementation of complete which checks if the return value of the new (abstract) method target exists. Generally, the only requirement on the TargetType returned from target is that it can report its existence. Strictly speaking, that it implements the protocol:

class Target(Protocol):
    def exists(self) -> bool:
        """Check if the target exists."""
        ...

Note that TargetTask is implemented with TargetType as a generic type variable. This means that when you subclass TargetTask, you need to declare the type of target the target() method returns. This is critical for composability of tasks and allows type checkers to verify that chained tasks are compatible in terms of their I/O.

The Typical Target uses a File System

Moreover, the most commonly used Target persists, retrieves and checks existence of one or many files/objects in a file system. To this end Stardag implements the FileSystemTarget hierarchy: FileSystemTarget is the minimal base protocol (providing uri and exists()), with FileTarget for file-oriented targets and DirectoryTarget for directory-oriented targets.

You can, and it is in some cases motivated to, return a target for a certain type of file system with an absolute path/URI

import stardag as sd

class MyTask(sd.TargetTask[sd.LocalFileTarget]):

    def run(self):
        with self.target().open("w") as handle:
            handle.write("result")

    def target(self) -> sd.LocalFileTarget:
        return sd.LocalFileTarget("/absolute/path/to/file.txt")

However, you are strongly encouraged to instead use the function sd.get_file_target:

import stardag as sd

class MyTask(sd.TargetTask[sd.FileTarget]):

    def run(self):
        with self.target().open("w") as handle:
            handle.write("result")

    def target(self) -> sd.FileTarget:
        return sd.get_file_target("path/to/file.txt")

The main benefit here is that you can configure the file system and root directory/URI-prefix centrally and decoupled from your tasks. This means for example that you can trivially jump between experimenting fully locally and running pipelines in production (or staging etc.).

Target Roots

Target roots define the base location for FileSystemTargets obtained by sd.get_file_target() or when using sd.Task or the @sd.task decorator API.

As per the last example above, when implementing target(), you are advised to use sd.get_file_target which takes the arguments:

  • relpath: str (required)
  • target_root_key: str (default value "default")

The core idea is that you define a set of named (the target_root_key) base paths or URIs, to which the target of any task specifies the relative path. E.g

Configuration

Via environment variables:

# The "default" root
export STARDAG_TARGET_ROOTS__DEFAULT="s3://some-bucket/prefix/"

# Additional root named "ingestion"
export STARDAG_TARGET_ROOTS__INGESTION="s3://some-other-bucket/prefix/"

Or equivalently via a single JSON-encoded environment variable:

export STARDAG_TARGET_ROOTS=\
    '{"default": "s3://some-bucket/prefix/", "ingestion": "s3://some-other-bucket/prefix/"}'

You can also instantiate/implement your own TargetFactory and set it to sd.target_factory_provider.set(my_target_factory).


🚧 Work in progress 🚧

This documentation is still taking shape. It should soon cover:

  • Serialization
  • How target roots are configured via the STARDAG_PROFILE/CLI/configuration
  • How target roots are connected to Environments and synced via the stardag Registry.
  • Common patterns and best practices.