step_pipeline package¶
Submodules¶
step_pipeline.batch module¶
This module contains Hail Batch-specific extensions of the Pipeline and Step classes
- class step_pipeline.batch.BatchStepType(value)¶
Bases:
enum.Enum
Constants that represent different Batch Step types.
- PYTHON = 'python'¶
- BASH = 'bash'¶
- class step_pipeline.batch.BatchPipeline(name=None, config_arg_parser=None, backend=Backend.HAIL_BATCH_SERVICE)¶
Bases:
step_pipeline.pipeline.Pipeline
This class contains Hail Batch-specific extensions of the Pipeline class
- property backend¶
Returns either Backend.HAIL_BATCH_SERVICE or Backend.HAIL_BATCH_LOCAL
- new_step(name=None, step_number=None, arg_suffix=None, depends_on=None, image=None, cpu=None, memory=None, storage=None, always_run=False, timeout=None, output_dir=None, reuse_job_from_previous_step=None, localize_by=Localize.COPY, delocalize_by=Delocalize.COPY)¶
Creates a new pipeline Step.
- Parameters
name (str) – A short name for this Step.
step_number (int) – Optional Step number which serves as another alias for this step in addition to name.
arg_suffix (str) – Optional suffix for the command-line args that will be created for forcing or skipping execution of this Step.
depends_on (Step) – Optional upstream Step that this Step depends on.
image (str) – Docker image to use for this Step.
cpu (str, float, int) – CPU requirements. Units are in cpu if cores is numeric.
memory (str, float int) – Memory requirements. The memory expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes. For the ServiceBackend, the values ‘lowmem’, ‘standard’, and ‘highmem’ are also valid arguments. ‘lowmem’ corresponds to approximately 1 Gi/core, ‘standard’ corresponds to approximately 4 Gi/core, and ‘highmem’ corresponds to approximately 7 Gi/core. The default value is ‘standard’.
storage (str, int) – Disk size. The storage expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes. For the ServiceBackend, jobs requesting one or more cores receive 5 GiB of storage for the root file system /. Jobs requesting a fraction of a core receive the same fraction of 5 GiB of storage. If you need additional storage, you can explicitly request more storage using this method and the extra storage space will be mounted at /io. Batch automatically writes all ResourceFile to /io. The default storage size is 0 Gi. The minimum storage size is 0 Gi and the maximum storage size is 64 Ti. If storage is set to a value between 0 Gi and 10 Gi, the storage request is rounded up to 10 Gi. All values are rounded up to the nearest Gi.
always_run (bool) – Set the Step to always run, even if dependencies fail.
timeout (float, int) – Set the maximum amount of time this job can run for before being killed.
output_dir (str) – Optional default output directory for Step outputs.
reuse_job_from_previous_step (Step) – Optionally, reuse the batch.Job object from this other upstream Step.
localize_by (Localize) – If specified, this will be the default Localize approach used by Step inputs.
delocalize_by (Delocalize) – If specified, this will be the default Delocalize approach used by Step outputs.
- Returns
The new BatchStep object.
- Return type
- gcloud_project(gcloud_project)¶
Set the requester-pays project.
- Parameters
gcloud_project (str) – The name of the Google Cloud project to be billed when accessing requester-pays buckets.
- cancel_after_n_failures(cancel_after_n_failures)¶
Set the cancel_after_n_failures value.
- Parameters
cancel_after_n_failures – (int): Automatically cancel the batch after N failures have occurred.
- default_image(default_image)¶
Set the default Docker image to use for Steps in this pipeline.
- Parameters
default_image (str) – Default docker image to use for Bash jobs. This must be the full name
desired (of the image including any repository prefix and tags if) –
- default_python_image(default_python_image)¶
Set the default image for Python Jobs.
- Parameters
default_python_image (str) – The Docker image to use for Python jobs. The image specified must have the dill package installed. If default_python_image is not specified, then a Docker image will automatically be created for you with the base image hailgenetics/python-dill:[major_version].[minor_version]-slim and the Python packages specified by python_requirements will be installed. The default name of the image is batch-python with a random string for the tag unless python_build_image_name is specified. If the ServiceBackend is the backend, the locally built image will be pushed to the repository specified by image_repository.
- default_memory(default_memory)¶
Set the default memory usage.
- Parameters
default_memory (int, str) – Memory setting to use by default if not specified by a Step. Only applicable if a docker image is specified for the LocalBackend or the ServiceBackend. See Job.memory().
- default_cpu(default_cpu)¶
Set the default cpu requirement.
- Parameters
default_cpu (float, int, str) – CPU setting to use by default if not specified by a job. Only applicable if a docker image is specified for the LocalBackend or the ServiceBackend. See Job.cpu().
- default_storage(default_storage)¶
Set the default storage disk size.
- Parameters
default_storage (str, int) – Storage setting to use by default if not specified by a job. Only applicable for the ServiceBackend. See Job.storage().
- default_timeout(default_timeout)¶
Set the default job timeout duration.
- Parameters
default_timeout – Maximum time in seconds for a job to run before being killed. Only applicable for the ServiceBackend. If None, there is no timeout.
- run()¶
Batch-specific code for submitting the pipeline to the Hail Batch backend
- class step_pipeline.batch.BatchStep(pipeline, name=None, step_number=None, arg_suffix=None, image=None, cpu=None, memory=None, storage=None, always_run=False, timeout=None, output_dir=None, reuse_job_from_previous_step=None, localize_by=Localize.COPY, delocalize_by=Delocalize.COPY)¶
Bases:
step_pipeline.pipeline.Step
This class contains Hail Batch-specific extensions of the Step class
- cpu(cpu)¶
Set the CPU requirement for this Step.
- Parameters
cpu (str, float, int) – CPU requirements. Units are in cpu if cores is numeric.
- memory(memory)¶
Set the memory requirement for this Step.
- Parameters
memory (str, float int) – Memory requirements. The memory expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes. For the ServiceBackend, the values ‘lowmem’, ‘standard’, and ‘highmem’ are also valid arguments. ‘lowmem’ corresponds to approximately 1 Gi/core, ‘standard’ corresponds to approximately 4 Gi/core, and ‘highmem’ corresponds to approximately 7 Gi/core. The default value is ‘standard’.
- storage(storage)¶
Set the disk size for this Step.
- Parameters
storage (str, int) – Disk size. The storage expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes. For the ServiceBackend, jobs requesting one or more cores receive 5 GiB of storage for the root file system /. Jobs requesting a fraction of a core receive the same fraction of 5 GiB of storage. If you need additional storage, you can explicitly request more storage using this method and the extra storage space will be mounted at /io. Batch automatically writes all ResourceFile to /io. The default storage size is 0 Gi. The minimum storage size is 0 Gi and the maximum storage size is 64 Ti. If storage is set to a value between 0 Gi and 10 Gi, the storage request is rounded up to 10 Gi. All values are rounded up to the nearest Gi.
- always_run(always_run)¶
Set the always_run parameter for this Step.
- Parameters
always_run (bool) – Set the Step to always run, even if dependencies fail.
- timeout(timeout)¶
Set the timeout for this Step.
- Parameters
timeout (float, int) – Set the maximum amount of time this job can run for before being killed.
step_pipeline.constants module¶
step_pipeline.io module¶
This module contains classes and methods related to data input & output.
- class step_pipeline.io.Localize(value)¶
Bases:
enum.Enum
Constants that represent different options for how to localize files into the running container. Each 2-tuple consists of a name for the localization approach, and a subdirectory where to put files.
- COPY = ('copy', 'local_copy')¶
COPY uses the execution backend’s default approach to localizing files
- GSUTIL_COPY = ('gsutil_copy', 'local_copy')¶
GSUTIL_COPY runs ‘gsutil cp’ to localize file(s) from a google bucket path. This requires gsutil to be available inside the execution container.
- HAIL_HADOOP_COPY = ('hail_hadoop_copy', 'local_copy')¶
HAIL_HADOOP_COPY uses the Hail hadoop API to copy file(s) from a google bucket path. This requires python3 and Hail to be installed inside the execution container.
- HAIL_BATCH_GCSFUSE = ('hail_batch_gcsfuse', 'gcsfuse')¶
HAIL_BATCH_GCSFUSE use the Hail Batch gcsfuse function to mount a google bucket into the execution container as a network drive, without copying the files. This Hail Batch service account must have read access to the bucket.
- HAIL_BATCH_GCSFUSE_VIA_TEMP_BUCKET = ('hail_batch_gcsfuse_via_temp_bucket', 'gcsfuse')¶
HAIL_BATCH_GCSFUSE_VIA_TEMP_BUCKET is useful for situations where you’d like to use gcsfuse to localize files and your personal gcloud account has read access to the source bucket, but the Hail Batch service account cannot be granted read access to that bucket. Since it’s possible to run ‘gsutil cp’ under your personal credentials within the execution container, but Hail Batch gcsfuse always runs under the Hail Batch service account credentials, this workaround 1) runs ‘gsutil cp’ under your personal credentials to copy the source files to a temporary bucket that you control, and where you have granted read access to the Hail Batch service account 2) uses gcsfuse to mount the temporary bucket 3) performs computational steps on the mounted data 4) deletes the source files from the temporary bucket when the Batch job completes.
This localization approach may be useful for situations where you need a large number of jobs and each job processes a very small piece of a large data file (eg. a few loci in a cram file).
Copying the large file(s) from the source bucket to a temp bucket in the same region is fast and inexpensive, and only needs to happen once before the jobs run. Each job can then avoid allocating a large disk, and waiting for the large file to be copied into the container. This approach requires gsutil to be available inside the execution container.
- get_subdir_name()¶
Returns the subdirectory name passed to the constructor
- class step_pipeline.io.Delocalize(value)¶
Bases:
enum.Enum
Constants that represent different options for how to delocalize file(s) from a running container.
- COPY = 'copy'¶
COPY uses the execution backend’s default approach to delocalizing files
- GSUTIL_COPY = 'gsutil_copy'¶
GSUTIL_COPY runs ‘gsutil cp’ to copy the path to a google bucket destination. This requires gsutil to be available inside the execution container.
- HAIL_HADOOP_COPY = 'hail_hadoop_copy'¶
HAIL_HADOOP_COPY uses the hail hadoop API to copy file(s) to a google bucket path. This requires python3 and hail to be installed inside the execution container.
- class step_pipeline.io.InputType(value)¶
Bases:
enum.Enum
Constants that represent the type of a step.input_value(..) arg.
- STRING = 'string'¶
- FLOAT = 'float'¶
- INT = 'int'¶
- BOOL = 'boolean'¶
- class step_pipeline.io.InputSpecBase(name=None)¶
Bases:
abc.ABC
This is the InputSpec parent class, with subclasses implementing specific types of input specs which contain metadata about inputs to a Pipeline Step.
- property name¶
- property uuid¶
- class step_pipeline.io.InputValueSpec(value=None, name=None, input_type=InputType.STRING)¶
Bases:
step_pipeline.io.InputSpecBase
An InputValueSpec stores metadata about an input that’s not a file path
- property value¶
- property input_type¶
- class step_pipeline.io.InputSpec(source_path=None, name=None, localize_by=None, localization_root_dir=None)¶
Bases:
step_pipeline.io.InputSpecBase
An InputSpec stores metadata about an input file or directory
- property source_path¶
- property source_bucket¶
- property source_path_without_protocol¶
- property source_dir¶
- property filename¶
- property local_path¶
- property local_dir¶
- property localize_by¶
- class step_pipeline.io.OutputSpec(local_path=None, output_dir=None, output_path=None, name=None, delocalize_by=None)¶
Bases:
object
An OutputSpec stores metadata about an output file or directory from a Step
- property output_path¶
- property output_dir¶
- property filename¶
- property name¶
- property local_path¶
- property local_dir¶
- property delocalize_by¶
step_pipeline.main module¶
This module contains the pipeline(..) function which is the main gateway for users to access the functionality in the step_pipeline library
- step_pipeline.main.pipeline(name=None, backend=Backend.HAIL_BATCH_SERVICE, config_file_path='~/.step_pipeline')¶
Creates a pipeline object.
Usage:
with step_pipeline("my pipeline") as sp: s = sp.new_step(..) ... step definitions ... # or alternatively: sp = step_pipeline("my pipeline") s = sp.new_step(..) ... step definitions ... sp.run()
- Parameters
name (str) – Pipeline name.
backend (Backend) – The backend to use for executing the pipeline.
config_file_path (str) – path of a configargparse config file.
- Returns
- An object that you can use to create Steps by calling .new_step(..) and then execute the pipeline by
calling .run()
- Return type
step_pipeline.pipeline module¶
- class step_pipeline.pipeline.Pipeline(name=None, config_arg_parser=None)¶
Bases:
abc.ABC
Pipeline represents the execution pipeline. This base class contains only generalized code that is not specific to any particular execution backend. It has public methods for creating Steps, as well as some private methods that implement the general aspects of traversing the execution graph (DAG) and transferring all steps to a specific execution backend.
- get_config_arg_parser()¶
Returns the configargparse.ArgumentParser object used by the Pipeline to define command-line args. This is a drop-in replacement for argparse.ArgumentParser with some extra features such as support for config files and environment variables. See https://github.com/bw2/ConfigArgParse for more details. You can use this to add and parse your own command-line arguments the same way you would using argparse. For example:
p = pipeline.get_config_arg_parser() p.add_argument(”–my-arg”) args = pipeline.parse_args()
- parse_args()¶
Parse command line args defined up to this point. This method can be called more than once.
- Returns
argparse args object.
- abstract new_step(name, step_number=None)¶
Creates a new pipeline Step. Subclasses must implement this method.
- Parameters
name (str) – A short name for the step.
step_number (int) – Optional step number.
- gcloud_project(gcloud_project)¶
- cancel_after_n_failures(cancel_after_n_failures)¶
- default_image(default_image)¶
- default_python_image(default_python_image)¶
- default_memory(default_memory)¶
- default_cpu(default_cpu)¶
- default_storage(default_storage)¶
- default_timeout(default_timeout)¶
- default_output_dir(default_output_dir)¶
Set the default output_dir for pipeline Steps.
- Parameters
default_output_dir (str) – Output directory
- abstract run()¶
Submits a pipeline to an execution engine such as Hail Batch. Subclasses must implement this method. They should use this method to perform initialization of the specific execution backend and then call self._transfer_all_steps(..).
- check_input_glob(glob_path)¶
This method is useful for checking the existence of multiple input files and caching the results. Input file(s) to this Step using glob syntax (ie. using wildcards as in gs://bucket/**/sample*.cram)
- Parameters
path (str) – local file path or gs:// Google Storage path. The path can contain wildcards (*).
- Returns
List of metadata dicts like:
[ { 'path': 'gs://bucket/dir/file.bam.bai', 'size_bytes': 2784, 'modification_time': 'Wed May 20 12:52:01 EDT 2020', }, ]
- Return type
list
- export_pipeline_graph(output_svg_path=None)¶
Renders the pipeline execution graph diagram based on the Steps defined so far.
- Parameters
output_svg_path (str) – Path where to write the SVG image with the execution graph diagram. If not specified, it will be based on the pipeline name.
- class step_pipeline.pipeline.Step(pipeline, name, step_number=None, arg_suffix=None, output_dir=None, localize_by=None, delocalize_by=None, add_force_command_line_args=True, add_skip_command_line_args=True)¶
Bases:
abc.ABC
Represents a set of commands or sub-steps which together produce some output file(s), and which can be skipped if the output files already exist (and are newer than any input files unless a –force arg is used). A Step’s input and output files must be stored in some persistant location, like a local disk or GCS.
Using Hail Batch as an example, a Step typically corresponds to a single Hail Batch Job. Sometimes a Job can be reused to run multiple steps (for example, where step 1 creates a VCF and step 2 tabixes it).
- name(name)¶
Set the short name for this Step.
- Parameters
name (str) – Name
- command(command)¶
Add a shell command to this Step.
- Parameters
command (str) – A shell command to execute as part of this Step
- input_glob(glob_path, name=None, localize_by=None)¶
Specify input file(s) to this Step using glob syntax (ie. using wildcards as in gs://bucket/**/sample*.cram)
- Parameters
glob_path (str) – The path of the input file(s) or directory to localize, optionally including wildcards.
name (str) – Optional name for this input.
localize_by (Localize) – How this path should be localized.
- Returns
An object that describes the specified input file or directory.
- Return type
- input_value(value=None, name=None, input_type=None)¶
Specify a Step input that is something other than a file path.
- Parameters
value – The input’s value.
name (str) – Optional name for this input.
input_type (InputType) – The value’s type.
- Returns
An object that contains the input value, name, and type.
- Return type
- input(source_path=None, name=None, localize_by=None)¶
Specifies an input file or directory.
- inputs(source_path, *source_paths, name=None, localize_by=None)¶
Specifies one or more input file or directory paths.
- Parameters
source_path (str) – Path of input file or directory to localize.
name (str) – Optional name to apply to all these inputs.
localize_by (Localize) – How these paths should be localized.
- Returns
- A list of InputSpec objects that describe these input files or directories. The list will contain
one entry for each passed-in source path.
- Return type
list
- use_the_same_inputs_as(other_step, localize_by=None)¶
Copy the inputs of another step, while optionally changing the localize_by approach. This is a utility method to make it easier to specify inputs for a Step that is very similar to a previously-defined step.
- Parameters
- Returns
- A list of new InputSpec objects that describe the inputs copied from other_step. The returned list
will contain one entry for each input of other_step.
- Return type
list
- use_previous_step_outputs_as_inputs(previous_step, localize_by=None)¶
Define Step inputs to be the output paths of an upstream Step and explicitly mark this Step as downstream of previous_step by calling self.depends_on(previous_step).
- Parameters
- Returns
A list of new InputSpec objects that describe the inputs defined based on the outputs of previous_step. The returned list will contain one entry for each output of previous_step.
- Return type
list
- output_dir(path)¶
If an output path is specified as a relative path, it will be relative to this dir.
- Parameters
path (str) – Directory path.
- output(local_path, output_path=None, output_dir=None, name=None, delocalize_by=None)¶
Specify a Step output file or directory.
- Parameters
local_path (str) – The file or directory path within the execution container’s file system.
output_path (str) – Optional destination path to which the local_path should be delocalized.
output_dir (str) – Optional destination directory to which the local_path should be delocalized. It is expected that either output_path will be specified, or an output_dir value will be provided as an argument to this method or previously (such as by calling the step.output_dir(..) setter method). If both output_path and output_dir are specified and output_path is a relative path, it is interpretted as being relative to output_dir.
name (str) – Optional name for this output.
delocalize_by (Delocalize) – How this path should be delocalized.
- Returns
An object describing this output.
- Return type
- outputs(local_path, *local_paths, output_dir=None, name=None, delocalize_by=None)¶
Define one or more outputs.
- Parameters
local_path (str) – The file or directory path within the execution container’s file system.
output_dir (str) – Optional destination directory to which the given local_path(s) should be delocalized.
name (str) – Optional name for the output(s).
delocalize_by (Delocalize) – How the path(s) should be delocalized.
- Returns
A list of OutputSpec objects that describe these outputs. The list will contain one entry for each passed-in path.
- Return type
list
- depends_on(upstream_step)¶
Marks this Step as being downstream of another Step in the pipeline, meaning that this Step can only run after the upstream_step has completed successfully.
- Parameters
upstream_step (Step) – The upstream Step this Step depends on.
- has_upstream_steps()¶
Returns True if this Step has upstream Steps that must run before it runs (ie. that it depends on)
- post_to_slack(message, channel=None, slack_token=None)¶
Posts the given message to slack. Requires python3 and pip to be installed in the execution environment.
- Parameters
message (str) – The message to post.
channel (str) – The Slack channel to post to.
slack_token (str) – Slack auth token.
- switch_gcloud_auth_to_user_account(gcloud_credentials_path=None, gcloud_user_account=None, gcloud_project=None, debug=False)¶
This method adds commands to this Step to switch gcloud auth from the Batch-provided service account to the user’s personal account.
This is useful if subsequent commands need to access google buckets that to which the user’s personal account has access but to which the Batch service account cannot be granted access for whatever reason.
For this to work, you must first:
create a google bucket that only you have access to - for example: gs://weisburd-gcloud-secrets/
on your local machine, make sure you’re logged in to gcloud by running:
gcloud auth login
copy your local ~/.config directory (which caches your gcloud auth credentials) to the secrets bucket from step 1:
gsutil -m cp -r ~/.config/ gs://weisburd-gcloud-secrets/
grant your default Batch service-account read access to your secrets bucket so it can download these credentials into each docker container.
make sure gcloud & gsutil are installed inside the docker images you use for your Batch jobs
call this method at the beginning of your batch job:
Example
- step.switch_gcloud_auth_to_user_account(
“gs://weisburd-gcloud-secrets”, “weisburd@broadinstitute.org”, “seqr-project”)
- Parameters
gcloud_credentials_path (str) – Google bucket path that contains your gcloud auth .config folder.
gcloud_user_account (str) – The user account to activate (ie. “weisburd@broadinstitute.org”).
gcloud_project (str) – This will be set as the default gcloud project within the container.
debug (bool) – Whether to add extra “gcloud auth list” commands that are helpful for troubleshooting issues with the auth steps.
- record_memory_cpu_and_disk_usage(output_dir, time_interval=5, export_json=True, export_graphs=False, install_glances=True)¶
Add commands that run the ‘glances’ python tool to record memory, cpu, disk usage and other profiling stats in the background at regular intervals.
- Parameters
output_dir (str) – Profiling data will be written to this directory.
time_interval (int) – How frequently to update the profiling data files.
export_json (bool) – Whether to export a glances.json file to output_dir.
export_graphs (bool) – Whether to export .svg graphs.
install_glances (bool) – If True, a command will be added to first install the ‘glances’ python library inside the execution container.
step_pipeline.utils module¶
This module contains misc. utility functions used by other modules.
- step_pipeline.utils.are_any_inputs_missing(step, verbose=False)¶
Returns True if any of the Step’s inputs don’t exist
- step_pipeline.utils.are_outputs_up_to_date(step, verbose=False)¶
Returns True if all of the Step’s outputs already exist and are newer than all inputs
- exception step_pipeline.utils.GoogleStorageException¶
Bases:
Exception
- step_pipeline.utils.check_gcloud_storage_region(gs_path, expected_regions=('US', 'US-CENTRAL1'), gcloud_project=None, ignore_access_denied_exception=True, verbose=True)¶
Checks whether the given Google Storage path is located in one of the expected_regions. This is set to “US-CENTRAL1” by default since that’s the region where the hail Batch cluster is located. Localizing data from other regions will be slower and result in egress charges.
- Parameters
gs_path (str) – The google storage gs:// path to check. Only the bucket portion of the path matters, so other parts of the path can contain wildcards (*), etc.
expected_regions (tuple) – a set of acceptable storage regions. If gs_path is not in one of these regions, this method will raise a StorageRegionException.
gcloud_project (str) – (optional) if specified, it will be added to the gsutil command with the -u arg.
ignore_access_denied_exception (bool) – if True, this method return silently if it encounters an AccessDenied error.
verbose (bool) – print more detailed log output
- Raises
StorageRegionException – If the given gs_path is not stored in one the expected_regions.
step_pipeline.wdl module¶
This module contains Cromwell/Terra-specific extensions of the Pipeline and Step classes
- class step_pipeline.wdl.WdlPipeline(name=None, config_arg_parser=None, backend=Backend.TERRA)¶
Bases:
step_pipeline.pipeline.Pipeline
This class extends the Pipeline class to add support for generating a WDL and will later add support for running it using Cromwell or Terra.
- property backend¶
Returns either Backend.CROMWELL or Backend.TERRA
- new_step(name=None, step_number=None, depends_on=None, image=None, cpu=None, memory=None, storage=None, localize_by=Localize.COPY, delocalize_by=Delocalize.COPY, **kwargs)¶
Creates a new pipeline Step.
- Parameters
name (str) – A short name for this Step.
step_number (int) – Optional Step number which serves as another alias for this step in addition to name.
depends_on (Step) – Optional upstream Step that this Step depends on.
image (str) – Docker image to use for this Step.
cpu (str, float, int) – CPU requirements. Units are in cpu if cores is numeric.
memory (str, float int) – Memory requirements. The memory expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes. For the ServiceBackend, the values ‘lowmem’, ‘standard’, and ‘highmem’ are also valid arguments. ‘lowmem’ corresponds to approximately 1 Gi/core, ‘standard’ corresponds to approximately 4 Gi/core, and ‘highmem’ corresponds to approximately 7 Gi/core. The default value is ‘standard’.
storage (str, int) – Disk size. The storage expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes.
localize_by (Localize) – If specified, this will be the default Localize approach used by Step inputs.
delocalize_by (Delocalize) – If specified, this will be the default Delocalize approach used by Step outputs.
**kwargs – other keyword args can be provided, but are ignored.
- Returns
The new WdlStep object.
- Return type
- run_for_each_row(table)¶
Run the pipeline in parallel for each row of the given table
- run()¶
Generate WDL
- class step_pipeline.wdl.WdlStep(pipeline, name=None, step_number=None, image=None, cpu=None, memory=None, storage=None, output_dir=None, localize_by=Localize.COPY, delocalize_by=Delocalize.COPY)¶
Bases:
step_pipeline.pipeline.Step
This class contains Hail Batch-specific extensions of the Step class
- cpu(cpu)¶
Set the CPU requirement for this Step.
- Parameters
cpu (str, float, int) – CPU requirements. Units are in cpu if cores is numeric.
- memory(memory)¶
Set the memory requirement for this Step.
- Parameters
memory (str, float int) – Memory requirements. The memory expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi. Omitting a suffix means the value is in bytes. For the ServiceBackend, the values ‘lowmem’, ‘standard’, and ‘highmem’ are also valid arguments. ‘lowmem’ corresponds to approximately 1 Gi/core, ‘standard’ corresponds to approximately 4 Gi/core, and ‘highmem’ corresponds to approximately 7 Gi/core. The default value is ‘standard’.
- storage(storage)¶
Set the disk size for this Step.
- Parameters
storage (str, int) – Disk size. The storage expression must be of the form {number}{suffix} where valid optional suffixes are K, Ki, M, Mi, G, Gi, T, Ti, P, and Pi.