​
DirectorySecurity Advisories
Sign In
Directory
apache-beam-python-sdk logo

apache-beam-python-sdk

Last changed

Sign In for Updates

Get notified of upcoming product changes, critical vulnerability notifications and patches and more.

Sign In
Versions
Overview
Provenance
Specifications
SBOM
Vulnerabilities
Advisories

Apache Beam is a unified programming model for Batch and Streaming data processing.This image provides the Python SDK of Apache Beam. The Python SDK for Apache Beam provides a simple, powerful API for building batch and streaming data processing pipelines.

Download this Image

The image is available on cgr.dev:

docker pull cgr.dev/chainguard/apache-beam-python-sdk:py3.11-v2.58

Get started with the Beam Python SDK quickstart to set up your Python development environment, get the Beam SDK for Python, and run an example pipeline.

To get you quickly started, we are giving you an example that uses DirectRunner. The Apache Beam examples directory has many examples. All examples can be run locally by passing the required arguments described in the example script.

For example, run wordcount.py with the following command:

container_id=$(docker run -d \
    --entrypoint bash \
    "cgr.dev/chainguard/apache-beam-python-sdk:py3.11-v2.58" \
    -c "
      echo 'Running Apache Beam wordcount with DirectRunner...';
      python -m apache_beam.examples.wordcount \
        --input=/etc/profile \
        --output=/output/part \
        --runner=DirectRunner;
      echo 'Listing /output inside container:';
      ls -l /output;
    ")
``
Once the container process is over, copy the output file from the container to the local system

```bash
docker cp "${container_id}:/output/" "${OUTPUT_DIR}"

Here ${OUTPUT_DIR} is any directory you want the output to be copied at, the output would be of format ${OUTPUT_DIR}/part-00000-of-00001, you can ls the content of the output and that will show you the word count map of each word and how many times it occured in that input file.

We have another example of using PortableRunner documented in our TESTING.md, you can also refer to official docs for further experiments

Licenses

Chainguard Images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" version of this image:

  • Apache-2.0

  • BSD-3-Clause

  • FTL

  • GCC-exception-3.1

  • GPL-2.0-only

  • GPL-2.0-or-later

  • GPL-3.0-or-later

For a complete list of licenses, please refer to this Image's SBOM.

Software license agreement

Category
apache
application
sdk

Media KitContact Us
© 2024 Chainguard. All Rights Reserved.
Private PolicyTerms of Use

Product

Chainguard Images