Last changed
Get notified of upcoming product changes, critical vulnerability notifications and patches and more.
Sign InApache Beam is a unified programming model for Batch and Streaming data processing.This image provides the Python SDK of Apache Beam. The Python SDK for Apache Beam provides a simple, powerful API for building batch and streaming data processing pipelines.
The image is available on cgr.dev
:
Get started with the Beam Python SDK quickstart to set up your Python development environment, get the Beam SDK for Python, and run an example pipeline.
To get you quickly started, we are giving you an example that uses DirectRunner
. The Apache Beam examples directory has many examples. All examples can be run locally by passing the required arguments described in the example script.
For example, run wordcount.py with the following command:
Here ${OUTPUT_DIR} is any directory you want the output to be copied at, the output would be of format ${OUTPUT_DIR}/part-00000-of-00001
, you can ls the content of the output and that will show you the word count map of each word and how many times it occured in that input file.
We have another example of using PortableRunner
documented in our TESTING.md
, you can also refer to official docs for further experiments
Chainguard Images contain software packages that are direct or transitive dependencies. The following licenses were found in the "latest" version of this image:
Apache-2.0
BSD-3-Clause
FTL
GCC-exception-3.1
GPL-2.0-only
GPL-2.0-or-later
GPL-3.0-or-later
For a complete list of licenses, please refer to this Image's SBOM.
Software license agreement