The Coinbase Blog
Sep 1
Tl;dr: This blog post describes how we developed an efficient, reliable Python ecosystem using Pants, an open source build system, and solved the challenge of managing Python applications at a large scale at Coinbase.
By The Coinbase Compute Platform Team
Python is one of the most frequently used programming languages for data scientists, machine learning practitioners, and blockchain researchers at Coinbase. Over the past few years, we have witnessed a growth of Python applications that aim to solve many challenging problems in the cryptocurrency world like Airflow data pipelines, blockchain analytics tools, machine learning applications, and many others. Based on our internal data, the number of Python applications has almost doubled since Q3, 2022. According to our internal data, today there are approximately 1,500 data processing pipelines and services developed with Python. The total number of builds is around 500 per week at the time of writing. We foresee an even wider application as more Python centric frameworks (such as Ray, Modin, DASK, etc.) are adopted into our data ecosystem.
Engineering success comes largely from choosing the right tools. Building a large-scale Python ecosystem to support our growing engineering requirements could raise some challenges, including using a reliable build system, flexible dependency management, fast software release, and consistent code quality check. However, these challenges can be combated by integrating Pants, a build system developed by Toolchain labs, into the Coinbase build infrastructure. We chose this as the Python build system for the following reasons:
Python is one of the most popular programming languages for machine learning and data science applications. However, prior to adopting the Python-first build system, Pants, our internal investment in the Python ecosystem was low in comparison to that of Golang and Ruby — the primary choice for writing services and web applications at Coinbase.
According to the usage statistics of Coinbase’s monorepo, Python today accounts for only 4% of the usage because of lack of build system support. Before 2021, most of the Python projects were in multiple repositories without a unified build infrastructure — leading to the following issues:
We decided to build PyNest — a new Python “monorepo” for the data organization at Coinbase. It is not our intention for PyNest to be use as a monorepo for the entire company, but rather that the repository is used for projects within the data organization.
Figure 1. Dependency graph for machine learning platform (MLP) projects.
The graph below shows the repository architecture at Coinbase, where the green blocks indicate the new Python ecosystem we have built. Inter-repository operability is achieved by serving layers including the code artifacts and schema registry.
Figure 2. Repository architecture at Coinbase
# third-party dependencies
Figure 3. Pynest repository structure
The following is a list of the major elements of the repository and their explanations.
1. 3rdparty
Third party dependencies are placed under this folder. Pants will parse the requirements.txt files and automatically generate the “python_requirement” target for each of the dependencies. Multiple versions of the same dependency are supported by the multiple lockfiles feature of Pants. This feature makes it possible for projects to have conflicts in either direct or transitive dependencies. Pants generates lockfiles to pin every dependency and ensure a reproducible build. More explanations of the pants multiple lock is in the dependency management section.
2. Lib
Shared libraries accessible to all the projects. Projects within PyNest can directly import the source code. For projects outside PyNest, the libraries can be accessed via pip installing the wheel files from an internal PyPI server.
3. Project folders
Individual projects live in this folder. The folder path is formatted as “{project_name}/{src or test}/python/{namespace}”. The source root is configured as “src/python” or “test/python”, and the underneath namespace is used to isolate the modules.
4. Code owner files
Code owner files (OWNERS) are added to the folders to define the individuals or teams that are responsible for the code in the folder tree. The CI workflow invokes a script to compile all the OWNERS files into a CODEOWNERS file under “.github/”. Code owner approval rule requires all pull requests to have at least one approval from the group of code owners before they can be merged.
5. Tools
Tools folder contains the configuration files for the code quality tools, e.g. flake8, black, isort, mypy, etc. These files are referenced by Pants to configure the linters.
6. Buildkite workflow
Coinbase uses Buildkite as the CI platform. The Buildkite workflow and the hook definitions are defined in this folder. The CI workflow defines the steps such as
7. Dockerfiles
Dockerfiles are defined in this folder. The docker images are built by the CI workflow and deployed by Codeflow — an internal deployment platform at Coinbase.
8. Pants libraries
This folder contains the Pants script and the configuration files (pants.toml,
This article describes how we build PyNest using the Pants build system. In our next blog post, we will explain dependency management and CI/CD.

Learn about working at Coinbase:
Jo Colina
Chatbots Magazine
Neha M
General Tso
Josh Cole
Dev Genius
Susan Were
Michael Corleon
Our mission is to increase economic freedom in the world. Visit us at
Zion Schum
Framework Ventures
Tony Yiu
Alpha Beta Blog
Richard Knight


/ Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *