Boosting Data Engineering Workflows with Dev Containers in Visual Studio Code
Maialen | September 21, 2025As a developer, have you ever encountered the infamous “it works on my machine” problem?
This is exactly what Dev Containers solve. A development container is a running Docker container that is fully configured with a specific toolset, runtime, and environment settings required for a project.
For Data Engineers, this concept is particularly vital because our work involves complex stacks:
- Multiple Runtimes: Juggling Python versions alongside the Java Virtual Machine (JVM) needed for Spark.
- Cloud Integration: Needing various CLIs (
awscli,gcloud, etc.) and Infrastructure-as-Code tools (terraform). - Dependency Conflicts: Managing project-specific library versions without polluting your local machine.
By using Visual Studio Code Dev Containers, I’ve managed to create a reproducible, portable, and fully configured environment tailored specifically for my data projects.
In this post, I’ll walk you through how to set up a devcontainer, sharing an example of a configuration—complete with Python, Spark, AWS CLI, and Terraform—that helps me be more productive and avoid dependency hell.
What’s Inside My Dev Container?
A .devcontainer folder can contain multiple files. In this example, I will focus on three key files that bring everything together, ensuring a consistent environment:
1. Dockerfile – Custom Python Environment for Data Engineering
This file defines the base image and installs everything needed for a comprehensive Data Engineering workflow.
- Core Runtimes:
- Python (specific version): Ensures version consistency across team members.
- OpenJDK: Crucial, as distributed processing tools like Spark run on the JVM.
- System Dependencies:
curl,unzip,build-essential, and database connectors. - Spark Integration: I set up the
Dockerfileto download and configure Spark, ensuring environment variables like$JAVA_HOME$v and$SPARK_HOME$are correctly defined and added to the$PATH$. - Python Libraries:
pyspark– for distributed data processing.pandas– for efficient data wrangling.- Others.
- Tools:
awscli– for interacting with AWS services.terraform– for infrastructure as code (IaC).
This complete setup means we have everything from running Spark jobs to deploying infrastructure ready to go in one isolated container.
2. devcontainer.json – VS Code Extensions and Workspace Configuration
This JSON file brings the VS Code experience to life inside the container.
- Workspace Settings: Configures VS Code’s integrated terminal to use Bash as the default shell and adjusts SSL settings for working easily with proxies.
- Mount Points: Defines local directories or secrets to be attached to the container, providing flexibility while maintaining isolation.
- Extensions for Productivity: Visual Studio Extensions are installed directly into the container, ensuring that all developers have the same debugging and development tools. Here is a list of some interesting extensions:
| Category | Extension | Functionality Focus |
|---|---|---|
| Data/Python | ms-python.vscode-pylance |
Fast and precise type checking and advanced IntelliSense. |
| Cloud/AWS | AmazonWebServices.aws-toolkit-vscode |
Explore cloud resources, manage credentials, and invoke services from VS Code. |
| IaC/DevOps | hashicorp.terraform |
Full support for Terraform configuration files, including syntax highlighting and validation. |
| Containers | ms-azuretools.vscode-docker |
Manage Docker containers, images, and networks through the VS Code UI. |
| Configuration | redhat.vscode-yaml |
Schema-based validation and formatting for YAML files. |
| Readability | oderwat.indent-rainbow |
Uses color coding to visually distinguish levels of indentation, enhancing readability. |
Other Useful Extensions: While the table above covers the most impactful extensions for core productivity, countless other extensions can be a game-changer depending on your specific needs, such as:
andyyaldoo.vscode-json(JSON files formatting support),wayou.vscode-todo-highlight(hightlight keywords such asTODO), and others.
3. .bashrc – Interactive and Smart Terminal
A custom .bashrc file supercharges the terminal experience inside the container:
- Persistent command history – Never lose track of useful, project-specific commands.
- Custom prompt – Shows user, host, working directory, and current Git branch.
- Aliases and autocomplete – Faster access to commonly used commands, including AWS CLI.
The result is a terminal that’s not just functional, but a joy to use.
Final Thoughts
Using devcontainers in VS Code has completely changed how I approach project setup. For data engineering work that spans multiple tools, languages, and cloud services, having a well-defined environment is invaluable.It provides:
- Consistency: Every collaborator gets the exact same environment.
- Isolation: Your local machine stays clean and free of conflicting packages.
- Productivity: Everything is pre-configured—no more time wasted on environment setup.
If you’re a data engineer looking for a scalable way to manage your dev environment, I highly recommend giving devcontainers a try.
Got questions or want to share your setup? Reach out to me on my GitHub profile. 📩😊