Skip to the content.

Boosting Data Engineering Workflows with Dev Containers in Visual Studio Code

|

As a data engineer, maintaining a consistent development environment across machines and projects can be challenging. That’s where dev containers come in. By using Visual Studio Code Dev Containers, I’ve managed to create a reproducible, portable, and fully configured environment tailored specifically for my data projects.

In this post, I’ll walk you through how I use devcontainers in my day-to-day workflow, and how my current setup—with Python, Spark, AWS CLI, and Terraform—helps me be more productive and avoid the classic “it works on my machine” syndrome.


🔧 What’s Inside My Dev Container?

My .devcontainer folder contains three key files that bring everything together:

1. Dockerfile – Custom Python Environment for Data Engineering

This file defines the base image and installs everything I need for my workflow:

This setup gives me everything from Spark jobs to deploying infrastructure in one container.

2. devcontainer.json – VS Code Extensions and Workspace Configuration

This JSON file brings the VS Code experience to life by adding:

Extension Description
AmazonWebServices.aws-toolkit-vscode Integrates AWS services into VS Code, allowing exploration of cloud resources, invocation of Lambda functions, and management of credentials.
andyyaldoo.vscode-json Improves the experience of working with JSON files by providing validation, syntax highlighting, and formatting support.
editorconfig.editorconfig Applies consistent coding style rules across different editors and environments using .editorconfig files.
hashicorp.terraform Offers full support for Terraform configuration files, including syntax highlighting, auto-completion, linting, and validation.
mark-tucker.aws-cli-configure Assists in the interactive setup of AWS CLI credentials and profiles within the VS Code interface.
ms-azuretools.vscode-docker Enables the management of Docker containers, images, volumes, and networks through the Visual Studio Code user interface.
oderwat.indent-rainbow Uses color coding to visually distinguish levels of indentation, which enhances readability of nested structures in formats like YAML, JSON, and Python.
redhat.vscode-yaml Provides schema-based validation, auto-completion, and formatting for YAML files, essential for configuration-driven development.
ryu1kn.annotator Supports the addition of custom annotations or visual markers (e.g., highlights and notes) directly in the code.
tuxtina.json2yaml Facilitates the conversion between JSON and YAML formats, useful when working with configuration files in different ecosystems.
wayou.vscode-todo-highlight Highlights tags such as TODO, FIXME, and other keywords, improving the visibility of in-code reminders and pending tasks.
ms-python.python The official extension for Python development, providing script execution, environment management, linting, and Jupyter notebook integration.
ms-python.debugpy A Python debugging engine that supports breakpoints, variable inspection, and step-through code execution.
ms-python.vscode-pylance Delivers fast and precise type checking, intelligent auto-completion, and advanced static code analysis using the Pyright engine.

3. .bashrc – Interactive and Smart Terminal

I created a custom .bashrc to supercharge the shell:

The result is a terminal that’s not just functional, but a joy to use.


🚀 How This Helps Me Day to Day


✅ Final Thoughts

Using devcontainers in VS Code has completely changed how I approach project setup. For data engineering work that spans multiple tools, languages, and cloud services, having a well-defined environment is invaluable.

If you’re a data engineer looking for a scalable way to manage your dev environment, I highly recommend giving devcontainers a try.


Got questions or want to share your setup? Reach out to me on my GitHub profile. 📩😊