Design, build, and maintain Databricks data pipelines (ETL/ELT) for ingestion, transformation, and orchestration using Spark/Delta Lake/Databricks Workflows.
• Operationalize machine learning models by building inference pipelines that invoke models authored by data scientists (batch or real-time), ensuring consistency between training and inference environments.
• Ensure data reliability, quality, and observability through robust validation, monitoring, alerting, and automated recovery mechanisms.
• Collaborate closely with data scientists to productionize models, manage model deployment lifecycles, and optimize inference performance and cost.
• Implement best-practice DevOps/MLOps processes such as CI/CD for pipelines, model versioning, environment promotion, and infrastructure-as-code.
• Optimize performance and cost across compute clusters, jobs, and storage layers.
• Implement and manage the enterprise data catalog, including schema design, table ownership, lineage, governance, and documentation using Unity Catalog.
• Experience with some Databricks infrastructure.
• Experience with building BI dashboards and visualization.
• Experience with coding agents and best practices (spec-driven development, etc.).
Must Have / Nice to Have Skills Required:
• Databricks platform experience
• Python development for data processing and ETL pipelines
• Unity Catalog knowledge
• AWS data services (S3, IAM, VPC, potentially Glue/Lambda)
• Data lake/lakehouse architecture patterns
• Dashboard building experience
Nice to Have:
• RESTful API design and development (Flask, FastAPI, or similar)
• Authentication/authorization patterns (OAuth, API keys, IAM roles)
• Query optimization and performance tuning
• PySpark optimization experience
• ML/AI pipeline experience
• Databricks AI/BI