Please note We need someone who is physically present in India.
Data Engineer
Company: CODVO.AI
Department: Engineering
Location: Work From Home (India)
Employment Type: Full-Time
Experience Level: Senior (5+ years)
Work Hours: 2:30 PM - 11:30 PM IST
About CODVO.AI
CODVO.AI is a forward-thinking technology company dedicated to building intelligent data solutions that help organizations unlock insights from complex datasets. Our platform combines advanced matching algorithms, automated data standardization, and intuitive reporting to solve real-world data challenges. We're committed to creating a collaborative, inclusive engineering culture where talented professionals can do their best work and grow their careers.
Role Overview
We're seeking an experienced Data Engineer to build and maintain the core automation engine that powers CODVO.AI's data intelligence platform. In this role, you'll own the end-to-end Python codebase responsible for data ingestion, standardization, matching logic, scenario classification, and report generation. You'll be the primary architect of our data pipeline and Excel/PDF output systems, while collaborating closely with stakeholders to translate complex business requirements into robust, scalable solutions.
What You'll Do
- Design and build ETL pipelines that ingest, validate, and standardize data from multiple sources with high accuracy and performance
- Develop matching logic and fuzzy matching algorithms using tools like rapidfuzz to identify and reconcile data discrepancies across datasets
- Create automated report generation systems that produce Excel and PDF outputs with complex formatting, formulas, and dynamic content
- Implement data transformation workflows including date normalization, field mapping, and scenario classification based on business rules
- Write comprehensive unit tests using pytest and maintain code quality through version control (Git) and code review practices
- Translate business requirements into technical specifications, documenting matching rules, reconciliation scenarios, and data workflows directly from stakeholder discussions
- Optimize database queries and data operations using relational databases (PostgreSQL, MySQL) and SQLAlchemy ORM for efficient data access
- Collaborate with cross-functional teams to troubleshoot data issues, validate outputs, and continuously improve pipeline reliability and performance
What We're Looking For
Required Qualifications:
- 5+ years of professional Python development experience
- Strong hands-on expertise with pandas, openpyxl, and xlsxwriter for data manipulation and Excel generation
- Proven experience building and maintaining ETL pipelines and data matching logic
- Solid understanding of relational databases (PostgreSQL, MySQL) and SQL query optimization
- Experience with fuzzy matching techniques and date/time normalization
- Proficiency with pytest for unit testing and Git for version control
- Ability to self-manage requirements, prioritize tasks, and work independently in a remote environment
- Excellent communication skills with the ability to document technical solutions and business rules clearly
Nice to Have:
- Experience with SQLAlchemy ORM and advanced database design patterns
- Familiarity with Databricks or other cloud data platforms
- Knowledge of Tableau or other business intelligence tools
- Experience with data quality frameworks and validation methodologies
- Background in business analysis or requirements gathering
- Exposure to CI/CD pipelines and automated testing frameworks
What We Offer
- Flexible remote work with a collaborative team environment
- Meaningful impact on data-driven solutions that help organizations make better decisions
- Professional growth opportunities to expand your skills in cloud platforms, advanced analytics, and system architecture
- Competitive compensation commensurate with experience
- Full-time engagement with long-term project stability
- Collaborative culture where your technical expertise and insights are valued
- Opportunity to mentor junior developers and shape engineering best practices