Data Engineer - AI Focused Data Pipeline & Management

Data Engineer – AI Focused Data Pipeline & Management

Remote
Anywhere
Bloged 1 month ago

What We Offer:

Remote job opportunity
Internet allowance
Canteen Subsidy
Night Shift allowance as per process
Health Insurance
Tuition Reimbursement
Work Life Balance Initiatives
Rewards & Recognition
Internal movement through IJP

WHAT YOU’LL BE DOING:

Data Infrastructure Design & Management:

Data Lake Development:

Design, implement, and manage scalable and secure data lakes to store and process large volumes of structured and unstructured data from various sources.
Optimize data lake architectures for performance, cost- effectiveness, and easy data retrieval to support AI/ML workloads.

Database Management:

Manage and maintain SǪL and NoSǪL databases, including AWS DocumentDB, ensuring optimal performance, reliability, and scalability.
Design and implement efficient database schemas and indexing strategies tailored for AI data requirements.

Data Pipeline Development:

ETL/ELT Processes:

Develop and maintain robust Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) pipelines to process and prepare data for AI model training and evaluation.
Automate data ingestion from diverse sources, including APIs, streaming data, and batch processes.

Workflow Automation:

Utilize tools like Apache Airflow, AWS Glue, or similar technologies to schedule and orchestrate complex data workflows and ensure timely availability of data.
Implement data validation and quality checks within pipelines to ensure accuracy and consistency.

Data Processing & Optimization:

Big Data Processing:

Employ big data technologies such as Apache Spark and Hadoop for large-scale data processing and transformation tasks.
Optimize data processing jobs for performance and resource efficiency, enabling faster AI model training cycles.

Data Storage Optimization:

Implement data partitioning, compression, and other storage optimization techniques to enhance data retrieval speeds and reduce storage costs.
Manage data versioning and lineage to track changes and support reproducibility in AI experiments.

Collaboration & Support:

Cross-functional Collaboration:

Work closely with AI researchers and developers to understand data requirements and provide tailored data solutions that support various AI/ML projects.
Collaborate with MLOps engineers to integrate data pipelines with model deployment and monitoring systems seamlessly.

Technical Support & Troubleshooting:

Provide ongoing support for data-related issues, quickly identifying and resolving bottlenecks and failures in data pipelines and storage systems.
Continuously monitor data systems’ performance and implement improvements as needed.

Data Governance & Security:

Data Ǫuality & Compliance:

Establish and enforce data governance policies and best practices to maintain high data quality and integrity across all datasets.
Ensure compliance with relevant data protection regulations (e.g.,
GDPR, HIPAA) and implement appropriate data anonymization and encryption techniques.

Security Management:

Implement robust security measures to protect data assets, including access controls, auditing, and monitoring to prevent unauthorized access and data breaches.

Documentation & Knowledge Sharing:

Comprehensive Documentation:

Create and maintain detailed documentation for data architectures, pipelines, and processes to facilitate knowledge sharing and onboarding.
Document data schemas, data dictionaries, and metadata to support effective data usage across the organization.

Best Practices Development:

Develop and promote best practices for data engineering within the AI team and the broader organization, staying updated with the latest industry trends and technologies.

WHAT WE EXPECT YOU TO HAVE:

Programming Languages:

Python: Advanced proficiency for developing data pipelines, scripting, and automation tasks.
SǪL: Strong skills in writing complex queries, stored procedures, and optimizing database interactions.
Scala or Java: Experience with Scala or Java for working with big data frameworks like Apache Spark.

Data Storage & Management:

Data Lakes: Expertise in designing and managing data lakes using platforms such as AWS S3 or Azure Data Lake Storage, including understanding of lakehouse architectures.
Databases:
SǪL Databases: Proficient in working with relational databases like BloggreSǪL, MySǪL, or AWS RDS, including database design and optimization.
NoSǪL Databases: Experience with NoSǪL databases such as AWS DocumentDB (MongoDB compatible), Cassandra, or DynamoDB, including data modeling and performance tuning.

Data Pipeline & Workflow Management:

ETL/ELT Tools: Proficiency in building and managing data pipelines using tools like Apache Airflow, AWS Glue, or similar orchestration frameworks.
Big Data Technologies: Experience with big data processing frameworks such as Apache Spark, Hadoop, or Databricks for large- scale data processing tasks.
Stream Processing: Knowledge of real-time data processing using technologies like Apache Kafka, Kinesis, or Flink.

Cloud Platforms & Services:

AWS Services: Extensive experience with AWS ecosystem, including services like S3, EC2, EMR, Lambda, Redshift, and IAM for data storage, processing, and security.

Data Modeling & Processing:

Data Modeling: Strong skills in designing efficient and scalable data models tailored for AI/ML use cases.
Data Transformation: Expertise in data cleaning, normalization, aggregation, and transformation techniques to prepare data for machine learning workflows.
Metadata Management: Experience in managing and utilizing metadata to enhance data discoverability and governance.

Experience:

Professional Experience:

Professional experience in data engineering, particularly in environments supporting AI and machine learning projects.
Proven track record of building and maintaining scalable, reliable, and efficient data infrastructures and pipelines.

Project Experience:

Experience in handling large-scale datasets (terabytes or more) and optimizing data processing workflows for performance and cost- effectiveness.
Demonstrated ability to work on complex data integration projects involving multiple data sources and formats.

Collaboration:

Experience collaborating with cross-functional teams, including data scientists, AI researchers, software developers, and DevOps engineers, to understand and meet diverse data needs.
Ability to translate business and analytical requirements into effective data solutions.

Education:

Academic Qualifications: Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or a related field. A Master’s degree or relevant certifications is a plus.
Certifications: Relevant certifications such as AWS Certified Data Analytics, Certified Data Engineer, or similar credentials are advantageous.

Other Skills:

Problem-Solving: Strong analytical and problem-solving skills with the ability to troubleshoot complex data issues and implement effective solutions promptly.
Attention to Detail: High level of accuracy and thoroughness in handling and processing data, ensuring data quality and consistency across all stages.
Communication: Excellent verbal and written communication skills, with the ability to convey complex technical concepts to both technical and non- technical stakeholders effectively.
Adaptability: Ability to adapt to evolving technologies and methodologies in the data engineering and AI landscape, continuously learning and integrating new tools and practices.
Time Management: Proven ability to manage multiple projects and tasks simultaneously, prioritizing effectively to meet deadlines and deliver high- quality results.

To apply for this job email your details to priya.mittal@etechtexas.com

Job Title	:	Data Engineer – AI Focused Data Pipeline & Management
Location	:	Gandhinagar

Data Engineer – AI Focused Data Pipeline & Management

Subscribe To Receive Our Latest Updates

Solutions

Company

Resources

Career

Blog

Events

Contact Us

Connect With Us

Copyright © Etech Global Services. All Rights Reserved

Contact Us

Request A Free Consultation

Request a Demo

Request a Free Trial

HIRE DATA SCIENTISTS

Thank you for sharing your details. Click below link to watch.