Senior/Principal Data Engineer
Company: SciTec Incorporated
Location: Boulder
Posted on: April 1, 2025
Job Description:
SciTec has been awarded multiple government contracts and is
growing our creative Team! SciTec, Inc. is a dynamic small business
with the mission to deliver advanced sensor data processing
technologies and scientific instrumentation capabilities in support
of National Security and Defense. We support customers throughout
the Department of Defense and U.S. Government in building
innovative new tools to deliver unique world-class data
exploitation capabilities.Important Notice: SciTec exclusively
works on U.S. government contracts that require U.S. citizenship
for all employees. SciTec cannot sponsor or assume sponsorship of
employee work visas of any type. Further, U.S. citizenship is a
requirement to obtain and keep a security clearance. Applicants
that do not meet these requirements will not be considered.We are
seeking an experienced Data Engineer to join our Mission Data
Processing program. In this role, you will design, build, and
maintain scalable ETL pipelines for processing terabyte-scale
streaming data and architect databases optimized for machine
learning on on-premises hardware using open-source software. The
ideal candidate will have expertise in data design patterns such as
the Medallion Architecture and data lakehouse technologies to
ensure efficient and reliable data processing. You should be
skilled at handling high-throughput, low-latency data ingestion,
managing data bursts, and implementing features like time-based
partitioning, versioning, auditing, and rollback for historical
data replay and event reproducibility. Additionally, you will bring
DevOps expertise for pipeline automation, Infrastructure as Code
(IaC) skills with tools like Terraform and Ansible, and a strong
understanding of DevSecOps practices for maintaining secure and
compliant data workflows.Responsibilities
- Design and optimize ETL pipelines capable of handling
high-throughput, low-latency data ingestion, especially during
large data bursts.
- Implement robust asynchronous processing systems using ZeroMQ
to handle large, serialized Protobuf messages.
- Create systems that efficiently process sudden, large volumes
of data while maintaining performance.
- Design strategies for managing backpressure to prevent system
overload during high data volumes.
- Develop fault-tolerant systems to safeguard data integrity and
maintain reliability.
- Set up monitoring and alerting mechanisms for proactive
response to sudden data load changes.
- Build and sustain high-performance databases on on-premises
infrastructure, leveraging MinIO or similar object storage
solutions for seamless integration with ML workflows.
- Apply and manage data design patterns such as the Medallion
Architecture to organize data into Bronze, Silver, and Gold
layers.
- Deploy Delta Lake solutions to combine the flexibility of data
lakes with data warehouse performance.
- Implement containerization and orchestration solutions using
Docker and Kubernetes, and build CI/CD pipelines for automated ETL
workflows.
- Implement infrastructure provisioning and deployment automation
using Terraform and/or Ansible.
- Uphold data governance and security protocols to ensure data
integrity and compliance with DoD standards, including
vulnerability scans and secure configurations.
- Lead the evaluation and adoption of open-source technologies
that enhance data engineering capabilities.
- Work with subcontractors and DoD organizations across sites,
accommodating hardware limitations and ensuring seamless
integration.
- Maintain comprehensive documentation and train teams on best
practices and tools in data engineering.
- Lead and provide guidance to developers and engineers on
architecture, design, and testing decisions.
- Provide thought-leadership and subject matter expertise for
data engineering and data pipeline orchestration across the
company.
- Regularly communicate with customers, present status, and
engage in program-level meetings and processes.
- Other duties as assigned.Minimum Qualifications
- Minimum 8 years of experience building and maintaining data
pipelines/ETL solutions at scale.
- Proficiency in Python, C++, SQL, and RDBMS (PostgreSQL or
similar).
- Experience with object storage (e.g., MinIO), Protocol Buffers,
and ZeroMQ.
- Familiarity with Data Version Control (DVC), Delta Lake, and
the Medallion Architecture.
- Skilled in Docker, Kubernetes, CI/CD pipelines, and
infrastructure automation (Terraform/Ansible).
- Experience with high-throughput, low-latency systems, fault
tolerance, and backpressure handling.
- Knowledge of data governance, versioning, auditing, rollback,
and DevSecOps practices.
- Active DoD Secret Clearance.
- Detail Oriented.
- Good verbal and written communication skills.Preferred
Qualifications:
- Knowledge of Java, Rust, Scala, and NoSQL databases (e.g.,
Apache Cassandra).
- Familiar with Apache Iceberg, Yugabyte, Apache Hudi, Ceph,
OpenStack Swift, Redis, and high-performance alternatives
(DragonflyDB, KeyDB, Apache Ignite).
- Experienced with data processing tools (e.g., Apache Airflow,
Prefect, Dagster, Apache NiFi, Apache Spark, Flink, Beam, Dask) and
data quality tools (e.g., Great Expectations, Soda Core).
- Familiar with performance optimization and observability tools
(e.g., Prometheus, Grafana, Loki).
- Experience with data management, compliance, and security
platforms (e.g., AWS Secrets Manager).Education:
- Bachelor's or Master's degree in Computer Science, Data
Engineering, or a related field.
- Relevant certifications are a plus.SciTec offers a highly
competitive salary and benefits package, including:
- Employee Stock Ownership Plan (ESOP).
- 3% Fully Vested Company 401K Contribution (no employee
contribution required).
- 100% company paid HSA Medical insurance, with a choice of 2
buy-up options.
- 80% company paid Dental insurance.
- 100% company paid Vision insurance.
- 100% company paid Life insurance.
- 100% company paid Long-term Disability insurance.
- Short-term Disability insurance.
- Annual Profit-Sharing Plan.
- Discretionary Performance Bonus.
- Paid Parental Leave.
- Generous Paid Time Off, including Holiday, Vacation, and Sick
Pay.
- Flexible Work Hours.The pay range for this position is $141,000
-$202,000/ year. SciTec considers several factors when extending an
offer of employment, including but not limited to the role and
associated responsibilities, a candidate's work experience,
education/training, and key skills. This is not a guarantee of
compensation.SciTec is committed to hiring and retaining a diverse
workforce and is proud to be an Equal Opportunity/Affirmative
Action employer.
#J-18808-Ljbffr
Keywords: SciTec Incorporated, Boulder , Senior/Principal Data Engineer, Engineering , Boulder, Colorado
Didn't find what you're looking for? Search again!
Loading more jobs...