back Back to Jobs

Data QA Engineer

Location: San Diego, California, United States
Job # 12147923
Date Posted: 04-01-2019
Data QA Engineer- San Diego, CA 
Pay Rate: DOE
Direct Hire
Start Date: ASAP
Our client is a leading IT consulting service headquartered in San Diego, CA.  With well known clients, they seek to build their team of data experts to work directly with their clients. At this time, they are looking for a Data QA Engineer with key skill sets: (Kafka/Confluent and StreamSets) whose goal will be to ensure that the QA process is automated and continuous to identify issues more quickly and more often, as well as, require less interaction from the team to run, maintain and update.

.  Ideal candidates are experienced data pipeline builders and data wranglers who enjoy optimizing and building data systems from the ground up, getting data from source systems to analytic systems, and modeling data for analytical reporting. They have also learned quite a bit about managing customers and the relevance of external and internal perceptions of their work product and how they relate to satisfaction. Having the confidence and knowledge to recommend solutions and the experience to know what will and won’t work are important traits for this consultant.
1.  Document QA processes for streams and data at rest in multiple data stores​
  • Specific repeatable workflow for testing kafka loading from any data source (Teradata, Snowflake, and Elasticsearch at a minimum). This should include matching counts from query are existing in result stream. Any transformation on the dataset should be taken to account if it changes the counts in the result stream by design.
  • Schema testing should be applied to Source, Transit and Target systems.
  • Data Types testing should be applied to Source, Transit and Target systems to ensure data integrity
 2.  Automate QA pipelines to be able to test as projects move through environments and data is transformed.​
  • Integrate testing into Streamsets data collectors so as pipelines are promoted from dev to production, tests run automatically and validate production readiness
  • Identify QA automation opportunities where Streamsets data collectors is not the processor and align the process so we are not dependent on Streamsets data collectors for proper QA
3.  Report on QA results.​
  • QA results should be formatted in html and output to S3 where internal techops users can review
  • Reports with QA failures should send a report link to a slack channel and the pipeline should not be promoted further.
 4.  Document QA architecture requirements
  • Diagrams of all architecture used to solve the above should be fully documented, including any git repos, reasons for choosing architecture, and monitoring and alerting specs.


Universal Skills
Must possess the following set of fundamental skills:
  • Uses technology to contribute to development of customer objectives and to achieve goals in creative and effective ways.
  • Communicates clearly and effectively in careful consideration of the audience, and in terms and tone appropriate to them.
  • Accepts responsibility for the successful delivery of a superior work product.
  • Gathers requirements and composes estimates in collaboration with the customer.
  • Respects coworkers and has a casual, friendly attitude.
  • Has an interest and passion for technology. This is not a joke, and yes, it’s a requirement.
  • The primary skill sets are Kafka/Confluent and StreamSets (or something similar to Streamsets such as Kinesis.
    • Terraform, Lambda, Dynamo, Athena architecture
  • Experience with data warehouse tools (Teradata, Oracle, Netezza, SQL, etc.) as well as cloud-based data warehouse tools (Snowflake, Redshift, Google BigQuery).
  • Experience building and optimizing traditional and/or event driven data pipelines.
  • Advanced working SQL knowledge and experience working with relational databases.
  • Familiarity with data processing tools such as Hadoop, Apache Spark, Hydra, etc.
  • Knowledge of cloud-based or streaming solutions such as Confluent and Kafka, Databricks and Spark Streaming.
  • Experience with ETL/ELT tools such as Matillion, FiveTran, Talend, Informatica, Oracle Data Integrator, or IBM Infosphere, and understands the pros/cons of transforming data in ETL or ELT fashion.
  • Good understanding of data warehouse concepts of schemas, tables, views, materialized views, stored procedures, and roles/security.
  • Adept at building processes to support data transformation, data structures, metadata, dependency and workload management.
  • Experience with BI tools such as Looker, Tableau, PowerBI, and Microstrategy.
  • Familiarity with StreamSets a plus.
  • Investigate emerging technologies.
  • Research most appropriate technology solution to solve complex and unique business problems.
  • Research and manage important and complex design decisions.
  • Direct interaction with the customer regarding significant matters often involving coordination among groups.
  • Work on complex issues where analysis of situations or data requires an in-depth evaluation of variable factors.
  • Exercise good judgment in selecting methods, techniques and evaluation criteria for obtaining solutions.
  • Attend sales calls as technical expert and offer advice or qualified recommendations based on clear and relevant information.
  • Research and vet project requirements with customer and technical leadership.
  • Assist in the creation of SOWs, proposals, estimates and technical documentation.
  • Act as vocal advocate for Fairway and pursue opportunities for continued work from each customer.
  • Determine methods and procedures on new or special assignments.
  • Requires minimal day-to-day supervision from the client management team.
  • Typically requires 5+ years of related experience.
  • Typically requires BS in computer science or higher.


  • Work from Home
  • Flexible Hours
  • 100% covered employee health insurance
  • 401(k) with employer match
  • Fun team building events/days/activities
  • New HQ with adjustable desks
this job portal is powered by CATS