Boosting genetics research
Our customer company is a genomics service company focused on general NGS and microarray research with a proprietary technology platform.
Our client started a research program to understand genetic factors and their variations with respect to the SARS-CoV-2 infection development. The main focus was on sequencing patients’ data and identification of the genetic factors that may cause or correlate with disease outcomes.
Technologically, the company collected WGS data and performed viral genome sequencing, which allowed host respectability analysis, disease severity, host-virus interactions, and other factors. Huge datasets and a need to test several research hypotheses in parallel and quickly were the biggest challenge at the beginning of the project. Of course, cost-effective and secure storage alongside the ability to make fast queries over the research data were among the requirements as well.
Since the high load was the biggest challenge, we focused on cloud platform development. To allow parallel task processing we have built a workflow orchestrator that automatically schedules, monitors, and scales processing. All the results from the analytics queries ran on separate VMs were stored in a cloud object storage which is scalable with respect to the amount of data and protects results from unauthorized access with encryption features and access management tools. Cost-effectiveness was met with data lifecycle policies and automatic workflow orchestration. The OLAP data warehouse provided fast analytical queries on structured data.
- Unstructured WGS datasets were stored in an on-premise ecosystem with an analytics speed bottleneck that became crucial with the data amount growth
- Inability to analyze host susceptibility, clinical outcomes, and severity of the disease, host-virus interactions based on the WGS data
- A secure cloud platform tailored for parallel data processing running in production and supporting hundreds of simultaneous analytical tasks processing
- A human-friendly UX interface for fast analytical queries