World-Class Data Science Firm Builds Data Lake and Pipeline Application to Store & Transform Data

Challenge:

In order to create unique and cutting-edge predictive solutions for their clients, this world-class AI/Analytic data science firm’s data scientists needed a:

Robust Data Lake for terabytes of diverse and complex datasets.
Data pipeline application to transform operationally raw (dirty) customer data, public data, and third-party data into standardized data sets.

Solution:

Using AWS Airflow (Apache) as the Data Pipeline (ETL) orchestration engine, Converge engineers developed the modular applications in Python to ingest, cleanse, parse, enrich, and transform raw data and store into AWS S3. The transformed data sets, in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), a healthcare OHDSI standard, would become a commercially offered product by this client.
Our engineers processing these large datasets, each 300GB-500GB in size, used sophisticated performance-optimizing vectorized Python with PANDA and other advanced data analytic libraries for the heavy data science processing.
The first Cloud-based Data Platform (CDP) leverages DataBricks (SPARK on AWS) to support transformed data on demand for ad-hoc analysis, hypothesis testing, exploratory data analysis (EDA), derivative data set generation, and Machine-Learning (ML) models with fluid multi-cloud interoperability for the Data Lake between AWS data warehouse and GCP BigQuery, all integrated with Tableau visualization and Immuta as the data-governance core.

Results:

This technology-capable AI/Analytic data science firm found the right partner in Converge to complement their strengths at their level as equals. Our skilled and experienced consulting team met the stringent requirements of a hedge-fund funded AI start-up in a fast pace emergent high-value data science marketplace.

← Previous Case Study Next Case Study →

More Case Studies

New CISO Gains Unbiased View of Security Stance With Advanced Testing

Read Case Study $

Manufacturer Increases Security Operations Center (SOC) Efficiency

Read Case Study $

Manufacturer Plans a Secure-by-Design Cloud Transformation

Read Case Study $

New CISO Gains Unbiased View of Security Stance With Advanced Testing

Manufacturer Increases Security Operations Center (SOC) Efficiency

Let’s Talk

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.

Necessary

Always Enabled

Functional

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie	Duration	Description
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
language	1 month	This cookie is used to store the language preference of the user.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

Performance

Analytics

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_gtag_UA_127734268_1	1 minute	This cookie is set by Google and is used to distinguish users.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.

Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.

Others

Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.

Cookie	Duration	Description
AADNonce.forms	session	Unique identifier of one authentication session to prevent replay.
AnalyticsSyncHistory	1 month	linkedin.com - Used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
DcLcid	3 months	Microsoft - Used to implement Microsoft-forms on the website.
FormsWebSessionId	1 month	Microsoft
guest	1 month	Jotform.com - Preserves user session state across page requests for guest accounts.
jcm	past	Jotform.com - Preserves user session state across page requests
jcmc	past	Jotform.com - Preserves user session state across page requests
JOTFORM_SESSION	1 month	jotform.com - Unique identifier of the current Jotform session.
li_gc	2 years	Linkedin.com - Used to store consent of guests regarding the use of cookies for non-essential purposes
theme	1 month	Jotform.com - This is used for storing the theme/skin currently used on site.
usenewauthrollout	1 month	Microsoft
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
userReferer	1 month	jotform.com - This cookie indicates the referrer URL of the user.