Data Vault Ensemble Modeling

About Data Science, and the Dependencies between Data Scientists and Data Modelers

Abhishek Chhibber, Indalytics Advisors

Storing and querying data for data science has its own problems. The data can be unstructured with complex schemas, and flat schemas are not of much help. Further, querying data of a high magnitude might create problems relating to bandwidth and infrastructure. Therefore, while creating data models for data science problems, it is important to anticipate all the future constraints, and plan accordingly.

For data science problems such as feature engineering, a data scientist is generally looking for a data in a specific format. Therefore, a lot of time is spent on converting the data from a database, to the desired format, without losing the essence. This talk will focus on the intersection of data modelling and data science.

Abhishek Chhibber is a data scientist, with over a decade of experience in aggregating and cleaning data, and using it for artificial intelligence, as well as business intelligence dashboards. He has worked with a variety of data types, ranging from text and financial data, to maps and images. Abhishek takes care of the data pipeline of Toronto-based Indalytics Advisors. He has hands-on experience of all the stages of building a scalable data pipeline, right from designing data models, to creating databases, and querying them through cluster-computing frameworks.