Data Vault Ensemble Modeling

Data Structure Graph Hands-On Workshop

Doug Needham, Kore Wireless

A Data Structure graph is a measurement tool applying the concepts of graph theory to data modeling and data architecture. For those familiar with using Data Modeling tools, there is an adage associated with creating new data models. “Try not to cross the lines. If you cross the lines, your ERD will be complex.” This “rule of thumb” can be found in Graph Theory. A planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. In other words, it can be drawn in such a way that no edges cross each other. This mathematical reason for not crossing the lines is one of the many applications of graph theory to data modeling.

  • Graph Theory and Network Science overview. We will talk about some of the history of graph theory, how it evolved into network science, and its applications in the “real world”. Many terms will be defined, and their use applied to various problems.  Graph Clustering and general clustering techniques will get a brief overview.
  • ERD Data Modeling Fundamentals. Here we will create a few entities and relationships. The most fundamental types of data modeling will be covered in this class. For more details of data modeling other sources are described and outlined.
  • Tool Overview. Gephi, Tulip and R, will be discussed. Our lab later will be done in Gephi. Tool setup, and a short demo of how to use the tool. Statistics, measures, filters, and on screen graph manipulation will be demonstrated. The terms defined in part 1 will be expanded on, and applied to demonstration graphs.
  • Converting an ERD to a Data Structure Graph. Using Dbeaver and some python code that will be handed out, we will reverse engineer an existing data model and create an input file for Gephi to use in the lab. For any of you who have existing data models you wish to review, this is the portion of the class where we will convert your data model into a data structure graph that can be analyzed with Gephi, Tulip, R and Python.
  • Volumetrics. Volumetrics is the study of the space requirements of data models. We will discuss the basic formulas for how to create volumetric estimates. Then we will show how to analyze an existing database for growth patterns. These volumetric measurements will assist us in the topographic component of the lab
  • Lab. During the lab, we will load our data structure graph into Gephi and explore the data model using the tools and techniques learned earlier. The centrality measures, along with volumetric, and structural data will be analyzed and some basic clustering techniques will be covered demonstrating how to analyze the internal structure of a data model and applications will be discussed for real world use.
  • Transition from DSG Level 1 to DSG Level 2, and Q&A. What we have been working on during this lab is a Data Structure Graph Level 1. A Data Structure Graph Level 2 is a larger graph that incorporates the transfer of data from one application to another within an enterprise. The same analytical techniques can be applied to both Level 1 and Level 2 Data Structure Graphs but the interpretation, meaning and applicability are slightly different. We will discuss these topics and leave room for further questions and answers.

Doug started his career as a Marine Database Administrator supporting operational systems that spanned the globe in support of the Marine Corps missions. Since then Doug has worked as a consultant, data engineer, and data architect for Enterprises of all sizes from 3M and Lockheed Martin to a number of startups. Working in industries like Telecom, Retail, Medical, Industrial, and Education, Doug has worked with data that supports a variety of mission critical needs. Organizing data to make to make it easily accessible to people that need it has been Doug’s main purpose during this time. In working with such a variety of use-cases, applications, source systems, and analytical needs, Doug began to understand how to apply Social Network Analysis to the field of data modeling and data architecture. These techniques have been around since the time of Euler and applying them to the growing needs of our ever expanding data infrastructure has shone new light on a field defined by Codd, Inmon, Kimball,  and others. Doug is excited to share Network techniques and their application with anyone who will listen. Doug is always looking to learn new things.