Monday, May 23, 2016

A Data Model with a Meta-Entity Becomes Three-Dimensional


The following information protected by USA and international law. Any usage of this publication or its part can be allowed per request. Consultation available - leave your email address as a comment.

Foreword

The post describes some conclusions appeared recently, after the article “Data Modeling Techniques – Efficiency of Meta-Entities in an ERD” was published in the TDAN (http://tdan.com/data-modeling-techniques-efficiency-of-meta-entities-in-an-erd/19452).

Appreciation

First of all, I appreciate the help and support I got from my family, and especially, from my children Dima and Lara, without which my work on the article and this blog would not be possible.

Let me express my special thanks to David Hay (https://www.linkedin.com/in/davehaytx) who generously gave me a lot of very important advices which helped me to prepare the article and make it ready for the publication.

I am very grateful to Robert Seiner (https://www.linkedin.com/in/robert-s-seiner-445313) who published the article which became my first in the USA (after a dozen published earlier, including publications in editions of the Ukrainian Academy of Science).

What does Meta-Entity Bring to ERD?

The article describes the main practical achievement of the usage of meta-entities in an entity-relationship diagram – efficiency of the usage: reduction of relations makes the entity-relationship diagram more readable and reduces time for its program implementation.

Also, the usage of meta-entities helps to reveal more generic features of business entities, common for different business entities (some examples are in the article). It lets identify such features in a business process/model and use them to manage the appropriate business process. This is the reason for publication of this post.

Furthermore: probably, the most interesting effect of the usage of meta-entities in an entity-relationship diagram appeared because meta-entities not only becomes an organic part of their entity-relationship diagram, but also become a part of physical implementation of this entity-relationship diagram – they implemented as database tables (more details of the technique described in the article). On the other hand, meta-entities represent a different level of generalization in a data model.

Such way, meta-entities are an organic part of their data model (entity-relationship diagram) implemented in database tables, and, at the same time, represent (and such way, bring to their data model) a higher level of generalization. Hence, meta-entities become a higher level of their data model making a data model (entity-relationship diagram) 3-dimensional (Figure 1).

Figure 1



The adding a meta-entity to (and above!) an entity-relationship diagram is the process opposite to the regular top-down data modeling process. It happens because the addition of a meta-entity to an existing entity-relationship diagram is the improvement of the data model, and this improvement uses the bottom-up approach, which is provided after the process of creating the data model is completed (Figure 2).

Figure 2



Here is the example of such process (entities are taken from the article): initially, a housing authority had two different entities – Private Property Owner and State Property Owner. Each of these entities came from two different appropriate business processes which exist separately because the management of these two different types of business owner properties based on different requirements. After adding the Property Owner meta-entity to a data model, this meta-entity becomes not only a part of a logical (then, and physical) data models (eventually been implemented in a database table), but also reflects, along with all other entities, appropriate business entity. This reflection gives to a housing authority management an opportunity to consider the new meta-entity and work on finding general for both business entities new, common management solutions, and such way, provide unification of the management solutions.

Conclusion
1.      Addition of a meta-entity to an entity-relationship diagram is accompanied by creation relations of entities with representing them meta-entity, and as a meta-entity and entities are on the different levels of a data model, the relations between entities and a meta-entity make a data model 3-dimensional (Figure 1).

2.      Addition of a meta-entity creates premise for identification the appropriate entity in a business model (Figure 2). It allows unify the business management processes for business entities that become possible because of relations between different levels of a data model, and necessity to have these levels of a data model consistent to each other. The improvement of the business management process after adding a meta-entity to an entity-relationship diagram occurs in the opposite (bottom-up) direction to the regular modeling process because the addition of a meta-entity is the improvement of the initially created data model.

 

Responses on LinkedIn Discussions

After discussions of the post on LinkedIn, I found that some of my responses can be useful for the post readers and decided to add them to the post.
The article related to the process of a data model improvement – not of its creation, and this is a key point. The improvement begins after a data model already created – not during its creation. This process is opposite to the top-down process of creation, and it is the bottom-up process: generalization of already defined entities – this is what the blog post about.
The improvement aims simplification of a data model which became too complex because of multifunctional business entities which plays different roles in different business processes simultaneously. The improvement uses an already created data model as a base and simplifies it using generalization of entities. This generalization provided not as a simple aggregation of entities, but the generalization which is based on entities which may belong to different supperclasses. The generalization may replace relations of generalized entity in the improving data model not completely, but partially. Here is the major difference between meta-entity and superclass – it is mentioned in the article.
The category meta-entity exists (for example, Google search shows numerous results: IBM is among first ones; it also mentioned, for example, in the book “Conceptual Modeling of Information Systems” https://books.google.com/books?id=61mT383WC78C&pg=PA385&lpg=PA385&dq=%22meta-entity%22+definition&source=bl&ots=aXW28JVaKP&sig=5ihMooqZ94jyG0mWRQ2hWHjLtVE&hl=en&sa=X&ved=0ahUKEwj6756986DNAhWHkh4KHecxCGsQ6AEIQTAH#v=onepage&q=%22meta-entity%22%20definition&f=false ). The prefix meta- (in meaning "after", or "beyond") used because meta-entity created during the ERD improvement – after ERD creation, and meta-entity is created above the entities which it generalized.
The model becomes 3-dimensional because the meta-entity not only stays above entities it generalized, but at the same time represented in a database table becoming a part of a physical implementation of the data model.
The number of relations defines the number of possible ways in a data entering and search. The number of ways defines a number of SQL constructions, implemented in programs, for example, programs for data entry (allocation with decomposition) and programs for calculation of reports.
Such category as “View” used in the implementation of a data model in a database and not alternating with data modeling categories like “Entity”, “Dimension” (and in our case, “Meta-Entity”). A meta-entity should not be implemented as a view because it contains data which should be inserted, updated, and deleted – the same as a regular table. The usage of a view for a meta-entity implementation will just create technical difficulties and restrictions without any gain. By that reason, a meta-entity should be implemented as a table – the same way as a regular entity.
Implementation of the data model with meta-entities was provided with usage the Data Vault modeling approach that organically allows to present meta-entities along with entities: meta-entities were presented as hubs and links, and entities – as links and satellites. It allows implement the occasion when one entity covered by more than one meta-entity (the occasion mentioned in the article), for example, “Private Property Owner” has relations with two meta-entities, implemented as two hubs: “Property Owner” and “Individual”.
As the result, the number of relations minimized and the model became simpler, than it was initially, and more readable and understandable both by business subject matter experts and by developers.
The work with developers on the right implementation of a data model definitely requires some efforts to reach the right result. I met this issue and can recommend building SQL statements separately from other parts of programs – it allows get the SQL statements with the right usage of a data model based on a data architect’s vision, not a developer’s one. Then, the SQL statements can be embedded into the programs or called from them as functions or stored procedures. But this advice can be implemented when a project management set the predominant status of a data architect during a data model implementation – unfortunately, it happen not always … - it’s the topic for another discussion.
The serious answer on why stop at three dimensions: yes, you can add more dimensions, for example, time and money, but in this case, you will consider not a data model, but a model of a data model implementation (management of a data model implementation) – another than a data modeling field.

 
Meta-entities, as a generalization layer, save relations of the replaced entities with other ones, just reducing the relations number.  Meta-entities also contain data which reflect the modeled business process. These relations and data completely present the initial model which describes business relationships.


Particular business data reflect the business process they related to. By that reason, they have the traits of the business process.


A normalized data model (for example, 3-rd normal form) can be transformed into a completely denormalized data model (star-schema), or into a multilevel model (for example, Data Vault).


Levels of data mentioned, for example, by Martin Modell: http://www.martymodell.com/dadmc/dadmc03.html .


The perspectives of data are reflecting the vectors of measurement – they become dimensions in the star-schema approach. But cross-perspective relations between data from different perspectives (for example, different types of measurement related to the same business event (for example, transaction) allow to build data models as we have them (for example, relations of data from different dimensions to the same event fact data in a star-schema data model).


After the cross-perspective relations set and data from different perspectives form what usually considered as a transactional row, which has key attributes. These key attributes represent higher than other attributes level(s – plural form related to Data Vault modeling). Such way, perspectives can be transformed into multi-level model.

Tuesday, February 9, 2016

Dimensions of Programming

The following information protected by USA and international law. Any usage of this publication or its part can be allowed per request. Consultation available - leave your email address as a comment.

 
This post describes some specific tendency in programming appeared over the last years, especially in ETL programming. The analysis was mostly prepared in 2011 and finalized now because the tendency of multi-dimensional programming not disappeared, but even became dominant, sometimes transforming into really dispersed programming, especially in the user interface implementation. Most of narrative below was not changed because the tendency stays strong despite issuing new versions of different tools and applications with some modification in details.
 

History


1.    Böhm and Jacopini showed in 1966 that any program can be created as a sequence of 3 basic constructions: sequence of commands; if-logic; and loop. They considered the program as a sequence of steps provided one-by-one – in 1 dimension.  Later, Dahl, Dijkstra and Hoare developed this approach and built “without go-to” style of programming – so named “structural programming”. Such way, it was eliminated “spaghetti” style of programming where logic of programs was even not multidimensional but scattered. From that moment, the computer programming transformed from logical knitting into industrial technology.
2.    Later, program execution became parallel, and observation of program became 2-dimensional. To consolidate consideration (tracking) of such parallel processes, the parallel processes were synchronized. Such way, relation and dependency between parallel processes became observable, visible and readable because we can see them on flat surface.
3.    After appearing and using multi-featured components with variety of tuned abilities to transform data (for example, ETL transformations), details (variations) of each component (transformation) hide behind scene, in 3-d dimension. Such 3-dimensional programming (which later became a multidimensional) made the tracking of data processing more complicated, and, hence, slower.

Example 

To provide analysis of how program parts spreaded in 3 dimensions, let’s for example, provide this analysis of a very popular ETL tool Informatica®. I choose it because it’s a very popular and powerful tool with which I worked during long time. It gives a lot of advantages and has a numerous achievements in ETL process. To show the way how to make it better, let’s consider it from prospective of 3-dimensional programming and show how to make it more efficient.
 

It has different types of objects each of which has different types of links to other ones:
  • Mappings and Mapplets 
  • Sessions  
  • Workflows and Worklets 
  • Parameter files
  • Parameters and variables with or without persistent values
  • External objects (for example, files and database tables)
  • External operating system and database processes

 
Each of them has a few levels of sub-objects in depth, spreading on a few squares (one in other) or even in a few other objects. Sometimes, entry to the lower level description attached as a patch (for example, Set File Properties point in the in the Mapping tab of the session Task Editor).  As result, tracking of some process and keeping a whole big picture interfere with each other. It reminds a noodle-style programming existed before structured programming came in IT when programmers were creeping over the program trying to catch how the program works. The efforts developers make to work with such ETL program are much bigger than if it would be presented compactly without numerous jumps back and forth from one window to another one.
 
To eliminate this difficulties, Informatica added some extra options (for example, tracing Link Path Forward and Backward throw fields in a mapping; gathering all Connections on one plate in separate item of a session Mapping tab; preparing Compare… and Dependencies… observing reports for Workflows). The list of ways to eliminate the 3-dimensional programming issues can be prolonged. These ways can help a lot, but they are not organically embedded into the product and do not give whole complete picture of particular ETL program.
We can see the similar 3-dimensional programming issues in other ETL tools, and not only ETL, but also in another types of tools and homemade applications. It’s especially notable in the Enterprise Architecture where data processing running in heterogeneous environment.


How to rid of multidimensional programming? 

To get rid of multidimensional program, it’s necessary to transform it to one or at least to 2-dimensional view. For this purpose, first of all, it’s necessary to calculate metrics how much the particular program can be observed, other words, how much it's a “flat”.
 
To consider the level of flatness, we will name “plate” all types of descriptions showed on one surface which we can observe visually (for example, the program text presented on computer monitor or on the sheet of paper). The smallest object on the plate will be named as an “item” (for example, radio button, and check-box).
 
The biggest flatness will be when all items will on one plate (all items are visible), the lowest flatness will be when each item is on a separate plate (it’s visible only one item at a time).
 
Based on this terminology, we offer the using such symbols to characterize metrics of how easy the program can be verified:
 
            P – number of plates on which the program locates;
 
            i – number of items on a plate of the program;
 
            I – total number of items on all plates of the program;
 
            L – number of the program levels in deep;
            V – level of the program visibility;
            D – dispersion of items on the program plates.

 
Using these symbols, we can calculate the indexes of flatness and visibility:
 
Flatness of a program (of all plates together):

 
            F = 1 / P,

 
which shows that program is flat when P = 1 (whole program is on 1 plate) and, as result, F=1.
 
Average flatness of one plate:

 
f = i / I,

 
which shows that program is flat when i = I (all items are on 1 plate) and, as result, f = 1.

 
Level of program visibility / readability:

 
            V = 1 / L,

 
which shows that program is flat if L = 1 (only 1 level of program – only 1 plate where all items allocated).

 
Using these indexes, it is possible not only measure the visibility of particular program, but also compare visibility of different programs. It means that it’s possible to compare the tools and applications to find which of them allow building more flat (that means more visible) programs, and such way find which tool will allow building easier verifiable programs. The last finding means that the using listed indexes it’s possible to evaluate which tool allows build programs which will be more efficient for testing, production support and enhancement. 

 
Measure of dispersion can be calculated as:

 
            D = P*L,

 
that means: the more plates and more levels a program has, the more its code is dispersed and the more difficult to read and verify the program code. Obvious that the most efficient program has D = 1 (one plate and one level of program). 

 
Example of calculation:     
Informatica flatting index for session description (Edit Task) has 6 tabs (P=6) and up to 3        level s (L=3) on some tabs.  
 Hence, it’s flatting index F= 1/6; level of visibility V= 1/3; dispersion D=6x3=18.
 
Definitely, some other metrics can be used, but the most important is that we can evaluate in numbers how some tool is convenient for using.
 
Based on these metrics, we can compare different tools and see which of them lets to save more time on programs development and especially, on enhancement.  


Summary 

Multidimensional programming slows down the productivity of programmers during developments, testing, and especially, on software support and its enhancement. Despite that, the using of multidimensional programming grows. It happens because producers of the software not always pay attention on readability the programs and ability of their tool create the most readable programs. Users of such software don’t require a better level of readability the software first of all because of lack of traditions to require readable software or a tool which allows building such software.
 
One of the ways to eliminate the multidimensional programming is measure the level of readability of software. The measurement as a set of some metrics will allow compare the level of readability between different software to see how each of them is efficient for programming. Paying attention at the readability of software and on ability of tool to create readable software will help to increase productivity of programming, and especially, the supporting and enhancement of software. It will help to reduce the expenses on IT that is important for IT users.