The Entity Family model

Contact Martin Modell   Table of Contents

The entity family is the most general, yet still meaningful, classification, categorization or group possible within the context of the firm. This broad classification of entities is composed of many smaller more specific, yet similar, groups of entities. A model of an entity family would look somewhat like a hierarchy, in that it would consist of groups dependent upon groups.

An entity family model should be generated reflecting the names of the various entity groups within the family and the various, and sometimes complex manner in which they are related to each other. That is, the model should depict the derivation of each group. In many respects this is a key-only or identifier-only model, which graphically represents the various characteristic strings which describe and identify the groups within the family (figure 15-1), since the names of the groups are derived from the values of the characteristics in the identifier string.

The entity family model as a hierarchy

The entity family model is usually illustrated as a hierarchy showing the familial or decomposition relationship between the larger group (the family) and the smaller groups which comprise it.

This hierarchic representation of an entity family is correct in that classification represents a general to specific characteristic tree structure. In such a representation, the entity family hierarchy is usually created by adding one characteristic at each level such that each new characteristic forms the basis for the new groups at that level.

The sum of the characteristics of any given group at any given level must include all the characteristics used to form the more general groups in a direct line to the family root or base. The lineage from ancestor to descendent is call the chain (of characteristics). Since each successively smaller group is formed from the members of the larger group which precedes it in the hierarchy, all members must possess the characteristic which formed the parent group as well as those additional characteristics which were used to create its group and those of its siblings. The children of the parent are said to have inherited the characteristic of the parent.

The development of the entity family models seeks to identify the relationship between the broad real world entity groups (or families) that populate the internal and external environment of the business, and the narrower groups derived from them.

The entity family as a Polyhierarchy

If the members of a given family or the groups within it were each identified by a single value characteristic the family model could be represented by a simple hierarchy. In other words, each each group in the hierarchy chart would be the child of a group above it (its parent), and the parent of any number of groups below it. No child would have more than one parent, and its parent would have to be within the same family branch.

The data model is much more complex than that, a complexity that is masked if the concentration is on the characteristic as a data element, rather than in the characteristic as a value list. A characteristic is a multi-value element, each value of which identifies a group. Each value dependent group is further decomposed into smaller groups by the values of other dependent groups. For a true entity family representation each first level characteristic should be modeled separately. Under this view the entire population of a family would be decomposed using a single characteristics. Within that model each value of that characteristic should be isolated and treated as a group and head of chain. Each qualifier (characteristic) value is modeled as a separate group derived from the main value group.

There would be as many of these charts as there were first level characteristics. Because each chart maps the entity family population differently, and because each characteristic applies to the entire family, these models when used together depict the concurrent and overlapping nature of the entity groups within a family. Figure 15-2 illustrates a very simple polyhierarchy which is created by the employee characteristics used in our illustrations.

Much of this graphic representation, would appear to be hierarchic in nature. When placed one over another, it forms a network of interlocking, overlapping groups. This complex network-like hierarchy is called a polyhierarchy and may used to represent any type of complex family relationship pattern. Most entity family models reflect this polyhierarchic structure, as do the relationships between the families themselves.

Entity family model notation

The top of each separate hierarchy is called the Base. The label of a base entity is the label of the entity family and is the only label that can be applied to any member of the family without exception. A base entity cannot be a kind-of another entity. There is only one base entity per family.

More specific, characteristic value dependent, groups within less specific groups (group decomposition) formed by the use of additional sets of characteristic values form a chain of characteristic values from the most general to the most specific (top-to-bottom). In this case each more specific group is said to represent a kind-of or more specifically qualified group of the less specific group from which it was formed. These groups are sometimes called entity types and are shown as being related to the base entity using a relationship named Is-a. While this works for first level characteristics, it becomes confusing and cumbersome when an entire chain must be represented, (B is-a A, C is-a B, etc.) and becomes more so when all the values of a characteristic are shown, much less all values from all characteristics. This method of representation cannot show the polyhierarchic nature of the entity family model.

In some cases entity groups from different chains at a given level of specificity from the base may be used to form a group at the same level of specificity. That is some of the value derived groups within a characteristic model may be grouped across chains for a reason. This composite group is named a grouping. For instance, if the characteristic is country of origin (USA, Canada, England, France, Spain, India, republic of China, Egypt, ...) and for some purposes we group them as origin (US and foreign), or region (North America, Europe, Asia, ...) origin and region are groupings of the value dependent groups of the characteristic country of origin and not necessarily new characteristic.

These terms can be used to describe the various component groups of an entity family structure when documenting them in a dictionary (figure 15-3). The components of an entity family model (as illustrated in figure 15-4) are:



Base a composite description that incorporates every characteristic and attribute every different group and member of the entity family and every relationship enjoyed by any member of the entity family. The label of a Base entity is the only label that can be applied to any member of the family without exception.

A Base entity cannot be a Kind-of another entity. There is only one Base entity per family.

Kind-of is a distinct entity group within an entity family or within another entity group formed by the use of a characteristic value in addition to the values used to form the parent group. Each kind-of group in a chain is always more specific in membership than the group from which it was formed. A group can not be a kind-of a group from a different value chain.

>Alias is another label for an entity group.

Grouping is a label for a collection of entity groups at a given level of specificity which otherwise have no explicit common parent, but have a common (but not explicitly used) characteristic that yields the grouping name.

Usage is a view of one or more entity groups within the family formed through the use of a set of relationships and characteristics selected from those groups. (This is not a classification term, but it is used extensively in data modeling. This is sometimes called a user (usage) view.)

Entity classification issues

The term entity represents an unclassified fact of being or thing. Everything that exists in reality, or in the perception of reality is an entity. This group being to large and too general to do anything meaningful with, it is usual to break that too general group (phylum in the taxonomic charts) down into four slightly less general groups (classes in the taxonomic charts). In effect we have selected a characteristic of entity - kind of entity (people, places, things and concepts).

Most firms rarely deal with entities at this abstract level, rather they look at smaller more meaningful groups which are more relevant (of more interest) to the business.

The next level of grouping (orders on the taxonomic charts) then, is usually based upon the business reason for the firm's interest in each group. This determination is made using the rules of the firm to determine the relevant group names.

The "business reasons for interest" usually correspond to the various roles played by or uses of the members of the groups. This reflects the characteristics which complete the sentence "the firm is interested in the entity because it ...." The group names correspond to the labels of those uses or the labels of those roles. However, since each group is also part of a larger group they must obviously have certain characteristics in common. The fact that the firm is not interested in, or is not using the larger group name, is no indication of the firm's interest in the characteristics (and other attributes) inherited from the larger group. In many respects, these characteristics (and other attributes figure 15-5) from the larger group are a critical part of the entity description.

Dependent upon how the entities are grouped within the model, and the needs of the firm for data these common characteristics and attributes may be replicated with each specific group, or the model may maintain the common characteristics in the general group, and maintain role and usage specific characteristics in the role and usage groups. The first issue then is related to classification based upon role or usage.

Entity-based data models by definition are restricted to entities that are of interest to the firm, and still further limited to those entities about whom the firm must collect data and maintain records (families on the taxonomic charts). Because no grouping above the level of family interests the firm or is necessary to the firm, most models ignore them, although data modelers must remember that they are there.

It is easier to view these smaller groups of the larger population by group name than it is by naming the individuals which comprise the group or by naming the tests which are made against the group characteristics. Group names are used because, although each member of the group is different and uniquely identifiable, the group's members are similarly described, act the same way, or are processes in the same manner. The group name is in almost all cases identical to the specific characteristic values used to determine the members of the group.

In the data model, as the number characteristics in the entity family definition decreases, the number and kinds of entities which can be included under that definition increases. As the list of characteristic in the definition becomes more extensive, the number of entities which can qualify for group membership becomes less extensive. The broadest group of entities used within any entity model, regardless of level is the entity family. However, the entity family and all similar groupings of entities, are still a convenienceof the data model and as such their definition and population content are highly subjective.

The entities which are collected under the umbrella of the family possess many individual characteristics. These characteristics are extensional, and while the sum of the characteristics of a family determine the family makeup, individual characteristics may be shared across families. That is, the same characteristic may be included in the definition of multiple entity families (i.e. written agreement). This characteristic sharing results from the families themselves being derived from larger, more general groups, groups which were not of interest to the firm, or which the business rules, or requirements of the firm dictated be separated out into smaller more specific groups. This implies that there is a polyhierarchic relationship between families as well. This separation into families by the characteristic business reason for interest (figure 15-6) is dependent upon adherence to the business rules of the firm which in turn dictates:

  1. entity family roles of interest within the firm
  2. entity family relationships of interest to the firm or to other families.
  3. identifier differences of interest to the firm
  4. data requirements differences of interest to the firm

Because these separations and distinctions are subjective, their treatment is open to interpretation, and debate. Because there are a variety of ways to depict this information within the data model additional areas of judgement and preference are introduced, along with additional areas of debate. In many cases these discussions take on the qualities of religious doctrine with each participant defending their own perspective.

Entity Families determined by Roles

Entity families are almost always developed from the role which the members of the family play in the organization. These roles reflect the reason why the firm is interested in the members of that particular group. At the family level, roles are distinct, and an entity occurrence is assigned a role as it is recognized. That role is usually permanent, as are the records that are collected around the role.

These records are maintained and retained until the firm is no longer interested in that entity occurrence, in that role.

To illustrate:

An entity occurrence within the business world may have many different characteristics. One characteristics or combination of characteristics of an entity can be used to place it in a particular group, and another characteristic or combination of characteristics however, can be used to place it in another group. A given characteristic may be used in combination with others to define many families.

Thus, dependent upon the characteristics selected, and the combinations of characteristics used, a population of entities occurrences may be concurrently grouped into multiple different families by what they do, by the purpose they serve, how they are used, sometimes by what they look like, and sometimes by what they are.

For example:

The firm's interest in different groups of people is usually determined by business reasons that dictate a need to collect specific kinds of data about a specific group based upon the role they play. This could be determined by testing the characteristic role played (employee, customer, stockholder). Any given person however, can be both an employee, a customer and a stockholder, that is the roles overlap to the extent that employees are customers, or customers are also employees.

Role characteristics can be both extensional and intensional. That is roles can separate entities into families or they can separate groups within a family. Roles can also be independent or dependent.

Independent roles are those where the family mutually exclusivity is dictated by business rules but family membership is not. That is, there is no requirement that an occurrence have one role in order to have any other.

To illustrate:

All independent roles are extensional by definition since they segregate entity occurrences permanently and independently. Role independence does not preclude role overlap (nor by extension dual-family membership) such that an individual can play more than one role.

Dependent roles are those where an entity occurrence role assignment is dependent upon that entity occurrence already playing another role. To illustrate: An employee may have many functional roles (salesperson, engineer, clerk) and many supervisory roles (manager, supervisor, executive) but to play either role an entity occurrence must first qualify as an employee.

The role characteristics must be examined with care to ensure that they are mutually exclusive and exhaustive. If the list of roles is not exhaustive, processing conditioned upon role value may be affected, as might data group identification and data content. If the list of roles is not mutually exclusive, then role overlap can occur.

Overlapping Entity Family membership

Entity family role characteristics are not as clearly definable as entity group characteristics, which leads to fuzzy, and sometimes overlapping entity family definitions. Since entity family roles are extensional, and since all members of the family must possess the sameextensional characteristics, the list of family selection characteristics (extensional) are usually confined to the narrative definition of the entity family. Entity family extensional characteristics must always be single valued, thus role played (employee, customer, stockholder) would not be valid as an entity family extensional characteristic since it is multi-valued.

Multivalued extensional characteristics lead to fuzzy definitions. This fuzziness, and the overlap it implies, can lead to interpretational differences when creating the data model for the family. Figure 15-7 illustrates the overlapping roles within the entity family people.

To illustrate:

There are at least two possible interpretations of this statement.

  1. the family consisted of some some number of entity occurrences from each of the three roles and any given member could be any one of the three but not more than one (following the mutually exclusive rule). This would be like saying the family consisted of three entity groups - employees, customers, stockholders. Inthis case an intensional characteristic has been used as an extensional characteristic.
  2. the family consisted of occurrences which were employees, customers and stockholders, concurrently. In this case the implication is that there are families of employees, families of stockholders, and families of customers, and other families which embody the various combinations of the three roles.

In either case the statement is is fuzzy which is to say it is ambiguous.

To illustrate further:

If the rule is followed that all extensional characteristics must be single-valued then mutual exclusivity is enforced, and overlap cannot occur. Data about each role is collected, and maintained independently (but redundantly where the same entity occurrence plays more than one role).

The following summarized the recommended method for modeling role based entity families:

  1. if the roles are mutually exclusive, i.e. if the entities can play one role in the organization or another of the roles, but not both, define separate entity families.
    In this case, there will probably be few attributes in common between roles. There should be no duplication of individual members in another family.
  2. if the roles are distinct, but not mutually exclusive, that is, if there are some entities who play each role exclusively, but there are some which can play both, the roles should be treated as separate families, as if they were mutually exclusive and the data about dual-family members should be stored redundantly.
  3. If the role of an entity occurrence in a family must be explicit and testable, the role characteristic must be used intensionally. If the role of an entity occurrence in a family can be implicit and need not be testable, the role characteristic may be used extensionally.

Because we use the business determinants - statements of mission, charter, goal, objective, strategy, tactic, etc. and analysis of user interviews and user documentation to identify and label the business entities of the firm and their roles, in practice it is often difficult to determine whether a characteristic value set represents entity families or a entity groups within a family. For the same reason, it is difficult to determine with any clarity the characteristics of each family. If we scan all of the documentation collected and generated from the analysis phase, we would probably have a myriad of entity characteristics, and thus many entity groups, but probably only a few entity families. Without definitive rules, we have no consistent way to tell which is which.

If all the rules for characteristic are followed, much of this difficulty should be eliminated and consistency is increased.

Entity Families and Relationships

An entity family consists of all individual and groups of member entities which behave the same way in the organization. All entities which have the same role in the organization or which relate to the environment in the same way should belong to the same family.

Entities are grouped into families based upon extensional characteristics. While characteristics determine entity families, characteristics can also indicate relationships. In some casesseparate entity families which normally be created based upon the extensional characteristics used must be combined and the required family groupings must be accomplished through relationships. This occurs when business rules or other characteristics of the entity population prevent normal family grouping and segregation techniques from being applied.

Some of the most common conditions when alternative methods must be used are:

  1. the business rules of the firm dictate that certain role based entities be aggregated into a single family or group regardless of other factors.
  2. When there is a high degree of data commonality which put pressure on aggregation into a single group
  3. when a high number of dual-family membership is anticipated

Relationships as Characteristics

In many models, the designer must deal with the condition where a given occurrence play multiple roles simultaneously. Since characteristic values (the role indicators) must be mutually exclusive and exhaustive, normal grouping cannot be used. Instead a way must be found to handle roles which is both consistent with the general model and in conformance to the rules. One of the most common methods of treating this problem is by transferring the characteristic to relationships. Figure 15-8 illustrates the use of multiplerelationships to replace a non-mutually exclusive characteristic value list.

To illustrate:

Without these business rule and business reason constraints the population of people would have been represented as separate entity families each named for a single value of the characteristic.

However because of the constraints the population has been aggregated into the same family each of whose members may be related to the educational institution in a variety of ways (relationship names - on the faculty of, student of, alumnus of, trustee of, contributor to)

The same solution holds true for the person who may be a depositor of, a lender to, a borrower from, a mortgagee of, etc., a bank.

Each of the roles that would normally be represented by an entity family model, is instead represented by series of relationships to some other entity. These relationships may be direct or thorough some intermediary entity, such as an account entity.

When relationships are used instead of family groups, the single family group is defined ignoring the role characteristic as a family grouping indicator, and instead the characteristic is defined in terms of a set of relationships to some other family. In this representation the characteristic and the many of the data groupings which it represents, is transferred to the relationship. The model's narrative descriptions must explain as fully as possible:

  1. each of these characteristic relationships
  2. the reasons (business rules and reasons) why relationships are being used
  3. the conditions under which each relationship exists
  4. how to identify the entity occurrences within each entity family which participates in each relationship.

Since the majority of people-entity processing activity within an organization can be expressed in terms of the role relationships between entity family and entity family, or between entity family and company (also an entity family, by the way), there are usually a large number of inter-family relationships in the data model and that relationships between entity families will be multiple, conditional, and complex.

Contact Martin Modell   Table of Contents

Data Analysis, Data Modeling and Classification
Written by Martin E. Modell
Copyright © 2007 Martin E. Modell
All rights reserved. Printed in the United States of America. Except as permitted under United States Copyright Act of 1976, no part of this publication may be reproduced or distributed in any form or by any means, or stored in a data base or retrieval system, without the prior written permission of the author.