SQL by Design: Supertypes and Subtypes
Accounting for all the people
April 30, 1999
We're all people, whether we're employees, supervisors, or customers. Why do so many databases store different categories of people in separate tables? Databases segregate customers, employees, and vendors into separate tables, even though each of these groups has numerous common attributes. Everyone has a first and last name, an address and phone number—or more likely, several—and some unique identifier to make data management and people administration easier.
In a database that suffers from people redundancy, when a vendor becomes a customer, you must re-enter the vendor's name, address, phone, etc., in the customer table. Any change in the person's data means changing both the vendor and customer tables. Or, when a customer becomes an employee, you must transfer data from the customer's table to the employee's table. This transfer operation usually requires you to insert the data into the employee table and remove the data from the customer table. And because the table layouts are rarely identical, you hope that you don't lose anything in the process.
Commonly, an employee is also a customer (who receives employee-purchase discounts). So, if you keep customers in one table and employees in a second table, do you store a person's employee discount in the employee table or the customer table? If you store the employee's discount in the customer table, does that mean you store blank or null space for all nonemployee customers? Database designers are asking such tough questions every day.
I can only speculate why we have redundancies that lead to such potential data management problems, but tradition plays a large part in determining database schemas. The few attributes that might differ from one category of person to another may play a role in retaining such database schemas. Fortunately, a solution exists—in the form of supertypes and subtypes.
The Scenario
Let's examine a university database that supports critical university functions. You have to track many categories of people—students, faculty, academic division directors, administrative and technical staff, and marketing prospects. The diversity of these categories is a database design challenge.
Figure 1 is an entity model of the people portion of this university database. The database designer populated the model with entities, attributes, and relationships. An entity is something about which you want to store data, such as a person (tblPerson). An attribute is a property that describes an entity (SurName and GivenName of tblPerson). A relationship refers to how one entity associates itself with other entities in the model.
In this model, you can see the relationships among various categories of people. Everyone is a person (in this case, a supertype entity). Then, an individual can be any of the following special types of person: faculty, director, staff, student, or prospect (each of which is a subtype entity). A person can be a faculty member and a student (e.g., someone in ongoing education for an advanced degree). A staff member also can be a student (e.g., in a student work-study program). A person can be a prospect (someone who's thinking about taking classes) and a staff member. Or, a person can be an academic program or department director, faculty member, and student of advanced studies.
The other two entities in Figure 1, PersonAddress and PersonComm, contain each person's address information and communications information, respectively. Database managers can record as many addresses in PersonAddress as needed. A person can have as many phone numbers and email addresses or Web sites as needed in PersonComm.
Data Manipulation
This supertype-subtype schema makes data manipulation and management easy and straightforward. When you create the physical model in SQL Server or any other relational database management system (RDBMS), each entity becomes a table. Each entity's attributes become columns in the associated table. You'll relate the tables to one another as follows:
For each one-to-one (1:1) relationship (for example, person to student), the two tables have the same primary key (pkey)—PersonID.
For each one-to-many (1:M) relationship (for example, person to PersonAddress), the pkey of the master table (person) becomes a foreign key column in the detail table (PersonAddress).
Because you store all people in one table, generating mailing labels for Christmas cards or the summer picnic is a simple task. A join of tblPerson and tblPersonAddress produces mailing labels or data for a mail merge letter for all people associated with the university, as you see in Listing 1.
Conversely, when you need mailing labels for students only, the same join statement, extended with a subquery that references only people who are students, generates labels only for students. Listing 2 shows this version of the join statement. You can generate labels or lists for any functional group by simply substituting one subtype table name (such as tblFaculty or tblStaff) for tblStudent.
If you use a surrogate pkey in the person table (tblPerson.PersonID), you want to diminish the possibility of creating duplicate entries in the table for one individual. You can create a unique index on SurName + GivenName. Because people sign their names differently (Jim Smith versus James Smith), you can't totally avoid duplicate entries, but this indexing scheme is the first step toward guarding data integrity. Listing 3 shows how to create an index on first plus last name.
You can write code that filters staff, directors, faculty, students, or prospects, or a combination of any of the preceding. You can save these filtered results as views to help your nonprogramming colleagues easily query the database. Instead of giving them table names and directions about writing joins and subqueries, simply give people the view name and instructions on how to write a simple SELECT query using the view name, as Listing 4 demonstrates.
This style of table architecture has a few drawbacks. Data entry that uses raw tables (as in an Access datasheet connection to SQL Server) or SQL INSERT statements (that you do by using the ISQL/w window) won't be easy. Recording data about an individual requires inserting or updating numerous tables because data that is specific to each person subtype is stored in a separate table. Someone must create multitable forms that let people enter data in an intuitive fashion (i.e., in a way that masks the fact that information about individuals is stored in not one, but several, tables). Screen 1 shows a simple Access data entry form that does just what we've been talking about. The form is based on tblPerson, and its main part displays the tblPerson fields. You can access each subtype table by clicking on the associated command button (or pressing the underlined hot key, 3 through 7, on the keyboard). You can access the associated address (tblPersonAddress) and communications data (tblPersonComm) via hot keys 1 and 2, respectively. At the top of the form, you see a set of option buttons, each attached to a view (or, in this case, an Access query) that filters and presents a specific group of people, such as faculty only, directors only, or all people.
Everyone using the database must honor some use rules to make this kind of architecture work best and provide correct query results. One obvious use rule: Specify an address type of mailing for each individual in the database. If a person doesn't have an address type of mailing, the application can't generate mailing labels for that person, nor will that person show up on any other report that specifies the same mailing address type condition. If the query specification lacks mailing, you'll generate too many mailing labels, because many individuals have more than one address (home and business) in the database. You can generate a mailing label for each address in the PersonAddress table. So, the database user community must agree on certain use rules and adhere to them.
The Entity Structure
An entity model graphically displays a database's informational requirements, as Figure 1 shows. Many people also use an entity model to graphically display the database schema, the architecture of a database—the tables and the columns in a table—because the software that people use to create the model displays this information so clearly. Terms that describe an entity model (e.g., entity, attribute) aren't the same as those describing the physical model (e.g., table, column, field). Listing 1 translates the terms from the entity model to the physical model. For the rest of this discussion, we'll use entity model terms. Remember, an entity is analogous to a table, and an attribute is analogous to a column. An instance of an entity would be like a single row in a table.
The model in Figure 1 contains eight entities. The focus of this model is tblPerson, the supertype entity. The tblPerson attributes are those that most people have in common: a first and last name (GivenName and SurName, respectively); a prefix (Mr., Mrs., Ms., Dr.) and a suffix (PhD, JD, Esq.); a health-care code; a salutation for letter-writing; a notes attribute (PerNotes) for commentary; and a person identifier (PersonID).
Five subtype tblPerson entities further define and describe the categories of people. TblDirector contains attributes that describe academic directors, the organizational units (DeptCode) they manage, their office locations (CampusLoc plus BuildingLoc plus Room-Loc), when they became directors (TermStart), and whether they're actively managing their organizational units (DirActive). TblFaculty contains attributes that describe faculty members and their job classifications (FacStatus). TblStaff contains attributes of administrative and technical staffers; TblStudent contains attributes of students; TblProspect contains attributes of those who might be interested in becoming a university student.
The model contains two other entities, tblPersonAddress and tblPersonComm. TblPersonAddress contains address attributes—what kind of address (mailing, home, business, on-campus, etc.), street, city, state, postal code, and an attention line. TblPersonComm contains attributes about how to contact a person by phone or fax, email, or Web page.
The Relationships
Relationships between entities need to echo the business rules a company operates by. The relationships between tblPerson, the supertype entity, and the five subtypes are nonexclusive, which means a person can be one or many of the subtypes at the same time. A person can be a staff member, student, and faculty member all at the same time. Many students are prospects. A graduate student might be a faculty member and director of an academic department or other organizational unit.
The relationships between tblPerson and the five subtypes aren't all the same because some relationships are 1:1; others are 1:M. For example, as Figure 1 shows, tblPerson to tblStaff is 1:1 (as are tblPerson to tblFaculty and tblPerson to tblStudent). A person can be a staff member, and each instance of tblStaff refers to a single instance of tblPerson.
This supertype-subtype concept, implemented as a set of database tables, solves the nagging problem of how to store data about a multidimensional person. If you use this concept, you can accommodate almost every people situation with ease and flexibility.
About the Author
You May Also Like