Experience of Designing a Relational Materials Database
REFERENCE. Bamkin, R.J. and MacRae, S.C.F., "Experience of Designing a Relational Materials Database." Computerization and Use of materials Property Data: third Volume, ASTM STP 1140, Thomas L. Barry and Keith W. Reynard, Eds., American Society for Testing and Materials, Philadelphia, 1992.
ABSTRACT: This paper describes the design of a database for Engineering and Manufacturing Materials data in a tabular (relational) format. The freedom permitted by a relational (rather than hierarchical) data model allows better use of the computer’s ability to handle n-dimensional and complex networks of information.
The following features are described:
It will be noted that these features are based on the study of the structure of the materials information and are believed to be generic solutions to the modeling of Materials Property Data.
KEY WORDS materials data, data banks, relational databases, normalised, data quality, information integration, design data, material specifications, CASE, document publishing, inheritance, object orientated, Information Engineering. IEF, CALS, STEP, independent variables
Introduction *
The Methodology
*System Launch
*Planning
*The Analysis
*The High Level Design
*Identification of the Procedures to maintain and view data
*The Production of a Preliminary Database Design
*Design, Construction and Implementation
*Final Comments on IEF
*Data Usage
*The Relational Approach
*Property Storage
*The Structure used for Properties
*Just One Mechanical Property?
*Property Examples
*The Result
*Defaults
*Inheritance
*Magnetic Aluminium or Dealing with Exceptions
*Duplicate Parents
*Inheritance - Summary
*Data Quality
*Meta-Data
*Document Publication
*The CAI.S Strategy
*Documents from Databases
*Information Integration
*Multi-Mainframe
*Transfer to Other Computers
*Summary of Conclusions
*Acknowledgements
*References
*A database has been developed for the use of a large engineering company by Rolls-Royce plc during the last two years. After a detailed analysis of both the business activities (What is done) and the data structure, a design has been developed which incorporates a number of novel features.
It was important to support a distributed environment, including two mainframes, and the ability to share data with external collaborators and workstation technology. The development was based on the relational database software DB2 with implementation in SQL, Fortran (technical mainframe), ORACLE (distributed processing) and a fifth generation programming tool IEF (generates COBOL code on a commercial mainframe) (See "The Methodology").
The previous mainframe database was originally conceived twenty years ago and its skeleton had remained substantially unchanged since then, although it had been modified over the years as the technology and requirements increased. This database structure is now inappropriate for modern materials, but it is relied on by a substantial number of strategic programs and it was therefore important:
This latter constraint was particularly difficult, as the structure had to anticipate the development of faster computers with larger memory capacity, and be compatible with existing technology. The design needed to ensure that no artificial limits were put on the data that could be entered (e.g. "infinite" number of materials and properties). The existence of the International Standard Organisation (ISO) standard for a database Structured Query Language (alias SQL pronounced Sequel) was particularly valuable.
The design is based on the tabular (relational) model, with some substantial re-modeling of the traditional methods of handling materials data. This database can then be said to be highly normalised (structured for the database, as against structured as it will be used.) Although it was not the intention, it is now apparent that COMMIT contains many features of an object oriented database design, Dittrich (Ditt89) describes three definable levels of object orientation i.e. structural, operational and behavioral. This design is certainly structural and can make some claims to higher levels.
Considerable effort was required to supply a number of essential support mechanisms for this database. However these problems were not specific to materials data, although in many cases they needed to be more rigourous. Of these units of measure, null values, issue control and data security require special mention, but the scope of this paper has had to exclude them.
Rolls-Royce plc are using the concept of Information Engineering to develop their Corporate Materials database (COMMIT). The methodology used has been that of the Information Engineering Methodology. The toolset which has been used to capture the data generated by the methodology is known as the Information Engineering Facility (IEF).
The users of COMMIT are varied: metallurgists, finite element modelers, chemists, design engineers, manufacturing engineers, health and safety officers -in fact all the disciplines required in an engineering and manufacturing organisation. The requirements the users have for data have been gathered via many interviews across the whole of the Company.
The IEF system allows the system analyst to record the users database requirements in a structured and formal way. These requirements are obtained from information volunteered by the users when considering prospective transaction and physical screen designs. The methodology can simply be broken down into four key areas:
Before a project can be funded it is necessary to produce a supporting case that addresses the financial, strategic and technical advantages of what is proposed. The early work used structured systems approaches developed by Yourdon (Your84) and B.I.S.(BIS84) to place a materials database in the context of the company’s business and its associated information flow (4.). The technical problems were novel in that a multi-mainframe database was required. The financial advantages were overwhelming, but it was the formation of a project for all the systems in the company which showed the strategic advantage of laying down an investment which could be exploited by later systems. This provided the final impetus to launching COMMIT.
Planning is the first step which is undertaken by the system planners in Rolls-Royce, who together with the users, define the strategic objectives and goals of future systems. The deliverable from this activity is the scope of a future system: this is what is given to the system analysts who, together with the customer, build a system - a Materials Database in this case.
The construction of COMMIT was identified by the planners as a strategic database which would be required by all areas of the company. This need for a strategic database was identified as the "Area of Analysis" within each customer project (Manufacturing and Engineering).
The IEF (5.) is structured in such a way that it demands the production of a model, known as a data model, which shows how one part of the data of the future Materials Database relates to another. This was simply performed by the users and the developers of the database who created objects known as ‘entities’, ‘attributes’, and ‘relationships’.
An entity, with reference to the Materials Database, was defined as any concept about which the users wish to store materials data. Examples of entities used within the Materials Database are:
An attribute was defined as data that describes a known entity, of interest to the ‘Materials Database’. Examples are:
A relationship, with reference to the Materials Database, can only exist between entities (where that entity may be either in the materials database or an external system). Examples of a relationship are: ‘a Material "has-a" form’ and ‘a Material "has-a" property’, where "has-a" is the relationship.
The concept of the relationship can be understood more clearly by referencing the following statements which were obtained from interviews with "users
"There exist many materials within the Company, against which the business wishes to store one or more properties."
"Properties are clearly defined and can be used by more than one material"
The above statements indicate that a relationship exists between the entities ‘Property’ and ‘Material’, as shown in Figure 1.
It is worth noting that the objective of a relational database is to store the data once. Therefore each data object requires one corporate definition from which all application programs and interfacing systems will read.
When the prospective users first understood the concept of normalization, i.e. separating out information into entities, such that all attributes relate only to one entity, it was tempting to step outside the scope and model the science and not just that portion that the business requires. Moreover it is often difficult to identify when an attribute is in fact an entity and when it is unimportant.
For example, a form (e.g. bar) is often seen as an attribute of the material as can be seen by its inclusion in the information in most material specifications. However forms are usually constrained to only a few possible values (a finite list). If it is required to change all occurrences of Plate to be Sheet, say because of a new ISO standard, then it is important to have the information stored uniquely. To permit this ‘forms’ were stored as a separate entity, and each material had a relationship (pointers) to the unique text.
The analysis also required the users to specify the business processing which exists for the data, shown in the data model. The processing ascertained ‘what the business does with its data and how it creates it’. The systems analysts, via interviews with the users, developed a hierarchy of business processes. The processes at the top of the hierarchy detail a high level view of what the business does. The lowest level processes, known as the elementary processes, define explicitly what the business does with the data without defining how it carries out the processing. Any breakdown of the elementary processing will reveal ‘how the business process is achieved’ - these elementary processes are classed as procedures (transactions), pertinent to the design and construction of the screens for maintaining and viewing the database. Examples of the process hierarchy can be seen in figure 2. The interviews will also identify any dependencies which exist between the processes. This simply identifies the processes which must be performed before another is enabled.
In the example shown in Figure 2, there is a high level process called ‘Develop Material Data’, which is made up of many lower level processes, three of which are:
The users and the analysts, after compiling this simple hierarchy, needed to establish if there were any dependencies between the processes. Figure 3 on page S shows that the process ‘Define Property for a Material’ can only be performed after the processes ‘Define Material’ and ‘Define Property’ have been performed. This makes sense, because it is not logical to make a relationship from a property to a material if either the property and the material do not exist.
The next stage of the analysis was to provide specifications to define the ‘high level processing logic’ to support the data processing. These specifications detailed which entities were read, created, updated and deleted; which attributes were updated and also which relationships were created and deleted within each elementary process.
For the Materials Database this high level processing logic has been defined for processes which are to run either in a mainframe (DB2) or a Distributed -ORACLE (UNIX operating system) environment.
These specifications were an excellent means of identifying and recording all the business rules which needed to be built into the processing of the system. This involved collating the rules identified in the business data and process models for each elementary process.
The High Level Design Phase (6.) was where the users and the analysts identified what the future system was going to do in terms of transactions. It was an interim step which planned out what needed to be designed and constructed in the ‘design and construction’ phase of the project - this provided accurate estimates as to which transactions needed to be developed and how they were going to be constructed.
Identification of the Procedures to maintain and view data
This phase identified all the procedures that were required implement the business processes detailed within the analysis phase. For example, the business elementary process ‘Define Material’ has an on-line transaction to maintain the data for it. This transaction will add and delete new materials as well s change the data which exists for each one, e.g. change the NAME attribute. T s phase visited each elementary process and performed a mapping to one or many procedures which were to implement it. In the above example there was a simple mapping of one process to one transaction which has the capability of adding, deleting, updating or viewing the data (N.B. you need to view the data before a change or delete can be performed). The environment on which the procedure is to run was identified, i.e. mainframe or distributed in the case of the Materials Database. The reasons for choosing distributed or mainframe will not be discussed here as the argument is specific to a particular company.
This phase also identified a high level view of what the screens, supporting the transactions, would look like. The screens identified the attributes and entities which will be processed; defined what switching is required between one screen and another - thus allowing the user to follow a natural train of thought.
To illustrate the final system a prototype can be created. This mechanically allows the user to switch from one screen to another and actually view the layout of each screen. In a prototype the actual processing behind the screen which permits the maintenance and viewing of data will not be provided. The advantage of this approach is that it gives the users and the developers an indication of how much can be performed on one screen - maximum benefit needs to be achieved from each screen without over-complicating it (and reducing response time), thus making it ‘user-friendly’.
The Production of a Preliminary Database Design
The Materials Database needed to be designed. This was a complex task and covered the following important areas:
Design, Construction and Implementation
The design phase (6.), which is currently in progress, includes the following tasks:
The IEF, together with its supporting methodology, is a major breakthrough in the construction of business computer systems. Below are some important conclusions on the use of IEF within systems development:
This database was not developed for any particular user, but as a strategic resource that could be drawn on by any conceivable application for the next ten to twenty years. It is useful to note that one large subject was left for a later phase. It was agreed that the storage of mechanical test data would create too large a problem in total and this has been intentionally left for later attention. Having completed this design it is now apparent that this is not a significant problem as many of the relationships are common to both design (generic) and test (particular) property values.
The database was not only multiuser, but also multiuse (4., 7.). Many serious database management systems will enable more than one user to access and edit the data. However a bad design can prevent the data being used in more than one type of application.
Uses of the database will include:
Each of the uses above can be thought of as filters. The database design must reflect the combined requirements of all the uses for which a database can be used. Each user, system or document requires a sub-set of the information and hence the introduction of a passive filter between the user and the complexity of the full database, which prevents the user from being overwhelmed with information. Moreover good documents and systems will retain the relationships between the information to ensure that the information is presented in context. As the database grows, it becomes increasingly important to prevent the user from being given "all the information he or she always wanted". Without specific instructions it is better to give only the information the user needs, with the reassurance that all the information is available when required.
A database is intended to be a resource that is drawn on by more than one application. If integration with CAE and CIM is to proceed then the concept of a single user interface may prove to be unhelpful. Each use has different requirements and the classical implementation of a database with a single function of searching, display and access will not be sufficient. The primary requirement of a number of the users listed above is not to see the data, but to USC it as a resource behind their own application programs.
It must be emphasised that COMMIT does not stand alone in this implementation. It has links to other databases which use this information when necessary. Principal of these is the Process database which is required to accurately define the material, by reference to its process history’.
The traditional method of presenting materials data is to use annotated tables and graphs drawn on the medium of paper (9.). This method is constrained by the medium and the user's ability to understand information. The former requirement is effectively removed with a database whereas the access, interpolation, tabulation and graphical presentation software properly handle the latter.
Throughout this paper, a database is defined as the "numbers" and textual values, although in ‘the database design’ it is synonymous with the database structure. The database structure is not normally seen by the user, rather the structure is interpreted by programmers into a particular presentation by a program, a device, an "automatic" document, or a screen.
Before SQL standardized the tabular structure of databases (or at least their appearance), it was usual to assume a hierarchical method of storage. As this is still used by a number of databases, the two types should be compared. Figure 4 shows a hierarchical example which should be familiar to users of materials data.
The relational structure and its associated methodology require the combined skills of both the materials information expert and an information technologist. The skills required to develop a strategic database should not be underestimated.
The major advantages of a relational database design are:
As all structures can be modeled, it is tempting for the analyst to model only the information that is presented without re-analysing the problem within the reduced constraints (and freedoms) of a relational design. Because of this it is still possible to sec contemporary designs that resemble hierarchies.
One disadvantage of a relational design is that the information is broken down in an anamorphic way. The information is held in a way that is conceptually difficult to comprehend. Detailed study reveals that all the information is logically there and (usually) arranged correctly, but the casual enquirer is intimidated by the "jargon" and complex diagrams. It may be that given the logical methodology it will be possible to create automated tools to do the normalisation, but this would produce some loss in efficiency. At present the cost of analysis versus the purchase of additional computing power would not justify these losses.
Although the names of properties are not internationally agreed (e.g. Young's Modulus and Modulus of elasticity), the concepts are generally agreed. However in the data analysis, necessary to develop a relational database, a novel approach suggested itself. Although the approach described below is untraditional, it has a number of advantages which we believe outweigh the cost of changing the culture.
Figure 5 shows an example of creep data which is typical of the more complex material properties. Creep strain is generally believed to be a function of Stress, Temperature and Time and there are a number of well known papers that propose empirical and theoretical solutions that combine to produce this relationship. It was therefore important to have a database that could model this four-dimensional variation and to permit any combination of the independent variables to be interpolated.
Further study, however, revealed that although only four variables are traditionally presented there are other ‘constants’ that are usually assumed. These ‘constants’ include atmosphere, pressure, surface finish and surface treatment. In most cases, values can be assumed (see "Defaults") due to either the standardization of testing or common sense. For a given application, however, these ‘assumed’ constants can vary. A component may have a local area coated or it may be subject to an intermittent variation in environment. These "constants" may therefore be variables
When making an enquiry of the database such as "What is the stress to cause 0.12% total plastic strain at 789.6 degrees Celsius in 123.4 hours in material X", then it can usually be assumed that the enquirer means in a standard environment. However the same question can include "... when coating Y has been applied and when used in a standard marine environment".
These types of questions and the information required to answer them led to a realisation that a property like creep normally has three independent variables, however other variables can also be included.
Rolls Royce’s previous database had solved this problem by subdividing a property like Creep into Marine Creep, Chrome plated Creep etc. but this has two disadvantages
The Structure used for Properties
For Creep, it was decided to use the matrix of data implied by the three independent variables in Figure 5, but to also extend those matrices to include any other variables that may be required. This freedom requires a fresh look at the names traditionally associated with the storage of materials data.
After establishing the ability to have a large number of variables, one could imagine tensile data to be an ‘instantaneous’ creep test or a ‘single cycle’ fatigue test. In some ways this is true, but it causes a lot of unhelpful debate into whether these suppositions are theoretically correct". In order to minimise this debate it was decided to combine properties only to a level where they were normally grouped in discussions. For example the "creep properties" or the "fatigue" properties are concepts frequently discussed by engineers, if these concepts were combined then the users would find the resultant property difficult to visualize. (Table I gives some hypothetical examples where properties are referred to unspecifically.)
Sargent points out (12.) that the concept of material properties is only a convenient fiction. This underlines the need to be pragmatic in deciding the structure of property storage.
Tensile design data was therefore divided into that which varies with strain (flow or proof stress) and that which is measured by failure (UTS). This division creates two large properties called here PROOF and ULTIMATE. This division initially can appear unnatural, in that the values may well be obtained from the same test, however, the use of these two properties is normally divided between design calculations (Proof) and failure calculations (Ultimate).
ULTIMATE
is a new property concept that describes "simple" failure: it always varies implicitly with Temperature, Basis (i.e. minimum, typical or maximum), and Loading (tensile, compressive, bearing). In addition to these required variables, it can also vary with environment, crystallographic direction, laminate direction, local treatment etc. As the measure of this property is stress and one variable (basis) can have a value of "tensile", it can be seen that this property includes "Ultimate, Tensile, Strength", however it now contains much more.PROOF
is a property that describes the flow stress. It varies in the same way as the property ULTIMATE, but it also varies with Strain as a required independent variable (modifier). It is possible to recreate from this matrix both curves of proof stress with temperature by defining the strain, or curves of stress versus strain by defining the temperature. As no real examples exist of where temperature or strain is constant, then more realistic modeling is permitted.CREEP
(Stress) is modeled by having no less than the three required variables of strain, time and temperature. Real examples exist however where another three are required to fully characterize the anisotropic behaviour of a single crystal alloy. Other variables such as "basis" and "loading" are again used as in ULTIMATE and PROOF. When the effects of corrosion and coatings are included then a real n-dimensional property develops. A finite clement analysis (FEA) program can, with one instruction, access a typical tensile creep value at (say) 1023 hours, 1047.2 Degrees C, 0.043% strain and at an odd angle in three dimensional space. The system will, provide a warning to the user if the QUALITY (discussed later) is too low; supply references to the audit trail; warn of an unsupported material and supply the value interpolated either from carpets of numbers or a constitutive equation.FATIGUE describes failure as a result of cyclic stress, and an example of how this is stored can be visualiscd in Figure 10. After the Material (MATRL QXX), and the property (PROPRTY FATIGUE) have been specified then the modifiers (which have been previously defined), are listed in order. Having identified the modifiers then it is necessary to define the relevant methods of interpolation. These will usually be global for a particular modifier but may be defined per material- property instance. Having specified the interpolation, then a row of values can be quoted starting with the Quality and proceeding through the other modifiers in order.
By using this approach a previous database design which included many hundreds of current property names has been reduced down to approximately fifty, although the number of variables has increased from three to about fifteen. It is not anticipated that these figures will expand significantly within the lifetime of the database design. Moreover, with each new variable, there is an increase in the value of the data. A new property name allows new data, a new variable allows new data and an opportunity for more interpolation (extra data).
This approach has the following advantages:
A stable and understandable property structure.
Within any organisation there should be an ever improving technology. Programs are developed that are based on improved models which require a better description of the material. In parallel is the development of materials understanding which enables new materials to be better characterized under a wider variety of environments (variables). Hopefully the most precise programs are used on the best characterised materials and this enables both investments to be exploited. Where this does not happen then a "naive" program needs the data downgrading, or a "clever" program cannot run using inadequate data. The authors believe that the property storage structure described here will prevent the need for extra work, when either the program or data is inadequate.
N-dimensional properties provide the freedom to define information in a precise and accurate way, using as much complexity as necessary. This freedom however implies that some data will be better characterized because of:
The solution to this problem is defaults. This is illustrated in Figure 6 which shows a hypothetical property which varies principally with moisture content (humidity) and time (of exposure maybe). The property for this material or material category (sec the following section on Inheritance) also varies with strain and temperature, but this may not be true for the same property of a functionally similar material. How does a single program use the extra data available, without failing where the data is missing or is unnecessary?
As can be seen in Figure 6, one position in the matrix is known to be special, in this case this is indicated by the value of the property being shown (i.e. 1024). This indicates that the default value for time is 100 hours; for humidity it is 0.1%; for temperature it is room temperature (20 DegC) and for strain it is 0.2%. Certain independent variables are essential (required) to make the property meaningful, but others are optional. When the user (or application program) requests data, all of the required variables modifiers must be quoted. Additional variables however can be assumed from the defined defaults. Clever" programs can assume anisotropic properties for isotropic materials and "naive" programs can assume anisotropic materials are isotropic. Suitable warning messages are required to annotate the answer obtained, but this permits programs to succeed.
This use of defaults is believed to be similar to the way a human expert would answer particular questions. The table below has some untrivial enquiries and the interpretation which an expert or a COMMIT application program could presume.
|
Enquiry |
Presumption |
|
What is the tensile strength of Alloy ‘X’? |
Tell the enquirer the room temperature, Ultimate, Tensile, Design value. Make sure its the minimum value and assume the "part" is not notched; the environment is air; the surface finish is ‘normal’ and no unusual heat treatments or coatings have been applied. |
|
What is the creep strength of aluminium alloys? |
It can be assumed that we are talking about an air environment and the temperature and life are finite, but unknown. Within a company it could be assumed what the "standard" percentage total plastic strain is. Fortunately inheritance can be used to resolve the unspecific material definition Either plot the range of data or ask further questions to ascertain, the temperature (range?), the life (range?) and whether absolute values are required, a range, or relative to another material class. |
|
Tell me the proof stress when the temperature is 4 17K; the strain is0.134%; the load is applied in a direction specified by (2.1, 0.4, 1.0) co-ordinates and Alloy ‘X’ is in a marine environment? |
This person knows what is required ... but we haven't got it! Use the defaults to either supply the‘closest’ and include warnings or if its interactive, display a list of choices to the user. Also see the"Discussion". |
|
Table I. Enquiries v. Presumptions The table shows examples of how an unspecific enquiry by a user contains implicit information. Caution needs to be used as the question can also contain implicit ignorance. However, the answer can contain sufficient detail to define the information and hopefully educate the user for a more specific enquiry, if necessary. |
|
This paper describes the design of a relational database, where all the data is stored in tables. The power of this data model has been previously described with respect to a hierarchical model. Having established a relational data model, it was found useful to introduce the concept of inheritance. It must be emphasised that a hierarchy does not exist explicitly in the data model. The impression of a hierarchy is created, for the user and the application programmer, using relational tables. This retains the advantages of the relational structure, and adds to it the higher level concept of inheritance making it more understandable.
Inheritance is a term associated with the development of knowledge based and object oriented programs. Inheritance builds on the presumption that members of a given class of objects inherit the properties of the class as a whole. For example Aluminium alloys are usually electrically and thermally conductive, non-magnetic and have a specific gravity between 2 and 3. This assumption of inheritance must bc correct as any members of any meaningful class, must have some common attribute.
With reference to Figure 4, if the only ‘property’ known about alloy X is that it is an aluminium alloy then it can be assumed that alloy X is probably non-magnetic, electrically and thermally conductive and has a specific gravity between 2 and 3.
In some ways, this design could be said to possess knowledge about the nature of materials, in that the database itself is able to decide some of the properties for alloy X above, without them being specifically entered against that material.
The advantages of including inheritance in the structure is that properties can be automatically assigned at data entry. This is useful for the following reasons:
To create inheritance within the database it was necessary to create an entity called "Material Category" which had all the relationships of a "simple material’ and also had a relationship of ownership to itself and to "simple" materials. This permits any property to be assigned either to the "simple" material or to the "material category" which owns that "simple" material. The software that obtains data from this structure then needs to look for a property of a "simple" material, and if it is not satisfied, then it continues up the material category hierarchy (see Figure 4 on page 12) until it is satisfied.
When designing these hierarchies it is important to ensure they are balanced. As far as possible each parent should have the same number of children (C). If the number of "simple" materials which can be classified with respect to a given theme is "N", then the number of levels required (L) will (on average) be covered by the following relationship.
L = log(N/C)
Using this equation it is possible to get an indication of whether a proposed hierarchy will reach a compromise between having too many children for a given parent (user sees overwhelming lift of choices) or too many levels (user will find places where he will be presented with a number of choices between two (or even one) child categories).
Magnetic Aluminium or Dealing with Exceptions
For every rule there is an exception. There are sintered magnetic materials which do have a substantial aluminium content. When these materials are required then the database needs to have a method of encoding this exception unless the inheritance of that aspect of the class of aluminiums is to be withdrawn.
This is provided for by having a pointer (for each property instance) from each material to the class to which it belongs. This pointer is created when the material is assigned to the class. At any time this relationship can be changed when new or better information becomes available on a material by material basis. Figure 7 illustrates this feature.
The flaw with inheritance is that there is no agreed classification for separating materials into classes. Initially you may divide materials by a theme of composition, USC or form. A multi-user database which covers a number of disciplines needs to use a number of different hierarchies. Any one material can belong to more than one hierarchy and can be said to have two (or more) parents.
Figure 7 illustrates this with Alloy Z which has a parent in two different classification themes of chemical composition and perhaps generic shape. As either of these parents could potentially have properties which are inheritable it is difficult to decide how to resolve this dilemma. Fortunately this has been investigated by Artificial Intelligence research but even so there is no one really convincing solution. Classification (taxonomy) is a difficult problem and different themes are required to satisfy different users within any discipline.
- SummaryThe gains in data entry efficiency by using inheritance are large i.e. having some knowledge of the structure of materials. By having an active index (thesaurus?), the synergy between the index and inheritance can be exploited. However the complexities of exceptions and dual parentage are to say the least challenging.
A previous paper (16.) explained the concept of quality as developed by RollsRoyce. RR’s concept is customer driven and acknowledges that the final user is rarely interested in the associated material data (sec meta-data below) although many materials experts would argue this is essential to the correct interpretation of the data.
We argue that the user only wants to know whether the data he has is good enough for the task proposed. The customer is likely to be:
or
There are people inside an organisation who require access to the meta- data, but these are materials experts cognizant with all the variables associated with the information. When the real users require this extra information then the audit trail which is associated with all the design data can be retrieved. Each section of design data points back at the analysis that derived it, which in turn points at the test data which was considered in that analysis. The test data has associated information, which characterise the values specifically.
It has been argued (12.) that more than one empirical index should be used to describe quality. This might include the quality of the material; the quality of the analysis etc. The flaw is that the meta-data is voluminous and any one datum could completely invalidate the associated property value. If automatic processing is to be used then a system would be required that automatically evaluated every aspect of quality and made a "value judgement. This is not possible with the technology available. Therefore, we argue, the materials expert who analysed and evaluated the design data is the only fit person to make this judgement.
Manual systems do need to be introduced to ensure that each material expert is consistent with his colleagues in the assignment of Quality. This price is worth paying to ensure that the need for urgency in the use of the data does not ignore the expertise of the people who derived the property curve. The final users will come to appreciate this empirical measure and can use this as a guide when deciding the necessary quality they require, consistent with the use they intend.
Meta-Data is a concept that has occupied the Materials Database community for many years (19.). Each data value can have over a hundred pieces of supporting evidence (sometimes called meta-data). This supporting evidence defines not only the material tested, but how it was tested, who tested it and to what standards the methods and analysis were performed. The combination of large n-dimensional properties, quality and an effective audit trail remove the need for the final user to be aware of this large volume of information.
Some proponents of meta-data have implied that all of this information should be presented to the user with the information he or she requested. Each piece of meta-data that must be given to the designer (or engineer) represents a statement
of failure by the materials discipline. Figure 8 shows a schematic of the design process with each discipline feeding its expertise to the designer who will deliver the completed design. If each discipline fails to come to a decision and gives not conclusions but the meta-data, the designer will be overpowered by the detail. Moreover if the designer works in the same style then instead of a design being delivered, the deliverable will be merely a collection of meta data from which a design could be made.
In 1988, the U.S. Under- Secretary of Defense, published a memorandum (20.) outlining the plans of the Department of Defense to oblige subcontractors to supply not only the weapon systems but also their documentation in a machine readable form. The initiative is known as CALS (Computer Aided Acquisition and Logistic Support). The plan was to move from the confused state of having to manage constantly changing documentation, by providing the automated exchange of such data. The eventual plan is to move to an Integrated Weapons System Database (IWSDB) which would be the source of all relevant information.
CALS acknowledges a number of standards which it will demand shall be used by its sub- contractors to exchange data digitally (21.). CALS divides information into four principal types
Although the USDOD is a powerful catalyst for these standards, it would be a mistake if only the defence industry takes immediate notice of the need for this initiative. It is obvious to most information industry observers that these standards will be pervasive in the whole of the engineering and manufacturing industries. The benefits of reducing document maintenance, reliable access and rapid retrieval are felt to more than pay for the development of such standards. With the USDOD’s encouragement a company has been formed called PDES Inc. which will exploit the STEP standards.
CALS is a good example of the type of initiative that every organisation should have to deal with the exploitation of the technology of EDI (Electronic Data Interchange).
This categorization places the majority of Materials information into the category of Product Data (II.). However the world will require more than just "databases" for a "number of years". The great majority of users still require paper documents which fall into the field of Text (The SGML standard (22.).
The availability of terminals and computer literacy requires that information be supplied not only at terminals, but also as printed documents. These documents must still be printed but the information must come from COMMIT, to ensure that there is one single source of data. Our requirement was that the users would produce their final documents straight from the database. The advantage of knowing that the written documents were from the same common source as ‘on-line’ applications, was a substantial product assurance benefit. With each use of the same data, the benefits are not just in reduced keyboard time, but each application audits the same information. Errors found in any use of the data, can be corrected for the benefit of all users.
Alas, CALS has not as yet worked out a solution to the joining of these technologies. There are expert cliques attached to each set of standards but they have only hopes of how these standards will fit together.
Visionaries and early research software illustrate a better horizon where documents can be:
All of these very desirable facilities can be seen in examples of current software, but no package has been found that addresses all of them. Certainly there are no standards for such a piece of software to work to, nor does there appear to be any standards in public development. The continual promise of better software "tomorrow" makes planning difficult, but eventually you realise that you need to proceed with the tools available. ‘Easy editing’, ‘effortless publishing’, and ‘media free document design’ all justify attention. In this design it was realised that of these three, ‘effortless publishing’, gave not only a business return, but also ensured data integrity. The need to show audit trails and product assurance was given priority over technological development.
In order to achieve the automated publication of documents, the model of how SGML operates was used. SGML uses ‘tags’ to describe the content of pieces of information (for instance to say ‘this collection of data is a chemical composition’). The tags only know the content of the information. The style (e.g. font sizes and relative positions) is held separately in a ‘document type definition’ (DTD).
The DTD contains the instructions which describes how the information should be arranged on the page, and there is therefore sufficient knowledge to process the database information into a printable document. The tags allow the same information to be presented in a variety of ways so that once a DTD has been written for a material specification then all of the documents can be printed with the latest modifications included. This facility is obviously limited by release control (i.e. is it approved?) and security.
It is interesting to note that there are many conceptual similarities between a document encoded in SGML (prior to formatting) and the physical file proposed for data transfer (e.g. Figure 10). It would be theoretically possible to hold all the information for these documents in an SGML file, and print any one document from the file, by constraining the print by including and excluding the instances of the relevant tags.
Applications like this break down the barrier between bibliographic and factual (26.) databases. If a bibliographic database was designed in a fully normaliscd way then it would be factual. This division is caused by producing textual document on word-processors which are merely machine readable, if the information is made machine evaluablc then the facts will no longer be hidden. SGML represents only one initiative in this direction, the move to hypertextual documents represents another trend which will eventually lead to the homogenization of bibliographic and factual databases.
The development of distributed computing and the need to transfer information to other organisations are particularly difficult design constraints. However both of these relate to minimizing the software’s reliance on supporting hardware or systems. A parallel problem was to ensure that the database and its data could be functionally transferred to virtually any other computing machine.
As part of our company’s installation it was necessary to ensure that the information could be written, searched and read by two independent mainframes.
Surprisingly the major problem with two independent mainframes was not the various technical issues. The most difficult problem was that the two mainframes had never previously shared a major database system and they had therefore different methods of security, change control, unit control etc. Each system had equivalent operating procedures, but they varied in style and content. New common procedures had to be developed that catered for the difference in style and ensured functional equivalency.
One further problem was that half the system was to be developed using IEF (as previously described), whilst the other was completed in FORTRAN.
The two mainframes were a technical mainframe used on an open-shop basis for engineering calculations and a commercial mainframe running mainly database management systems. There are various ways of tackling this problem.
One possible solution is to copy data from one system to the other, but this presumes that for any one field there is only one mainframe that holds the master copy. This was inappropriate and the decision was made to use the latest version of DB2 which claimed to support multi-mainframe use. In this way, data could be read from either mainframe, updated from either (and in a few cases both). In reality the need to update a piece of information from both systems is rare. The need to ensure that each piece of information has an owner (and hence audit trail) coupled with users having a preferred machine largely prevents the need for a dual updating facility.
It is worth noting that although these two mainframes had developed independently and were originally purchased for two separate operations of the company, more than two thirds of the resultant shared data was required by both sets of users.
The transfer of materials data in an automated way is a primary objective of a number of bodies who realise the benefits of sharing information. The leader of this initiative is the STandard for the Exchange of Product data (STEP), this work does not yet have a firm solution but committees in both the U.S. (ASTM E49) and the U.K. (BSI AMT/4/-/6) are working on achieving a standard for a neutral file for materials information (27.). There is also an initiative under VAMAS (Versailles Project on Advanced Materials and Standards), whose technical working area number ten (materials data banks) is interested in co-ordinating work in this area. Their work led to an international workshop in September 1989 at Derby where the problem of Materials Data exchange was addressed specifically.
Transportability of data is not a significant problem, the problem is retaining the data structure and the associated data (metadata), during that transfer (30.). There is a dc facto standard for relational database access (SQL pronounced Sequel), but confidence in the codes transportability and universal application are not certain. This language does not solve the problem of transferring data between alien computers. The highest common factor for transfer is to USC ASCII based, 80 character per record files. Assuming your programs are written in Fortran, C or some other common standard language, then this neutral file will allow transportation to any processor. This 80 column file is the same as that used by the existing standard for draughting (IGES), and is compatible with STEPs.
These files will have to replace the functionality of a relational database. This is done by defining a format in the file which can be translated to a number of related tables (and vice versa). A small sample is shown in Figure 10.
The file remains human-readable (but only just). It is not anticipated that anyone would ever ‘author’ such an anamorphic file without the aid of the support program which displays the information in an understandable (unnormalised) way. it is important however that such files can be browsed and even modified, using a normal editing program. This file is similar in concept to that described by Vinard (28.), Cverna (29.) and hypothetically by McCarthy (17.).
Defining a format such as the one shown in Figure 10, does not solve the problem of data interchange. The mapping of one database onto another of a different design is difficult. To map N databases onto each other is very difficult. This is evidenced by the difficulties encountered within the COMMIT project in launching a database (of the same design on two mainframes and sundry workstations. The STEP project is aiming to tackle the very difficult task of creating a model that is a superset of all the databases currently in use.
The only alternative is to wait for the chaos caused by "islands of agreement". These will develop under the pressure of market sectors possibly clustering around a standard inappropriate for the task (e.g. EDIFACT). Only a single custom solution will do, and the STEP standard is the heir apparent. Anyone who appreciates this argument should:
• support the necessary standard development (STEP) through their relevant national standards body.
• ensure their database is as generic as possible and not tied to parochial requirements.
3’
In fact STEP uses EXPRESS files which does not define the position of data inside the allure the number of characters per line.Program Access
Access by users to the full detail of the data will be by dedicated programs (or database transactions) which will allow the user to browse, search and plot the data. Subject to security the user can also edit and release new versions of the data on either mainframe or on distributed workstations.
Whilst in the editing program, the user will be able to import and export the files shown in Figure 10.
Access by CAE programs will be via FORTRAN subroutines which contain the functionality to find the best data available, whenever an exact match is not found. These routines "understand" the structure of the COMMIT database, and can be said to be "closely- coupled" (313. Only after the program has completed will the user be warned of inappropriate data, unless the program is interactive in which case the programmer can give specific instructions beforehand.
Once linked these programs can run either on the same machine as the database, or by using the files described above the program can be run on alien mainframes without modification.
3’
Some aspects of access from programs are further described in the "Discussion".The difficult problem of joining two independent mainframes with a shared database has been overcome. The database was greatly assisted by a rigourous methodology and an associated fifth generation language. The design demonstrates that the standard relational approach can be appropriate for material information systems. (It is interesting to compare this with McCarthy’s paper (17.) who only a few years ago saw a number of problems with a relational approach.)
This paper has attempted to demonstrate that a "factual" database does not exist per se. if a bibliographic database is designed in a fully normalised way, then it would be "factual".
The objective of designing a user- friendly system has been achieved for data input, data searching, access and automated publication. The provision to change data, incorporate knowledge, data transfer and future hardware and software development has been anticipated.
The development of the database design described in this paper has been assisted by various members of the COMMIT project within Rolls-Royce plc. The work was funded by the Computer Aided Engineering and Manufacture project as a strategic development for the Company. Those who have particularly influenced this design in addition to the authors are R.T. Perkin, C.E. Butler, FJ. Selvey, K. Barnett, B.i. Piearcey, P. Coultas, R. Price, J.R. Marjoram, RH.. Fleetwood, D.A. Youngs and R.A.Newley.
The helpful advice of M. Clarke, P.M. Sargent, A.Demaid, B.J.Piearcey, R. Price and M.R. King, during the preparation of this paper was much appreciated.
1. Dittrich, K.R., "Object-oriented Database Systems the Notions and the Issues", Forschungszentrum Informatik (FZI) an dc Univcrsistat Kazlsruhe, llaid-und.Ncu.Str. 10-14, D-7500, Karlsruhe.
2. Yourdon Inc., "Structured Systems for ReaI-Time Systems" ,Edition 3.0, 1984 Yourdon Inc. New York
3. BIS Limited "Data Analysis Workshop", B.I.S. Applied Systems Ltd.
4. Bamkin, R.J. ct al "European Materials Information Technology - Materials Function Analysis Report." Report Ref. SPS5047 Issue 2. The Librarian, Rolls-Royce P.O. Box 31, Derby, DE2 SRI, U.K., April 1989.
S. James Martin Associates. "BAA, Business Area Analysis Handbook", 1987, Seagrave house, Earlsfort Ten., Dublin 2, Ireland.
6. James Martin Associates. "BSD, Business System Design Handbook", 1987, Segrave house, Earlsfort Ten., Dublin 2, Ireland.
7. Ammersbach, K.I., Fuhr, N., and Knorz, G.E., "Empirically Based Concepts for Material Information Systems", 1988, GMD P4 (IPSI), Th1 Darmstadt, D-6 100 Darmstadt, Germany
8. Bamkin, R.J.., and Piearcey, BJ., "Knowledge- Based Material Selection in Design.", Materials and Design 1., April 90, Butterworths, London
9. Grattidgc, W., "Capture of Published Materials Data," Computerization and Networking of Materials Data Bases, ASTM STP 1017, J.S.Glazman and J.R.Rumble, Jr., Eds., American Society for Testing and Materials, Philadelphia, 1989, pp. 151-174.
10. Milman,M., "Trompe-I’oeil Painting - The Illusion of Reality" p100 Macmillan 1983 ISBN 0333 34153-S
11. Rumble, J.R.,Jr."August 6, 1991 Version of the STEP Materials Model" Communication to DSI AMT/41-/6 and other STEP material activists. August 1991. NIST, Gaithersburg, Maryland 20899.
12. Sargcnt,P.M., "Materials Information for CADICAM Butterworth- Heinemann Publ, Oxford, UK ISBN 0-7506-0277-5
13. Rumble, 1, "Access Paths for Materials Databases: Approaches for Large Databases and Systems" Computerization and Networking of Material Data Bases: Second Volume. ASTM STP /106. Kaupfman and J.S. Glazman, Eds., American Society for Testing and Materials, Philadelphia, 1991, p 133.
14. Stanton, E.L., Meyer, KJ., and Kipp, T.E., Jr., "Computerization of Composites Materials Data and Metadata," Computerization and Networking of Materials Data Bases: Second Volume, 4STM STP /106,Kaupfman and J.S. Glazman, I3ds., American Society for Testing and Materials, Philadelphia, 1991, p 173.
15. Westbrook, J.H., and Grattidge, W., "The Role of Metadata. the Design and Operation of a Materials Database" Computerization wad Networking of Materials Data Bases: Second Volume, 4STM STP 1/06, Kaupfman and 1.5. Glazman, Eds., American Society for Testing and Materials, Philadelphia, 1991, p 96.
16. Bamkin, RJ., and Butler, C.E., CAE - the Integration with Material Data and Information", Presented at the 2nd Symposium on the Computerization of Material Property Data, Orlando, Florida. November 1989. Available from the authors.
17. McCarthy, J.L., "Information Systems Design for Material Properties Data," Computerization and Networking of material Data Bases, ,ISTM STP 1017. J.S. Glaxman and J.R. Rumble, Jr., Eds., American Society for Testing and Materials, Philadelphia, 1989, pp 135-150.
18. Sargcnt,P.M. et al, "Materials Information and Conceptual Data Modeling", Computerization and Use of Materials Databases: Third Volume. ASTM STP 1140, Thomas I. Barry and Keith W. Rcynard, I3ds., American Society for Testing and Materials, Philadelphia, 1992.
19. CODATA~Material Data Systems for Engineering" Proceedings of a CODATA workshop, Schluchsec, 1985. Wcstbrook,i.I1. Ct a!. ISBN 3-88127-100-7
20. Hansen, R., "How Manufacturing can make the most of CALS. .J. of Manufacturing, Winter 1990, Frost and Sullivan.
21. US Department of Defence. MIL-STD-I840A Covers the exchange of data for CALS using magnetic tape.
22. US Department of Defense. MIL-D-28001 Covers the creation of text files based on the Standard Generalized Markup Language (SGML), ISO standard 8879.
23. Us Department of Defence. MIL-D-23000 Covers the creation of Product Definition Data based on the International Graphics Exchange Specification (IGI3S). ANSI standard X14.26M.
24. US Department of Defence. MIL-D-28003 Covers the creation of two dimensional image files based on Computer Graphics Metafile (CGM). Relevant standards are ISO standard 8632 and an ANSI standard X3.122.
25. US Department of Defence. MIL-D-28002 Covers the creation of compressed raster graphics files based either on the CCI1’T standard Group IV T.6 Facsimile or on the ISO standard 8613/7.
26. Kröckel, H., Reynard,K., Rumble,J., "Factual Materials Databanks - the need for standards" July 1987, CEC. iRC Petten, Postbus 2, 1755 ZG Petten, The Netherlands.
27. Rumble, J.R., Jr., "The STEP model of Materials Information", Computerization and Use of Materials Databases: Third Volume, ASTM STP 1/40, Thomas 1. Barry and Keith W.Reynard Eds., American Society for Testing and Materials, Philadelphia, 1992.
28. Vinard. D.R., Pellering, C. and Dereims, M., "Use of Z99-OOI as a Neutral Exchange Format for Saint-Gobains Materials Databank", Computerization and Use of Materials Databases: Third Volume, ASTM STP 1/40, Thomas 1. Barry and Keith W. Reynard Eds., American Society for Testing and Matcrials,~hi1adc1phia, 1992.
29. Cverna, F.A., Gall, T.L, and Heller, M.E., "An ASCII File Format for Materials Properties Import and Export", Computerization and Use of Materials Databases: Third Volume, ASTM STP 1140, Thomas I. Barry and Keith Reynard Eds., American Society for Testing and Materials, Philadelphia, 1992.
30. Sargent, P.M., "A survey of Materials Data Interchange Technologies" CIJED/C-MANUF/TR.1 August 1989, Cambridge University Engineering Dept., Technical Reports UK.
31. McKay,A., Holdsworth, M., et al, "The Integration of Third Party Software," Advances in Manufacturing Technology. Proceedings of the Sixth National conference on Production Research., Carrie,!., and Simpson, I., Ed.
Discussion
N.Swindell (written discussion) - In the STEP model for materials the starting point is material product, but in your system you have taken the starting point for classification as chemical composition, a view point we rejected in the STEP model. Could you comment on the different views?
R.J.Bamkin (author’s closure) - If a database has but one method of defining a material then the concept of a material being defined by the processes that have created it, is correct. In our paper ("Data Usage") we allude to the existence of other databases, principal of which is our processes database. Like STEP, the relationship to processes defines the simplest material. Higher level concepts like "the aluminium alloys" cannot be defined in this way. Hierarchies are one method of finding information wjJh the added functionality of being able to store information at higher concept levels where the properties are generic. The arrangcmcnt of most conventional publications tabulating materials confirms our belief that the most important hierarchy (and the most debated) is that based on chemical composition. I have argued that this concept could be built into STEP, but this does increase the cost to each implementer and is certainly outside the STEP short term scope. Communications I have received since this paper (from A.Demaid) indicate that it can be argued that data exchange should use the simplest type of data structure. A decision on this will be needed in the next phase of STEP.
H.Kröckel (written discussion) - (comment): The result of a highly specified query such as that shown in the third example of Table I... usually is that the database does not have a data point exactly meeting that query. Problems of this type can be solved by designing the database to associate data with knowledge of mechanisms or their mathematical representation i.e. models. This concept (for instance applied in the HTM-DB enables the computation of values not found, by interpolation and (limited) extrapolation from available data on the basis of mechanistic representation.
R.J.Bamkin (author’s closure) - Your comment pertains to the evaluation process which is one use of a database, but of course, is not the database or its management system. In order to answer your question effectively I need to broaden the scope in order to place my reply in context. The database described excludes (at present) mechanical test data. Validated test data is evaluated using statistics, experience and mathematical models. The resulting design data (or equations) will reside in this database and any interpolation which is theoretically possible will be permitted by the property structure. As the data should have been extrapolated within the evaluation process, when all the meta-data was available, it would be too dangerous to second guess the previous extrapolation. It is possible to argue that there are relationships between properties and between similar materials (which were not previously considered) which enable further extrapolation, but this is an evaluation process which would be done by exception, as there is insufficient time between the data being requested and the data being required (fractions of a micro second) to perform these calculations "on the fly".
Having maximized all possibilities for extrapolation and modeling there will still be cases where no data exists. The third example was intended to illustrate an enquiry that would normally be satisfied1- by interpolation (on numeric variables) and exact matching (character or integer variables). However where all variables are present except say "environment = marine" then the default environment would be used ("environment = air’
P.M .Sargent3’ (written discussion how many computer programs do you currently have accessing your database. How often are they running at "2 o’clock in the morning"
RJ.Bamkin (author’s closure) -z The database described here is a design for a replacement to our existing system. The existing database has of the order of thirty programs which require access to some data approximately a thousand times a week. Some of these programs may be accessing" particular interpolations of the data several million times within the same program run Those programs that are not interactive (usually because they are heavy users) are likely to run in the early hours of a morning, and there is therefore no opportunity to ask for human expertise. These calculations either get the data they require or they fail (with a large cost in wasted preparation and computing time).