Basic Data Terminology

Prepared by Dr. Nasir Gharaibeh, Texas A&M University

It’s been said that data is the fuel that makes pavement management systems run.  Data terminology can be confusing because some terms can be defined in different ways.  Here are definitions of 10 basic data-related terms.  I hope you find this post helpful.

  • Data quality:  The principal dimensions of data quality are accuracy (closeness between a data value and the real-world value that it represents), completeness (absence/presence of missing values – a value that exists in the real world but is not in the database), and timeliness (how current the data are for the task at hand).  (1)
  • Data mining: The process of discovering interesting patterns from data.  As a knowledge discovery process, it involves in-depth analysis such as data classification, clustering, outlier/anomaly detection, and the characterization of changes in data over time. (4)
  • Data warehouse: A repository for long-term storage of data from multiple sources, organized so as to facilitate management decision making. The data are stored under a unified schema.  Data warehouse systems provide data cleaning, data integration, and online analytical processing (OLAP). (4)

600px-Applications-database.svg

  • Data Mart:  A subset of a data warehouse that supports the requirements of a particular business function.  It is a scaled-down version of a data warehouse. (2)
  • Data schema: A logical schema describes the design (blue print) of the database.  A physical schema is the collection of actual tables (and other objects) that the database is comprised of.
  • Index: A commonly used method for rapidly retrieving specified rows from a table without having to search the entire table.  Each table can have one or more indexes specified. Each index applies to a particular column or set of columns. For each value of the column(s), the index lists the location(s) of the row(s) in which that value can be found. (3)

600px-Suitcase_icon_blue_green_red_dynamic_v32.svg

  • Legacy data: Data collected by an information system that has been replaced by a newer system, and which cannot be immediately integrated into the newer system’s database. (2)
  • Metadata: The data that describes data. For example, a data point may consist of the number, “ 150. ” The metadata for that data may be the words “ Weight, in pounds.” (2)
  • Relational Database: A collection of tables, each of which is assigned a unique name.  Each table consists of a set of attributes (columns) and usually stores a large set of tuples (records or rows).  Each tuple in a relational table represents an objects identified by a unique key and described by a set of attribute values. (4)
  • Structured Query Language (SQL) (pronounced sequel): A computer language used to retrieve data from a relational database. (3)

Sql_database_shortcut_icon

References:

  1. Data quality: concepts, methodologies and techniques, by Carlo Batini, and Monica Scannapieca. Springer, 2006.
  2. Repurposing Legacy Data : Innovative Case Studies, by Jules J. Berman, Computer Science Reviews and Trends, Elsevier Science, 2015.
  3. Database design : know it all, by Toby Teorey et al., 2008.
  4. Data Mining Concepts and Techniques, by Jiawei Han, Micheline Kamber, and  Jian Pei, 3rd Edition, Elsevier Inc., 2012.

 

For more information, please contact Nasir Gharaibeh at: ngharaibeh@civil.tamu.edu

One thought on “Basic Data Terminology”

  1. Nima, I will need to poll the states about MAP-21 performance measures. Specifically want to know how their network looks like under the proposed MAP-21 measures in terms of Good%, Fair%, and Poor%. Could you pls help me post on this site? Any information will be appreciated.

Comments are closed.