Introduction to Databases
Database Basics
Data is numeric, textual, visual, or audio information that describes real-world systems.
- varies by:
- scope
- format
- access
Databases
A database is a collection of data in a structured format.
A database management system (DBMS) is software that reads and writes data in a database.
- ensure data is secure, internally consistent, and available at all times
A database application is software that helps business users interact with database systems.
- make it easier for non programmers to interact with data and databases
An information management system is a software application that manages corporate data for a specific business function.
- include database system, user interface, business logic, and more
Database Roles
A database administrator is responsible for securing the database system against unauthorized users.
A database designer determines the format of each data element and the overall database structure.
A database programmer develops computer programs that utilize a database.
A database user is a consumer of data in a database.
Database Systems
A transaction is a group of queries that must be either completed or rejected as a whole.
- when processing transactions, DBMS must:
- ensure transactions are processed completely or not at all
- prevent conflicts between concurrent transactions
- ensure transaction results are never lost
Architecture
Database architecture describes the internal components and the relationships between components.
- components:
- query processor
- interprets queries, creates a plan to modify the database or retrieve data, and returns query results to the application
- performs query optimization to ensure the most efficient instructions are executed
- storage manager
- translates the query processor instructions into low-level file-system commands that modify or retrieve data
- uses indexes to quickly locate data
- log
- is a file containing a complete record of all inserts, updates and deletes processed by the database
- transaction manager writes to log before applying changes to the database
- catalog
- aka data dictionary
- is a directory of tables, columns, indexes, and other database objects
- used to process and execute queries
- query processor
Metadata is data about the database, such as column names and the number of rows in each table.
Products
A relational database stores data in tables, columns, and rows.
- similar to a spreadsheet
- most database systems are relational
- all data in a column has the same format
- all data in a row represents a single object
- all relational database systems support Structured Query Language (SQL)
- uses statements that read and write data, create and delete tables, and administer the database system
- ideal for databases that require an accurate record of every transaction
- not ideal for big data
- massive volumes of poorly structured or incomplete data
- thus non-relational systems were created optimized for big data
- called NoSQL
- db-engines.com ranks systems
Query Languages
A query is a command for a database that typically inserts new data, retrieves data, updates data, or deletes data from a database.
A query language is a computer programming language for writing database queries.
- database system responds to queries written in a query language
Writing Queries with SQL
Structured Query Language (SQL) is the standard query language of relational database systems.
A SQL statement is a complete, executable database command.
Database Design and Programming
A database design is a specification of database objects such as tables, columns, data types, and indexes.
Design Phases
- 3 phases
- Conceptual design
- Logical design
- Physical design
Conceptual design phases specifies database requirements without regard to a specific database system.
- aka analysis, data modeling, requirements definition
- requirements are represented as:
- entities
- relationships
- attributes
- entities, relationships, and attributes are depicted in ER diagrams
Logical design phase implements database requirements in a specific database system.
- in a relational database system,
- converts entities, relationships, and attributes into tables, keys, and columns
- a key is a column used to identify individual rows of a table
- specified in SQL with
CREATE TABLEstatement
- depicted in a table diagram
- more detailed than ER diagram
- called a database schema
Physical design phase adds indexes and specifies how tables are organized on storage media.
- e.g., rows of a table may be sorted on the values of a column and stored in sort order
- logical and physical design affect queries differently
- logical design affects the query result
- physical design affects query processing speed but never the result
Data independence is the principle that physical design never affects the query result.
- aka information independence
- allows database designers to tune query performance without changes to application programs
Programming
- SQL lacks programming features
- most SQL implementations are not object-oriented
- SQL is usually combined with a general purpose programming language
- programming languages use an API to implement SQL operations
- e.g., connecting to database, executing queries, and retrieving results
- programming languages use an API to implement SQL operations
MySQL
MySQL is a leading relational database system, sponsored by Oracle.
- runs on all major operating systems
- multiple editions:
- MySQL Community
- free edition with a complete set of database services and tools
- MySQL Enterprise
- paid edition for managing commercial databases
- MySQL HeatWave
- is a commercial edition with additional capabilities for analytics and machine learning
- MySQL Community
Relational Model
A database model is a conceptual framework for database systems with 3 parts:
- data structures
- prescribe how data is organized
- operations
- manipulate data structures
- rules
- govern valid data
The relational model is a database model based on tabular data structure.
- published in 1970 by E.F. Codd of IBM
- released commercially in 1980
- data structure, operations, and rules are standardized in SQL

Relational Data Structure
- relational data structure is based on set theory
- a set is an unordered collection of elements enclosed in braces
- e.g., {a, b, c} = {c, b, a}
- a tuple is an ordered collection of elements enclosed in parentheses
- e.g., (a, b, c) ≠ (c, b, a)
- organizes data in tables:
- table
- has a name, fixed tuple of columns, and varying set of rows
- column
- has a name and a data type
- row
- is unnamed tuple of values
- each value corresponds to a column and belongs to the column’s data type
- rows in a table have no inherent order
- is unnamed tuple of values
- data type
- is a named set of values, from which column values are drawn
- table
Relational Operations
- based on set theory
- each operation generates a result table from one or two input tables
SELECTselects a subset of (or all) rows of a tablePROJECTselects one or more columns of a tablePRODUCTlists all combinations of rows of two tablesJOINcombines two tables by comparing related columnsUNIONselects rows of two tablesINTERSECTselects rows common to two tablesDIFFERENCEselects rows that appear in one table but not anotherRENAMEchanges a table nameAGGREGATEcomputes functions over multiple table rows, such as sum and count
- these operations are called relational algebra
- result of a SQL query is a table
Relational Rules
Rules are logical constraints that ensure data is valid.
Relational rules are part of the relational model and govern data in every relational database.
- e.g.,
- unique primary key
- unique column names
- no duplicate rows
- relational rules are implemented as SQL constraints
- enforced by the database system
Business rules are based on business policy and specific to a particular database.
- discovered during database design
- often implemented as SQL constraints
- but may need to be enforced by applications running on the database