Introduction to Databases


Database Basics

Data is numeric, textual, visual, or audio information that describes real-world systems.

  • varies by:
    • scope
    • format
    • access

Databases

A database is a collection of data in a structured format.

A database management system (DBMS) is software that reads and writes data in a database.

  • ensure data is secure, internally consistent, and available at all times

A database application is software that helps business users interact with database systems.

  • make it easier for non programmers to interact with data and databases

An information management system is a software application that manages corporate data for a specific business function.

  • include database system, user interface, business logic, and more

Database Roles

A database administrator is responsible for securing the database system against unauthorized users.

A database designer determines the format of each data element and the overall database structure.

A database programmer develops computer programs that utilize a database.

A database user is a consumer of data in a database.

Database Systems

transaction is a group of queries that must be either completed or rejected as a whole.

  • when processing transactions, DBMS must:
    • ensure transactions are processed completely or not at all
    • prevent conflicts between concurrent transactions
    • ensure transaction results are never lost

Architecture

Database architecture describes the internal components and the relationships between components.

  • components:
    • query processor
      • interprets queries, creates a plan to modify the database or retrieve data, and returns query results to the application
      • performs query optimization to ensure the most efficient instructions are executed
    • storage manager
      • translates the query processor instructions into low-level file-system commands that modify or retrieve data
      • uses indexes to quickly locate data
    • log
      • is a file containing a complete record of all inserts, updates and deletes processed by the database
      • transaction manager writes to log before applying changes to the database
    • catalog
      • aka data dictionary
      • is a directory of tables, columns, indexes, and other database objects
      • used to process and execute queries

Metadata is data about the database, such as column names and the number of rows in each table.

Products

A relational database stores data in tables, columns, and rows.

  • similar to a spreadsheet
  • most database systems are relational
  • all data in a column has the same format
  • all data in a row represents a single object
  • all relational database systems support Structured Query Language (SQL)
    • uses statements that read and write data, create and delete tables, and administer the database system
  • ideal for databases that require an accurate record of every transaction
  • not ideal for big data
    • massive volumes of poorly structured or incomplete data
    • thus non-relational systems were created optimized for big data
      • called NoSQL
  • db-engines.com ranks systems

Query Languages

A query is a command for a database that typically inserts new data, retrieves data, updates data, or deletes data from a database.

A query language is a computer programming language for writing database queries.

  • database system responds to queries written in a query language

Writing Queries with SQL

Structured Query Language (SQL) is the standard query language of relational database systems.

A SQL statement is a complete, executable database command.

Database Design and Programming

database design is a specification of database objects such as tables, columns, data types, and indexes.

Design Phases

  • 3 phases
    1. Conceptual design
    2. Logical design
    3. Physical design

Conceptual design phases specifies database requirements without regard to a specific database system.

  • aka analysis, data modeling, requirements definition
  • requirements are represented as:
    • entities
    • relationships
    • attributes
  • entities, relationships, and attributes are depicted in ER diagrams

Logical design phase implements database requirements in a specific database system.

  • in a relational database system,
    • converts entities, relationships, and attributes into tables, keys, and columns
    • a key is a column used to identify individual rows of a table
    • specified in SQL with CREATE TABLE statement
  • depicted in a table diagram
    • more detailed than ER diagram
  • called a database schema

Physical design phase adds indexes and specifies how tables are organized on storage media.

  • e.g., rows of a table may be sorted on the values of a column and stored in sort order
  • logical and physical design affect queries differently
    • logical design affects the query result
    • physical design affects query processing speed but never the result

Data independence is the principle that physical design never affects the query result.

  • aka information independence
  • allows database designers to tune query performance without changes to application programs

Programming

  • SQL lacks programming features
    • most SQL implementations are not object-oriented
  • SQL is usually combined with a general purpose programming language
    • programming languages use an API to implement SQL operations
      • e.g., connecting to database, executing queries, and retrieving results

MySQL

MySQL is a leading relational database system, sponsored by Oracle.

  • runs on all major operating systems
  • multiple editions:
    • MySQL Community
      • free edition with a complete set of database services and tools
    • MySQL Enterprise
      • paid edition for managing commercial databases
    • MySQL HeatWave
      • is a commercial edition with additional capabilities for analytics and machine learning

Relational Model

A database model is a conceptual framework for database systems with 3 parts:

  • data structures
    • prescribe how data is organized
  • operations
    • manipulate data structures
  • rules
    • govern valid data

The relational model is a database model based on tabular data structure.

  • published in 1970 by E.F. Codd of IBM
    • released commercially in 1980
  • data structure, operations, and rules are standardized in SQL

Relational Data Structure

  • relational data structure is based on set theory
  • a set is an unordered collection of elements enclosed in braces
    • e.g., {a, b, c} = {c, b, a}
  • a tuple is an ordered collection of elements enclosed in parentheses
    • e.g., (a, b, c) ≠ (c, b, a)
  • organizes data in tables:
    • table
      • has a name, fixed tuple of columns, and varying set of rows
    • column
      • has a name and a data type
    • row
      • is unnamed tuple of values
        • each value corresponds to a column and belongs to the column’s data type
      • rows in a table have no inherent order
    • data type
      • is a named set of values, from which column values are drawn

Relational Operations

  • based on set theory
  • each operation generates a result table from one or two input tables
    • SELECT selects a subset of (or all) rows of a table
    • PROJECT selects one or more columns of a table
    • PRODUCT lists all combinations of rows of two tables
    • JOIN combines two tables by comparing related columns
    • UNION selects rows of two tables
    • INTERSECT selects rows common to two tables
    • DIFFERENCE selects rows that appear in one table but not another
    • RENAME changes a table name
    • AGGREGATE computes functions over multiple table rows, such as sum and count
  • these operations are called relational algebra
  • result of a SQL query is a table

Relational Rules

Rules are logical constraints that ensure data is valid.

Relational rules are part of the relational model and govern data in every relational database.

  • e.g.,
    • unique primary key
    • unique column names
    • no duplicate rows
  • relational rules are implemented as SQL constraints
    • enforced by the database system

Business rules are based on business policy and specific to a particular database.

  • discovered during database design
  • often implemented as SQL constraints
    • but may need to be enforced by applications running on the database