DBMS: Database Queries and Relational Calculus

The sources provided offer a comprehensive exploration of database concepts, beginning with foundational elements of Entity-Relationship (ER) models, including entities, attributes, and relationships. They distinguish between various types of attributes (derived, multi-valued, composite, descriptive) and keys (super, candidate, primary, foreign), explaining their roles in uniquely identifying and linking data. The text transitions into relational models, detailing how ER constructs are converted into tables and the importance of referential integrity. A significant portion focuses on relational algebra as a procedural query language, breaking down fundamental operators like selection, projection, union, set difference, Cartesian product, and joins (inner and outer), and illustrating their application through practical examples. Finally, the sources touch upon relational calculus (tuple and domain) as non-procedural alternatives and introduce SQL, emphasizing its syntax for data retrieval and modification (insert, delete, update).

Data Modeling: ER and Relational Models Explained

Data modeling is a fundamental concept in database management systems (DBMS) that serves as a blueprint or structure for how data is stored and accessed. It provides conceptual tools to describe various aspects of data:

  • Data itself.
  • Data relationships.
  • Consistency constraints.
  • Data meaning (semantics).

The goal of data modeling is to establish a structured format for storing data to ensure efficient retrieval and management. It is crucial because information derived from processed data is highly valuable for decision-making, which is why companies invest significantly in data.

There are primarily two phases in database design that involve data modeling:

  1. Designing the ER (Entity-Relationship) Model: This is the first, high-level design phase.
  2. Converting the ER Model into a Relational Model: This phase translates the high-level design into a structured format suitable for relational databases.

Let’s delve into the types and key aspects of data models discussed in the sources:

Types of Data Models

The sources categorize data within a database system into two broad types: structured and unstructured.

  • Structured Data: This type of data has a proper format, often tabular. Examples include data from Indian railways or university data. Different patterns for storing structured data include:
  • Key-value pairs: Used for high-speed lookups.
  • Column-oriented databases: Store data column by column instead of row by row.
  • Graph databases: Data is stored in nodes, with relationships depicted by edges (e.g., social media recommendation systems).
  • Document-oriented databases: Used in systems like MongoDB.
  • Object-oriented databases: Store data as objects.
  • Unstructured Data: This data does not have a proper format, such as a mix of videos, text, and images found on a website.

For strictly tabular and structured data, a relational database management system (RDBMS) is considered the best choice. However, for better performance, scalability, or special use cases, other database types can serve as alternatives.

The Entity-Relationship (ER) Model

The ER model is a high-level data model that is easily understandable even by non-technical persons. It is based on the perception of real-world objects and the relationships among them. The ER model acts as a bridge to understand the relational model, allowing for high-level design that can then be implemented in a relational database.

Key constructs in the ER model include:

  • Entities: Represent real-world objects (e.g., student, car). Entities can be:
  • Entity Type: The class blueprint or table definition (e.g., “Student” table).
  • Entity Instance: A specific record or row with filled values (e.g., a specific student’s record).
  • Entity Set: A collection of all entity instances of a particular type.
  • Strong Entity Type: Can exist independently and has its own primary key (also called regular or independent entity type).
  • Weak Entity Type: Depends on the existence of a strong entity type and does not have its own primary key (also called dependent entity type). Its instances are uniquely identified with the help of a discriminator (a unique attribute within the weak entity) and the primary key of the strong entity type it depends on. A weak entity type always has total participation in its identifying relationship.
  • Attributes: These are the properties that describe an entity type (e.g., for a “Fighter” entity, attributes could be ranking, weight, reach, record, age). Each attribute has a domain (set of permissible values), which can be enforced by domain constraints. Attributes can be categorized as:
  • Simple: Atomic, cannot be subdivided (e.g., gender).
  • Composite: Can be subdivided (e.g., address into street, locality).
  • Single-valued: Holds a single value (e.g., role number).
  • Multivalued: Can hold multiple values (e.g., phone number, email).
  • Stored: Cannot be derived from other attributes (e.g., date of birth).
  • Derived: Can be calculated or derived from other stored attributes (e.g., age from date of birth).
  • Descriptive: Attributes of a relationship (e.g., “since” in “employee works in department”).
  • Relationships: Represent an association between instances of different entity types (e.g., “customer borrows loan”). Relationships have a degree (unary, binary, ternary) and cardinality ratios (based on maximum participation like one-to-one, one-to-many, many-to-one, many-to-many, and minimum participation like total or partial).
  • Total Participation: Means every instance of an entity type must participate in the relationship (minimum cardinality of one).
  • Partial Participation: Means instances of an entity type may or may not participate in the relationship (minimum cardinality of zero), which is the default setting.

The ER model is not a complete model on its own because it does not define the storage format or manipulation language (like SQL). However, it is a crucial conceptual tool for designing high-level database structures.

The Relational Model

Developed by E.F. Codd in 1970, the relational model dictates that data will be stored in a tabular format. Its popularity stems from its simplicity, ease of use and understanding, and its strong mathematical foundation.

In the relational model:

  • Tables (relations): Practical forms where data of interest is stored.
  • Rows (tuples, records, instances): Represent individual entries.
  • Columns (attributes, fields): Represent properties of the data.
  • Schema: The blueprint of the database, including attributes, constraints, and relationships.
  • Integrity Constraints: Rules to ensure data correctness and consistency. These include domain constraints, entity integrity (primary key unique and not null), referential integrity (foreign key values are a subset of parent table’s primary key values), null constraints, default value constraints, and uniqueness constraints.

The relational model is considered a complete model because it answers the three fundamental questions of data modeling:

  1. Storage Format: Data is stored in tables.
  2. Manipulation Language: SQL (Structured Query Language) is used for data manipulation.
  3. Integrity Constraints: It defines various integrity rules for data correctness.

When converting an ER model to a relational model, each entity type (strong or weak) is typically converted into a single table. Multivalued attributes usually require a separate table, while composite attributes are flattened into the original table. Relationships are represented either by incorporating foreign keys into existing tables or by creating separate tables for the relationships themselves, depending on the cardinality and participation constraints.

In summary, data modeling is the conceptual process of organizing data and its relationships within a database. The ER model provides a high-level design, serving as a conceptual bridge to the more detailed and mathematically rigorous relational model, which defines how data is physically stored and manipulated in tables using languages like SQL.

Relational Algebra: Operators and Concepts

Relational Algebra is a foundational concept in database management systems, serving as a procedural query language that specifies both what data to retrieve and how to retrieve it. It forms the theoretical foundation for SQL and is considered a cornerstone for understanding database concepts, design, and querying. This mathematical basis is one of the key reasons for the popularity of the relational model.

In relational algebra, operations deal with relations (tables) as inputs and produce new relations as outputs. The process involves three main components: input (one or more relations), output (always exactly one relation), and operators.

Types of Operators

Relational algebra operators are categorized into two main types: Fundamental and Derived. Derived operators are built upon the fundamental ones.

Fundamental Operators

  1. Selection ($\sigma$):
  • Purpose: Used for horizontal selection, meaning it selects rows (tuples) from a relation based on a specified condition (predicate).
  • Nature: It is a unary operator, taking one relation as input and producing one relation as output.
  • Syntax: $\sigma_{condition}(Relation)$.
  • Effect on Schema: The degree (number of columns) of the output relation is equal to the degree of the input relation, as only rows are filtered.
  • Effect on Data: The cardinality (number of rows) of the output relation will be less than or equal to the cardinality of the input relation.
  • Properties: Selection is commutative, meaning the order of applying multiple selection conditions does not change the result. Multiple conditions can also be combined using logical AND ($\land$) operators.
  • Null Handling: Null values are ignored in the selection operator if the condition involving them evaluates to null or false. Only tuples that return true for the condition are included.
  1. Projection ($\pi$):
  • Purpose: Used for vertical selection, meaning it selects columns (attributes) from a relation.
  • Nature: It is a unary operator, taking one relation as input and producing one relation as output.
  • Syntax: $\pi_{Attribute1, Attribute2, …}(Relation)$.
  • Effect on Schema: The degree (number of columns) of the output relation is less than or equal to the degree of the input relation, as only specified columns are projected.
  • Effect on Data: Projection eliminates duplicates in the resulting rows. Therefore, the cardinality of the output relation may be less than or equal to the cardinality of the input relation.
  • Properties: Projection is not swappable with selection if the selection condition relies on an attribute that would be removed by projection.
  • Null Handling: Null values are not ignored in projection; they are returned as part of the projected column.
  1. Union ($\cup$):
  • Purpose: Combines all unique tuples from two compatible relations.
  • Compatibility: Both relations must be union compatible, meaning they have the same degree (number of columns) and corresponding columns have same domains (data types). Column names can be different.
  • Properties: Union is commutative ($A \cup B = B \cup A$) and associative ($A \cup (B \cup C) = (A \cup B) \cup C$).
  • Effect on Schema: The degree remains the same as the input relations.
  • Effect on Data: Eliminates duplicates by default. The cardinality of the result is $Cardinality(R1) + Cardinality(R2)$ minus the number of common tuples.
  • Null Handling: Null values are not ignored; they are treated just like other values.
  1. Set Difference ($-$):
  • Purpose: Returns all tuples that are present in the first relation but not in the second relation. ($A – B$) includes elements in A but not in B.
  • Compatibility: Relations must be union compatible.
  • Properties: Set difference is neither commutative ($A – B \neq B – A$) nor associative.
  • Effect on Schema: The degree remains the same as the input relations.
  • Effect on Data: The cardinality of the result ranges from 0 (if R1 is a subset of R2) to $Cardinality(R1)$ (if R1 and R2 are disjoint).
  • Null Handling: Null values are not ignored.
  1. Cartesian Product ($\times$):
  • Purpose: Combines every tuple from the first relation with every tuple from the second relation, resulting in all possible tuple combinations.
  • Syntax: $R1 \times R2$.
  • Effect on Schema: The degree of the result is the sum of the degrees of the input relations ($Degree(R1) + Degree(R2)$). If columns have the same name, a qualifier (e.g., TableName.ColumnName) is used to differentiate them.
  • Effect on Data: The cardinality of the result is the product of the cardinalities of the input relations ($Cardinality(R1) \times Cardinality(R2)$).
  • Use Case: Often used as a preliminary step before applying a selection condition to filter for meaningful combinations, effectively performing a “join”.
  1. Renaming ($\rho$):
  • Purpose: Used to rename a relation or its attributes. This is useful for self-joins or providing more descriptive names.
  • Syntax: $\rho_{NewName}(Relation)$ or $\rho_{NewName(NewCol1, NewCol2, …)}(Relation)$.

Derived Operators

Derived operators can be expressed using combinations of fundamental operators.

  1. Intersection ($\cap$):
  • Purpose: Returns tuples that are common to both union-compatible relations.
  • Derivation: Can be derived using set difference: $R1 \cap R2 = R1 – (R1 – R2)$.
  • Compatibility: Relations must be union compatible.
  • Effect on Schema: The degree remains the same.
  • Effect on Data: The cardinality of the result ranges from 0 to the minimum of the cardinalities of the input relations.
  • Null Handling: Null values are not ignored.
  1. Join (Various Types): Joins combine tuples from two relations based on a common condition. They are derived from Cartesian product and selection.
  • Theta Join ($\Join_{\theta}$): Performs a Cartesian product followed by a selection based on any comparison condition ($\theta$) (e.g., greater than, less than, equals).
  • Syntax: $R1 \Join_{condition} R2$.
  • Effect on Schema: Sum of degrees.
  • Effect on Data: Ranges from 0 to Cartesian product cardinality.
  • Equijoin ($\Join_{=}$): A special case of Theta Join where the condition is restricted to equality ($=$).
  • Natural Join ($\Join$):
  • Purpose: Equijoins relations on all common attributes, automatically. The common attributes appear only once in the result schema.
  • Properties: Natural join is commutative and associative.
  • Effect on Schema: Degree is sum of degrees minus the count of common attributes.
  • Effect on Data: Cardinality ranges from 0 to the maximum (can be Cartesian product if no common attributes or if all common attributes have same values across all tuples). Tuples that fail to find a match are called dangling tuples.
  • Semi-Join ($\ltimes$):
  • Purpose: Performs a natural join but keeps only the attributes of the left-hand side relation. It effectively filters the left relation to only include tuples that have a match in the right relation.
  • Anti-Join ($\rhd$):
  • Purpose: Performs a natural join but keeps only the attributes of the left-hand side relation for tuples that do not have a match in the right relation [This is an external clarification, source says “keep the attributes of right hand side relation only” for anti-join, which contradicts the common definition of anti-join]. Correction based on source direct statement: “we have to keep the attributes of right hand side relation only” for anti-join. This is a bit unusual compared to standard anti-join (which typically returns tuples from the left that don’t have a match on the right, retaining left attributes). However, sticking to the provided source:
  • Purpose (per source): “keep the attributes of right hand side relation only”.
  • Effect: It implies a filtering operation, but the source’s description for anti-join might be a specific interpretation or a typo compared to conventional anti-join. I’ll highlight the source’s wording.
  1. Outer Join (Left, Right, Full):
  • Purpose: Similar to inner joins, but they also include non-matching (dangling) tuples from one or both relations, padding missing attribute values with null.
  • Left Outer Join ($\Join^{L}$): Includes all matching tuples and all dangling tuples from the left relation.
  • Right Outer Join ($\Join^{R}$): Includes all matching tuples and all dangling tuples from the right relation.
  • Full Outer Join ($\Join^{F}$): Includes all matching tuples and dangling tuples from both left and right relations.
  • Effect on Data: Cardinality of Left Outer Join is at least $Cardinality(R1)$. Cardinality of Right Outer Join is at least $Cardinality(R2)$. Cardinality of Full Outer Join is at least $Cardinality(R1 \cup R2)$ (if treating attributes as sets).
  • Null Handling: Nulls are explicitly used to represent missing values for non-matching tuples.
  1. Division ($\div$):
  • Purpose: Finds tuples in one relation that are “associated with” or “match all” tuples in another relation based on a subset of attributes. Often used for “for all” type queries.
  • Prerequisite: $R1 \div R2$ is only possible if all attributes of $R2$ are present in $R1$, and $R1$ has some extra attributes not present in $R2$.
  • Effect on Schema: The degree of the result is $Degree(R1) – Degree(R2)$ because attributes of $R2$ are removed from $R1$ in the output.
  • Derivation: Division is a derived operator and can be expressed using projection, Cartesian product, and set difference.

Relationship with Relational Calculus and SQL

Relational Algebra is a procedural language, telling the system how to do the retrieval, in contrast to Relational Calculus (Tuple Relational Calculus and Domain Relational Calculus), which are non-procedural and only specify what to retrieve. Relational algebra has the same expressive power as safe relational calculus. This means any query expressible in relational algebra can also be written in safe relational calculus, and vice versa. However, relational calculus (in its full, unsafe form) can express queries that cannot be expressed in relational algebra or SQL.

SQL’s SELECT, FROM, and WHERE clauses directly map to relational algebra’s Projection, Cartesian Product, and Selection operators, respectively. SQL is considered relationally complete, meaning any query expressible in relational algebra can also be written in SQL.

Key Concepts in Relational Algebra

  • Relation vs. Table: A relation is a mathematical set, a subset of a Cartesian product, containing only tuples that satisfy a given condition. A table is the practical form of a relation used in DBMS for storing data of interest. In tables, null and duplicate values are allowed for individual columns, but a whole tuple in a relation (mathematical sense) cannot be duplicated.
  • Degree and Cardinality: Degree refers to the number of columns (attributes) in a relation, while cardinality refers to the number of rows (tuples/records).
  • Null Values: In relational algebra, null signifies an unknown, non-applicable, or non-existing value. It is not treated as zero, empty string, or any specific value. Comparisons involving null (e.g., null > 5, null = null) typically result in null (unknown). This behavior impacts how selection and join operations handle tuples containing nulls, as conditions involving nulls usually do not evaluate to true. Projection, Union, Set Difference, and Intersection, however, do not ignore nulls.
  • Efficiency: When writing complex queries involving Cartesian products, it is generally more efficient to minimize the number of tuples in relations before performing the Cartesian product, as this reduces the size of the intermediate result. This principle is often applied by performing selections (filtering) early.

Relational Calculus: Principles, Types, and Applications

Relational Calculus is a non-procedural query language used in database management systems. Unlike procedural languages such as Relational Algebra, it specifies “what data to retrieve” rather than “how to retrieve” it. This means it focuses on describing the desired result set without outlining the step-by-step process for obtaining it.

Comparison with Relational Algebra and SQL

  • Relational Algebra (Procedural): Relational Algebra is considered a procedural language because it answers both “what to do” and “how to do” when querying a database.
  • Expressive Power:
  • Safe Relational Calculus has the same expressive power as Relational Algebra. This means any query that can be formulated in safe Relational Calculus can also be expressed in Relational Algebra, and vice versa.
  • However, Relational Calculus, in its entirety, has more expressive power than Relational Algebra or SQL. This additional power allows it to express “unsafe queries” – queries whose results include tuples that are not actually present in the database table.
  • Consequently, every query expressible in Relational Algebra or SQL can be represented using Relational Calculus, but there exist some queries in Relational Calculus that cannot be expressed using Relational Algebra.
  • Theoretical Foundation: SQL is theoretically based on both Relational Algebra and Relational Calculus.

Types of Relational Calculus

Relational Calculus is divided into two main parts:

  1. Tuple Relational Calculus (TRC)
  2. Domain Relational Calculus (DRC)

Tuple Relational Calculus (TRC)

Tuple Relational Calculus uses tuple variables to represent an entire row or record within a table.

  • Representation: A TRC query is typically represented as S = {T | P(T)}, where S is the result set, T is a tuple variable, and P is a condition (or predicate) that T must satisfy. The tuple variable T iterates through each tuple, and if the condition P(T) is true, that tuple is included in the result.
  • Attribute Access: Attributes of a tuple T are denoted using dot notation (T.A) or bracket notation (T[A]), where A is the attribute name.
  • Relation Membership: T belonging to a relation R is represented as T ∈ R or R(T).

Quantifiers in TRC: TRC employs logical quantifiers to express conditions:

  • Existential Quantifier (∃): Denoted by ∃ (read as “there exists”).
  • It asserts that there is at least one tuple that satisfies a given condition.
  • Unsafe Queries: Using the existential quantifier with an OR operator can produce unsafe queries. An unsafe query can include tuples in the result that are not actually present in the source table. For example, a query like T | ∃B (B ∈ Book ∧ (T.BookID = B.BookID ∨ T.Year = B.Year)) (where Book is a table) might include arbitrary combinations of BookID and Year that aren’t real entries if either part of the OR condition is met.
  • The EXISTS keyword in SQL is conceptually derived from this quantifier, returning true if a subquery produces a non-empty result.
  • Universal Quantifier (∀): Denoted by ∀ (read as “for all”).
  • It asserts that a condition must hold true for every tuple in a specified set.
  • Using ∀ with an AND operator can be meaningless for direct output projection.
  • It is often used in combination with negation (¬) or implication (→) to express queries like “find departments that do not have any girl students”.

Examples in TRC (from sources):

  • Projection:
  • To project all attributes of the Employee table: {T | Employee(T)}.
  • To project specific attributes (e.g., EName, Salary) of the Employee table: {T.EName, T.Salary | Employee(T)}.
  • Selection:
  • Find details of employees with Salary > 5000: {T | Employee(T) ∧ T.Salary > 5000}.
  • Find Date_of_Birth and Address of employees named “Rohit Sharma”: {T.DOB, T.Address | Employee(T) ∧ T.FirstName = ‘Rohit’ ∧ T.LastName = ‘Sharma’}.
  • Join (referencing multiple tables):
  • Find names of female students in the “Maths” department: {S.Name | Student(S) ∧ S.Sex = ‘Female’ ∧ ∃D (Department(D) ∧ D.DeptID = S.DeptNo ∧ D.DeptName = ‘Maths’)}.
  • Find BookID of all books issued to “Makash”: {T.BookID | ∃U (User(U) ∧ U.Name = ‘Makash’) ∧ ∃B (Borrow(B) ∧ B.CardNo = U.CardNo ∧ T.BookID = B.BookID)}.

Domain Relational Calculus (DRC)

Domain Relational Calculus uses domain variables that represent individual column attributes, rather than entire rows.

  • Representation: A DRC query is typically represented as Output_Table = {A1, A2, …, An | P(A1, A2, …, An)}, where A1, A2, …, An are the column attributes (domain variables) to be projected, and P is the condition they must satisfy.
  • Concept: Instead of iterating through tuples, DRC defines the domains of the attributes being sought.

Examples in DRC (from sources):

  • Projection:
  • Find BookID and Title of all books: {BookID, Title | (BookID, Title) ∈ Book}.
  • Selection:
  • Find BookID of all “DBMS” books: {BookID | (BookID, Title) ∈ Book ∧ Title = ‘DBMS’}.
  • Join:
  • Find title of all books supplied by “Habib”: {Title | ∃BookID, ∃SName ((BookID, Title) ∈ Book ∧ (BookID, SName) ∈ Supplier ∧ SName = ‘Habib’)}.

Safety of Queries

As mentioned, Relational Calculus can express unsafe queries. An unsafe query is one that, when executed, might include results that are not derived from the existing data in the database, potentially leading to an infinite set of results. For instance, a query to “include all those tuples which are not present in the table book” would be unsafe because there are infinitely many tuples not in a finite table.

SQL: Relational Database Querying and Manipulation

SQL (Structured Query Language) queries are the primary means of interacting with and manipulating data in relational database management systems (RDBMS). SQL is a non-procedural language, meaning it specifies what data to retrieve or modify rather than how to do it. This design allows the RDBMS to manage the efficient retrieval of data.

The theoretical foundation of SQL is based on both Relational Algebra (a procedural language) and Relational Calculus (a non-procedural language). SQL is considered a fourth-generation language, making it closer to natural language compared to third-generation languages like C++.

Core Components of SQL Queries

At its most basic level, an SQL query consists of three mandatory keywords for data retrieval: SELECT, FROM, and WHERE.

  • SELECT Clause:
  • Corresponds conceptually to the projection operator in Relational Algebra.
  • By default, SELECT retains duplicate values (projection with duplicacy).
  • To obtain distinct (unique) values, the DISTINCT keyword must be explicitly used (e.g., SELECT DISTINCT Title FROM Book).
  • If the default setting is changed to DISTINCT, ALL can be used to explicitly retain duplicates (e.g., SELECT ALL Title FROM Book).
  • Attributes or columns to be displayed are listed here.
  • FROM Clause:
  • Specifies the tables from which data is to be retrieved.
  • Conceptually, listing multiple tables in the FROM clause (e.g., FROM User, Borrow) implies a Cartesian Product between them.
  • The FROM clause is mandatory for data retrieval.
  • Tables can be renamed using the AS keyword (e.g., User AS U1), which is optional for tables but mandatory for renaming attributes.
  • WHERE Clause:
  • Used to specify conditions that rows must satisfy to be included in the result.
  • Corresponds to the selection operator in Relational Algebra (horizontal row selection).
  • The WHERE clause is optional; if omitted, all rows from the specified tables are returned.
  • Conditions can involve comparison operators (=, >, <, >=, <=, !=, <>), logical operators (AND, OR, NOT).

Advanced Query Operations

SQL queries can become complex using various clauses and operators:

  • Set Operations:
  • UNION: Combines the result sets of two or more SELECT statements. By default, UNION eliminates duplicate rows.
  • UNION ALL: Combines results and retains duplicate rows.
  • INTERSECT: Returns only the rows that are common to both result sets. By default, INTERSECT eliminates duplicates.
  • EXCEPT (or MINUS): Returns rows from the first query that are not present in the second. By default, EXCEPT eliminates duplicates.
  • For all set operations, the participating queries must be union compatible, meaning they have the same number of columns and compatible data types in corresponding columns.
  • Aggregate Functions:
  • Used to perform calculations on a set of rows and return a single summary value. Common functions include:
  • COUNT(): Counts the number of rows or non-null values in a column. COUNT(*) counts all rows, including those with nulls.
  • SUM(): Calculates the total sum of a numeric column.
  • AVG(): Calculates the average value of a numeric column.
  • MIN(): Returns the minimum value in a column.
  • MAX(): Returns the maximum value in a column.
  • All aggregate functions ignore null values, except for COUNT(*).
  • GROUP BY Clause:
  • Used to logically break a table into groups based on the values in one or more columns.
  • Aggregate functions are then applied to each group independently.
  • All attributes in the SELECT clause that are not part of an aggregate function must also be included in the GROUP BY clause.
  • Any attribute not in GROUP BY that needs to be displayed in the SELECT clause must appear inside an aggregate function.
  • HAVING Clause:
  • Used to filter groups created by the GROUP BY clause.
  • Similar to WHERE, but HAVING operates on groups after aggregation, while WHERE filters individual rows before aggregation.
  • Aggregate functions can be used directly in the HAVING clause (e.g., HAVING COUNT(*) > 50), which is not allowed in WHERE.
  • Subqueries (Nested Queries):
  • A query embedded within another SQL query.
  • Used with operators like IN, NOT IN, SOME/ANY, ALL, EXISTS, NOT EXISTS.
  • IN: Returns true if a value matches any value in a list or the result of a subquery.
  • SOME/ANY: Returns true if a comparison is true for any value in the subquery result (e.g., price > SOME (subquery) finds prices greater than at least one price in the subquery).
  • ALL: Returns true if a comparison is true for all values in the subquery result (e.g., price > ALL (subquery) finds prices greater than the maximum price in the subquery).
  • EXISTS: Returns true if the subquery returns at least one row (is non-empty). It’s typically used to check for the existence of related rows.
  • NOT EXISTS: Returns true if the subquery returns no rows (is empty).
  • UNIQUE: Returns true if the subquery returns no duplicate rows.
  • ORDER BY Clause:
  • Used to sort the result set of a query.
  • Sorting can be in ASC (ascending, default) or DESC (descending) order.
  • When sorting by multiple attributes, the first attribute listed is the primary sorting key, and subsequent attributes are secondary keys for tie-breaking within primary groups.
  • Sorting is always done tuple-wise, not column-wise, to avoid creating invalid data.
  • JOIN Operations:
  • Used to combine rows from two or more tables based on a related column between them.
  • INNER JOIN: Returns only the rows where there is a match in both tables. Can be specified with ON (any condition) or USING (specific common columns). INNER keyword is optional.
  • THETA JOIN: An inner join with an arbitrary condition (e.g., R1.C > R2.D).
  • EQUI JOIN: A theta join where the condition is solely an equality (=).
  • NATURAL JOIN: An equi join that automatically joins tables on all columns with the same name and data type, and eliminates duplicate common columns in the result.
  • OUTER JOIN: Includes matching rows and non-matching rows from one or both tables, filling non-matches with NULL values.
  • LEFT OUTER JOIN: Includes all rows from the left table and matching rows from the right table.
  • RIGHT OUTER JOIN: Includes all rows from the right table and matching rows from the left table.
  • FULL OUTER JOIN: Includes all rows from both tables, with NULL where there’s no match.

Database Modification Queries

SQL provides commands to modify the data stored in tables:

  • INSERT:
  • Adds new rows (tuples) to a table.
  • Syntax includes INSERT INTO table_name VALUES (value1, value2, …) or INSERT INTO table_name (column1, column2, …) VALUES (value1, value2, …).
  • DELETE:
  • Removes one or more rows from a table.
  • Syntax is DELETE FROM table_name [WHERE condition].
  • If no WHERE clause is specified, all rows are deleted.
  • TRUNCATE TABLE: A DDL command that quickly removes all rows from a table, similar to DELETE without a WHERE clause, but it is faster as it deletes the whole table in one go (rather than tuple by tuple) and resets identity columns. TRUNCATE cannot use a WHERE clause.
  • UPDATE:
  • Modifies existing data within a row (cell by cell).
  • Syntax is UPDATE table_name SET column1 = value1, … [WHERE condition].

Other Important Concepts Related to Queries

  • Views (Virtual Tables):
  • A virtual table based on the result-set of an SQL query.
  • Views are not physically stored in the database (dynamic views); instead, their definition is stored, and the view is evaluated when queried.
  • Views are primarily used for security (data hiding) and simplifying complex queries.
  • Views can be updatable (allowing INSERT, UPDATE, DELETE on the view, which affects the base tables) or read-only (typically for complex views involving joins or aggregates).
  • Materialized Views are physical copies of a view’s data, stored to improve performance for frequent queries.
  • NULL Values:
  • NULL represents unknown, non-existent, or non-applicable values.
  • NULL is not comparable to any value, including itself (e.g., SID = NULL will not work).
  • Comparison with NULL is done using IS NULL or IS NOT NULL.
  • NULL values are ignored by aggregate functions (except COUNT(*)).
  • In ORDER BY, NULL values are treated as the lowest value by default.
  • In GROUP BY, all NULL values are treated as equal and form a single group.
  • Pattern Matching (LIKE):
  • Used for string matching in WHERE clauses.
  • % (percentage sign): Matches any sequence of zero or more characters.
  • _ (underscore): Matches exactly one character.
  • The ESCAPE keyword can be used to search for the literal % or _ characters.
  • DDL Commands (Data Definition Language):
  • While not strictly queries that retrieve data, DDL commands define and manage the database schema.
  • CREATE TABLE: Defines a new table, including column names, data types, and constraints (like PRIMARY KEY, NOT NULL, FOREIGN KEY, DEFAULT).
  • ALTER TABLE: Modifies an existing table’s structure (e.g., adding/dropping columns, changing data types, adding/deleting constraints).
  • DROP TABLE: Deletes an entire table, including its data and schema.
  • DCL Commands (Data Control Language):
  • Manage permissions and access control for database users.
  • GRANT: Assigns specific privileges (e.g., SELECT, INSERT, UPDATE, DELETE) on database objects to users or roles.
  • REVOKE: Removes previously granted privileges.

SQL: Data Modification, Definition, and Control

SQL (Structured Query Language) provides powerful commands for modifying data stored in relational database management systems (RDBMS). These modifications are distinct from data retrieval queries (like SELECT) and fall under various categories within SQL, primarily Data Manipulation Language (DML) for data content changes and Data Definition Language (DDL) for schema structure changes.

Data Manipulation Commands (DML)

The core DML commands for modifying database content operate on a tuple-by-tuple or cell-by-cell basis.

  1. Deletion (DELETE)
  • Purpose: DELETE is used to remove one or more rows (tuples) from a table.
  • Syntax: The basic syntax is DELETE FROM table_name [WHERE condition].
  • Conditional Deletion: If a WHERE clause is specified, only rows satisfying the condition are deleted. If omitted, all rows are deleted from the table.
  • Relational Algebra Equivalent: In relational algebra, deletion is represented using the set difference operator (R – E), where R is the original relation and E is a relational algebra expression whose output specifies the tuples to be removed. The resulting new relation is then assigned back to the original relation. This requires E to be union compatible with R (same degree and domain for corresponding attributes).
  • Example: To delete all entries from the borrow relation corresponding to card number 101, one would subtract a relation containing all tuples where card_number = 101 from the borrow relation.
  1. Insertion (INSERT)
  • Purpose: INSERT is used to add new rows (tuples) to a table.
  • Syntax:
  • INSERT INTO table_name VALUES (value1, value2, …): Values must be in the order of the table’s columns.
  • INSERT INTO table_name (column1, column2, …) VALUES (value1, value2, …): Allows specifying columns, useful if not inserting values for all fields or if the order is not strictly followed.
  • Null Values: If not all fields are inserted, the remaining fields will by default be set to NULL.
  • Relational Algebra Equivalent: In relational algebra, insertion is performed using the union operator (R UNION E), where R is the original relation and E represents the tuples to be inserted. The new relation is then assigned to the old one. Union compatibility is also required here.
  • Example: To insert an entry into the book table with book_ID B101, year_of_publication 2025, and title A, you would use INSERT INTO book VALUES (‘B101’, ‘A’, 2025) or INSERT INTO book (book_ID, title, year_of_publication) VALUES (‘B101’, ‘A’, 2025).
  1. Update (UPDATE)
  • Purpose: UPDATE is used to modify existing data within rows. Unlike INSERT and DELETE which work tuple-by-tuple, UPDATE works cell-by-cell.
  • Syntax: UPDATE table_name SET column1 = value1, column2 = value2, … [WHERE condition].
  • Conditional Updates: The WHERE clause specifies which rows to update.
  • Calculations: The SET clause can include calculations (e.g., applying a discount).
  • Relational Algebra Equivalent: Conceptually, updating a single cell in relational algebra involves deleting the old tuple and inserting a new tuple with the modified value, while retaining other values.
  • Example: To give a 5% discount on all books supplied by ABC having a price greater than 1,000, you would UPDATE supplier SET price = 0.95 * price WHERE s_name = ‘ABC’ AND price > 1000.

Schema Modification Commands (DDL)

DDL commands are used to define and modify the database schema (structure).

  1. TRUNCATE TABLE
  • Purpose: TRUNCATE TABLE is a DDL command that removes all rows from a table.
  • Key Differences from DELETE:
  • Speed: TRUNCATE is faster than DELETE because it deletes the whole table in one go, rather than row by row.
  • WHERE Clause: TRUNCATE cannot use a WHERE clause; it always removes all rows.
  • Logging/Transactions: TRUNCATE typically involves less logging and cannot be rolled back easily in some systems, while DELETE (being DML) is part of transactions and can be rolled back.
  • Identity Columns: TRUNCATE often resets identity columns (auto-incrementing IDs).
  • DDL vs. DML: TRUNCATE is DDL, DELETE is DML.
  • Schema Preservation: Both DELETE (without WHERE) and TRUNCATE preserve the table’s schema (structure).
  1. DROP TABLE
  • Purpose: DROP TABLE deletes an entire table, including its data and schema (structure). This is a more permanent and impactful operation compared to DELETE or TRUNCATE.
  1. ALTER TABLE
  • Purpose: ALTER TABLE is used to modify the structure of an existing table.
  • Common Operations:
  • Adding/Dropping Columns: You can add new columns with ADD COLUMN column_name data_type or remove existing ones with DROP COLUMN column_name.
  • Modifying Columns: Change the data type or properties of an existing column with MODIFY COLUMN column_name new_data_type.
  • Adding/Dropping Constraints: Constraints (like PRIMARY KEY, FOREIGN KEY, NOT NULL) can be added or removed. Naming constraints with the CONSTRAINT keyword allows for easier modification or deletion later.
  • Infrequent Use: Schema changes are rarely done frequently because they can affect numerous existing tuples and related application programs.
  • RESTRICT vs. CASCADE with DROP COLUMN:
  • RESTRICT: If a column being dropped is referenced by another table (e.g., as a foreign key), RESTRICT will prevent the deletion.
  • CASCADE: If a column being dropped is referenced, CASCADE will force the deletion and also delete the referencing constraints or even the dependent tables/relations.

Data Control Language (DCL)

DCL commands manage permissions and access control for database users.

  1. GRANT
  • Purpose: GRANT is used to assign specific privileges on database objects (like tables, views) to users or roles.
  • Common Privileges:
  • SELECT: Allows users to retrieve data.
  • INSERT: Allows users to add new data.
  • UPDATE: Allows users to modify existing data.
  • DELETE: Allows users to remove data.
  • REFERENCES: Allows users to create foreign key relationships referencing the object.
  • ALL PRIVILEGES: Grants all available permissions.
  • Syntax: GRANT privilege_name ON object_name TO username.
  • Example: GRANT INSERT, UPDATE ON student TO Gora gives Gora permission to insert and update data in the student table.
  1. REVOKE
  • Purpose: REVOKE is used to remove previously granted privileges from users or roles.
  • Syntax: REVOKE privilege_name ON object_name FROM username.
  • Example: REVOKE DELETE ON student FROM Gora removes the delete privilege from Gora on the student table.

GRANT and REVOKE are crucial for database security and controlling who can perform specific actions with the data. Views, which are virtual tables, are often used in conjunction with DCL for security, as permissions can be granted on a view rather than directly on the underlying base tables, allowing for data hiding and simplified interaction.

Relational DBMS Course – Database Concepts, Design & Querying Tutorial

By Amjad Izhar
Contact: amjad.izhar@gmail.com
https://amjadizhar.blog


Discover more from Amjad Izhar Blog

Subscribe to get the latest posts sent to your email.

Comments

Leave a comment