SQL Server Architecture Explained: Named Pipes, Optimizer, Buffer Manager
MS SQL Server is a client-server architecture. MS SQL Server process starts with the client application sending a request. The SQL Server accepts, processes and replies to the request with processed data. Let’s discuss in detail the entire architecture shown below:
As the below Diagram depicts there are three major components in SQL Server Architecture:
1. Protocol Layer
2 .Relational Engine
Protocol Layer – SNI
· Shared Memory
· TCP/IP
· Named Pipes
· What is TDS?
Shared Memory
MS SQL SERVER – Here MS SQL server provides SHARED MEMORY PROTOCOL. Here CLIENT and MS SQL server run on the same machine. Both can communicate via Shared Memory protocol
For connection to local - DB in SQL Management Studio, "Server Name" Option could be below
TCP/IP
MS SQL SERVER provides the capability to interact via TCP/IP protocol, where CLIENT and MS SQL Server are remote to each other and installed on a separate machine.
Named Pipes
MS SQL Server Provides the capability to interact via the named pipe protocol, here the CLIENT and MS SQL Server are in connection via LAN
What is TDS?
Now that we know that there are three types of Client-Server Architecture, lets us have a glance at TDS:
TDS stands for Tabular Data Stream.
All 3 protocol Relational Engine
The Relational Engine is also known as the Query Processor. It has the SQL Server components that determine what exactly a query needs to do and how it can be done best. It is responsible for the execution of user queries by requesting data from the storage engine and processing the results that are returned.
As depicted in the Architectural Diagram there are 3 major components of the Relational Engine. Let’s study the components in detail:
CMD Parser
Data once received from Protocol Layer is then passed to Relational Engine. “CMD Parser” is the first component of Relational Engine to receive the Query data. The principal job of CMD Parser is to check the query for Syntactic and Semantic error. Finally, it generates a Query Tree.
Syntactic check:
Like every other Programming language, MS SQL also has the predefined set of Keywords. Also, SQL Server has its own grammar which SQL server understands.
SELECT, INSERT, UPDATE, and many others belong to MS SQL predefined Keyword lists.
CMD Parser does syntactic check. If users’ input does not follow these language syntax or grammar rules, it returns an error.
Semantic check:
This is performed by Normalizer.
In its simplest form, it checks whether Column name, Table name being queried exist in Schema. And if it exists, bind it to Query. This is also known as Binding.
Complexity increases when user queries contain VIEW. Normalizer performs the replacement with the internally stored view definition and much more.
Create Query Tree:
This step generates different execution tree in which query can be run.
Note that, all the different trees have the same desired output.
Optimizer
The work of the optimizer is to create an execution plan for the user’s query. This is the plan that will determine how the user query will be executed.
Similarly, MS SQL Optimizer works on inbuilt exhaustive/heuristic algorithms. The goal is to minimize query run time. All the Optimizer algorithms are propriety of Microsoft and a secret. Although, below are the high-level steps performed by MS SQL Optimizer. Searches of Optimization follows three phases as shown in the below diagram:
Phase 0: Search for Trivial Plan:
This is also known as Pre-optimization stage.
For some cases, there could be only one practical, workable plan, known as a trivial plan. There is no need for creating an optimized plan. The reason is, searching more would result in finding the same run time execution plan. That too with the extra cost of Searching for optimized Plan which was not required at all.If no Trivial plan found, then 1st Phase starts
Phase 1: Search for Transaction processing plans
This includes the search for Simple and Complex Plan.
Simple Plan Search: Past Data of column and Index involved in Query, will be used for Statistical Analysis. This usually consists but not restricted to one Index Per table. Still, if the simple plan is not found, then more complex Plan is searched. It involves Multiple Index per table..
Phase 2: Parallel Processing and Optimization.
If none of the above strategies work, Optimizer searches for Parallel Processing possibilities. This depends on the Machine’s processing capabilities and configuration If that is still not possible, then the final optimization phase starts. Now, the final optimization aim is finding all other possible options for executing the query in the best way. Final optimization phase Algorithms are Microsoft Propriety.
Storage Engine
The work of the storage Engine is to store data in storage system like disk or san and retrieve the data when needed before we deep dive into storage engine , lets have a look at how data is stored database and type of tiles available
Data File and Extent
Data File, physically stores data in the form of data pages, with each data page having a size of 8KB, forming the smallest storage unit in SQL Server. These data pages are logically grouped to form extents. No object is assigned a page in SQL Server.
The maintenance of the object is done via extents. The page has a section called the Page Header with a size of 96 bytes, carrying the metadata information about the page like the Page Type, Page Number, Size of Used Space, Size of Free Space, and Pointer to the next page and previous page, etc.
Primary file
Every database contains one Primary file.
This store all important data related to tables, views, Triggers, etc.
Extension is .mdf usually but can be of any extension.
Secondary file
Database may or may not contains multiple Secondary files.
This is optional and contain user-specific data.
Extension is .ndf usually but can be of any extension.
Log file
Also known as Write ahead logs.
Extension is .ldf
Used for Transaction Management.
This is used to recover from any unwanted instances. Perform important task of Rollback to uncommitted transactions.
Access Method
It acts as an interface between query executor and Buffer Manager/Transaction Logs.
Access Method itself does not do any execution.
The first action is to determine whether the query is:
Select Statement (DDL)
Non- Select Statement (DDL & DML)
Depending upon the result, the Access Method takes the following steps:
If the query is DDL, SELECT statement, the query is pass to the Buffer Manager for further processing.
And if query if DDL, NON-SELECT statement, the query is pass to Transaction Manager. This mostly includes the UPDATE statement
Buffer Manager
Buffer manager manages core functions for modules below:
Plan Cache
Data Parsing: Buffer cache & Data storage
Dirty Page
We will learn Plan, Buffer and Data cache in this section. We will cover Dirty pages in the Transaction
Plan Cache
Existing Query plan: The buffer manager checks if the execution plan is there in the stored Plan Cache. If Yes, then query plan cache and its associated data cache is used.
First time Cache plan: Where does existing Plan cache come from?
If the first-time query execution plan is being run and is complex, it makes sense to store it in in the Plane cache. This will ensure faster availability when the next time SQL server gets the same query. So, it’s nothing else but the query itself which Plan execution is being stored if it is being run for the first time.