Database query processing pdf


















This iteration produces no output, include a2, b4 and all tuples within w of both of them. This new since no tuples join together in the constructed rectangles. Since, the output tuples of this iteration must in C.

Since tuple c3 satisfies the join predicate and also lies inside include the new tuple then duplicate tuples cannot be produced. Our W-join algorithm is presented as a multi-way W-join. Exploiting shared execution for these queries will This work was supported in part by the National Science significantly improve system scalability. One example of such an operator is the window- Crane.

However sharing of the window-join is not straightforward, especially if the queries are interested in different windows over the data streams. In [15] we investigated different approaches to 9. We Author phone numbers and email addresses are as follows: Walid introduced two new scheduling approaches, the shortest-window- Aref , aref cs.

Ann Christine Catlin first and the maximum-query-throughout, for a shared window- , acc cs. Ahmed Elmagarmid join, and compared their performance with the largest-window- , ake cs. Moustafa Hammad , only scheduling technique. The performance of the algorithms mhammad cs. Ihab Ilyas Mirette with respect to reduced response time is more prominent when the Marzouk , marzouk cs.

Thanaa streams possess bursty arrival rates. Ghanem , ghanemtm cs. Although the algorithms for shared window-joins target general data streams, these problem are extremely important in the video Since video stream are usually encoded in a [1] Aref, W. A frames from a video stream. Furthermore, when the outcome of video database management system for advancing the video query is expected to be streamed as a new video, the video database research.

In Proc. Nov Therefore, proposed Tempe, Arizona. Finally, for processing on-line feeds of video streams, [2] Aref, W. Rezgui, A. A distributed server for continuous media.

Feb Video-based applications require strong video database support, Mar 1 San Jose, California. A video query processor should support video-based operations for search by content and B. Independent quantization: An index data type. The VDBMS query capability was designed to support a full range of functionality for video processing, based on the compression technique for high-dimensional data development and integration of video as an abstract database data spaces.

We described two query operators for the VDT Engineering. San Diego, CA. February which implement the rank-join and stop-after algorithms. We then Quality and expressed video query processing as continuous queries over video data streams.

The stream data type SDT was developed of service in multimedia digital libraries. From this viewpoint, we [6] Bertino, E. An access control model for video database including video, through the new VDBMS query execution systems.

We described the implementation of several algorithms and Knowledge Management. An extended queries over video streams. We also described the window-join authorization model. IEEE Trans. Bonnet , J. Gehrke and P. Towards Sensor Database Systems. Jan Journal Tucson, Arizona. May New York, August, Large Data Bases. Hong Kong, China.

Optimal [20] Ilyas, I. An extensible index for spatial aggregation algorithms for middleware. May and Scientific Databases. July Multiview: Multi-level video structure for high dimensional nearest neighbor content representation and retrieval. Journal of queries. October Madden, M. Franklin, J. Hellerstein and W. The design of an acquisitional query processor Optimizing multi-feature queries for image databases. Cairo, Egypt.

September Query processing issues in [14] Hammad, M. Search- image multimedia databases. Sydney, Australia. March , Multimedia and Expo. Lausanne, Switzerland. August [24] Natsev, A. Supporting incremental join queries on ranked [15] Hammad, M. Scheduling for shared window joins over data Bases. Rome, Italy. Predator: A resource for database Data Bases. Stream [26] R.

Morgan Kaufmann, Network Databases. Jul San Diego, Generalized search trees for database systems. Zurich, Switzerland.

September , Note that the start states V0 and W0 are equal. Consider M running on Derr. At some point, however, there may be some cursor c in block Bij Therefore, M will go through some successive states Vi i. Therefore, M will go through some successive states Wi i. Hence, in the run of M on Derr , each time a cursor c has just left block Bij00 , the machine is in state VtIc.

Let d be the last cursor that leaves block Bij When d has just left this block, M is in state VtId. After the last cursor has left block Bij00 , the run of M on Derr finishes exactly as the run of M on D I after the last cursor has left block Bij00 and on D J for that matter. Finally, this completes the proof of Theorem 5. In the statement of Theorem 5. This is interesting in particular because we can use a substantial number of cursors, polynomially related to the input size, to store data elements and still obtain the lower bound result.

If the background provides functions for setting and checking the i-th bit of a bitstring, the query RST is easily computed by an O n -FCM. By a variation of the proof of Theorem 5. Note that Theorems 5. A natural question arising from Corollary 4. The answer is affirmative: Proposition 6. Using an auxiliary cursor over sort 2,1 , 1,1 R , we check this for the first subset in the list.

Then, using two cursors over sort 1,2 , 1,1 R , we check whether the second subset equals the first, the third equals the second, and so on. Note that, using an Ehrenfeucht-game argument, one can indeed prove that the query from Proposition 6.

Is there a boolean relational algebra query that cannot be computed by any composition of O 1 -FCMs or even o n -FCMs and sorting operations? Under a plausible assumption from parameterized complexity theory [10, 8] we can answer the O 1 -version of this problem affirmatively for FCMs with a decidable background structure.

There are, however, many queries that are not definable in relational al- gebra, but computable by FCMs with sorting. By their sequential nature, FCMs can easily compare cardinalities of relations, check whether a directed graph is regular, or do modular counting—and all these tasks are not de- finable in relational algebra.

One might be tempted to conjecture, however, that FCMs with sorting cannot go beyond relational algebra with counting and aggregation, but this is false: Proposition 6. However, since deterministic reachability is a non-local query, it is not expressible in first-order with counting and aggregation see [17].

References [1] G. Aggarwal, M. Datar, S. Rajagopalan, and M. On the streaming model augmented with a sorting primitive. Alon, Y. Matias, and M. The space complexity of ap- proximating the frequency moments. Journal of Computer and System Sciences, —, Altinel and M. Efficient filtering of XML documents for selective dissemination of information. Babcock, S. Babu, M. Datar, R. Motwani, and J. Models and issues in data stream systems.

Bar-Yossef, M. Fontoura, and V. Buffering in query eval- uation over XML streams. Chan, P. Felber, M. Garofalakis, and R. Downey and M. Parameterized Complexity. Springer, Degrees of acyclicity for hypergraphs and relational database schemes. Journal of the ACM, —, Flum and M. Parameterized Complexity Theory. Garcia-Molina, J. Ullman, and J. Database System Im- plementation. Prentice Hall, Green, G. Miklau, M. Onizuka, and D.

Processing XML streams with deterministic automata. Grohe, C. Koch, and N. Tight lower bounds for query processing on streaming and external memory data. Grohe and N. Lower bounds for sorting with few ran- dom accesses to external memory. Gupta and D. Stream processing of XPath queries with predicates. Evolving algebras Lipari guide. Oxford University Press, Hella, L. Libkin, J. Nurmonen, and L. Logics with aggregate operators. Journal of the ACM, 48 4 —, Henzinger, P.

Raghavan, and S. Computing on data streams. External Memory Algorithms. One-way multihead deterministic finite automata. Acta Informatica, —, Law, H. Wang, and C. Query languages and data models for database sequences and data streams.

Leinders and J. Van den Bussche. On the complexity of division and set joins in the relational algebra. Leinders, M. Marx, J. Tyszkiewicz, and J. The semijoin algebra and the guarded fragment. Journal of Logic, Language and Information, 14 3 —, Leinders, J.

On the expressive power of semijoin queries. Information Processing Letters, 91 2 —98, Elements of Finite Model Theory. Data Streams: Algorithms and Applications. As density decreases, selectivity of a value increases. The SQL Server Query Optimizer is important because it enables the database server to adjust dynamically to changing conditions in the database without requiring input from a programmer or database administrator. This enables programmers to focus on describing the final result of the query.

They can trust that the SQL Server Query Optimizer will build an efficient execution plan for the state of the database every time the statement is run. SQL Server evaluates some constant expressions early to improve query performance. This is referred to as constant folding. An exception is made for large object types. If the output type of the folding process is a large object type text,ntext, image, nvarchar max , varchar max , varbinary max , or XML , then SQL Server does not fold the expression.

All other expression types are not foldable. In particular, the following types of expressions are not foldable:. Benefits of this constant folding include the following:. On the other hand, if dbo. For more information on parameterization, see Forced Parameterization later in this article. In addition, some expressions that are not constant folded but whose arguments are known at compile time, whether the arguments are parameters or constants, are evaluated by the result-set size cardinality estimator that is part of the optimizer during optimization.

The following operators are also evaluated at compile time if all their inputs are known:. No other functions or operators are evaluated by the Query Optimizer during cardinality estimation. However, at optimization time, the value of the parameter is known. This allows the Query Optimizer to accurately estimate the size of the result set, which helps it select a good query plan. The process of identifying these rows is the same process used to identify the source rows that contribute to the result set of a SELECT statement.

The Relational Engine may need to build a worktable to perform a logical operation specified in an Transact-SQL statement. Worktables are internal tables that are used to hold intermediate results. For example, if an ORDER BY clause references columns that are not covered by any indexes, the Relational Engine may need to generate a worktable to sort the result set into the order requested.

Worktables are also sometimes used as spools that temporarily hold the result of executing a part of a query plan. Worktables are built in tempdb and are dropped automatically when they are no longer needed. The logic used by the SQL Server Query Optimizer to decide when to use an indexed view is similar to the logic used to decide when to use an index on a table. If the data in the indexed view covers all or part of the Transact-SQL statement, and the Query Optimizer determines that an index on the view is the low-cost access path, the Query Optimizer will choose the index regardless of whether the view is referenced by name in the query.

When an Transact-SQL statement references a nonindexed view, the parser and Query Optimizer analyze the source of both the Transact-SQL statement and the view and then resolve them into a single execution plan. There is not one plan for the Transact-SQL statement and a separate plan for the view. Based on this view, both of these Transact-SQL statements perform the same operations on the base tables and produce the same results:.

Hints that are placed on views in a query may conflict with other hints that are discovered when the view is expanded to access its base tables. When this occurs, the query returns an error. For example, consider the following view that contains a table hint in its definition:.

AddrState in the query is propagated to both tables Person. Address and Person. StateProvince in the view when it is expanded.

Hints can propagate through levels of nested views. When v1 is expanded, we find that view v2 is part of its definition. For example, the following query selects from three tables and a view:. As with any index, SQL Server chooses to use an indexed view in its query plan only if the Query Optimizer determines it is beneficial to do so.

Indexed views can be created in any edition of SQL Server. For clarification, see the documentation for each version. Other than the requirements for the SET options and table hints, these are the same rules that the Query Optimizer uses to determine whether a table index covers a query. Nothing else has to be specified in the query for an indexed view to be used.

A query does not have to explicitly reference an indexed view in the FROM clause for the Query Optimizer to use the indexed view. If the query contains references to columns in the base tables that are also present in the indexed view, and the Query Optimizer estimates that using the indexed view provides the lowest cost access mechanism, the Query Optimizer chooses the indexed view, similar to the way it chooses base table indexes when they are not directly referenced in a query.

The Query Optimizer may choose the view when it contains columns that are not referenced by the query, as long as the view offers the lowest cost option for covering one or more of the columns specified in the query. The Query Optimizer expands the definition of the view into the query at the start of the optimization process.

Then, indexed view matching is performed. The indexed view may be used in the final execution plan selected by the Query Optimizer, or instead, the plan may materialize necessary data from the view by accessing the base tables referenced by the view.

The Query Optimizer chooses the lowest-cost alternative. However, you should let the Query Optimizer dynamically determine the best access methods to use for each query.

If the query that makes up the view contains any table hints, these hints are propagated to the underlying tables. This process is explained in more detail in View Resolution. As long as the set of hints that exists on the underlying tables of the view are identical to each other, the query is eligible to be matched with an indexed view.

Most of the time, these hints will match each other, because they are being inherited directly from the view. However, if the query references tables instead of views, and the hints applied directly on these tables are not identical, then such a query is not eligible for matching with an indexed view.

Generally, when the Query Optimizer matches an indexed view to a query, any hints specified on the tables or views in the query are applied directly to the indexed view. If the Query Optimizer chooses not to use an indexed view, any hints are propagated directly to the tables referenced in the view. For more information, see View Resolution. This propagation does not apply to join hints. They are applied only in their original position in the query. Join hints are not considered by the Query Optimizer when matching queries to indexed views.

If a query plan uses an indexed view that matches part of a query that contains a join hint, the join hint is not used in the plan. Hints are not allowed in the definitions of indexed views. In compatibility mode 80 and higher, SQL Server ignores hints inside indexed view definitions when maintaining them, or when executing queries that use indexed views. Although using hints in indexed view definitions will not produce a syntax error in 80 compatibility mode, they are ignored.

The SQL Server query processor optimizes the performance of distributed partitioned views. The most important aspect of distributed partitioned view performance is minimizing the amount of data transferred between member servers. SQL Server builds intelligent, dynamic plans that make efficient use of distributed queries to access data from remote member tables:.

For example, consider a system where a customers table is partitioned across Server1 CustomerID from 1 through , Server2 CustomerID from through , and Server3 CustomerID from through The execution plan for this query extracts the rows with CustomerID key values from through from the local member table, and issues a distributed query to retrieve the rows with key values from through from Server2. For example, consider this stored procedure:. Because the key value cannot be predicted, the query processor also cannot predict which member table will have to be accessed.

To handle this case, SQL Server builds an execution plan that has conditional logic, referred to as dynamic filters, to control which member table is accessed, based on the input parameter value.

Assuming the GetCustomer stored procedure was executed on Server1, the execution plan logic can be represented as shown in the following:. SQL Server sometimes builds these types of dynamic execution plans even for queries that are not parameterized. The Query Optimizer may parameterize a query so that the execution plan can be reused. If the Query Optimizer parameterizes a query referencing a partitioned view, the Query Optimizer can no longer assume the required rows will come from a specified base table.

It will then have to use dynamic filters in the execution plan. SQL Server stores only the source for stored procedures and triggers. When a stored procedure or trigger is first executed, the source is compiled into an execution plan. If the stored procedure or trigger is again executed before the execution plan is aged from memory, the relational engine detects the existing plan and reuses it. If the plan has aged out of memory, a new plan is built.

Therefore, the relational engine easily matches them with any existing execution plans. Stored procedure and trigger plans are easily reused. The execution plan for stored procedures and triggers is executed separately from the execution plan for the batch calling the stored procedure or firing the trigger. This allows for greater reuse of the stored procedure and trigger execution plans. SQL Server has a pool of memory that is used to store both execution plans and data buffers.

The percentage of the pool allocated to either execution plans or data buffers fluctuates dynamically, depending on the state of the system. The part of the memory pool that is used to store execution plans is referred to as the plan cache.

Compiled Plan or Query Plan The query plan produced by the compilation process is mostly a re-entrant, read-only data structure used by any number of users. It stores information about:. The order of these operators, which determines the order in which data is accessed, filtered, and aggregated.

In newer versions of the Database Engine, information about the statistics objects that were used for Cardinality Estimation is also stored. What support objects must be created, such as worktables or workfiles in tempdb.

No user context or runtime information is stored in the query plan. There are never more than one or two copies of the query plan in memory: one copy for all serial executions and another for all parallel executions. The parallel copy covers all parallel executions, regardless of their degree of parallelism. Execution Context Each user that is currently executing the query has a data structure that holds the data specific to their execution, such as parameter values.

This data structure is referred to as the execution context. The execution context data structures are reused, but their content is not. If another user executes the same query, the data structures are reinitialized with the context for the new user. The Transact-SQL statement qualifies as existing if it literally matches a previously executed Transact-SQL statement with a cached plan, character per character.

If no execution plan exists, SQL Server generates a new execution plan for the query. The execution plans for some Transact-SQL statements are not persisted in the plan cache, such as bulk operation statements running on rowstore or statements containing string literals larger than 8 KB in size.

These plans only exist while the query is being executed. In most systems, the minimal resources that are used by this scan are less than the resources that are saved by being able to reuse existing plans instead of compiling every Transact-SQL statement. The algorithms to match new Transact-SQL statements to existing, unused execution plans in the plan cache require that all object references be fully qualified. While in this example it is not required that the Person table is fully qualified to execute, it means that the second statement is not matched with an existing plan, but the third is matched:.

Changing any of the following SET options for a given execution will affect the ability to reuse plans, because the Database Engine performs constant folding and these options affect the results of such expressions:.

Queries and execution plans are uniquely identifiable in the Database Engine, much like a fingerprint:. A compiled plan can be retrieved from the plan cache using a Plan Handle , which is a transient identifier that remains constant only while the plan remains in the cache.

The plan handle is a hash value derived from the compiled plan of the entire batch. The plan handle for a compiled plan remains the same even if one or more statements in the batch get recompiled. If a plan was compiled for a batch instead of a single statement, the plan for individual statements in the batch can be retrieved using the plan handle and statement offsets.

The sys. For more information, see sys. The Transact-SQL text for a compiled plan can be retrieved from the sql manager cache using a SQL Handle , which is a transient identifier that remains constant only while at least one plan that references it remains in the plan cache.

The sql handle is a hash value derived from the entire batch text and is guaranteed to be unique for every batch. Like a compiled plan, the Transact-SQL text is stored per batch, including the comments.

The sql handle contains the MD5 hash of the entire batch text and is guaranteed to be unique for every batch. There is a 1:N relation between a sql handle and plan handles.

Such a condition occurs when the cache key for the compiled plans is different. This may occur due to change in SET options between two executions of the same batch. Now execute the stored procedure with a different parameter, but no other changes to execution context:. Notice the usecounts has increased to 2, which means the same cached plan was re-used as-is, because the execution context data structures were reused.

What this effectively means is that we have two plan entries in the cache corresponding to the same batch, and it underscores the importance of making sure that the plan cache affecting SET options are the same, when the same queries are executed repeatedly, to optimize for plan reuse and keep plan cache size to its required minimum. A common pitfall is that different clients may have different default values for the SET options.

Executing the same queries from these two clients will result in multiple plans as described in the example above. Execution plans remain in the plan cache as long as there is enough memory to store them. When memory pressure exists, the SQL Server Database Engine uses a cost-based approach to determine which execution plans to remove from the plan cache.

To make a cost-based decision, the SQL Server Database Engine increases and decreases a current cost variable for each execution plan according to the following factors. When a user process inserts an execution plan into the cache, the user process sets the current cost equal to the original query compile cost; for ad-hoc execution plans, the user process sets the current cost to zero.

Thereafter, each time a user process references an execution plan, it resets the current cost to the original compile cost; for ad-hoc execution plans the user process increases the current cost. For all plans, the maximum value for the current cost is the original compile cost. To determine which plans to remove, the SQL Server Database Engine repeatedly examines the state of each execution plan and removes plans when their current cost is zero. An execution plan with zero current cost is not removed automatically when memory pressure exists; it is removed only when the SQL Server Database Engine examines the plan and the current cost is zero.

When examining an execution plan, the SQL Server Database Engine pushes the current cost towards zero by decreasing the current cost if a query is not currently using the plan.

The SQL Server Database Engine repeatedly examines the execution plans until enough have been removed to satisfy memory requirements. While memory pressure exists, an execution plan may have its cost increased and decreased more than once.

When memory pressure no longer exists, the SQL Server Database Engine stops decreasing the current cost of unused execution plans and all execution plans remain in the plan cache, even if their cost is zero. The SQL Server Database Engine uses the resource monitor and user worker threads to free memory from the plan cache in response to memory pressure. The resource monitor and user worker threads can examine plans run concurrently to decrease the current cost for each unused execution plan.

The resource monitor removes execution plans from the plan cache when global memory pressure exists. It frees memory to enforce policies for system memory, process memory, resource pool memory, and maximum size for all caches. The maximum size for all caches is a function of the buffer pool size and cannot exceed the maximum server memory. The user worker threads remove execution plans from the plan cache when single cache memory pressure exists. They enforce policies for maximum single cache size and maximum single cache entries.

Starting with SQL Server A configuration change like this will log the following informational message in the error log:. Certain changes in a database can cause an execution plan to be either inefficient or invalid, based on the new state of the database.

SQL Server detects the changes that invalidate an execution plan and marks the plan as not valid. A new plan must then be recompiled for the next connection that executes the query. The conditions that invalidate a plan include the following:. Most recompilations are required either for statement correctness or to obtain potentially faster query execution plans.

In SQL Server versions prior to , whenever a statement within a batch causes recompilation, the entire batch, whether submitted through a stored procedure, trigger, ad-hoc batch, or prepared statement, was recompiled. Starting with SQL Server 9. Also, there are additional types of recompilations in SQL Server 9. Statement-level recompilation benefits performance because, in most cases, a small number of statements causes recompilations and their associated penalties, in terms of CPU time and locks.

These penalties are therefore avoided for the other statements in the batch that do not have to be recompiled. This xEvent occurs when a statement-level recompilation is required by any kind of batch. This includes stored procedures, triggers, ad hoc batches and queries.



0コメント

  • 1000 / 1000