Skip to main content

classification of database indexing

  • Dense Index: For every records we are having one entry in the index file.
  • Sparse Index: Only one record for each block we will have it in the index file.
Primary Index (Primary Key + Ordered Data) A primary index is an ordered file whose records are fixed length size with 2 fields. First field is same as primary key and the second field is pointer to the data block. 



  • Here index entry is created for first record of each block, called ‘block anchor’.
  • Number of index entries thus are equal to number of blocks
  • Average number of block accesses are = logB (worst case) + 1 (best case), 
    So on average it will be O(logB)
  • The type of index is called sparse index because index is not created for all the records, only the first records of every block the entry is made into the index file.

Example on Primary Index
Number of Records = 30000, Block Size = 1024 Bytes, Strategy = Unspanned, Record Size = 100 Bytes, Key Size = 6 Bytes, Pointer Size = 9 Bytes, Then find average number of block accesses with or without index.
Data Records that can fit in 1 Block = 1024/100 = 10.24 block but we have to take the whole number so
we have to take only 10 records. 
10 records can put in 1 block,
Total records = 30000, so total number of blocks = 30000/10 = 300 Blocks.
So without index we need ceil [log3000] = 12 blocks to access for finding the record. With Index Size of index = 15 Bytes (6+4)
Index Record / Block = floor [1024/15] = 68 [because strategy is unspanned]
Index Records = Number of data blocks = 3000
So no of index blocks = ceil (3000/68) = 45
So average number of block accesses = ceil [log45] + 1 = 6 + 1 = 7 


Clustered Index

Clustered Index is an ordered file with two fields, non-key and block pointer.
Clustering Index is created on data file whose file records are physically ordered on a non-key field which does not have distinct value for each record, that field is called clustering field.



Explanation
  • Index Entry is created for each distinct value of clustering field.
  • Block pointer points to first block where key is available
  • Type of index is dense and sparse both.
Average Number of block accesses required >= logB + 1


Secondary Index
This a type of dense Index.
  • Secondary Index provides a secondary means of accessing a file for which some primary access already exists.
  • Secondary Index may be a key or a non-key
  • Index entry is created for each record in data file
  • Type of secondary index is Dense

Index fields are key (unordered) and block pointer

Example on Secondary Index
Number of Records = 30000, Block Size = 1024 Bytes, Strategy = Unspanned, Record Size = 100 Bytes, Key Size = 6 Bytes, Pointer Size = 9 Bytes, Then find average number of block accesses with or without index.


Without Index To search we need (30000 + 1) / 2 = 15001 Block accesses. With Index Size of index = 15 Bytes (6+4)
Index Record / Block = floor [1024/15] = 68 [because strategy is unspanned]
Index Records = Total Number of Records = 30000 So no of index blocks = ceil (30000/68) = 442
So average number of block accesses = ceil [log442] + 1 = 


Example 2  (On Multi Level Indexing)

Data Records = 16384, Size of Record = 32 Bytes, Storage Strategy = Unspanned, Block Size = 1024 Bytes, Size of key = 6 Bytes, Block Pointer = 10 Bytes



Answer:
Index Size = key size + block size = 16 Bytes
Index Records per block = 1024 / 16 = 64
Total Block required = Total record / record per block = 16384 / 64 = 256 This was first level indexing.
Second level Index:
Here we will use sparse index. Number of record = Number of blocks of 1st level index = 256
Number of blocks required = 256 / 64 = 4 Blocks
So to search a record how many blocks will be required = log4 + 1 + 1 = 4 Accesses.

Pictorial Representation



Comments

Post a Comment

Popular posts from this blog

Goodness of Fit Test for normal and poisson distribution

Meaning of Goodness of fit test: We find out which distribution fits the sample data the most. And this is achieved using chi-square distribution (Snedecor and Cochran, 1989). How to apply: There are 4 steps to follow: State the hypothesis:  Data follows a distribution or not Criteria to reject null hypothesis: if Χ 2  > Χ 2 (k,1-α) then reject null hypothesis. Analyze sample data: Compute the chi-square value using below formula: ∑(Oi- Ei) 2 /Ei        : Oi is observed frequency and Ei is expected frequency Interpret the results: Declare the results after comparing the values of Χ 2 and Χ 2 (k,1-α), where k is degree of freedom and  α is significance level. Degree of Freedom: It is  =  n - 1 - m m: number of parameter in the distribution. So in case of normal distribution m is 2 ( μ ,α) and in case of poisson dist. m is = 1 ( λ). Example 1: Goodness of fit test for Normal Distribution Year wise d...

Red Black Tree Insertion in Java

Red Black Tree is a binary search tree. With extra property with each node having color either RED or BLACK. It has following properties: Every node is either RED or BLACK Root element will have BLACK color. Every RED node has BLACK children. Every path from root to leaf will have same number of BLACK nodes. We have few theorems over RB tree: Note: Height of RB tree is θ(logn). Insertion: Insertion in RB tree is little tricky as compared to BST, we have following cases to insert a node in RB tree: When tree is null just insert the node and it will be root, color will be BLACK. To insert an element first find its position: if key <= node.key then move to left else move to right Once we find the correct place to insert, we insert the node with color RED. But if the parent of the node just inserted has color RED then it will violate 3 property. So we have to fix this issue. We call it double RED problem, we resolve it using following few cases. DOUBLE R...