Wolmer’s Trust High School for Girls
INFORMATION TECHNOLOGY (GENERAL PROFICIENCY)
Grade: 11 Teacher: Mrs. McCallum-Rodney
File organization and access mode
Introduction
You can organize your files as sequential, line-sequential, indexed, or relative. You should decide on the file organization and access modes when you design your program. The access mode defines how to reads and writes files, but not how files are organized.
Your file-management system handles the input and output requests and record retrieval from the input-output devices.
The following table summarizes file organization and access modes for files.
Table 1 File organization and access mode | |||
File organization |
Order of records |
Records can be deleted or replaced? |
Access mode |
Sequential |
Order in which they were written |
A record cannot be deleted, but its space can be reused for a same-length record. |
Sequential only |
Line-sequential |
Order in which they were written |
No |
Sequential only |
Indexed |
Collating sequence by key field |
Yes |
Sequential, random, or dynamic |
Basic file organization techniques
Given that a file consists, generally speaking, of a collection of records, a key element in file management is the way in which the records themselves are organized inside the file, since this heavily affects system performances as far as record finding and access. Note carefully that by ``organization'' we refer here to the logical arrangement of the records in the file (their ordering or, more generally, the presence of ``closeness'' relations between them based on their content), and not instead to the physical layout of the file as stored on a storage media, To prevent confusion, the latter is referred to by the expression ``record blocking''.
Choosing a file organization is a design decision, hence it must be done having in mind the achievement of good performance with respect to the most likely usage of the file. The criteria usually considered important are:
- Fast access to single record or collection of related records.
- Easy record adding/update/removal, without disrupting (1).
- Storage efficiency.
Needless to say, these requirements are in contrast with each other for all but the most trivial situations, and it's the designer job to find a good compromise among them, yielding and adequate solution to the problem at hand. For example, easiness of adding/etc. is not an issue when defining the data organization of a CD-ROM product, whereas fast access is, given the huge amount of data that this media can store. However, as it will become apparent shortly, fast access techniques are based on the use of additional information about the records, which in turn competes with the high volumes of data to be stored.
Three organization models will be considered:
- Sequential.
- Indexed-sequential.
- Indexed.
Sequential
This is the most common structure for large files that are typically processed in their entirety, and it's at the heart of the more complex schemes. In this scheme, all the records have the same size and the same field format, with the fields having fixed size as well. The records are sorted in the file according to the content of a field of a scalar type, called ``key''. The key must identify uniquely a records, hence different record have different keys. This organization is well suited for batch processing of the entire file, without adding or deleting items: this kind of operation can take advantage of the fixed size of records and file; moreover, this organization is easily stored both on disk and tape.
The key ordering, along with the fixed record size, makes this organization amenable to extensive search. However, adding and deleting records to this kind of file is a tricky process: the logical sequence of records typically matches their physical layout on the media storage, so to ease file navigation, hence adding a record and maintaining the key order requires a reorganization of the whole file.
Indexed sequential
An index file can be used to effectively overcome the above mentioned problem, and to speed up the key search as well. The simplest indexing structure is the single-level one: a file whose records are pairs key-pointer, where the pointer is the position in the data file of the record with the given key. Only a subset of data records, evenly spaced along the data file, are indexed, so to mark intervals of data records.
A key search then proceeds as follows: the search key is compared with the index ones to find the highest index key preceding the search one, and a linear search is performed from the record the index key points onward, until the search key is matched or until the record pointed by the next index entry is reached. In spite of the double file access (index + data) needed by this kind of search, the decrease in access time with respect to a sequential file is significant.
Consider, for example, the case of simple linear search on a file with 1,000 records. With the sequential organization, an average of 500 key comparisons are necessary (assuming uniformly distributed search key among the data ones). However, using and evenly spaced index with 100 entries, the number of comparisons is reduced to 50 in the index file plus 50 in the data file: a 5:1 reduction in the number of operations.
This scheme can obviously be hierarchically extended: an index is a sequential file in itself, amenable to be indexed in turn by a second-level index, and so on, thus exploiting more and more the hierarchical decomposition of the searches to decrease the access time. Obviously, if the layering of indexes is pushed too far, a point is reached when the advantages of indexing are hampered by the increased storage costs, and by the index access times as well.
Indexed
Why using a single index for a certain key field of a data record? Indexes can be obviously built for each field that uniquely identifies a record (or set of records within the file), and whose type is amenable to ordering. Multiple indexes hence provide a high degree of flexibility for accessing the data via search on various attributes; this organization also allows the use of variable length records (containing different fields).
It should be noted that when multiple indexes are used the concept of sequentiality of the records within the file is useless: each attribute (field) used to construct an index typically imposes an ordering of its own. For this very reason is typically not possible to use the ``sparse'' (or ``spaced'') type of indexing previously described. Two types of indexes are usually found in the applications: the exhaustive type, which contains an entry for each record in the main file, in the order given by the indexed key, and the partial type, which contain an entry for all those records that contain the chosen key field (for variable records only).