Module – 2 : File Organisation and Processing:
Fields, Records, Files, Type of files, Serial,
Sequential, Index
Sequential and Random files, File Organisations, Batch Processing, Real time
processing, Time
sharing, Multi Processing, Multi Programming, Client Serves processing.
Fields: A field
in Microsoft Access is a
piece of information related to a single person or thing. Related fields are grouped together to form a record.
Records: In a database, a record (sometimes called a row) is a group of fields
within a table that are relevant to a specific entity. For example, in a table
called customer contact information, a row would likely contain fields such as:
ID number, name, street address, city, telephone number and so on
Files:
A file is a logical
collection of information stored on secondary storage such as hard disk.
Physically, a file is the smallest allotment of secondary storage devices e.g.
disk. Logically, a file is a sequence of logical records i.e. a sequence of
bits and bytes. Files can be used to contain data and programs (both source and
object program). Data files can be numeric, alphabetic, alphanumeric or binary.
A file has various attributes like name, type, location, size, protection, time
and date of creation etc.
Type of files:
In order to support
different types of files, operating systems support two part file names. The
two parts are: name and an extension. Both are separated by a period (dot). For
example, a name of a file can be program C.
File Type
|
Extension
|
Meaning
|
Execute
file
|
.exe,
.com, .bin
|
Read
to run machine language program
|
Object
file
|
.obj,
.o
|
Compiled,
machine language but not linked
|
Source
code file
|
.c,
.cc, .java, .pas, .asm, .a, .ftn
|
Represents
source code in different languages such as c, java, pascal, assembly language
of fortran.
|
Batch
file
|
.bat,
.sh
|
Command
to the command interpreter.
|
Text
file
|
.txt,
.doc
|
Documentation
|
Library
file
|
.lib,
.dll
|
Libraries
of routines for programmers
|
Backup
file
|
.bak
|
Used
for taking backup of some program file
|
Multimedia
file
|
.mpeg,
.mov, .rm
|
Binary
file containing audio or audio/video information
|
Access Methods:
Files are used to
store data. The information present in the file can be accessed by various
methods. Thus, the way of retrieving data from a file is known as access
method. Different systems use different access methods. The various access
methods used are:
1.
Sequential
access
It is the simplest and most commonly used access method.
Information in the file is accessed in the order it is stored in the file i.e.
one record after the other. Starting at the beginning to the end of the file.
2.
Direct
access
In direct access method it is possible to access the records of a
file in any order. The various records can be read or write randomly. In this
way the records can be accessed by key, rather than by position.
3.
Indexed
access
In this
method, an index is created for the file. This index contains pointer for
various blocks of a file, just like a index in a back of a book. If we want to
find a record of a file, first the index is searched and then the pointer from
index is used to access that file. In this way, a required record is found.
File Organisations
It is used
to determine an efficient file organization for each base relation. For
example, if we want to retrieve student records in alphabetical order of name,
sorting the file by student name is a good file organization. However, if we
want to retrieve all students whose marks is in a certain range, a file ordered
by student name would not be a good file organization. Some file organizations
are efficient for bulk loading data into the database but inefficient for
retrieve and other activities.
Types of
File Organization
In order to make effective selection of file
organizations and indexes, here we present the details different types of file
Organization. These are:
• Heap File Organization: An unordered file, sometimes called a heap file,
is the simplest type of file organization.
Records
are placed in file in the same order as they are inserted. A new record is
inserted in the last page of the file; if there is insufficient space in the
last page, a new page is added to the file. This makes insertion very
efficient.
• Hash File Organization: In a hash file, records are not stored sequentially
in a file instead a hash function is used to calculate the address of the page
in which the record is to be stored. The field on which hash
function is calculated is called as Hash field and if that field acts as the
key of the relation then it is called as Hash key.
• Indexed Sequential Access Methods (ISAM) File Organization: In an ISAM system, data is
organized into records which are composed of fixed length fields. Records are
stored sequentially. A secondary set of hash tables known as indexes contain
"pointers" into the tables, allowing individual records to be
retrieved without having to search the entire data set. It is a data structure that
allows the DBMS to locate particular records in a file more quickly and thereby
speed response to user queries.
• B+-
tree File Organization: B+-tree is a more versatile
storage structure than hashing. It supports retrievals based on exact key
match, pattern matching, range of values, and part key specification. The
B+-tree index is dynamic, growing as the relation grows.
• Cluster
File Organization:
Some DBMSs, such as Oracle, support clustered and
non-clustered tables. Clusters are group of one or more tables physically
stored together because they share common columns and are often used together.
Indexed
Clusters
In an index cluster, records with the same cluster key are stored
together. Oracle suggests using indexed clusters when:
• Queries retrieve records over a range of cluster key value;
• Clustered tables may grow unpredictable.
Cluster can improve performance of retrieval, depending on the
data distribution and what SQL operations are most often performed on the data
Hash
Clusters
Hash clusters also cluster table data in a manner similar to index
clusters. However, a record is stored in a hash cluster based on the result of
applying a hash function to the record's cluster key value. All records with
the same hash key value are stored together on disk.
Batch Processing: In Batch processing system, the various jobs of the users are
collected in a queue. This process is known as spooling. SPOOLING is the short form of Simultaneous Peripheral Operations On Line.
Users didn’t interact directly with computer system;
they prepare their job that consisted of the program, data and some control
information. This job was usually in form of punched cards. The users submit then job to a computer operator.
When batches of programs have been collected, the operator loads this batch of
programs into the computer at one time
where they are executed one after the other. Finally, the operators retrieve
the output of these jobs and return them to the concerned users. In this way
many different jobs are processed, one
after the other without any interaction from the users during program execution.
- · The batch processing operating system was called a monitor that resides in the main memory. Such a portion of main memory is known as resident monitor.
- · The batch monitor executes batches of job at definite interval of time.
- · The batch monitor accepts the commands for initializing, processing and terminating a batch.
Real Time processing,
In a real time
operating system, a job is to be completed within the rigid time constraints
otherwise job loses its meaning. A real time system functions correctly only if
it returns the correct result within its time constraints. Thus, in a real-time
system, the correctness of the computation not only depends upon the logical
correctness of the computation but also upon the time at which the result is
produced. A real time system is often used as a central device in a dedicated
application like fuel-injection system, robotics, air-traffic control and
medical imaging systems, systems that control scientific experiment, industrial
control system and weapon systems.
Time sharing
Time sharing refers
to the allocation of computer resources in a time dependent fashion to several
programs simultaneously. A time sharing system has many user terminals that are
connected to same computer simultaneously. Using these terminals, different
users can work on a system at the same time. In timesharing system, the CPU
time is divided among all the users on a scheduled basis. Each user program is
allocated a very short period of CPU time one-by-one, beginning from the first
user program and proceeding the last one, and then again beginning from the
first one. This short period of time during which user gets the attention of
the CPU is known as a Time Slice. Time slot
or Quantum. Thus, in timesharing, when the CPU is allocated to a user
program, the user uses the CPU for the period of time slot. It releases the CPU
under any of the following three conditions:-
1. When
the allotted time slice expires
2. When
the program needs to perform I/O operations.
3. When
the execution of the program is over during the time slice.
Even though it may
appear that several users are using computer system at the same time, a single
CPU system can only execute one instruction at a time. Thus, like
multiprogramming, even with a timesharing system, only one program can be in
control of the CPU at any given time.
Multi Processing
Multiprocessor system
is the system that contains two or more processors or CPUs and has ability to
simultaneously execute several programs. Hence, the name ‘multi-processor’.
In such a system, multiple
processors share the clock, bus, memory and peripheral devices. A
multiprocessor system is also known as parallel
system.
In such a system,
instructions from different and independent programs can be processed at the
same instant of time by different CPUs.
In this system, the
CPUs may simultaneously execute different instructions from the same program.
Multi Programming
Multiprogramming
operating system allows multiple users to execute multiple programs using a
single CPU concurrently i.e. at the same time. In multiprogramming several
processes are kept in the main memory and CPU
execute all these processes concurrently. It means, the CPU immediately
switches from one process to next that are ready to be executed.
The advantages of
Multiprogramming are:
1.
Increased
throughput:
Throughput is increased by utilizing the idle-time of CPU for
running other programs that are already present in the main memory.
2.
Lowered
Response Rate:
Response time is lowered by recognizing the priority of a job as
it enters the system and by processing a jobs on a priority basis.
3.
Ability
to assign priorities to Jobs:
Most multiprogramming systems have schemes for setting priorities
for rotating programs. They specify when the CPU will rotate to another
program, and which program it will rotate to.
Client Server System
# Distributed System
A distributed system
is a collection of processors located in geographical dispersed physical
location. In this system, the workload is divided between two or more computers
that are linked together by a communication network. That is, the different
processors communicate using communication links, such as telephone lines and
buses. The processors in a distributed system vary in size and function. They
may include small microprocessors, workstations microcomputers, mainframe
computers and large general purpose computers. The various processors are also
called as sites, nodes, hosts or machine. The purpose of distributed system is
to provide an efficient and convenient environment for sharing of resources.
No comments:
Post a Comment