This section presents informal definitions and basic concepts that
will be used throughout the document. This is intended to clarify
the meaning of certain terms which are used inconsistently in
the virus field. However, this section is not intended as a primer on viruses.
Additional background information and an extensive ``Suggested Reading''
list may be found in NIST Special Publication 500-166 [].
A virus is a self-replicating code segment which must be attached to
a host executable.
When the host is executed, the virus
code also executes. If possible, the virus will replicate by
attaching a copy of itself to another executable. The virus may
include an additional ``payload'' that
triggers when specific conditions are met. For example, some viruses
display a message on a particular date.
A Trojan horse is a program that performs a desired task, but
also includes unexpected (and undesirable) functions. In this respect, a
Trojan horse is similar to a virus, except a Trojan horse does not
replicate. An example of a Trojan horse would be an editing program for a
multi-user system which has been modified to randomly delete one of the
user's files each time that program is used.
The program would perform its normal, expected function (editing),
but the deletions are unexpected and undesired. A host program that has
been infected by a virus is often described as a Trojan horse. However, for
the purposes of this document, the term Trojan horse will exclude
virus-infected programs.
A worm is a self-replicating program. It is self-contained and
does not require a host program. The program creates the copy and
causes it to execute; no user intervention is required. Worms
commonly utilize network services to propagate to other computer systems.
A variant is a virus that is generated by modifying a known virus.
Examples are modifications that add functionality or evade detection.
The term variant is usually applied only when the modifications are minor
in nature. An example would be changing the trigger date from Friday the 13th
to Thursday the 12th.
An overwriting virus will destroy code or data in the host program
by replacing it with the virus code. It should be noted that most viruses
attempt to retain the original host program's code and functionality after
infection because the virus is more likely to be detected and deleted if the
program ceases to work. A non-overwriting virus
is designed to append the
virus code to the physical end of the program or to move the original code
to another location.
A self-recognition procedure is a technique whereby a virus
determines whether or not an executable is already infected. The
procedure usually involves searching for a particular value at a known
position in the executable. Self-recognition is required if the virus
is to avoid multiple infections of a single executable. Multiple
infections cause excessive growth in size of infected executables and
corresponding excessive storage space, contributing to the detection
of the virus.
A resident virus installs itself as part of the operating
system upon execution of an infected host program. The virus will
remain resident until the system is shut down. Once installed in memory, a
resident virus is available to infect all suitable hosts that are accessed.
A stealth virus is a resident virus that attempts to evade detection
by concealing its presence in infected files. To achieve this, the
virus intercepts system calls which examine the contents or attributes of
infected files. The results of these calls must be altered to correspond
to the file's original state. For example, a stealth virus might
remove the virus code from an executable when it is read (rather than
executed) so that an anti-virus software package will examine the original,
uninfected host program.
An encrypted virus has two parts: a small decryptor and the
encrypted virus body. When the virus
is executed, the decryptor will execute first and decrypt the virus body.
Then the virus body can execute, replicating or becoming resident.
The virus body will include an encryptor to apply during replication.
A variably encrypted virus
will use different encryption
keys or encryption algorithms. Encrypted viruses are more
difficult to
disassemble and study since the researcher must decrypt the code.
A polymorphic virus creates copies during replication that are
functionally equivalent but have distinctly different byte streams.
To achieve this, the virus may randomly insert superfluous instructions,
interchange the order of independent instructions, or choose from
a number of different encryption schemes. This variable quality makes
the virus difficult to locate, identify, or remove.
A research virus is one that has been written, but has never been
unleashed on the public. These include the samples that have been sent
to researchers by virus writers. Viruses that have been seen outside the
research community are termed ``in the wild.''
It is difficult to determine how many viruses exist. Polymorphic viruses and minor variants complicate the equation. Researchers often cannot agree whether two infected samples are infected with the same virus or different viruses. We will consider two viruses to be different if they could not have evolved from the same sample without a hardware error or human modification.