Intrusion Detection Working Group D. Curry
draft-ietf-idwg-idmef-xml-01.txt ISS
Expires: January 5, 2001 July 6, 2000
Intrusion Detection Message Exchange Format
Extensible Markup Language (XML) Document Type Definition
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC 2026 [1].
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Distribution of this memo is unlimited.
1. Abstract
The purpose of the Intrusion Detection Message Exchange Format
(IDMEF) is to define data formats and exchange procedures for sharing
information of interest to intrusion detection and response systems,
and to the management systems which may need to interact with them.
The goals and requirements of the IDMEF are described in [2].
This Internet-Draft describes a proposed implementation of the data
format component of the IDMEF, using the Extensible Markup Language
(XML) [3] to represent the class hierarchy defined by Debar, Huang
and Donahoo [4]. The rationale for choosing XML is explained, a
Document Type Definition (DTD) is developed, and examples are
provided.
An earlier version of this implementation was reviewed, along with
other proposed implementations, by the IDWG at its September, 1999
and February, 2000 meetings. At the February meeting, it was decided
that the XML solution was best at fulfilling the IDWG requirements.
The rationale for this decision is presented in [10].
Internet-Draft IDMEF XML DTD July 6, 2000
TABLE OF CONTENTS
1. Abstract ........................................................ 1
2. Conventions used in this document ............................... 5
3. Introduction .................................................... 5
3.1 The Extensible Markup Language ............................. 6
3.2 Rationale for Implementing IDMEF in XML .................... 6
3.3 The Debar/Huang/Donahoo IDMEF Class Hierarchy .............. 7
4. Use of XML in the IDMEF ......................................... 8
4.1 The IDMEF Document Prolog .................................. 8
4.1.1 XML Declaration ...................................... 8
4.1.2 XML Document Type Definition (DTD) ................... 9
4.1.3 IDMEF DTD Formal Public Identifier ................... 10
4.1.4 IDMEF DTD Document Type Declaration .................. 10
4.2 Character Data Processing in XML and IDMEF ................. 11
4.2.1 Character Entity References .......................... 11
4.2.2 Character Code References ............................ 12
4.2.3 White Space Processing ............................... 12
4.3 Languages in XML and IDMEF ................................. 12
4.4 Unrecognized Tags in IDMEF Messages ........................ 13
4.5 Digital Signatures ......................................... 13
5. IDMEF Data Types ................................................ 14
6. Structure of an IDMEF Message ................................... 15
6.1 The IDMEF-Message Root Element ............................. 16
6.2 The Message Type Elements .................................. 16
6.2.1 Alert ................................................ 16
6.2.1.1 CorrelationAlert ............................... 18
6.2.1.2 OverflowAlert .................................. 18
6.2.1.3 ToolAlert ...................................... 19
6.2.2 Heartbeat ............................................ 19
6.3 Time Elements .............................................. 20
6.3.1 Time ................................................. 20
6.3.2 DetectTime ........................................... 21
6.3.3 AnalyzerTime ......................................... 21
6.4 High-Level Entity Identification Elements .................. 21
6.4.1 Analyzer ............................................. 21
6.4.2 Source ............................................... 22
6.4.3 Target ............................................... 22
6.5 Low-Level Entity Identification Elements ................... 23
6.5.1 Address .............................................. 23
6.5.2 Classification ....................................... 24
6.5.3 Node ................................................. 25
6.5.4 Process .............................................. 26
6.5.5 Service .............................................. 27
6.5.5.1 SNMPService .................................... 27
Curry Expires: January 5, 2001 [Page 2]
Internet-Draft IDMEF XML DTD July 6, 2000
6.5.5.2 WebService ..................................... 28
6.5.6 User ................................................. 28
6.6 Simple Elements ............................................ 29
6.6.1 address .............................................. 29
6.6.2 alertid .............................................. 29
6.6.3 Arguments ............................................ 30
6.6.3.1 arg ............................................ 30
6.6.4 buffer ............................................... 30
6.6.5 cgi .................................................. 30
6.6.6 command .............................................. 30
6.6.7 community ............................................ 30
6.6.8 date ................................................. 30
6.6.9 dport ................................................ 31
6.6.10 Environment ......................................... 31
6.6.10.1 env ........................................... 31
6.6.11 gid ................................................. 31
6.6.12 group ............................................... 31
6.6.13 location ............................................ 31
6.6.14 method .............................................. 31
6.6.15 name ................................................ 32
6.6.16 netmask ............................................. 32
6.6.17 ntpstamp ............................................ 32
6.6.18 oid ................................................. 32
6.6.19 path ................................................ 32
6.6.20 pid ................................................. 32
6.6.21 portlist ............................................ 32
6.6.22 program ............................................. 32
6.6.23 protocol ............................................ 33
6.6.24 serial .............................................. 33
6.6.25 size ................................................ 33
6.6.26 sport ............................................... 33
6.6.27 time ................................................ 33
6.6.28 url ................................................. 33
6.6.29 uid ................................................. 33
6.7 Providing Additional Information ........................... 33
6.7.1 AdditionalData ....................................... 34
7. Examples ........................................................ 34
7.1 Denial of Service Attacks .................................. 35
7.1.1 The "teardrop" Attack ................................ 35
7.1.2 The "ping of death" Attack ........................... 36
7.2 Port Scanning Attacks ...................................... 37
7.2.1 Connection to a Disallowed Service ................... 37
7.2.2 Simple Port Scanning ................................. 38
7.3 Local Attacks .............................................. 39
7.3.1 The "loadmodule" Attack .............................. 39
7.3.2 The "phf" Attack ..................................... 41
7.4 System Policy Violation .................................... 42
7.5 Correlated Alerts .......................................... 44
7.6 Heartbeat .................................................. 45
8. Extending the IDMEF ............................................. 45
Curry Expires: January 5, 2001 [Page 3]
Internet-Draft IDMEF XML DTD July 6, 2000
8.1 Extending an Existing Attribute ............................ 46
8.2 Adding an Attribute ........................................ 47
8.3 Adding an Element .......................................... 48
9. The IDMEF Document Type Definition .............................. 48
10. Security Considerations ........................................ 60
11. References ..................................................... 61
12. Acknowledgments ................................................ 62
13. Author's Address ............................................... 62
Curry Expires: January 5, 2001 [Page 4]
Internet-Draft IDMEF XML DTD July 6, 2000
2. Conventions used in this document
The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT,"
"SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [5].
An "IDMEF application" is a program or program component that reads
and/or writes messages in the format specified by this memo.
An "IDMEF document" is a message that adheres to the requirements
specified by this memo, and that is exchanged by two or more IDMEF
applications. An "IDMEF message" is another term for an "IDMEF
document."
3. Introduction
The Intrusion Detection Message Exchange Format (IDMEF) [2] is
intended to be a standard data format that automated intrusion
detection systems can use to report alerts about events that they
have deemed suspicious. The development of this standard format will
enable interoperability among commercial, open source, and research
systems, allowing users to mix-and-match the deployment of these
systems according to their strong and weak points to obtain an
optimal implementation.
The most obvious place to implement the IDMEF is in the data channel
between an intrusion detection "analyzer" (or "sensor") and the
"manager" (or "console") to which it sends alarms. But there are
other places where the IDMEF can be useful:
+ a single database system that could store the results from a
variety of intrusion detection products would make it possible for
data analysis and reporting activities to be performed on "the
whole picture" instead of just a part of it;
+ an event correlation system that could accept alerts from a
variety of intrusion detection products would be capable of
performing more sophisticated cross-correlation and cross-
confirmation calculations than one that is limited to a single
product;
+ a graphical user interface that could display alerts from a
variety of intrusion detection products would enable the user to
monitor all of the products from a single screen, and require him
or her to learn only one interface, instead of several; and
+ a common data exchange format would make it easier for different
organizations (users, vendors, response teams, law enforcement) to
not only exchange data, but also communicate about it.
The diversity of uses for the IDMEF needs to be considered when
Curry Expires: January 5, 2001 [Page 5]
Internet-Draft IDMEF XML DTD July 6, 2000
selecting its method of implementation.
3.1 The Extensible Markup Language
The Extensible Markup Language (XML) [4] is a simplified version of
the Standard Generalized Markup Language (SGML), a text markup syntax
defined by the ISO 8879 standard. XML is gaining widespread
attention as a language for representing and exchanging documents and
data on the Internet, and as the solution to most of the problems
inherent in HyperText Markup Language (HTML). XML was published as a
recommendation by the World Wide Web Consortium (W3C) on February 10,
1998.
XML is a metalanguage -- a language for describing other languages --
that enables an application to define its own markup. XML allows the
definition of customized markup languages for different types of
documents and different applications. This differs from HTML, in
which there is a fixed set of tags with preset meanings that must be
"adapted" for specialized uses. Both XML and HTML use tags
(identifiers delimited by '<' and '>') and attributes (of the form
"name='value'"). But where "
" always means "paragraph" in HTML,
it may mean "paragraph," "person," "price," or "platypus" in XML, or
it might have no meaning at all, depending on the particular
application.
The publication of XML was followed by the publication of a second
recommendation [6] by the World Wide Web Consortium, defining the use
of namespaces in XML documents. An XML namespace is a collection of
names, identified by a Universal Resource Identifier (URI). It
allows documents of different types, that use tags with the same
names, to be merged with no confusion. When using namespaces, each
tag is identified with the namespace it comes from, allowing tags
from different namespaces with the same names to occur in the same
document. For example, a single document could contain both
"usa:football" and "europe:football" tags, each with different
meanings.
In anticipation of the widespread use of XML namespaces, this memo
includes the definition of the URI to be used to identify the IDMEF
namespace.
3.2 Rationale for Implementing IDMEF in XML
XML-based applications are being used or developed for a wide variety
of uses, including electronic data interchange in a variety of
fields, financial data interchange, electronic business cards,
calendar and scheduling, enterprise software distribution, web "push"
technology, and markup languages for chemistry, mathematics, music,
molecular dynamics, astronomy, book and periodical publishing, web
publishing, weather observations, real estate transactions, and many
Curry Expires: January 5, 2001 [Page 6]
Internet-Draft IDMEF XML DTD July 6, 2000
others.
XML's flexibility makes it a good choice for these applications; that
same flexibility makes it a good choice for implementing the IDMEF as
well. Other, more specific reasons for choosing XML to implement the
IDMEF are:
+ XML allows a custom language to be developed specifically for the
purpose of describing intrusion detection alerts. It also defines
a standard way to extend this language, either for later revisions
of this document ("standard" extensions), or for vendor-specific
use ("non-standard" extensions).
+ Software tools for processing XML documents are widely available,
in both commercial and open source forms. A variety of tools and
APIs for parsing and/or validating XML are available in a variety
of languages, including Java, C, C++, Tcl, Perl, Python, and GNU
Emacs Lisp. Widespread access to tools will make adoption of the
IDMEF by product developers easier, and hopefully, faster.
+ XML meets IDMEF Requirement 5.1, that message formats support full
internationalization and localization. The XML standard specifies
support for both the UTF-8 and UTF-16 encodings of ISO 10646
(Unicode), making IDMEF compatible with both one- and two-byte
character sets. XML also provides support for specifying, on a
per-element basis, the language in which the element's content is
written, making IDMEF easy to adapt to "Natural Language Support"
versions of a product.
+ XML meets IDMEF Requirement 5.2, that message formats must support
filtering and aggregation. XML's integration with XSL, a style
language, allows messages to be combined, discarded, and
rearranged.
+ Ongoing XML development projects, in the W3C and elsewhere, will
provide object-oriented extensions, database support, and other
useful features. If implemented in XML, the IDMEF immediately
gains these features as well.
+ XML is free, with no license, no license fees, and no royalties.
3.3 The Debar/Huang/Donahoo IDMEF Class Hierarchy
Debar, Huang and Donahoo [4] have proposed that intrusion detection
alerts in the IDMEF be represented by a class hierarchy. This
representation has several advantages:
+ it is flexible, and capable of describing alerts to arbitrary
levels of complexity;
+ it is compact, and allows applications to specify only the data
Curry Expires: January 5, 2001 [Page 7]
Internet-Draft IDMEF XML DTD July 6, 2000
they know; and
+ it is easy to extend, and will allow vendors to provide additional
information about particular alerts.
This implementation follows the Debar/Huang/Donahoo model almost
exactly, with the following exceptions and restrictions:
+ XML tags have the names given to the various classes in the model,
with a few minor exceptions where changes were made to deal with
XML scoping rules or to increase consistency with the rest of the
implementation.
+ XML does not support "inheritance;" tags may only be used at the
level at which they are declared. Subclasses are implemented by
making the tags for those classes child tags of the tags for the
parent classes.
+ Some extensions have been made, represented by the following
elements: , , ,
, and .
These changes make little difference in the overall usefulness of the
Debar/Huang/Donahoo model, or XML as an implementation language.
4. Use of XML in the IDMEF
This section describes how some of XML's features and requirements
will impact the IDMEF.
4.1 The IDMEF Document Prolog
The "prolog" of an IDMEF document, that part that precedes anything
else, consists of the XML declaration and the document type
declaration.
4.1.1 XML Declaration
Every XML document (and therefore every IDMEF document) starts with
an XML declaration. The XML declaration specifies the version of XML
being used; it may also specify the character set being used.
The XML declaration looks like:
If a character encoding is specified, the declaration looks like:
Curry Expires: January 5, 2001 [Page 8]
Internet-Draft IDMEF XML DTD July 6, 2000
where "charset" is the name of the character encoding in use (see
section 4.2). If no encoding is specified, UTF-8 is assumed.
IDMEF documents being exchanged between IDMEF applications MUST begin
with an XML declaration, and MUST specify the XML version in use.
Specification of the encoding in use is RECOMMENDED.
IDMEF applications MAY choose to omit the XML declaration internally
to conserve space, adding it only when the message is sent to another
destination (e.g., a web browser). This practice is NOT RECOMMENDED
unless it can be accomplished without loss of each message's version
and encoding information.
4.1.2 XML Document Type Definition (DTD)
The Document Type Definition (DTD) specifies the exact syntax of an
XML document. It defines the various tags that may be used in the
document, how the tags are related to each other, which tags are
mandatory and which are optional, and so forth.
The IDMEF Document Type Definition is listed in its entirety in
section 9.
It is expected that IDMEF applications will not normally include the
IDMEF DTD itself in their communications. Instead, the DTD will be
referenced in the document type declaration in the document entity
(see below). Such IDMEF documents will be well-formed and valid as
defined in [3].
Other IDMEF documents will be specified that do not include the
document prolog (e.g., entries in an IDMEF-format database). Such
IDMEF documents will be well-formed but not valid.
Generally, well-formedness implies that a document has a single
element that contains everything else (e.g., ""), and that all
the other elements nest nicely within each other without any
overlapping (e.g., a "chapter" does not start in the middle of
another "chapter").
Validity further implies that not only is the document well-formed,
but it also follows specific rules (contained in the Document Type
Definition) about which elements are "legal" in the document, how
those elements nest within other elements, and so on (e.g., a
"chapter" does not begin in the middle of a "title"). A document
cannot be valid unless it references a DTD (see Section 4.1.4).
XML processors are required to be able to parse any well-formed
document, valid or not. The purpose of validation is to make the
processing of that document (what's done with the data after it's
parsed) easier. Without validation, a document may contain elements
Curry Expires: January 5, 2001 [Page 9]
Internet-Draft IDMEF XML DTD July 6, 2000
in nonsense order, elements "invented" by the author that the
processing application doesn't understand, and so forth.
IDMEF documents MUST be well-formed. IDMEF documents SHOULD be valid
whenever both possible and practical.
4.1.3 IDMEF DTD Formal Public Identifier
The formal public identifier (FPI) for the Document Type Definition
described in this memo is:
"-//IETF//DTD RFCxxxx IDMEF v0.1//EN"
NOTE: The "RFCxxxx" text in the FPI value will be replaced
with the actual RFC number, if this memo is published
as an RFC.
This FPI MUST be used in the document type declaration within an XML
document referencing the DTD defined by this memo, as shown in the
following section.
4.1.4 IDMEF DTD Document Type Declaration
The document type declaration for an XML document referencing the DTD
defined by this memo will usually be specified in one of the
following ways:
The last component of the document type declaration is the formal
public identifier (FPI) specified in the previous section.
The last component of the document type declaration is a URL that
points to a copy of the Document Type Definition.
To be valid (see above), an XML document must contain a document type
declaration. However, this represents significant overhead to an
IDMEF application, both in the bandwidth it consumes as well as the
requirements it places on the XML parser (not only to parse the
declaration itself, but also to parse the DTD it references).
Implementors MAY decide, therefore, to have analyzers and managers
agree out-of-band on the particular document type definition they
will be using (the standard one as defined here, or one with
extensions), and then omit the document type declaration from IDMEF
messages. Great care must be taken in doing this however, as the
manager may have to accept messages from analyzers using DTDs with
different sets of extensions.
Curry Expires: January 5, 2001 [Page 10]
Internet-Draft IDMEF XML DTD July 6, 2000
4.2 Character Data Processing in XML and IDMEF
The XML standard requires that XML processors support the UTF-8 and
UTF-16 encodings of ISO 10646 (Unicode), making XML compatible with
both one- and two-byte character sets. While many XML processing
applications may support other character sets, only UTF-8 and UTF-16
can be relied upon from a portability viewpoint.
A document's XML declaration (see section 4.1.1) specifies the
character encoding to be used in the document, as follows:
where "charset" is the name of the character set, as registered with
the Internet Assigned Numbers Authority (IANA), see [7].
Consistent with the XML standard, if no encoding is specified for an
IDMEF message, UTF-8 SHALL be assumed.
IDMEF applications MUST NOT use, and IDMEF messages MUST NOT be
encoded in, character encodings other than UTF-8 and UTF-16. Note
that since ASCII is a subset of UTF-8, it MAY be used to encode IDMEF
messages.
Per the XML standard, IDMEF documents encoded in UTF-16 MUST begin
with the Byte Order Mark described by ISO/IEC 10646 Annex E and
Unicode Appendix B (the "ZERO WIDTH NO-BREAK SPACE" character,
#xFEFF).
4.2.1 Character Entity References
Within XML documents, certain characters have special meanings in
some contexts. To include the actual character itself in one of
these contexts, a special escape sequence, called an entity
reference, must be used.
The characters that sometimes need to be escaped, and their entity
references, are:
Character Entity Reference
---------------------------------
& &
< <
> >
" "
' '
It is RECOMMENDED that IDMEF applications use the entity reference
form whenever writing these characters in data, to avoid any
Curry Expires: January 5, 2001 [Page 11]
Internet-Draft IDMEF XML DTD July 6, 2000
possibility of misinterpretation.
4.2.2 Character Code References
Any character defined by the ISO/IEC 10646 standard may be included
in an XML document by the use of a character reference. A character
reference is started with the characters '&' and '#', and ended with
the character ';'. Between these characters, the character code for
the character inserted.
If the character code is preceded by an 'x' it is interpreted in
hexadecimal (base 16), otherwise, it is interpreted in decimal (base
10). For instance, the ampersand (&) is encoded as & or &
and the less-than sign (<) is encoded as < or <.
Any one- or two-byte character specified in the Unicode standard can
be included in a document using this technique.
4.2.3 White Space Processing
XML preserves white space by default. The XML processor passes all
white space characters to the application unchanged. This is much
different from HTML (and SGML), in which, although the space/no space
distinction is meaningful, the one space/many spaces distinction is
not.
XML allows tags to identify the importance of white space in their
content by using the "xml:space" attribute:
where "action" is either "default" or "preserve."
If "action" is "preserve," the application MUST treat all white space
in the tag's content as significant. If "action" is "default," the
application is free to do whatever it normally would with white space
in the tag's content.
The intent declared with the "xml:space" attribute is considered to
apply to all attributes and content of the element where it is
specified, unless overridden with an instance of "xml:space" on
another element within that content.
All IDMEF tags support the "xml:space" attribute.
4.3 Languages in XML and IDMEF
XML allows tags to identify the language their content is written in
by using the "xml:lang" attribute:
Curry Expires: January 5, 2001 [Page 12]
Internet-Draft IDMEF XML DTD July 6, 2000
where "langcode" is a language tag as described in RFC 1766 [8].
The intent declared with the "xml:lang" attribute is considered to
apply to all attributes and content of the element where it is
specified, unless overridden with an instance of "xml:lang" on
another element within that content.
IDMEF applications SHOULD specify the language in which their
contents are encoded; in general this can be done by specifying the
"xml:lang" attribute for the top-level tag.
If no language is specified for an IDMEF message, English SHALL be
assumed.
All IDMEF tags support the "xml:lang" attribute.
4.4 Unrecognized Tags in IDMEF Messages
On occasion, an IDMEF application may receive a well-formed, or even
well-formed and valid, IDMEF message containing tags that it does not
understand. The tags may be either:
+ Recognized as "legitimate" (a valid document), but the application
does not know the semantic meaning of the tag's content; or
+ Not recognized at all.
IDMEF applications MUST continue to process IDMEF messages that
contain unknown tags, provided that such messages meet the
well-formedness requirement of section 4.1.2. It is up to the
individual application to decide how to process any content from the
unknown tag(s).
4.5 Digital Signatures
The joint IETF/W3C XML Signature Working Group is currently working
to specify XML digital signature processing rules and syntax [12].
XML Signatures provide integrity, message authentication, and/or
signer authentication services for data of any type, whether located
within the XML that includes the signature or elsewhere.
The IDMEF requirements document assigns responsibility for message
integrity and authentication to the communications protocol, not the
message format. However, in situations where IDMEF messages are
exchanged over other, less secure protocols, or in cases where the
digital signatures must be archived for later use, the inclusion of
digital signatures within an IDMEF message itself may be desirable.
Curry Expires: January 5, 2001 [Page 13]
Internet-Draft IDMEF XML DTD July 6, 2000
Specifications for the use of digital signatures within IDMEF
messages is outside the scope of this document. However, use of the
XML Signature standard is RECOMMENDED if such functionality is
needed.
5. IDMEF Data Types
XML is a typeless language; everything is simply a stream of bytes,
and it is left to the application to extract meaning from them.
That being said, this specification makes the following rules to
allow interoperabity in interpreting IDMEF documents:
1. Integer data MUST be encoded in either Base 10 or Base 16. Base
10 encoding uses the digits '0' through '9' and an optional sign
('+' or '-'). Base 16 encoding uses the digits '0' through '9'
and 'a' through 'f' (or their upper case equivalents), and is
preceded by the characters "0x". For example, the number one
hundred twenty-three would be encoded as "123" in Base 10, or
"0x7b" in Base 16.
2. Floating-point (real) data MUST be encoded in Base 10. The
encoding is that of the POSIX strtod() function: an optional sign
('+' or '-') followed by a non-empty sequence of digits
optionally containing a radix character, then an optional
exponent part. An exponent part consists of an 'e' or 'E',
followed by an optional sign, followed by one or more decimal
digits.
3. Character and character string data does not require quoting, as
the IDMEF tags provide that functionality.
3. Dates, as used in the element, MUST be encoded as a
four-digit year, two-digit month, and two-digit day, separated by
dashes. The two-digit day and its corresponding dash MAY be
omitted to represent an entire month. For example, March 13,
2000 would be encoded as "2000-03-13", and December, 1999 would
be encoded as "1999-12".
4. Time of day, as used in the