Version 2.1 (draft), October 25, 1996
The Protein Data Bank (PDB) is an archive of experimentally determined three-dimensional structures of biological macromolecules, serving a global community of researchers, educators, and students. The archives contain atomic coordinates, bibliographic citations, primary and secondary structure information, as well as crystallographic structure factors and NMR experimental data.
Entries conforming to this format description have the following remark within them:
REMARK 4 XXXX COMPLIES WITH FORMAT V. 2.1, 25-OCT-1996
Entries released after October 25, 1996 will comply with this format. Conversion of older entries to this format will begin in the fall of 1996.
This Contents Guide was prepared through the efforts of all PDB staff members: J. Callaway, M. Cummings, B. Deroski, P. Esposito, A. Forman, P. Langdon, M. Libeson, J. McCarthy, J. Sikora, D. Xue; and especially E. Abola, F. Bernstein, N. Manning, R. Shea, D. Stampf, and J. Sussman. This document also included significant contributions from the scientific community whose members continually send us suggestions and comments regarding the contents and format of PDB entries.
The PDB is supported by a combination of Federal Government Agency funds and user fees. Support is provided by the U.S. National Science Foundation, the U.S. Public Health Service, National Institutes of Health, National Center for Research Resources, National Institutes of General Medical Sciences, National Library of Medicine, and the U.S. Department of Energy under contract DE-AC02-76CH00016.
Preface
Table of Contents
1. Introduction
2. Title Section
3. Primary Structure Section
4. Heterogen Section
5. Secondary Structure Section
6. Connectivity Annotation Section
7. Miscellaneous Features Section
8. Crystallographic and Coordinate Transformation Section
9. Coordinate Section
10. Connectivity Section
11. Bookkeeping Section
Appendix 1: Symmetry Operations
Appendix 2: Coordinate Systems and Transformations
Appendix 3: Atom Names
Appendix 4: Standard Residue Names and Abbreviations
Appendix 5: Formulas and Molecular Weights For Standard Residues
Appendix 6: Field Formats
Appendix 7: Order of Records
The PDB Contents Guide gives a complete and concise description of the contents of PDB coordinate entry files. This document will be helpful to several communities, assisting depositors in preparing their entries for deposition, guiding software and information resource developers, and helping users of PDB to understand the contents of coordinate entries. Finally, this format description is crucial in the effort to produce CIF-compliant data files from PDB entries.
List of changes/enhancements to PDB format as found in Contents Guide Version 2.1.
* MODRES records appear immediately following SEQRES. (The order was incorrectly stated in Version 2.0.)
* REMARK 3 has a new X-PLOR template to reflect the changes introduced by the recent release of X-PLOR(online)3.843.
* REMARK 3 will use the word NONE (for the attribute in the value-attribute pair) when the attribute is not applicable or when analysis options were chosen such that a value was not calculated. NULL will continue to be used to represent values not supplied by the depositor.
* COMPND and SOURCE have a few additional tokens.
* Some examples are enhanced, a few have been added.
* Language of the text has been improved in some places to help clarify the format.
List of important changes/enhancements to PDB format as found in Contents Guide Version 2.0.
* Columns 71 - 80 now contain data. They previously contained the PDB ID code and record serial number. These items may be generated using scripts available from the PDB.
Changes to ATOM/HETATM Records
* A segment identifier has been added to the coordinate records in columns 73 - 76. This allows unambiguous identification of regions of the chains and the relationship between them by specifying segments of molecules.
* The element symbol and charge now appear in columns 77 - 80 of the coordinate records.
* When temperature factors are provided, the tempFactor field (columns 61 - 66) always contains the isotropic B value, even when ANISOU records are provided.
* Insertion codes (column 27) are now defined as being alphabetic only.
Changes to Other Records
* HELIX records now contain the length of the helix in columns 72 - 76.
* SSBOND records now state the symmetry operation needed to generate one of the residues of the disulfide bond, if necessary.
* Footnotes (FTNOTE) have been dropped.
* In CRYST1 records:
- The full international Hermann-Mauguin symbol is used,
e.g., P 1 21 1 instead of P 21.
- For a rhombohedral space group in the hexagonal setting,
the lattice type symbol used is H.
* A number of record types which previously contained free text have been restructured as follows:
- "Keyword: value" pairs have been introduced in certain records
such as COMPND and SOURCE to allow easier parsing.
- EXPDTA has been expanded and now appears in every PDB coordinate
entry.
- REMARK records have been restructured to allow easier parsing
and to bring more organization to these records.
New Record Types Added
* TITLE
* CAVEAT
* KEYWDS
* MODRES
* DBREF
* SEQADV
* HETNAM
* HETSYN
* LINK
* HYDBND
* SLTBRG
* CISPEP
For details on each of these changes, see the section of the associated record type in this document.
A number of changes are being proposed to the existing data format. We are presenting these changes here for consideration. In accordance with PDB's Format Change Policy, there will be an open sixty-day discussion period during which we will entertain comments and suggestions regarding these changes. Send comments to Enrique Abola (abola1@bnl.gov) or to Nancy Manning (oeder@bnl.gov). Discussion on the PDB Listserver is encouraged as well.
Changes being proposed here, if adopted, will not appear in released entries before March 31, 1997. A public announcement will be made some weeks prior to their appearance in released entries.
Methylene hydrogen atoms will be labeled as 2HX and 3HX where X is the remoteness indicator of the atom. For example, hydrogen atoms attached to C beta of an amino acid will be named 2HB and 3HB. Our current convention is to name these 1HB and 2HB. This change will make PDB more compliant with IUPAC recommendations.
The use of the shortened Hermann-Mauguin symbol for monoclinic crystals will be reinstated. This will be applied to crystals in the standard b-unique cell setting. Thus the space group symbol P 21 will be used instead of P 1 21 1. Crystals using other settings will be designated with the full international Hermann-Mauguin symbol (e.g., P 21 1 1).
Modified nucleic acids will be represented using the same rules that are used by the PDB for representing modified amino acids. We will assign a unique three-letter code for modified residues. For example, we will use BRU for brominated uridine rather than +U. In addition, all atoms belonging to the residue will be grouped together in the coordinate records. Our current practice is to list atoms that modify nucleotides after the TER record.
When a change is made to PDB format, the format version number, as found in the entry and in this Contents Guide, will be incremented to the next whole number. Changes to the format of PDB coordinate entry files will follow the Format Change Policy presented below and will be detailed in this Contents Guide. Beginning January 1997, the format of all PDB entries will be compliant with the current version of this Contents Guide.
Changes to the Contents Guide will be listed at the beginning in the What's New section and denoted by a fractional increase in the document version number. These changes may be of the following kind.
* Correction of typographical errors.
* Changes to the language for clarity.
* Addition or changes to the examples for better representation of format issues.
* Addition of new rules (these do not change the format but help to clarify the semantics).
* Addition of tokens to specification lists, such as in COMPND and SOURCE records, that are needed to more fully describe the structure and its biological source.
* Enhancements to the refinement and experimental details templates in the REMARK records. These remarks are currently being reviewed by several people in the community, and PDB expects to increase the level of detail archived, such as for NMR studies.
* Addition of new sections that enhance and expand the document (these may include topics such as PDB to mmCIF cross references or insertion of relevant sections from the PDB Deposition Form).
The PDB will use the following protocol in making changes to the way PDB coordinate entries are represented and archived. The purpose of the new policy is to allow ample time for everyone to understand these changes and to assess their impact on existing programs. These modifications are necessary to address the changing needs of our users as well as the changing nature of the data that is archived.
1. Comments and suggestions will be solicited from the community on specific problems and data representation issues as they arise.
2. Proposed format changes will be disseminated through the PDB Listserver (pdb-l@pdb.pdb.bnl.gov) and PDB's Internet sites (WWW, FTP, and Gopher). They will also be summarized in the PDB Quarterly Newsletter.
3. A sixty-day discussion period will follow the announcement of proposed changes. Comments and suggestions must be received within this time period. Major changes which are not upwardly compatible will be allotted up to twice the standard amount of discussion time.
4. This sixty-day discussion period will be followed by a thirty-day period in which the PDB staff, the PDB Advisory Board, and the User Group Chair will evaluate and reconcile all suggestions. The final decision pertaining to the format change, which lies with the Advisory Board Chair, will then be officially announced via the PDB Listserver and PDB's Internet sites (WWW, FTP, and Gopher).
5. Implementation will follow official announcement of the format change. Major changes will not appear in PDB files earlier than sixty days after the announcement, allowing sufficient time to modify files and programs.
6. Changes will be released no more than twice a year, unless extraordinary circumstances require action. This will be done only in consultation with the Advisory Board and following the usual ninety-day discussion and evaluation period.
The PDB format has been in use since the late 1970's. A number of groups including the mmCIF Committee have been looking at ways to upgrade both the file content and the interchange format used by PDB. This is clearly needed due to changes in the data that PDB archives, the size of the database itself, and finally, to allow PDB to use more up-to-date methods for representing and storing biological data.
The PDB plans to be prudent and deliberate in making changes to the current PDB files in order to minimize the need to change existing programs. In particular, we will explore ways and means of ensuring that programs which read the current ATOM/HETATM records can continue to do so in the foreseeable future.
The PDB wishes to acknowledge Dr. Gerald Selzer of the National Science Foundation who urged us to formulate this policy.
Character Set
Only non-control ASCII characters, as well as the space and end-of-line indicator, appear in a PDB coordinate entry file. Namely:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
` - = [ ] \ ; ' , . / ~ ! @ # $ % ^ & * ( ) _ + { } | : " < > ?
the space, and end-of-line. The end-of-line indicator is system-specific. Unix uses a line feed character; other systems may use a carriage return followed by a line feed.
Special Characters
Greek letters are spelled out, i.e., alpha, beta, gamma, etc.
Bullets are represented as (DOT).
Right arrow is represented as -->.
Left arrow is represented as <--.
Superscripts are initiated and terminated by double equal signs, e.g., S==2+==.
Subscripts are initiated and terminated by single equal signs, e.g., F=c=.
If "=" is surrounded by at least one space on each side, then it is assumed to be an equal sign, e.g., 2 + 4 = 6.
Commas, colons, and semi-colons are used as list delimiters in records which have one of the following data types:
List
SList
Specification List
Specification
If a comma, colon, or semi-colon is used in any context other than as a delimiting character, then the character must be escaped, i.e., immediately preceded by a backslash, "\". Examples of this use are found in line 4 of each of the following:
COMPND MOL_ID: 1; COMPND 2 MOLECULE: GLUTATHIONE SYNTHETASE; COMPND 3 CHAIN: NULL; COMPND 4 SYNONYM: GAMMA-L-GLUTAMYL-L-CYSTEINE\:GLYCINE LIGASE COMPND 5 (ADP-FORMING); COMPND 6 EC: 6.3.2.3; COMPND 7 ENGINEERED: YES COMPND MOL_ID: 1; COMPND 2 MOLECULE: S-ADENOSYLMETHIONINE SYNTHETASE; COMPND 3 CHAIN: A, B; COMPND 4 SYNONYM: MAT, ATP\:L-METHIONINE S-ADENOSYLTRANSFERASE; COMPND 5 EC: 2.5.1.6; COMPND 6 ENGINEERED: YES; COMPND 7 BIOLOGICAL_UNIT: TETRAMER; COMPND 8 OTHER_DETAILS: TETRAGONAL MODIFICATION
Every PDB file may be broken into a number of lines terminated by an end-of-line indicator. Each line in the PDB entry file consists of 80 columns. The last character in each PDB entry should be an end-of-line indicator.
Each line in the PDB file is self-identifying. The first six columns of every line contain a record name, left-justified and blank-filled. This must be an exact match to one of the stated record names.
The PDB file may also be viewed as a collection of record types. Each record type consists of one or more lines.
Each record type is further divided into fields.
Each record type is detailed in this document. The description of each record type includes the following sections:
* Overview
* Record Format
* Details
* Verification/Validation/Value Authority Control
* Relationship to Other Record Types
* Example
* Known Problems
For records that are fully described in fixed column format, columns not assigned to fields must be left blank.
It is possible to group records into categories based upon how often the record type appears in an entry.
Single
There are records which may only appear one time (without continuations) in a file. Listed alphabetically, these are:
RECORD TYPE DESCRIPTION
------------------------------------------------------------------------------
CRYST1 Unit cell parameters, space group, and Z.
END Last record in the file.
HEADER First line of the entry, contains PDB ID code,
classification, and date of deposition.
MASTER Control record for bookkeeping.
ORIGXn Transformation from orthogonal coordinates to the submitted
coordinates (n = 1, 2, or 3).
SCALEn Transformation from orthogonal coordinates to fractional
crystallographic coordinates (n = 1, 2, or 3).
It is an error for a duplicate of any of these records to appear in an entry.
Single Continued
There are records that conceptually exist only once in an entry, but the information content may exceed the number of columns available. These records are therefore continued on subsequent lines. Listed alphabetically, these are:
RECORD TYPE DESCRIPTION
-------------------------------------------------------------------------------
AUTHOR List of contributors.
CAVEAT Severe error indicator. Entries with this record must be
used with care.
COMPND Description of macromolecular contents of the entry.
EXPDTA Experimental technique used for the structure determination.
KEYWDS List of keywords describing the macromolecule.
OBSLTE Statement that the entry has been removed from distribution
and list of the ID code(s) which replaced it.
SOURCE Biological source of macromolecules in the entry.
SPRSDE List of entries withdrawn from release and replaced by
current entry.
TITLE Description of the experiment represented in the entry.
The second and subsequent lines contain a continuation field which is a right-justified integer. This number increments by one for each additional line of the record, and is followed by a blank character.
Multiple
Most record types appear multiple times, often in groups where the information is not logically concatenated but is presented in the form of a list. Many of these record types have a custom serialization that may be used not only to order the records, but also to connect to other record types. Listed alphabetically, these are:
RECORD TYPE DESCRIPTION
--------------------------------------------------------------------------------
ANISOU Anisotropic temperature factors.
ATOM Atomic coordinate records for standard groups.
CISPEP Identification of peptide residues in cis conformation.
CONECT Connectivity records.
DBREF Reference to the entry in the sequence database(s).
HELIX Identification of helical substructures.
HET Identification of non-standard groups or residues (heterogens)
HETSYN Synonymous compound names for heterogens.
HYDBND Identification of hydrogen bonds.
LINK Identification of inter-residue bonds.
MODRES Identification of modifications to standard residues.
MTRIXn Transformations expressing non-crystallographic symmetry
(n = 1, 2, or 3). There may be multiple sets of these records.
REVDAT Revision date and related information.
SEQADV Identification of conflicts between PDB and the named sequence
database.
SEQRES Primary sequence of backbone residues.
SHEET Identification of sheet substructures.
SIGATM Standard deviations of atomic parameters.
SIGUIJ Standard deviations of anisotropic temperature factors.
SITE Identification of groups comprising important sites.
SLTBRG Identification of salt bridges
SSBOND Identification of disulfide bonds.
TURN Identification of turns.
TVECT Translation vector for infinite covalently connected
structures.
Multiple Continued
There are records that conceptually exist multiple times in an entry, but the information content may exceed the number of columns available. These records are therefore continued on subsequent lines. Listed alphabetically, these are:
RECORD TYPE DESCRIPTION ------------------------------------------------------------------------------- FORMUL Chemical formula of non-standard groups. HETATM Atomic coordinate records for heterogens. HETNAM Compound name of the heterogens.
The second and subsequent lines contain a continuation field which is a right-justified integer. This number increments by one for each additional line of the record, and is followed by a blank character.
Grouping
There are three record types used to group other records. Listed alphabetically, these are:
RECORD TYPE DESCRIPTION
-------------------------------------------------------------------------------
ENDMDL End-of-model record for multiple structures in a single
coordinate entry.
MODEL Specification of model number for multiple structures in a
single coordinate entry.
TER Chain terminator.
The MODEL/ENDMDL records surround groups of ATOM, HETATM, SIGATM, ANISOU, SIGUIJ, and TER records. TER records indicate the end of a chain.
Other
The remaining record types have a detailed inner structure. Listed alphabetically, these are:
RECORD TYPE DESCRIPTION ------------------------------------------------------------------------------ JRNL Literature citation that defines the coordinate set. REMARK General remarks, some are structured and some are free form.
All records in a PDB coordinate entry must appear in a defined order. Mandatory record types are present in all entries. When mandatory data are not provided, the record name must appear in the entry with a NULL indicator. Optional items become mandatory when certain conditions exist. Record order and existence are described in the following table:
RECORD TYPE EXISTENCE CONDITIONS IF OPTIONAL
-------------------------------------------------------------------------------
HEADER Mandatory
OBSLTE Optional Mandatory in withdrawn entries.
TITLE Mandatory
CAVEAT Optional Mandatory if structure is deemed
incorrect by an outside editorial board.
COMPND Mandatory
SOURCE Mandatory
KEYWDS Mandatory
EXPDTA Mandatory
AUTHOR Mandatory
REVDAT Mandatory
SPRSDE Optional Mandatory if a replacement entry.
JRNL Optional Mandatory if a publication describes
the experiment.
REMARK 1 Optional
REMARK 2 Mandatory
REMARK 3 Mandatory
REMARK N Optional Mandatory under certain conditions, as
noted in the remark descriptions.
DBREF Optional Mandatory for each peptide chain with a
length greater than ten (10) residues,
and for nucleic acid entries that exist
in the Nucleic Acid Database (NDB).
SEQADV Optional Mandatory if sequence conflict exists.
SEQRES Optional Mandatory if ATOM records exist.
MODRES Optional Mandatory if modified group exists
within the coordinates.
HET Optional Mandatory if non-standard group other
than water appears in the entry.
HETNAM Optional Mandatory if non-standard group other
than water appears in the entry.
HETSYN Optional
FORMUL Optional Mandatory if non-standard group or
water appears.
HELIX Optional
SHEET Optional
TURN Optional
SSBOND Optional Mandatory if disulfide bond is present.
LINK Optional
HYDBND Optional
SLTBRG Optional
CISPEP Optional
SITE Optional
CRYST1 Mandatory
ORIGX1 ORIGX2 ORIGX3 Mandatory
SCALE1 SCALE2 SCALE3 Mandatory
MTRIX1 MTRIX2 MTRIX3 Optional Mandatory if the complete asymmetric
unit must be generated from the given
coordinates using
non-crystallographic symmetry.
TVECT Optional
MODEL Optional Mandatory if more than one model
is present in the entry.
ATOM Optional Mandatory if standard residues exist.
SIGATM Optional
ANISOU Optional
SIGUIJ Optional
TER Optional Mandatory if ATOM records exist.
HETATM Optional Mandatory if non-standard group appears.
ENDMDL Optional Mandatory if MODEL appears.
CONECT Optional Mandatory if non-standard group
appears.
MASTER Mandatory
END Mandatory
Note that a PDB file existing outside of the PDB official release may contain locally-defined records beginning with "USER". The PDB reserves the right to add new record types (not beginning with "USER"), so programs which read PDB entries should be prepared to read (and ignore) other record types. PDB will follow standard procedures whenever format changes are proposed.
Sections of an Entry
The following table lists the various sections of a PDB coordinate entry and the records comprising them:
SECTION DESCRIPTION RECORD TYPE
--------------------------------------------------------------------------------
Title Summary descriptive remarks HEADER, OBSLTE, TITLE,
CAVEAT, COMPND, SOURCE,
KEYWDS, EXPDTA, AUTHOR,
REVDAT, SPRSDE, JRNL
Remark Bibliography, refinement, REMARKs 1, 2, 3 and others
annotations
Primary structure Peptide and/or nucleotide DBREF, SEQADV, SEQRES MODRES
sequence and the
relationship between the PDB
sequence and that found in
the sequence database(s)
Heterogen Description of non-standard HET, HETNAM, HETSYN, FORMUL
groups
Secondary structure Description of secondary HELIX, SHEET, TURN
structure
Connectivity Chemical connectivity SSBOND, LINK, HYDBND,
annotation SLTBRG, CISPEP
Miscellaneous Features within the SITE
features macromolecule
Crystallographic Description of the CRYST1
crystallographic cell
Coordinate Coordinate transformation ORIGXn, SCALEn,
MTRIXn, TVECT
transformation operators
Coordinate Atomic coordinate data MODEL, ATOM, SIGATM, ANISOU,
SIGUIJ, TER, HETATM, ENDMDL
Connectivity Chemical connectivity CONECT
Bookkeeping Summary information, MASTER, END
end-of-file marker
The above information on Order of Records is repeated as Appendix 7.
Each record type is presented in a table which contains the division of the records into fields by column number, defined data type, field name or a quoted string which must appear in the field, and field definition. Any column not specified must be left blank.
Each field contains an identified data type which can be validated by a program. These are:
DATA TYPE DESCRIPTION
----------------------------------------------------------------------------------
AChar An alphabetic character (A-Z, a-z).
Atom Atom name which follow the naming rules in Appendix 3.
Character Any non-control character in the ASCII character set or a
space.
Continuation A two-character field that is either blank (for the first
record of a set) or contains a two digit number
right-justified and blank-filled which counts continuation
records starting with 2. The continuation number must be
followed by a blank.
Date A 9 character string in the form dd-mmm-yy where DD is the
day of the month, zero-filled on the left (e.g., 04); MMM is
the common English 3-letter abbreviation of the month; and
YY is a year in the 20th century. This must represent a
valid date.
IDcode A PDB identification code which consists of 4 characters,
the first of which is a digit in the range 0 - 9; the
remaining 3 are alpha-numeric, and letters are upper case
only. Entries with a 0 as the first character do not
contain coordinate data.
Integer Right-justified blank-filled integer value.
Token A sequence of non-space characters followed by a colon and a
space.
List A String that is composed of text separated with commas.
LString A literal string of characters. All spacing is significant
and must be preserved.
LString(n) An LString with exactly n characters.
Real(n,m) Real (floating point) number in the FORTRAN format Fn.m.
Record name The name of the record: 6 characters, left-justified and
blank-filled.
Residue name One of the standard amino acid or nucleic acids, as listed
below, or the non-standard group designation as defined in
the HET dictionary. Field is right-justified.
SList A String that is composed of text separated with semi-colons.
Specification A String composed of a token and its associated value
separated by a colon.
Specification A sequence of Specifications, separated by semi-colons.
list
String A sequence of characters. These characters may have
arbitrary spacing, but should be interpreted as directed
below.
String(n) A String with exactly n characters.
SymOP An integer field of from 4 to 6 digits, right-justified, of
the form nnnMMM where nnn is the symmetry operator number and
MMM is the translation vector. See details in Appendix 1.
To interpret a String, concatenate the contents of all continued fields together, collapse all sequences of multiple blanks to a single blank, and remove any leading and trailing blanks. This permits very long strings to be properly reconstructed.
The above information about field formats is repeated as Appendix 6.
Residue Names
Standard residue names used in PDB entries:
RESIDUE TYPE RESIDUE NAME
----------------------------------------------------------------------------------
Amino acids ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS,
MET, PHE, PRO, SER, THR, TRP, TYR, VAL, ASX, GLX
Nucleic acids A, C, G, T, U, I, +A, +C, +G, +T, +U, +I
Other UNK (unknown)
See Appendix 4 for more information on the standard residue names and abbreviations, and Appendix 5 for their chemical formulas and molecular weights.
This section contains records used to describe the experiment and the biological macromolecules present in the entry: HEADER, OBSLTE, TITLE, CAVEAT, COMPND, SOURCE, KEYWDS, EXPDTA, AUTHOR, REVDAT, SPRSDE, JRNL, and REMARK records.
Overview
The HEADER record uniquely identifies a PDB entry through the idCode field. This record also provides a classification for the entry. Finally, it contains the date the coordinates were deposited at the PDB.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "HEADER"
11 - 50 String(40) classification Classifies the molecule(s)
51 - 59 Date depDate Deposition date. This is the date
the coordinates were received by
the PDB
63 - 66 IDcode idCode This identifier is unique within PDB
Details
* The classification string is left-justified and exactly matches one of a collection of strings. See the class list available from the WWW site. In the case of macromolecular complexes, the classification field must present a class for each macromolecule present. Due to the limited length of the classification field, strings must sometimes be abbreviated. In these cases, the full terms are given in KEYWDS.
* Classification may be based on function, metabolic role, molecule type, cellular location, etc. In the case of a molecule having a dual function, both may be presented here.
Verification/Validation/Value Authority Control
The verification program checks that the deposition date is a legitimate date and that the ID code is well-formed. PDB coordinate entry ID codes do not begin with 0, as this is used to identify the NOC files which are bibliographic only, not structural entries. The status and deposition date of an entry are checked against the PDB SYBASE tables, which provide a definitive list of existing ID codes.
Relationships to Other Record Types
The classification found in HEADER also appears in KEYWDS, unabbreviated and in no strict order.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
HEADER MUSCLE PROTEIN 02-JUN-93 1MYS
HEADER HYDROLASE (CARBOXYLIC ESTER) 08-APR-93 2PHI
HEADER COMPLEX (LECTIN/TRANSFERRIN) 07-JAN-94 1LGB
Overview
OBSLTE appears in entries which have been withdrawn from distribution.
This record acts as a flag in an entry which has been withdrawn from the PDB's full release. It indicates which, if any, new entries have replaced the withdrawn entry.
The format allows for the case of multiple new entries replacing one existing entry.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "OBSLTE"
9 - 10 Continuation continuation Allows concatenation of multiple
records.
12 - 20 Date repDate Date that this entry was replaced.
22 - 25 IDcode idCode ID code of this entry.
32 - 35 IDcode rIdCode ID code of entry that replaced
this one.
37 - 40 IDcode rIdCode ID code of entry that replaced
this one.
42 - 45 IDcode rIdCode ID code of entry that replaced
this one.
47 - 50 IDcode rIdCode ID code of entry that replaced
this one.
52 - 55 IDcode rIdCode ID code of entry that replaced
this one.
57 - 60 IDcode rIdCode ID code of entry that replaced
this one.
62 - 65 IDcode rIdCode ID code of entry that replaced
this one.
67 - 70 IDcode rIdCode ID code of entry that replaced
this one.
Details
* It is PDB policy that only the primary author who submitted an entry has the authority to withdraw it. All withdrawn entries are available for research purposes. PDB should be contacted in cases where the withdrawn data are desired.
Verification/Validation/Value Authority Control
PDB staff add this record at the time an entry is removed from release.
Relationships to Other Record Types
None.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
OBSLTE 31-JAN-94 1MBP 2MBP
Overview
The TITLE record contains a title for the experiment or analysis that is represented in the entry. It should identify an entry in the PDB in the same way that a title identifies a paper.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "TITLE "
9 - 10 Continuation continuation Allows concatenation of multiple
records.
11 - 70 String title Title of the experiment.
Details
* The title of the entry is free text and should describe the contents of the entry and any procedures or conditions that distinguish this entry from similar entries. It presents an opportunity for the depositor to emphasize the underlying purpose of this particular experiment.
* Some items that may be included in TITLE are:
- Experiment type.
- Description of the mutation.
- The fact that only alpha carbon coordinates have been provided in the entry.
Verification/Validation/Value Authority Control
This record is free text so no verification of format is required. The title is supplied by the depositor, but PDB staff may exercise editorial judgment in consultation with depositors in assigning the title.
Relationships to Other Record Types
COMPND, SOURCE, EXPDTA, and REMARKs provide information that may also be found in TITLE. You may think of the title as describing the experiment, and the compound record as describing the molecule(s).
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
TITLE RHIZOPUSPEPSIN COMPLEXED WITH REDUCED PEPTIDE INHIBITOR
TITLE BETA-GLUCOSYLTRANSFERASE, ALPHA CARBON COORDINATES ONLY
TITLE NMR STUDY OF OXIDIZED THIOREDOXIN MUTANT (C62A,C69A,C73A)
TITLE 2 MINIMIZED AVERAGE STRUCTURE
Overview
CAVEAT warns of severe errors in an entry. Use caution when using an entry containing this record.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "CAVEAT"
9 - 10 Continuation continuation Allows concatenation of multiple
records.
12 - 15 IDcode idCode PDB ID code of this entry.
20 - 70 String comment Free text giving the reason for the
CAVEAT.
Details
* PDB will add this record to incorrect entries that are not withdrawn from the set of released entries. This record will be used sparingly, and only after an external review has been made.
* Please note the CAVEAT will also be included in cases where PDB is unable to verify the transformation back to the crystallographic cell. In these cases, the molecular structure may still be correct.
Verification/Validation/Value Authority Control
CAVEAT will be added by the PDB to entries known to be incorrect.
Relationships to Other Record Types
REMARK 5 repeats the comment field of the CAVEAT record.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
CAVEAT 1ABC THE CRYSTAL TRANSFORMATION IS IN ERROR BUT IS
CAVEAT 2 1ABC UNCORRECTABLE AT THIS TIME
Overview
The COMPND record describes the macromolecular contents of an entry. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, PDB staff may exercise editorial judgment in consultation with depositors in assigning these names.
For each macromolecular component, the molecule name, synonyms, number assigned by the Enzyme Commission (EC), and other relevant details are specified.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "COMPND"
9 - 10 Continuation continuation Allows concatenation of multiple
records.
11 - 70 Specification compound Description of the molecular
list components.
Details
* The compound record is a Specification list. The specifications, or tokens, that may be used are listed below:
TOKEN VALUE DEFINITION
---------------------------------------------------------------------------------
MOL_ID Numbers each component; also used in SOURCE to associate
the information.
MOLECULE Name of the macromolecule.
CHAIN Comma-separated list of chain identifier(s). "NULL" is
used to indicate a blank chain identifier.
FRAGMENT Specifies a domain or region of the molecule.
SYNONYM Comma-separated list of synonyms for the MOLECULE.
EC The Enzyme Commission number associated with the
molecule. If there is more than one EC number, they
are presented as a comma-separated list.
ENGINEERED Indicates that the molecule was produced using
recombinant technology or by purely chemical synthesis.
MUTATION Describes mutations from the wild type molecule.
BIOLOGICAL_UNIT If the MOLECULE functions as part of a larger
biological unit, the entire functional unit may be
described.
OTHER_DETAILS Additional comments.
* In the general case the PDB tends to reflect the biological/functional view of the molecule. For example, the hetero-tetramer hemoglobin molecule is treated as a discrete component in COMPND.
* In the case of synthetic molecules, e. g., hybrids, the description will be provided by the depositor.
* No specific rules apply to the ordering of the tokens, except that the occurrence of MOL_ID or FRAGMENT indicates that the subsequent tokens are related to that specific molecule or fragment of the molecule.
* Physical layout of these items may be altered by PDB staff to improve human readability of the COMPND record.
* Asterisks in nucleic acid names (in MOLECULE) are for ease of reading.
* When insertion codes are given as part of the residue name, they must be given within square brackets, i.e., H57[A]N. This might occur when listing residues in FRAGMENT, MUTATION, or OTHER_DETAILS.
* For multi-chain molecules, e.g., the hemoglobin tetramer, a comma-separated list of CHAIN identifiers is used.
* When non-blank chain identifiers occur in the entry, they must be specified.
* NULL is used to indicate blank chain identifiers. E.g., CHAIN: NULL, CHAIN: NULL, B, C.
* For enzymes, if no EC number has been assigned, "EC: NOT ASSIGNED" is used.
* ENGINEERED is followed either by "YES" or by a comment.
* For the token MUTATION, the following set of examples illustrate the conventions used by PDB to represent various types of mutations.
MUTATION TYPE DESCRIPTION FORM
------------------------------------------------------------------------------
Simple substitution His 57 replaced by Asn H57N
His 57A replaced by Asn, in
chain C only Chain C, H57[A]N
Insertion His and Pro inserted before
Lys 48 INS(HP-K48)
Deletion Arg 141 of chains A and C
deleted, not deleted in
chain B Chain A, C, DEL(R141)
His 23 through ARG 26 deleted DEL(23-26)
His 23C and Arg 26 deleted
from chain B only Chain B, DEL(H23[C],R26)
* When there are more than ten mutations:
- All the mutations are listed in the SEQADV record.
- Some mutations may be listed in MUTATION in COMPND to highlight the most important ones, at the depositor's discretion.
* New tokens may be added by the PDB as needed.
Verification/Validation/Value Authority Control
CHAIN must match the chain identifiers(s) of the molecule(s). EC numbers are checked against the Enzyme Data Bank.
Relationships to Other Record Types
Each molecule given a MOL_ID in COMPND must be listed and given the biological source information in SOURCE. In the case of mutations, the SEQADV records will present differences from the reference molecule. REMARK record may further describe the contents of the entry. Also see verification above.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: HEMOGLOBIN;
COMPND 3 CHAIN: A, B, C, D;
COMPND 4 ENGINEERED: YES;
COMPND 5 MUTATION: CHAIN B, D, V1A;
COMPND 6 BIOLOGICAL_UNIT: HEMOGLOBIN EXISTS AS AN A1B1/A2B2
COMPND 7 TETRAMER;
COMPND 8 OTHER_DETAILS: DEOXY FORM
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: COWPEA CHLOROTIC MOTTLE VIRUS;
COMPND 3 CHAIN: A, B, C;
COMPND 4 SYNONYM: CCMV;
COMPND 5 MOL_ID: 2;
COMPND 6 MOLECULE: RNA (5'-(*AP*UP*AP*U)-3');
COMPND 7 CHAIN: D, F;
COMPND 8 ENGINEERED: YES;
COMPND 9 MOL_ID: 3;
COMPND 10 MOLECULE: RNA (5'-(*AP*U)-3');
COMPND 11 CHAIN: E;
COMPND 12 ENGINEERED: YES
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: HEVAMINE A;
COMPND 3 CHAIN: NULL;
COMPND 4 EC: 3.2.1.14, 3.2.1.17;
COMPND 5 OTHER_DETAILS: PLANT ENDOCHITINASE/LYSOZYME
Overview
The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "SOURCE"
9 - 10 Continuation continuation Allows concatenation of multiple
records.
11 - 70 Specification srcName Identifies the source of the
list macromolecule in a token: value
format.
Details
TOKEN VALUE DEFINITION
---------------------------------------------------------------------------------
MOL_ID Numbers each molecule. Same as appears in
COMPND.
SYNTHETIC Indicates a chemically-synthesized source.
FRAGMENT A domain or fragment of the molecule may be
specified.
ORGANISM_SCIENTIFIC Scientific name of the organism.
ORGANISM_COMMON Common name of the organism.
STRAIN Identifies the strain.
VARIANT Identifies the variant.
CELL_LINE The specific line of cells used in the
experiment.
ATCC American Type Culture Collection tissue
culture number.
ORGAN Organized group of tissues that carries on
a specialized function.
TISSUE Organized group of cells with a common
function and structure.
CELL Identifies the particular cell type.
ORGANELLE Organized structure within a cell.
SECRETION Identifies the secretion, such as saliva,
urine, or venom, from which the molecule was
isolated.
CELLULAR_LOCATION Identifies the location inside (or
outside) the cell.
PLASMID Identifies the plasmid containing the gene.
GENE Identifies the gene.
EXPRESSION_SYSTEM System used to express recombinant
macromolecules.
EXPRESSION_SYSTEM_STRAIN Strain of the organism in which the molecule
was expressed.
EXPRESSION_SYSTEM_VARIANT Variant of the organism used as the
expression system.
EXPRESSION_SYSTEM_CELL_LINE The specific line of cells used as the
expression system.
EXPRESSION_SYSTEM_ATCC_NUMBER Identifies the ATCC number of the expression
system
EXPRESSION_SYSTEM_ORGAN Specific organ which expressed the molecule.
EXPRESSION_SYSTEM_TISSUE Specific tissue which expressed the molecule.
EXPRESSION_SYSTEM_CELL Specific cell type which expressed the
molecule.
EXPRESSION_SYSTEM_ORGANELLE Specific organelle which expressed the
molecule.
EXPRESSION_SYSTEM_CELLULAR_LOCATION Identifies the location inside or outside
the cell which expressed the molecule.
EXPRESSION_SYSTEM_VECTOR_TYPE Identifies the type of vector used, i.e.,
plasmid, virus, or cosmid.
EXPRESSION_SYSTEM_VECTOR Identifies the vector used.
EXPRESSION_SYSTEM_PLASMID Plasmid used in the recombinant experiment.
EXPRESSION_SYSTEM_GENE Name of the gene used in recombinant
experiment.
OTHER_DETAILS Used to present information on the source
which is not given elsewhere.
* The srcName is a list of token: value pairs describing each biological component of the entry.
* As in COMPND, the order is not specified except that MOL_ID or FRAGMENT indicates subsequent specifications are related to that molecule or fragment of the molecule.
* Physical layout of these items may be altered by PDB staff to improve human readability of the SOURCE record.
* Only the relevant tokens need to appear in an entry.
* Molecules prepared by purely chemical synthetic methods are described by the specification SYNTHETIC followed by "YES" or an optional value, such as NON-BIOLOGICAL SOURCE or BASED ON THE NATURAL SEQUENCE. ENGINEERED must appear in the COMPND record.
* In the case of a chemically synthesized molecule using a biologically functional sequence (nucleic or amino acid), SOURCE reflects the biological origin of the sequence and COMPND reflects its synthetic nature by inclusion of the token ENGINEERED. The token SYNTHETIC appears in SOURCE.
* If made from a synthetic gene, ENGINEERED appears in COMPND and the expression system is described in SOURCE (SYNTHETIC does NOT appear in SOURCE).
* If the molecule was made using recombinant techniques, ENGINEERED appears in COMPND and the system is described in SOURCE.
* When multiple macromolecules appear in the entry, each MOL_ID, as given in the COMPND record, must be repeated in the SOURCE record along with the source information for the corresponding molecule.
* Hybrid molecules prepared by fusion of genes are treated as multi-molecular systems for the purpose of specifying the source. The token FRAGMENT is used to associate the source with its corresponding fragment.
- When necessary to fully describe hybrid molecules, tokens may appear more than once for a given MOL_ID.
- All relevant token: value pairs that taken together fully describe each fragment are grouped following the appropriate FRAGMENT.
- Descriptors relative to the full system appear before the FRAGMENT (see Example 3 below).
* ORGANISM_SCIENTIFIC provides the Latin genus and species. Virus names are listed as the scientific name.
* Cellular origin is described by giving cellular compartment, organelle, cell, tissue, organ, or body part from which the molecule was isolated.
* CELLULAR_LOCATION may be used to indicate where in the organism the compound was found. Examples are: extracellular, periplasmic, cytosol.
* Entries containing molecules prepared by recombinant techniques are described as follows:
- The expression system is described.
- The organism and cell location given are for the source of the gene used in the cloning experiment.
- Transgenic organisms, such as mouse producing human proteins, are treated as expression systems.
* For a theoretical modelling experiment, SOURCE describes the modelled compound just as though it were an experimental study.
* New tokens may be added by the PDB.
Verification/Validation/Value Authority Control
The biological source is compared to that found in the sequence database. Common and scientific names are checked against the "Annotated Classification of Source Organisms: PIR-International Protein Sequence Database" compiled by Andrzej Elzanowski and available from the PDB.
Relationships to Other Record Types
Each macromolecule listed in COMPND must have a corresponding source.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: AVIAN SARCOMA VIRUS;
SOURCE 3 STRAIN: SCHMIDT-RUPPIN B;
SOURCE 4 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE 5 EXPRESSION_SYSTEM_PLASMID: PRC23IN
SOURCE MOL_ID: 1;
SOURCE 2 ORGANISM_SCIENTIFIC: GALLUS GALLUS;
SOURCE 3 ORGANISM_COMMON: CHICKEN;
SOURCE 4 ORGAN: HEART;
SOURCE 5 TISSUE: MUSCLE
SOURCE MOL_ID: 1;
SOURCE 2 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE 3 EXPRESSION_SYSTEM_STRAIN: BE167;
SOURCE 4 FRAGMENT: RESIDUES 1-16;
SOURCE 5 ORGANISM_SCIENTIFIC: BACILLUS AMYLOLIQUEFACIENS;
SOURCE 6 EXPRESSION_SYSTEM: ESCHERICHIA COLI;
SOURCE 7 FRAGMENT: RESIDUES 17-214;
SOURCE 8 ORGANISM_SCIENTIFIC: BACILLUS MACERANS
Overview
The KEYWDS record contains a set of terms relevant to the entry. Terms in the KEYWDS record provide a simple means of categorizing entries and may be used to generate index files. This record addresses some of the limitations found in the classification field of the HEADER record. It provides the opportunity to add further annotation to the entry in a concise and computer-searchable fashion.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
---------------------------------------------------------------------------------
1 - 6 Record name "KEYWDS"
9 - 10 Continuation continuation Allows concatenation of records if
necessary.
11 - 70 List keywds Comma-separated list of keywords
relevant to the entry.
Details
* The KEYWDS record contains a list of terms relevant to the entry, similar to that found in journal articles. A phrase may be used if it presents a single concept (e.g., reaction center). Terms provided in this record may include those that describe the following:
- Functional classification.
- Metabolic role.
- Known biological or chemical activity.
- Structural classification.
*Other classifying terms may be used. No ordering is required for these terms. A number of PDB entries contain complexes of macromolecules. In these cases, all terms applicable to each molecule should be provided.
*Note that the terms in the KEYWDS record duplicate those found in the classification field of the HEADER record. Terms abbreviated in the HEADER record are unabbreviated in KEYWDS, and the parentheses used in HEADER are optional in KEYWDS.
Verification/Validation/Value Authority Control
Terms used in the KEYWDS record are subject to scientific and editorial review. A list of terms, definitions, and synonyms will be maintained at the PDB. Every attempt will be made to provide some level of consistency with keywords used in other biological databases.
Relationships to Other Record Types
HEADER records contain a classification term which must also appear in KEYWDS. Scientific judgment will dictate when terms used in one entry to describe a molecule should be included in other entries with the same or similar molecules.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
KEYWDS LYASE, TRICARBOXYLIC ACID CYCLE, MITOCHONDRION, OXIDATIVE
KEYWDS 2 METABOLISM
Overview
The EXPDTA record presents information about the experiment.
The EXPDTA record identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or modeling technique. Permitted values include:
ELECTRON DIFFRACTION
FIBER DIFFRACTION
FLUORESCENCE TRANSFER
NEUTRON DIFFRACTION
NMR
THEORETICAL MODEL
X-RAY DIFFRACTION
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "EXPDTA"
9 - 10 Continuation continuation Allows concatenation of multiple
records.
11 - 70 SList technique The experimental technique(s) with
optional comment describing the
sample or experiment.
Details
* EXPDTA is mandatory and appears in all entries.
* The technique must match one of the permitted values. See above.
* If more than one model appears in the entry, the number of models included must be stated.
* If only one model appears in the entry, its significance must be stated, such as it being a minimized average or regularized mean structure.
* If more than one technique was used for the structure determination and is being represented in the entry, EXPDTA presents the techniques as a semi-colon separated list. Each technique may have a comment, which appears before the semi-colon.
Verification/Validation/Value Authority Control
The verification program checks that the EXPDTA record appears in the entry and that the technique matches one of the allowed values. It also checks that the relevant standard REMARK is added in the case of NMR, fiber, or theoretical modeling studies, and that the correct CRYST1 and SCALE are used in these cases. If an entry contains multiple models, the verification program checks for the correct number of matching MODEL/ENDMDL records.
Relationships to Other Record Types
If the experiment is an NMR, fiber, or theoretical modeling study, this may be stated in the TITLE, and the appropriate EXPDTA and REMARK records should appear. Specific details of the data collection and experiment appear in the REMARKs.
In the case of a polycrystalline fiber diffraction study, CRYST1 and SCALE contain the normal unit cell data.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
EXPDTA X-RAY DIFFRACTION
EXPDTA NEUTRON DIFFRACTION; X-RAY DIFFRACTION
EXPDTA NMR, 32 STRUCTURES
EXPDTA NMR, REGULARIZED MEAN STRUCTURE
EXPDTA THEORETICAL MODEL
EXPDTA FIBER DIFFRACTION, FIBER
EXPDTA FIBER DIFFRACTION, POLYCRYSTALLINE SAMPLE
Overview
The AUTHOR record contains the names of the people responsible for the contents of the entry.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "AUTHOR"
9 - 10 Continuation continuation Allows concatenation of multiple
records.
11 - 70 List authorList List of the author names, separated
by commas.
Details
* The authorList field lists author names separated by commas with no subsequent spaces.
* Representation of personal names:
- First and middle names are indicated by initials, each followed by a period, and precede the surname.
- Only the surname (family or last name) of the author is given in full.
- Hyphens can be used if they are part of the author's name.
- Apostrophes are allowed in surnames.
- The word Junior is not abbreviated.
- Umlauts and other character modifiers are not given.
* Structure of personal names:
- There is no space after any initial and its following period.
- Blank spaces are used in a name only if properly part of the surname (e.g., J.VAN DORN), or between surname and Junior, II, or III.
- Abbreviations that are part of a surname, such as St. or Ste., are followed by a period and a space before the next part of the surname.
* Representation of corporate names:
- Group names used for one or all of the authors should be spelled out in full.
- The name of the larger group comes before the name of a subdivision, e.g., University of Somewhere Department of Chemistry.
* Structure of list:
- Line breaks between multiple lines in the authorList occur only after a comma.
- Personal names are not split across two lines.
* Special cases:
- Names are given in English if there is an accepted English version; otherwise in the native language, transliterated if necessary.
- "ET AL." may be used when all authors are not individually listed.
Verification/Validation/Value Authority Control
The verification program checks that the authorList field is correctly formatted. It does not perform any spelling checks or name verification.
Relationships to Other Record Types
The format of the names in the AUTHOR record is the same as in JRNL and REMARK 1 references.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
AUTHOR M.B.BERRY,B.MEADOR,T.BILDERBACK,P.LIANG,M.GLASER,
AUTHOR 2 G.N.PHILLIPS JUNIOR,T.L.ST. STEVENS
AUTHOR C.-I.BRANDEN,C.J.BIRKETT-CLEWS,L.RIVA DI SANSAVERINO
Overview
REVDAT records contain a history of the modifications made to an entry since its release.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "REVDAT"
8 - 10 Integer modNum Modification number.
11 - 12 Continuation continuation Allows concatenation of multiple
records.
14 - 22 Date modDate Date of modification (or release for
new entries). This is not repeated
on continuation lines.
24 - 28 String(5) modId Identifies this particular
modification. It links to the
archive used internally by PDB.
This is not repeated on continuation
lines.
32 Integer modType An integer identifying the type of
modification. In case of revisions
with more than one possible modType,
the highest value applicable will be
assigned.
40 - 45 LString(6) record Name of the modified record.
47 - 52 LString(6) record Name of the modified record.
54 - 59 LString(6) record Name of the modified record.
61 - 66 LString(6) record Name of the modified record.
Details
* Each time revisions are made to the entry, a modification number is assigned in increasing (by 1) numerical order. REVDAT records appear in descending order (most recent modification appears first). New entries have a REVDAT record with modNum equal to 1 and modType equal to 0. Allowed modTypes are:
0 Initial released entry.
1 Miscellaneous - mostly typographical.
2 Modification of a CONECT record.
3 Modification to coordinates or transformations.
4 - 9 Not defined.
* Each revision may have more than one REVDAT record, and each revision has a separate continuation field.
Verification/Validation/Value Authority Control
The modType must be one of the defined types, and the given record type must be valid. If modType is 0, the modId must match the entry's ID code in the HEADER record.
Relationships to Other Record Types
REMARK 860 presents the correction or change that is made to an entry. Also, see verification above.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
REVDAT 3 15-OCT-89 1PRCB 1 REMARK
REVDAT 2 19-APR-89 1PRCA 2 CONECT
REVDAT 1 09-JAN-89 1PRC 0
Overview
The SPRSDE records contain a list of the ID codes of entries that were made obsolete by the given coordinate entry and withdrawn from the PDB release set. One entry may replace many. It is PDB policy that only the principal investigator of a structure has the authority to withdraw it.
Record Format
COLUMNS DATA TYPE FIELD DEFINITION
----------------------------------------------------------------------------------
1 - 6 Record name "SPRSDE"
9 - 10 Continuation continuation Allows for multiple ID codes.
12 - 20 Date sprsdeDate Date this entry superseded the
listed entries. This field is not
copied on continuations.
22 - 25 IDcode idCode ID code of this entry. This field
is not copied on continuations.
32 - 35 IDcode sIdCode ID code of a superseded entry.
37 - 40 IDcode sIdCode ID code of a superseded entry.
42 - 45 IDcode sIdCode ID code of a superseded entry.
47 - 50 IDcode sIdCode ID code of a superseded entry.
52 - 55 IDcode sIdCode ID code of a superseded entry.
57 - 60 IDcode sIdCode ID code of a superseded entry.
62 - 65 IDcode sIdCode ID code of a superseded entry.
67 - 70 IDcode sIdCode ID code of a superseded entry.
Details
* The ID code list is terminated by the first blank sIDcode field.
Verification/Validation/Value Authority Control
PDB checks that the superseded entries have actually been withdrawn from release.
Relationships to Other Record Types
The sprsdeDate is usually the date the entry is released, and therefore matches the date in the REVDAT 1 record. The ID code found in the idCode field must be the same as one found in the idCode field of the HEADER record.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
SPRSDE 17-JUL-84 4HHB 1HHB
SPRSDE 27-FEB-95 1GDJ 1LH4 2LH4
Overview
The JRNL record contains the primary literature citation that describes the experiment which resulted in the deposited coordinate set. There is at most one JRNL reference per entry. If there is no primary reference, then there is no JRNL reference. Other references are given in REMARK 1.
PDB is in the process of linking and/or adding all references to CitDB, the literature database used by the Genome Data Base (available at URL http://gdbwww.gdb.org/gdb-bin/genera/genera/citation/Citation).
Record Format
COLUMNS DATA TYPE FIELD DEFINITION ---------------------------------------------------------------------------------- 1 - 6 Record name "JRNL " 13 - 70 LString text See Details below.
Details
* The following tables are used to describe the sub-record types of the JRNL record.
* The AUTH sub-record is mandatory in JRNL. This is followed by TITL, EDIT, REF, PUBL, and REFN sub-record types. REF and REFN are also mandatory in JRNL. EDIT and PUBL may appear only if the reference is to a non-journal.
* If the JRNL reference is in the MEDLINE database the information in the MEDLINE reference will be used to supply information for the sub-record types.
* When a MEDLINE reference is used, the abbreviation of the journal will be converted to the CASSI abbreviation as listed in the coden list used jointly by the Cambridge Crystallographic Data Centre (CCDC) and the PDB.
1. AUTH
* AUTH contains the list of authors associated with the cited article or contribution to a larger work (i.e., AUTH is not used for the editor of a book).
* The author list is formatted similarly to the AUTHOR record. It is a comma-separated list of names. Spaces at the end of a sub-record are not significant; all other spaces are significant. See the AUTHOR record for full details.
* The authorList field of continuation sub-records in JRNL differs from that in AUTHOR by leaving no leading blank in column 20 of any continuation lines.
* One author's name, consisting of the initials and family name, cannot be split across two lines. If there are continuation sub-records, then all but the last sub-record must end in a comma.
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "JRNL "
13 - 16 LString(4) "AUTH" Appears on all continuation records.
17 - 18 Continuation continuation Allows concatenation of multiple
records.
20 - 70 List authorList List of the authors.
2. TITL
* TITL specifies the title of the reference. This is used for the title of a journal article, chapter, or part of a book. The TITL line is omitted if the author(s) listed in authorList wrote the entire book (or other work) listed in REF and no section of the book is being cited.
* If an article is in a language other than English and is printed with an alternate title in English, the English language title is given, followed by a space and then the name of the language (in its English form, in square brackets) in which the article is written.
* If the title of an article is in a non-Roman alphabet the title is transliterated.
* The actual title cited is reconstructed in a manner identical to other continued records, i.e., trailing blanks are discarded and the continuation line is concatenated with a space inserted.
* A line cannot end with a hyphen. A compound term (two elements connected by a hyphen) or chemical names which include a hyphen must appear on a single line, unless they are too long to fit on one line, in which case the split is made at a normally-occurring hyphen. An individual word cannot be hyphenated at the end of a line and put on two lines. An exception is when there is a repeating compound term where the second element is omitted, e.g., "DOUBLE- AND TRIPLE-RESONANCE". In such a case the non-completed word "DOUBLE-" could end a line and not alter reconstruction of the title.
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------- 1 - 6 Record name "JRNL " 13 - 16 LString(4) "TITL" Appears on all continuation lines. 17 - 18 Continuation continuation Permits long titles. 20 - 70 LString title Title of the article.
3. EDIT
* EDIT appears if editors are associated with a non-journal reference. The editor list is formatted and concatenated in the same way that author lists are.
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------- 1 - 6 Record name "JRNL " 13 - 16 LString(4) "EDIT" Appears on all continuation records. 17 - 18 Continuation continuation Allows a long list of editors. 20 - 70 List editorList List of the editors.
4. REF
* REF is a group of fields which contains either the publication status or the name of the publication (and any supplement and/or report information), volume, page, and year. There are two forms of this sub-record group, depending upon the citation's publication status.
4a. If the reference has not yet been published, the sub-record type group has the form:
COLUMNS DATA TYPE FIELD DEFINITION -------------------------------------------------------------------------------- 1 - 6 Record name "JRNL " 13 - 16 LString(3) "REF" 20 - 34 LString(15) "TO BE PUBLISHED"
* At the present time, there is no formal mechanism in place for monitoring the subsequent publication of such referenced papers. PDB relies upon the depositor to provide reference update information since preliminary information can change by the time of actual publication.
4b. If the reference has been published, then the REF sub-record type contains information about the name of the publication, supplement, report, volume, page, and year in the appropriate fields. These fields are detailed below.
* Publication name (first item in pubName field):
- If the publication is a serial (i.e., a journal, an annual, or other non-book or non-monographic item issued in parts and intended to be continued indefinitely), use the abbreviated name of the publication as listed in American Chemical Society (A.C.S.) publications such as CAS Source Index (CASSI) or Chemical Abstracts. (The A.C.S. abbreviation is based on the International Standards Organization's standard ISO 4-1984[E].) If the A.C.S. has not yet established an abbreviation for the publication, the name is given in full.
- If the publication is a book, monograph, or other non-serial item, use its full name according to the Anglo-American Cataloging Rules, 2nd Ed., 1988 revision (AACR2R). (Non-serial items include theses, videos, computer programs, and anything that is complete in one or a finite number of parts.) If there is a sub-title, and the item is verified in an online catalog, it will be included using the same punctuation as in the source of verification. Preference will be given to verification using cataloging of the Library of Congress, the National Library of Medicine, and the British Library, in that order.
- If a book is part of a monographic series: the full name of the book (according to AACR2R) is listed first, followed by the name of the series in which it was published. The series information is given within parentheses and the series name is preceded by "IN:" and a space. If the series has an A.C.S. abbreviation, that abbreviation should be used; otherwise the series name should be listed in full. If applicable, the series name should be followed, after a comma and a space, by a volume (V.) and/or number (NO.) and/or part (PT.) indicator and the relevant characters to indicate its number and/or letter in the series.
* Supplement (follows publication name in pubName field):
- If a reference is in a supplement to the volume listed, or if information about a "part" is needed to distinguish multiple parts with the same page numbering, such information should be put in the REF sub-record.
- A supplement indication should follow the name of the publication and should be preceded by a comma and a space. Supplement should be abbreviated as "SUPPL." If there is a supplement number or letter, it should follow "SUPPL." without an intervening space. A part indication should also follow the name of the publication and be preceded by a comma and a space. A part should be abbreviated as "PT.", and the number or letter should follow without an intervening space.
- If there is both a supplement and a part, their order should reflect the order printed on the work itself.
* Report (follows publication name and any supplement or part information in pubName field):
- If a book has a report designation, the report information should follow the title and precede series information. The name and number of the report is given in parentheses, and the name is preceded by "REPORT:" and a space.
* Reconstruction of publication name:
- The name of the publication is reconstructed by removing any trailing blanks in the pubName field, and concatenating all of the pubName fields from the continuation lines with an intervening space. There are two conditions where no intervening space is added between lines: when the pubName field on a line ends with a hyphen or a period, or when the line ends with a hyphen (-). When the line ends with a period (.), add a space if this is the only period in the entire pubName field; do not add a space if there are two or more periods throughout the pubName field, excluding any periods after the designations "SUPPL", "V", "NO", or "PT".
* Volume, page, and year (volume, page, year fields respectively):
- The REF sub-record type group also contains information about volume, page, and year when applicable.
- In the case of a monograph with multiple volumes which is also in a numbered series, the number in the volume field represents the volume number of the book, not the series. (The volume number of the series is in parentheses with the name of the series, as described above under publication name.)
COLUMNS DATA TYPE FIELD DEFINITION
--------------------------------------------------------------------------------
1 - 6 Record name "JRNL "
13 - 16 LString(3) "REF"
17 - 18 Continuation continuation Allows long publication names.
20 - 47 LString pubName Name of the publication including
section or series designation. This is
the only field of this sub-record which
may be continued on successive
sub-records.
50 - 51 LString(2) "V." Appears in the first sub-record only,
and only if column 55 is non-blank.
52 - 55 String volume Right-justified blank-filled volume
information; appears in the first
sub-record only.
57 - 61 String page First page of the article; appears in the
first sub-record only.
63 - 66 Integer year Year of publication; first sub-record
only.
5. PUBL
* PUBL contains the name of the publisher and place of publication if the reference is to a book or other non-journal publication. If the non-journal has not yet been published or released, this sub-record is absent.
* The place of publication is listed first, followed by a space, a colon, another space, and then the name of the publisher/issuer. This arrangement is based on the ISBD(M) International Standard Bibliographic Description for Monographic Publications (Rev.Ed., 1987) and AACR2R and is used in public online catalogs in libraries. Details on the contents of PUBL are given below.
* Place of publication:
- Give the place of publication. If the name of the country, state, province, etc. is considered necessary to distinguish the place of publication from others of the same name, or for identification, then follow the city with a comma, a space, and the name of the larger geographic area.
- If there is more than one place of publication, only the first listed will be used. If an online catalog record is used to verify the item, the first place listed there will be used, omitting any brackets. Preference will be given to the cataloging done by the Library of Congress, the National Library of Medicine, and the British Library, in that order.
* Publisher's name (or name of other issuing entity):
- Give the name of the publisher in the shortest form in which it can be understood and identified internationally, according to AACR2R rule 1.4D.
- If there is more than one publisher listed in the publication, only the first will be used in the PDB file. If an online catalog record is used to verify the item, the first place listed there will be used for the name of the publisher. Preference will be given to the cataloging of the Library of Congress, the National Library of Medicine, and the British Library, in that order.
* Ph.D. and other theses:
- Theses are presented in the PUBL record if the degree has been granted and the thesis made available for public consultation by the degree-granting institution.
- The name of the degree-granting institution (the issuing agency) is followed by a space and "(THESIS)".
* Reconstruction of place and publisher:
- The PUBL sub-record type can be reconstructed by removing all trailing blanks in the pub field and concatenating all of the pub fields from the continuation lines with an intervening space. Continued lines do not begin with a space.
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "JRNL "
13 - 16 LString(4) "PUBL"
17 - 18 Continuation continuation Allows long publisher and place names.
20 - 70 LString pub City of publication and name of the
publisher/institution.
6. REFN
* REFN is a group of fields which contains encoded references to the citation. No continuation lines are possible. Each piece of coded information has a designated field.
* The American Society for Testing and Materials (ASTM) number is an encoded reference to the journal title. New ASTM codens are assigned by the Chemical Abstracts Service and appear in CASSI and its supplements.
* The country field is blank if the reference was published in more than one country.
* If more than one ISBN is known, select one that matches the individual volume cited (if it happens to be in a set that also has an ISBN for the set). If the reason for multiple ISBNs is that the publication is issued in more than one country, use the ISBN for the country of the first listed place of publication. If there are hardcover and paperback ISBN numbers, use the ISBN for the hardbound version.
* Because some publications do not have an ASTM coden, an ISSN number, or an ISBN number, each publication is assigned a number. This list of numbers, or codens, was established by the Cambridge Crystallographic Data Center (CCDC) and new numbers are assigned by both CCDC and PDB as new publications are added to their respective databases.
* There are two forms of this sub-record type group, depending upon the publication status.
6a. This form of the REFN sub-record type group is used if the citation has not been published.
COLUMNS DATA TYPE FIELD DEFINITION
--------------------------------------------------------------------------------
1 - 6 Record name "JRNL "
13 - 16 LString(4) "REFN"
67 - 70 LString(4) "0353" This is the CCDC/PDB coden for unpublished
works.
6b. This form of the REFN sub-record type group is used if the citation has been published.
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "JRNL "
13 - 16 LString(4) "REFN"
20 - 23 LString(4) "ASTM"
25 - 30 LString(6) astm ASTM devised coden.
33 - 34 LString(2) country Country of publication code as defined
in the OCLC/MARC cataloging format
(optional).
36 - 39 LString(4) "ISBN" or International Standard Book Number or
"ISSN" International Standard Serial Number.
41 - 65 LString isbn ISSN or ISBN number (final digit may be
a letter and may contain one or more
dashes).
67 - 70 LString(4) coden Code from CCDC/PDB coden list.
Verification/Validation/Value Authority Control
PDB verifies that this record is correctly formatted.
PDB uses MEDLINE to verify the accuracy of references and to obtain information required for CitDB that is not required by the PDB listing. The process of using MEDLINE requires following the National Library of Medicine rules for the transcription of names and titles. Articles in non-MEDLINE journals are verified through other online databases or with the reprint in hand. Verification of book references is done using online cooperative or individual library catalogs.
Citations appearing in JRNL may not also appear in REMARK 1.
Relationships to Other Record Types
The publication cited as the JRNL record may not be repeated in REMARK 1.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
JRNL AUTH N.THANKI,J.K.M.RAO,S.I.FOUNDLING,W.J.HOWE,
JRNL AUTH 2 A.G.TOMASSELLI,R.L.HEINRIKSON,S.THAISRIVONGS,
JRNL AUTH 3 A.WLODAWER
JRNL TITL CRYSTAL STRUCTURE OF A COMPLEX OF HIV-1 PROTEASE
JRNL TITL 2 WITH A DIHYDROETHYLENE-CONTAINING INHIBITOR:
JRNL TITL 3 COMPARISONS WITH MOLECULAR MODELING
JRNL REF TO BE PUBLISHED
JRNL REFN 0353
JRNL AUTH G.FERMI,M.F.PERUTZ,B.SHAANAN,R.FOURME
JRNL TITL THE CRYSTAL STRUCTURE OF HUMAN DEOXYHAEMOGLOBIN AT
JRNL TITL 2 1.74 A RESOLUTION
JRNL REF J.MOL.BIOL. V. 175 159 1984
JRNL REFN ASTM JMOBAK UK ISSN 0022-2836 0070
Known Problems
* Interchange of bibliographic information and linking with other databases is hampered by the lack of labels or specific locations for certain types of information or by more than one type of information being in a particular location. This is most likely to occur with books, series, and reports. Some of the points below provide details about the variations and/or blending of information.
* Titles of the publications that require more than 28 characters on the REF line must be continued on subsequent lines. There is some awkwardness due to volume, page, and year appearing on the first REF line, thereby splitting up the title.
* Information about a supplement and its number/letter is presented in the publication's title field (on the REF lines in columns 20 - 47). This sometimes means that the publication's coden has several versions of REF title information.
* When series information for a book is presented, it is added to the REF line. The number of REF lines can become large in some cases because of the 28-column limit for title information in REF.
* There is often an ISBN for a book title and a separate ISSN for the series in which it was published. There is no way to present more than one of these.
* Books that are issued in more than one series are not accommodated.
* Many books are issued in more than one country. The publisher has a separate ISBN number in each country. There is no place to put any additional applicable ISBN numbers, which would be useful in an international database such as the PDB.
* The country code prefix of the ISBN may not match the country of the place of publication that is listed on the PUBL line when a book is published in more than one country.
* Pagination is limited to the beginning page.
* There is no place for listing a reference's accession number in another database.
* MEDLINE truncates the author list after the tenth name.
Overview
REMARK records present experimental details, annotations, comments, and information not included in other records. In a number of cases, REMARKs are used to expand the contents of other record types. A new level of structure is being used for some REMARK records. This is expected to facilitate searching and will assist in the conversion to a relational database.
The very first line of every set of REMARK records is used as a spacer to aid in reading.
COLUMNS DATA TYPE FIELD DEFINITION
---------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
8 - 10 Integer remarkNum Remark number. It is not an error
for remark n to exist in an entry
when remark n-1 does not.
12 - 70 LString empty Left as white space in first line of
each new remark.
REMARK 1, 2, and 3, detailed below, are specific for references, resolution, and refinement, respectively.
REMARK 1 lists important publications related to the structure presented in the entry. These citations are chosen by the depositor. They are listed in reverse-chronological order. Citations are not repeated from the JRNL records. After the first blank record and the REFERENCE sub-record, the sub-record types for REMARK 1 are the same as in the JRNL sub-record types. For details, see the JRNL section.
PDB is in the process of linking and/or adding references to CitDB, the literature database of the Genome Data Base (available at URL http://gdbwww.gdb.org/gdb-bin/genera/genera/citation/Citation).
Record Format and Details
As with all other remarks, the first line is empty and is used as a spacer.
The following tables are used to describe the sub-record types of REMARK 1.
1. REFERENCE
Each reference is preceded by a line indicating the reference number in the entry.
COLUMNS DATA TYPE FIELD DEFINITION
--------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
12 - 20 LString(9) "REFERENCE"
22 - 70 Integer refNum Reference number. Starts with
1 and increments by 1.
2. AUTH
AUTH contains the list of authors of the reference.
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
13 - 16 LString(4) "AUTH" Appears on all continuation
records.
17 - 18 Continuation continuation Allows a long list of authors.
20 - 70 List authorList List of the authors.
See JRNL AUTH for details.
3. TITL
TITL specifies the title of the reference.
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
13 - 16 LString(4) "TITL" Appears on all continuation
records.
17 - 18 Continuation continuation Permits long titles.
20 - 70 LString title Title of the article.
See JRNL TITL for details.
4. EDIT
EDIT appears if editors are associated with a non-journal reference.
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
13 - 16 LString(4) "EDIT" Appears on all continuation
records.
17 - 18 Continuation continuation Permits long list of editors.
20 - 70 LString editorList List of the editors.
See JRNL EDIT for details.
5. REF
REF is a group of fields which contains the name of the publication.
5a. If it has not yet been published, the REF sub-record type has the form:
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------- 1 - 6 Record name "REMARK" 10 LString(1) "1" 13 - 16 LString(3) "REF" 20 - 34 LString(15) "TO BE PUBLISHED"
At the present time, there is no formal mechanism in place for monitoring the subsequent publication of referenced papers. PDB relies upon the depositor to provide reference update information since preliminary information can change by the time of actual publication.
5b. If the reference has been published, then the REF sub-record type group contains information about the name of the publication, supplement, report, volume, page, and year, in the appropriate fields.
COLUMNS DATA TYPE FIELD DEFINITION
---------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
13 - 16 LString(3) "REF"
17 - 18 Continuation continuation Permits long publication names.
20 - 47 LString pubName Name of the publication including
section or series designation.
This is the only field of this
record which may be continued on
successive records.
50 - 51 LString(2) "V." Appears in the first record only,
and only if column 55 is filled in.
52 - 55 String volume Right-justified blank-filled volume
information; appears in the first
sub-record only.
57 - 61 String page First page of the article; appears
in the first sub-record only.
63 - 66 Integer year Year of publication, first record
only.
See JRNL REF for details.
6. PUBL
PUBL contains the name of the publisher and place of publication if the reference is to a book or other non-journal publication. If the reference has not yet been published or released, this sub-record is absent.
COLUMNS DATA TYPE FIELD DEFINITION
---------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
13 - 16 LString(4) "PUBL"
17 - 18 Continuation continuation Permits long publisher and city
information.
20 - 70 LString pub Name of the publisher and city of
publication.
See JRNL PUBL for details.
7. REFN
REFN is a group of fields which contains encoded references to the citation.
7a. If the citation has not been published, this form of the REFN sub-record type group is used.
COLUMNS DATA TYPE FIELD DEFINITION
-------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
13 - 16 LString(4) "REFN"
67 - 70 LString(4) "0353" This is the PDB coden for unpublished
works.
7b. If the citation has been published, this form of the REFN sub-record type group is used.
COLUMNS DATA TYPE FIELD DEFINITION
--------------------------------------------------------------------------------
1 - 6 Record name "REMARK"
10 LString(1) "1"
13 - 16 LString(4) "REFN"
20 - 23 LString(4) "ASTM" Blank if reference is not
serialized.
25 - 30 LString astm Code from the ASTM file.
33 - 34 LString country 2-digit abbreviation for
country of publication.
36 - 39 LString(4) "ISBN" or
"ISSN"
41 - 65 LString isbn ISSN or ISBN number.
68 - 70 LString(4) coden Number from Cambridge
Crystallographic Data Center
coden list, or assigned by the
PDB.
See JRNL REFN for details.
Verification/Validation/Value Authority Control
PDB verifies that this record is correctly formatted.
PDB uses MEDLINE to verify the accuracy of references and to obtain information required for CitDB that is not required by the PDB listing. The process of using MEDLINE requires following the National Library of Medicine rules for the transcription of names and titles. Articles in non-MEDLINE journals are verified through other online databases or with the reprint in hand. Verification of book references is done using online cooperative or individual library catalogs.
Citations appearing in REMARK 1 may not appear in JRNL.
Relationships to Other Record Types
Citations appearing in REMARK 1 may not appear in JRNL.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
REMARK 1
REMARK 1 REFERENCE 1
REMARK 1 AUTH A.M.BONVIN,J.A.RULLMANN,R.M.LAMERICHS,R.BOELENS,
REMARK 1 AUTH 2 R.KAPTEIN
REMARK 1 TITL "ENSEMBLE" ITERATIVE RELAXATION MATRIX APPROACH:
REMARK 1 TITL 2 A NEW NMR REFINEMENT PROTOCOL APPLIED TO THE
REMARK 1 TITL 3 SOLUTION STRUCTURE OF CRAMBIN
REMARK 1 REF PROTEINS: STRUCT.,FUNCT., V. 15 385 1993
REMARK 1 REF 2 GENET.
REMARK 1 REFN ASTM PSFGEY US ISSN 0887-3585 0867
REMARK 1 REFERENCE 2
REMARK 1 AUTH J.A.C.RULLMANN,A.M.J.J.BONVIN,R.BOELENS,R.KAPTEIN
REMARK 1 TITL STRUCTURE DETERMINATION BY NMR - APPLICATION TO
REMARK 1 TITL 2 CRAMBIN
REMARK 1 EDIT D.M.SOUMPASIS,T.M.JOVIN
REMARK 1 REF COMPUTATION OF BIOMOLECULAR 1 1992
REMARK 1 REF 2 STRUCTURES; ACHIEVEMENTS,
REMARK 1 REF 3 PROBLEMS, AND PERSPECTIVES
REMARK 1 PUBL BERLIN : SPRINGER-VERLAG
REMARK 1 REFN GW ISBN 3540559515 2010
REMARK 1 REFERENCE 3
REMARK 1 AUTH R.M.J.M.LAMERICHS
REMARK 1 REF 2D NMR STUDIES OF 1989
REMARK 1 REF 2 BIOMOLECULES: PROTEIN
REMARK 1 REF 3 STRUCTURE AND PROTEIN-DNA
REMARK 1 REF 4 INTERACTIONS
REMARK 1 PUBL UTRECHT : UNIVERSITY OF UTRECHT (THESIS)
REMARK 1 REFN NE 2011
REMARK 1
REMARK 1 REFERENCE 1
REMARK 1 AUTH G.FERMI,M.F.PERUTZ
REMARK 1 REF HAEMOGLOBIN AND MYOGLOBIN 1981
REMARK 1 REF 2 (IN: ATLAS OF MOLECULAR
REMARK 1 REF 3 STRUCTURES IN BIOLOGY, V.2)
REMARK 1 PUBL OXFORD : CLARENDON PRESS
REMARK 1 REFN ISBN 0-19-854706-4 0986
Known Problems
See JRNL for a listing of problems associated with references.
REMARK 2 states the highest resolution, in Angstroms, that was used in building the model. As with all the remarks, the first REMARK 2 record is empty and is used as a spacer.
Record Format and Details
* The second REMARK 2 record has one of two formats. The first is used for diffraction studies, the second for other types of experiments in which resolution is not relevant, e.g., NMR and theoretical modeling.
* For diffraction experiments:
COLUMNS DATA TYPE FIELD DEFINITION -------------------------------------------------------------------------------- 1 - 6 Record name "REMARK" 10 LString(1) "2" 12 - 22 LString(11) "RESOLUTION." 23 - 27 Real(5.2) resolution Resolution. 29 - 38 LString(10) "ANGSTROMS."
REMARK 2 when not a diffraction experiment:
COLUMNS DATA TYPE FIELD DEFINITION -------------------------------------------------------------------------------- 1 - 6 Record name "REMARK" 10 LString(1) "2" 12 - 38 LString(28) "RESOLUTION. NOT APPLICABLE." 41 - 70 String comment Comment.
* Additional explanatory text may be included starting with the third line of the REMARK 2 record. For example, depositors may wish to qualify the resolution value provided due to unusual experimental conditions.
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------- 1 - 6 Record name "REMARK" 10 LString(1) "2" 12 - 22 LString(11) "RESOLUTION." 24 - 70 String comment Comment.
Example
1 2 3 4 5 6 7
1234567890123456789012345678901234567890123456789012345678901234567890
REMARK 2
REMARK 2 RESOLUTION. 1.74 ANGSTROMS.
REMARK 2
REMARK 2 RESOLUTION. NOT APPLICABLE.
REMARK 2
REMARK 2 RESOLUTION. NOT APPLICABLE.
REMARK 2 THIS EXPERIMENT WAS CARRIED OUT USING FLUORESCENCE TRANSFER
REMARK 2 AND THEREFORE NO RESOLUTION CAN BE CALCULATED.
Overview
REMARK 3 presents information on refinement program(s) used and the related statistics. For non-diffraction studies, REMARK 3 is used to describe any refinement done, but its format in those cases is mostly free text.
If more than one refinement package was used, they may be named in "OTHER REFINEMENT REMARKS". However, Remark 3 statistics are given for the final refinement run.
Refinement packages are being enhanced to output PDB REMARK 3. A token: value template style facilitates parsing. Spacer REMARK 3 lines are interspersed for visually organizing the information.
The templates below have been adopted in consultation with program authors. PDB is continuing this dialogue with program authors, and expects the library of PDB records output by the programs to greatly increase in the near future.
Instead of providing a Record Format table, each template is given as it appears in PDB entries.
Details
* The value "NULL" is given when there is no data available for a particular token.
This remark will be output by X-PLOR(online) for direct submission to PDB. Structures done using earlier versions of X-PLOR will contain the same template, but with many of the data items containing "NULL".
Template
REMARK 3 REMARK 3 REFINEMENT. REMARK 3 PROGRAM : X-PLOR REMARK 3 AUTHORS : BRUNGER REMARK 3 REMARK 3 DATA USED IN REFINEMENT. REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : REMARK 3 DATA CUTOFF (SIGMA(F)) : REMARK 3 DATA CUTOFF HIGH (ABS(F)) : REMARK 3 DATA CUTOFF LOW (ABS(F)) : REMARK 3 COMPLETENESS (WORKING+TEST) (%) : REMARK 3 NUMBER OF REFLECTIONS : REMARK 3 REMARK 3 FIT TO DATA USED IN REFINEMENT. REMARK 3 CROSS-VALIDATION METHOD : REMARK 3 FREE R VALUE TEST SET SELECTION : REMARK 3 R VALUE (WORKING SET) : REMARK 3 FREE R VALUE : REMARK 3 FREE R VALUE TEST SET SIZE (%) : REMARK 3 FREE R VALUE TEST SET COUNT : REMARK 3 ESTIMATED ERROR OF FREE R VALUE : REMARK 3 REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. REMARK 3 TOTAL NUMBER OF BINS USED : REMARK 3 BIN RESOLUTION RANGE HIGH (A) : REMARK 3 BIN RESOLUTION RANGE LOW (A) : REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : REMARK 3 REFLECTIONS IN BIN (WORKING SET) : REMARK 3 BIN R VALUE (WORKING SET) : REMARK 3 BIN FREE R VALUE : REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : REMARK 3 BIN FREE R VALUE TEST SET COUNT : REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : REMARK 3 REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. REMARK 3 PROTEIN ATOMS : REMARK 3 NUCLEIC ACID ATOMS : REMARK 3 HETEROGEN ATOMS : REMARK 3 SOLVENT ATOMS : REMARK 3 REMARK 3 B VALUES. REMARK 3 FROM WILSON PLOT (A**2) : REMARK 3 MEAN B VALUE (OVERALL, A**2) : REMARK 3 OVERALL ANISOTROPIC B VALUE. REMARK 3 B11 (A**2) : REMARK 3 B22 (A**2) : REMARK 3 B33 (A**2) : REMARK 3 B12 (A**2) : REMARK 3 B13 (A**2) : REMARK 3 B23 (A**2) : REMARK 3 REMARK 3 ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM LUZZATI PLOT (A) : REMARK 3 ESD FROM SIGMAA (A) : REMARK 3 LOW RESOLUTION CUTOFF (A) : REMARK 3 REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : REMARK 3 ESD FROM C-V SIGMAA (A) : REMARK 3 REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. REMARK 3 BOND LENGTHS (A) : REMARK 3 BOND ANGLES (DEGREES) : REMARK 3 DIHEDRAL ANGLES (DEGREES) : REMARK 3 IMPROPER ANGLES (DEGREES) : REMARK 3 REMARK 3 ISOTROPIC THERMAL MODEL : REMARK 3 REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA REMARK 3 MAIN-CHAIN BOND (A**2) : ; REMARK 3 MAIN-CHAIN ANGLE (A**2) : ; REMARK 3 SIDE-CHAIN BOND (A**2