aboutsummaryrefslogtreecommitdiffstats
path: root/docs/universal_types.md
diff options
context:
space:
mode:
Diffstat (limited to 'docs/universal_types.md')
-rw-r--r--docs/universal_types.md675
1 files changed, 675 insertions, 0 deletions
diff --git a/docs/universal_types.md b/docs/universal_types.md
new file mode 100644
index 0000000..048a135
--- /dev/null
+++ b/docs/universal_types.md
@@ -0,0 +1,675 @@
+# Universal Types with BER/DER Decoder and DER Encoder
+
+The *asn1crypto* library is a combination of universal type classes that
+implement BER/DER decoding and DER encoding, a PEM encoder and decoder, and a
+number of pre-built cryptographic type classes. This document covers the
+universal type classes.
+
+For a general overview of ASN.1 as used in cryptography, please see
+[A Layman's Guide to a Subset of ASN.1, BER, and DER](http://luca.ntop.org/Teaching/Appunti/asn1.html).
+
+This page contains the following sections:
+
+ - [Universal Types](#universal-types)
+ - [Basic Usage](#basic-usage)
+ - [Sequence](#sequence)
+ - [Set](#set)
+ - [SequenceOf](#sequenceof)
+ - [SetOf](#setof)
+ - [Integer](#integer)
+ - [Enumerated](#enumerated)
+ - [ObjectIdentifier](#objectidentifier)
+ - [BitString](#bitstring)
+ - [Strings](#strings)
+ - [UTCTime](#utctime)
+ - [GeneralizedTime](#generalizedtime)
+ - [Choice](#choice)
+ - [Any](#any)
+ - [Specification via OID](#specification-via-oid)
+ - [Explicit and Implicit Tagging](#explicit-and-implicit-tagging)
+
+## Universal Types
+
+For general purpose ASN.1 parsing, the `asn1crypto.core` module is used. It
+contains the following classes, that parse, represent and serialize all of the
+ASN.1 universal types:
+
+| Class | Native Type | Implementation Notes |
+| ------------------ | -------------------------------------- | ------------------------------------ |
+| `Boolean` | `bool` | |
+| `Integer` | `int` | may be `long` on Python 2 |
+| `BitString` | `tuple` of `int` or `set` of `unicode` | `set` used if `_map` present |
+| `OctetString` | `bytes` (`str`) | |
+| `Null` | `None` | |
+| `ObjectIdentifier` | `str` (`unicode`) | string is dotted integer format |
+| `ObjectDescriptor` | | no native conversion |
+| `InstanceOf` | | no native conversion |
+| `Real` | | no native conversion |
+| `Enumerated` | `str` (`unicode`) | `_map` must be set |
+| `UTF8String` | `str` (`unicode`) | |
+| `RelativeOid` | `str` (`unicode`) | string is dotted integer format |
+| `Sequence` | `OrderedDict` | |
+| `SequenceOf` | `list` | |
+| `Set` | `OrderedDict` | |
+| `SetOf` | `list` | |
+| `EmbeddedPdv` | `OrderedDict` | no named field parsing |
+| `NumericString` | `str` (`unicode`) | no charset limitations |
+| `PrintableString` | `str` (`unicode`) | no charset limitations |
+| `TeletexString` | `str` (`unicode`) | |
+| `VideotexString` | `bytes` (`str`) | no unicode conversion |
+| `IA5String` | `str` (`unicode`) | |
+| `UTCTime` | `datetime.datetime` | |
+| `GeneralizedTime` | `datetime.datetime` | treated as UTC when no timezone |
+| `GraphicString` | `str` (`unicode`) | unicode conversion as latin1 |
+| `VisibleString` | `str` (`unicode`) | no charset limitations |
+| `GeneralString` | `str` (`unicode`) | unicode conversion as latin1 |
+| `UniversalString` | `str` (`unicode`) | |
+| `CharacterString` | `str` (`unicode`) | unicode conversion as latin1 |
+| `BMPString` | `str` (`unicode`) | |
+
+For *Native Type*, the Python 3 type is listed first, with the Python 2 type
+in parentheses.
+
+As mentioned next to some of the types, value parsing may not be implemented
+for types not currently used in cryptography (such as `ObjectDescriptor`,
+`InstanceOf` and `Real`). Additionally some of the string classes don't
+enforce character set limitations, and for some string types that accept all
+different encodings, the default encoding is set to latin1.
+
+In addition, there are a few overridden types where various specifications use
+a `BitString` or `OctetString` type to represent a different type. These
+include:
+
+| Class | Native Type | Implementation Notes |
+| -------------------- | ------------------- | ------------------------------- |
+| `OctetBitString` | `bytes` (`str`) | |
+| `IntegerBitString` | `int` | may be `long` on Python 2 |
+| `IntegerOctetString` | `int` | may be `long` on Python 2 |
+
+For situations where the DER encoded bytes from one type is embedded in another,
+the `ParsableOctetString` and `ParsableOctetBitString` classes exist. These
+function the same as `OctetString` and `OctetBitString`, however they also
+have an attribute `.parsed` and a method `.parse()` that allows for
+parsing the content as ASN.1 structures.
+
+All of these overrides can be used with the `cast()` method to convert between
+them. The only requirement is that the class being casted to has the same tag
+as the original class. No re-encoding is done, rather the contents are simply
+re-interpreted.
+
+```python
+from asn1crypto.core import BitString, OctetBitString, IntegerBitString
+
+bit = BitString({
+ 0, 0, 0, 0, 0, 0, 0, 1,
+ 0, 0, 0, 0, 0, 0, 1, 0,
+})
+
+# Will print (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0)
+print(bit.native)
+
+octet = bit.cast(OctetBitString)
+
+# Will print b'\x01\x02'
+print(octet.native)
+
+i = bit.cast(IntegerBitString)
+
+# Will print 258
+print(i.native)
+```
+
+## Basic Usage
+
+All of the universal types implement four methods, a class method `.load()` and
+the instance methods `.dump()`, `.copy()` and `.debug()`.
+
+`.load()` accepts a byte string of DER or BER encoded data and returns an
+object of the class it was called on. `.dump()` returns the serialization of
+an object into DER encoding.
+
+```python
+from asn1crypto.core import Sequence
+
+parsed = Sequence.load(der_byte_string)
+serialized = parsed.dump()
+```
+
+By default, *asn1crypto* tries to be efficient and caches serialized data for
+better performance. If the input data is possibly BER encoded, but the output
+must be DER encoded, the `force` parameter may be used with `.dump()`.
+
+```python
+from asn1crypto.core import Sequence
+
+parsed = Sequence.load(der_byte_string)
+der_serialized = parsed.dump(force=True)
+```
+
+The `.copy()` method creates a deep copy of an object, allowing child fields to
+be modified without affecting the original.
+
+```python
+from asn1crypto.core import Sequence
+
+seq1 = Sequence.load(der_byte_string)
+seq2 = seq1.copy()
+seq2[0] = seq1[0] + 1
+if seq1[0] != seq2[0]:
+ print('Copies have distinct contents')
+```
+
+The `.debug()` method is available to help in situations where interaction with
+another ASN.1 serializer or parsing is not functioning as expected. Calling
+this method will print a tree structure with information about the header bytes,
+class, method, tag, special tagging, content bytes, native Python value, child
+fields and any sub-parsed values.
+
+```python
+from asn1crypto.core import Sequence
+
+parsed = Sequence.load(der_byte_string)
+parsed.debug()
+```
+
+In addition to the available methods, every instance has a `.native` property
+that converts the data into a native Python data type.
+
+```python
+import pprint
+from asn1crypto.core import Sequence
+
+parsed = Sequence.load(der_byte_string)
+pprint(parsed.native)
+```
+
+## Sequence
+
+One of the core structures when dealing with ASN.1 is the Sequence type. The
+`Sequence` class can handle field with universal data types, however in most
+situations the `_fields` property will need to be set with the expected
+definition of each field in the Sequence.
+
+### Configuration
+
+The `_fields` property must be set to a `list` of 2-3 element `tuple`s. The
+first element in the tuple must be a unicode string of the field name. The
+second must be a type class - either a universal type, or a custom type. The
+third, and optional, element is a `dict` with parameters to pass to the type
+class for things like default values, marking the field as optional, or
+implicit/explicit tagging.
+
+```python
+from asn1crypto.core import Sequence, Integer, OctetString, IA5String
+
+class MySequence(Sequence):
+ _fields = [
+ ('field_one', Integer),
+ ('field_two', OctetString),
+ ('field_three', IA5String, {'optional': True}),
+ ]
+```
+
+Implicit and explicit tagging will be covered in more detail later, however
+the following are options that can be set for each field type class:
+
+ - `{'default: 1}` sets the field's default value to `1`, allowing it to be
+ omitted from the serialized form
+ - `{'optional': True}` set the field to be optional, allowing it to be
+ omitted
+
+### Usage
+
+To access values of the sequence, use dict-like access via `[]` and use the
+name of the field:
+
+```python
+seq = MySequence.load(der_byte_string)
+print(seq['field_two'].native)
+```
+
+The values of fields can be set by assigning via `[]`. If the value assigned is
+of the correct type class, it will be used as-is. If the value is not of the
+correct type class, a new instance of that type class will be created and the
+value will be passed to the constructor.
+
+```python
+seq = MySequence.load(der_byte_string)
+# These statements will result in the same state
+seq['field_one'] = Integer(5)
+seq['field_one'] = 5
+```
+
+When fields are complex types such as `Sequence` or `SequenceOf`, there is no
+way to construct the value out of a native Python data type.
+
+### Optional Fields
+
+When a field is configured via the `optional` parameter, not present in the
+`Sequence`, but accessed, the `VOID` object will be returned. This is an object
+that is serialized to an empty byte string and returns `None` when `.native` is
+accessed.
+
+## Set
+
+The `Set` class is configured in the same was as `Sequence`, however it allows
+serialized fields to be in any order, per the ASN.1 standard.
+
+```python
+from asn1crypto.core import Set, Integer, OctetString, IA5String
+
+class MySet(Set):
+ _fields = [
+ ('field_one', Integer),
+ ('field_two', OctetString),
+ ('field_three', IA5String, {'optional': True}),
+ ]
+```
+
+## SequenceOf
+
+The `SequenceOf` class is used to allow for zero or more instances of a type.
+The class uses the `_child_spec` property to define the instance class type.
+
+```python
+from asn1crypto.core import SequenceOf, Integer
+
+class Integers(SequenceOf):
+ _child_spec = Integer
+```
+
+Values in the `SequenceOf` can be accessed via `[]` with an integer key. The
+length of the `SequenceOf` is determined via `len()`.
+
+```python
+values = Integers.load(der_byte_string)
+for i in range(0, len(values)):
+ print(values[i].native)
+```
+
+## SetOf
+
+The `SetOf` class is an exact duplicate of `SequenceOf`. According to the ASN.1
+standard, the difference is that a `SequenceOf` is explicitly ordered, however
+`SetOf` may be in any order. This is an equivalent comparison of a Python `list`
+and `set`.
+
+```python
+from asn1crypto.core import SetOf, Integer
+
+class Integers(SetOf):
+ _child_spec = Integer
+```
+
+## Integer
+
+The `Integer` class allows values to be *named*. An `Integer` with named values
+may contain any integer, however special values with named will be represented
+as those names when `.native` is called.
+
+Named values are configured via the `_map` property, which must be a `dict`
+with the keys being integers and the values being unicode strings.
+
+```python
+from asn1crypto.core import Integer
+
+class Version(Integer):
+ _map = {
+ 1: 'v1',
+ 2: 'v2',
+ }
+
+# Will print: "v1"
+print(Version(1).native)
+
+# Will print: 4
+print(Version(4).native)
+```
+
+## Enumerated
+
+The `Enumerated` class is almost identical to `Integer`, however only values in
+the `_map` property are valid.
+
+```python
+from asn1crypto.core import Enumerated
+
+class Version(Enumerated):
+ _map = {
+ 1: 'v1',
+ 2: 'v2',
+ }
+
+# Will print: "v1"
+print(Version(1).native)
+
+# Will raise a ValueError exception
+print(Version(4).native)
+```
+
+## ObjectIdentifier
+
+The `ObjectIdentifier` class represents values of the ASN.1 type of the same
+name. `ObjectIdentifier` instances are converted to a unicode string in a
+dotted-integer format when `.native` is accessed.
+
+While this standard conversion is a reasonable baseline, in most situations
+it will be more maintainable to map the OID strings to a unicode string
+containing a description of what the OID repesents.
+
+The mapping of OID strings to name strings is configured via the `_map`
+property, which is a `dict` object with keys being unicode OID string and the
+values being a unicode string.
+
+The `.dotted` attribute will always return a unicode string of the dotted
+integer form of the OID.
+
+The class methods `.map()` and `.unmap()` will convert a dotted integer unicode
+string to the user-friendly name, and vice-versa.
+
+```python
+from asn1crypto.core import ObjectIdentifier
+
+class MyType(ObjectIdentifier):
+ _map = {
+ '1.8.2.1.23': 'value_name',
+ '1.8.2.1.24': 'other_value',
+ }
+
+# Will print: "value_name"
+print(MyType('1.8.2.1.23').native)
+
+# Will print: "1.8.2.1.23"
+print(MyType('1.8.2.1.23').dotted)
+
+# Will print: "1.8.2.1.25"
+print(MyType('1.8.2.1.25').native)
+
+# Will print "value_name"
+print(MyType.map('1.8.2.1.23'))
+
+# Will print "1.8.2.1.23"
+print(MyType.unmap('value_name'))
+```
+
+## BitString
+
+When no `_map` is set for a `BitString` class, the native representation is a
+`tuple` of `int`s (being either `1` or `0`).
+
+```python
+from asn1crypto.core import BitString
+
+b1 = BitString((1, 0, 1))
+```
+
+Additionally, it is possible to set the `_map` property to a dict where the
+keys are bit indexes and the values are unicode string names. This allows
+checking the value of a given bit by item access, and the native representation
+becomes a `set` of unicode strings.
+
+```python
+from asn1crypto.core import BitString
+
+class MyFlags(BitString):
+ _map = {
+ 0: 'edit',
+ 1: 'delete',
+ 2: 'manage_users',
+ }
+
+permissions = MyFlags({'edit', 'delete'})
+
+# This will be printed
+if permissions['edit'] and permissions['delete']:
+ print('Can edit and delete')
+
+# This will not
+if 'manage_users' in permissions.native:
+ print('Is admin')
+```
+
+## Strings
+
+ASN.1 contains quite a number of string types:
+
+| Type | Standard Encoding | Implementation Encoding | Notes |
+| ----------------- | --------------------------------- | ----------------------- | ------------------------------------------------------------------------- |
+| `UTF8String` | UTF-8 | UTF-8 | |
+| `NumericString` | ASCII `[0-9 ]` | ISO 8859-1 | The implementation is a superset of supported characters |
+| `PrintableString` | ASCII `[a-zA-Z0-9 '()+,\\-./:=?]` | ISO 8859-1 | The implementation is a superset of supported characters |
+| `TeletexString` | ITU T.61 | Custom | The implementation is based off of https://en.wikipedia.org/wiki/ITU_T.61 |
+| `VideotexString` | *?* | *None* | This has no set encoding, and it not used in cryptography |
+| `IA5String` | ITU T.50 (very similar to ASCII) | ISO 8859-1 | The implementation is a superset of supported characters |
+| `GraphicString` | * | ISO 8859-1 | This has not set encoding, but seems to often contain ISO 8859-1 |
+| `VisibleString` | ASCII (printable) | ISO 8859-1 | The implementation is a superset of supported characters |
+| `GeneralString` | * | ISO 8859-1 | This has not set encoding, but seems to often contain ISO 8859-1 |
+| `UniversalString` | UTF-32 | UTF-32 | |
+| `CharacterString` | * | ISO 8859-1 | This has not set encoding, but seems to often contain ISO 8859-1 |
+| `BMPString` | UTF-16 | UTF-16 | |
+
+As noted in the table above, many of the implementations are supersets of the
+supported characters. This simplifies parsing, but puts the onus of using valid
+characters on the developer. However, in general `UTF8String`, `BMPString` or
+`UniversalString` should be preferred when a choice is given.
+
+All string types other than `VideotexString` are created from unicode strings.
+
+```python
+from asn1crypto.core import IA5String
+
+print(IA5String('Testing!').native)
+```
+
+## UTCTime
+
+The class `UTCTime` accepts a unicode string in one of the formats:
+
+ - `%y%m%d%H%MZ`
+ - `%y%m%d%H%M%SZ`
+ - `%y%m%d%H%M%z`
+ - `%y%m%d%H%M%S%z`
+
+or a `datetime.datetime` instance. See the
+[Python datetime strptime() reference](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
+for details of the formats.
+
+When `.native` is accessed, it returns a `datetime.datetime` object with a
+`tzinfo` of `asn1crypto.util.timezone.utc`.
+
+## GeneralizedTime
+
+The class `GeneralizedTime` accepts a unicode string in one of the formats:
+
+ - `%Y%m%d%H`
+ - `%Y%m%d%H%M`
+ - `%Y%m%d%H%M%S`
+ - `%Y%m%d%H%M%S.%f`
+ - `%Y%m%d%HZ`
+ - `%Y%m%d%H%MZ`
+ - `%Y%m%d%H%M%SZ`
+ - `%Y%m%d%H%M%S.%fZ`
+ - `%Y%m%d%H%z`
+ - `%Y%m%d%H%M%z`
+ - `%Y%m%d%H%M%S%z`
+ - `%Y%m%d%H%M%S.%f%z`
+
+or a `datetime.datetime` instance. See the
+[Python datetime strptime() reference](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior)
+for details of the formats.
+
+When `.native` is accessed, it returns a `datetime.datetime` object with a
+`tzinfo` of `asn1crypto.util.timezone.utc`. For formats where the time has a
+timezone offset is specified (`[+-]\d{4}`), the time is converted to UTC. For
+times without a timezone, the time is assumed to be in UTC.
+
+## Choice
+
+The `Choice` class allows handling ASN.1 Choice structures. The `_alternatives`
+property must be set to a `list` containing 2-3 element `tuple`s. The first
+element in the tuple is the alternative name. The second element is the type
+class for the alternative. The, optional, third element is a `dict` of
+parameters to pass to the type class constructor. This is used primarily for
+implicit and explicit tagging.
+
+```python
+from asn1crypto.core import Choice, Integer, OctetString, IA5String
+
+class MyChoice(Choice):
+ _alternatives = [
+ ('option_one', Integer),
+ ('option_two', OctetString),
+ ('option_three', IA5String),
+ ]
+```
+
+`Choice` objects has two extra properties, `.name` and `.chosen`. The `.name`
+property contains the name of the chosen alternative. The `.chosen` property
+contains the instance of the chosen type class.
+
+```python
+parsed = MyChoice.load(der_bytes)
+print(parsed.name)
+print(type(parsed.chosen))
+```
+
+The `.native` property and `.dump()` method work as with the universal type
+classes. Under the hood they just proxy the calls to the `.chosen` object.
+
+## Any
+
+The `Any` class implements the ASN.1 Any type, which allows any data type. By
+default objects of this class do not perform any parsing. However, the
+`.parse()` instance method allows parsing the contents of the `Any` object,
+either into a universal type, or to a specification pass in via the `spec`
+parameter.
+
+This type is not used as a top-level structure, but instead allows `Sequence`
+and `Set` objects to accept varying contents, usually based on some sort of
+`ObjectIdentifier`.
+
+```python
+from asn1crypto.core import Sequence, ObjectIdentifier, Any, Integer, OctetString
+
+class MySequence(Sequence):
+ _fields = [
+ ('type', ObjectIdentifier),
+ ('value', Any),
+ ]
+```
+
+## Specification via OID
+
+Throughout the usage of ASN.1 in cryptography, a pattern is present where an
+`ObjectIdenfitier` is used to determine what specification should be used to
+interpret another field in a `Sequence`. Usually the other field is an instance
+of `Any`, however occasionally it is an `OctetString` or `OctetBitString`.
+
+*asn1crypto* provides the `_oid_pair` and `_oid_specs` properties of the
+`Sequence` class to allow handling these situations.
+
+The `_oid_pair` is a tuple with two unicode string elements. The first is the
+name of the field that is an `ObjectIdentifier` and the second if the name of
+the field that has a variable specification based on the first field. *In
+situations where the value field should be an `OctetString` or `OctetBitString`,
+`ParsableOctetString` and `ParsableOctetBitString` will need to be used instead
+to allow for the sub-parsing of the contents.*
+
+The `_oid_specs` property is a `dict` object with `ObjectIdentifier` values as
+the keys (either dotted or mapped notation) and a type class as the value. When
+the first field in `_oid_pair` has a value equal to one of the keys in
+`_oid_specs`, then the corresponding type class will be used as the
+specification for the second field of `_oid_pair`.
+
+```python
+from asn1crypto.core import Sequence, ObjectIdentifier, Any, OctetString, Integer
+
+class MyId(ObjectIdentifier):
+ _map = {
+ '1.2.3.4': 'initialization_vector',
+ '1.2.3.5': 'iterations',
+ }
+
+class MySequence(Sequence):
+ _fields = [
+ ('type', MyId),
+ ('value', Any),
+ ]
+
+ _oid_pair = ('type', 'value')
+ _oid_specs = {
+ 'initialization_vector': OctetString,
+ 'iterations': Integer,
+ }
+```
+
+## Explicit and Implicit Tagging
+
+When working with `Sequence`, `Set` and `Choice` it is often necessary to
+disambiguate between fields because of a number of factors:
+
+ - In `Sequence` the presence of an optional field must be determined by tag number
+ - In `Set`, each field must have a different tag number since they can be in any order
+ - In `Choice`, each alternative must have a different tag number to determine which is present
+
+The universal types all have unique tag numbers. However, if a `Sequence`, `Set`
+or `Choice` has more than one field with the same universal type, tagging allows
+a way to keep the semantics of the original type, but with a different tag
+number.
+
+Implicit tagging simply changes the tag number of a type to a different value.
+However, Explicit tagging wraps the existing type in another tag with the
+specified tag number.
+
+In general, most situations allow for implicit tagging, with the notable
+exception than a field that is a `Choice` type must always be explicitly tagged.
+Otherwise, using implicit tagging would modify the tag of the chosen
+alternative, breaking the mechanism by which `Choice` works.
+
+Here is an example of implicit and explicit tagging where explicit tagging on
+the `Sequence` allows a `Choice` type field to be optional, and where implicit
+tagging in the `Choice` structure allows disambiguating between two string of
+the same type.
+
+```python
+from asn1crypto.core import Sequence, Choice, IA5String, UTCTime, ObjectIdentifier
+
+class Person(Choice):
+ _alternatives = [
+ ('name', IA5String),
+ ('email', IA5String, {'implicit': 0}),
+ ]
+
+class Record(Sequence):
+ _fields = [
+ ('id', ObjectIdentifier),
+ ('created', UTCTime),
+ ('creator', Person, {'explicit': 0, 'optional': True}),
+ ]
+```
+
+As is shown above, the keys `implicit` and `explicit` are used for tagging,
+and are passed to a type class constructor via the optional third element of
+a field or alternative tuple. Both parameters may be an integer tag number, or
+a 2-element tuple of string class name and integer tag.
+
+If a tagging value needs its tagging changed, the `.untag()` method can be used
+to create a copy of the object without explicit/implicit tagging. The `.retag()`
+method can be used to change the tagging. This method accepts one parameter, a
+dict with either or both of the keys `implicit` and `explicit`.
+
+```python
+person = Person(name='email', value='will@wbond.net')
+
+# Will display True
+print(person.implicit)
+
+# Will display False
+print(person.untag().implicit)
+
+# Will display 0
+print(person.tag)
+
+# Will display 1
+print(person.retag({'implicit': 1}).tag)
+```