.. title: XML Signatures and Python ElementTree
.. slug: xml-signatures-and-python-elementtree
.. date: 2019-06-16 16:00
.. tags: XML,signature,Python
.. category: 
.. link: 
.. description: 
.. type: text

## I just need to sign XML...

Python has a standard library module to handle XML, and there seems to be exactly one library for the signing part: `signxml`.
So it should be straighforward:

```
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)

XMLVerifier().verify(signed_xml_obj, x509_cert=cert)
```

Simple?

But the receiver tells you that your signature does not verify?

The solution code is at the [end of the article](#tldr). The rest explains what is happening.

An optional parameter of the `sign()` method specifies **type of the XML signature**, which
can be *enveloped*, *enveloping*, or *detached*. This article covers only the default
case of *enveloped* signature.

## ...and send it forward

When you sign something, usually it is for the purpose of verifying the signature by someone else.
Unless that someone else has access to your Python object, we need to serialize the latter:

```
data_serialized = ET.tostring(signed_xml_obj)

# Sending the data...

XMLVerifier().verify(data_serialized, x509_cert=cert)
```

And this is not simple any more, because verification of the serialized data fails:

```
Traceback (most recent call last):
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 729, in verify
    verify(signing_cert, raw_signature, signed_info_c14n, signature_digest_method)
  File "/path/to/python3.6/site-packages/OpenSSL/crypto.py", line 2928, in verify
    _raise_current_error()
  File "/path/to/python3.6/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue
    raise exception_type(errors)
OpenSSL.crypto.Error: [('rsa routines', 'int_rsa_verify', 'bad signature')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "1.py", line 13, in <module>
    XMLVerifier().verify(data_serialized, x509_cert=cert)
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 735, in verify
    raise InvalidSignature("Signature verification failed: {}".format(reason))
signxml.exceptions.InvalidSignature: Signature verification failed: bad signature
```

So, did the serialization break the signature?
"When nothing helps, read the instructions". Documentation of `signxml`
[tells](https://signxml.readthedocs.io/en/latest/#signxml.XMLSigner) that
the return value of `sign()` is an `lxml.etree.Element` object, not an `xml.etree.ElementTree` object.

Next try is the serialization with `lxml`:

```
from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

XMLVerifier().verify(data_serialized, x509_cert=cert)
```

The moment of happiness, this works. Simple?

Good if this works for you. But we are sending the serialized XML somewhere faraway and 
the receiver might first deserialize it, and verify the signature after that on a
deserialized version. (Which is not smart, but is probably out of your control.)
And do it with ElementTree.

```
from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)
```

And this fails with the same `InvalidSignature`. Bump. Now reading the instructions does not help.

## Debugging starts

We can check that the string representation of our **signed** data "looks right" (although you never can be sure with the XML).
Compare it to `ElementTree.tostring(data_parsed)` to note that they are not identical.
We do not know yet what is inside ElementTree object, but in its serialized string the signature will fail:

Sent string:
```
<Test><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11"/><ds:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11"/></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/><ds:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ds:DigestValue>....
```

De-serialized ElementTree object:
```
<Test xmlns:ns0="http://www.w3.org/2000/09/xmldsig#"><ns0:Signature><ns0:SignedInfo><ns0:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11" /><ns0:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256" /><ns0:Reference URI=""><ns0:Transforms><ns0:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature" /><ns0:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11" /></ns0:Transforms><ns0:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256" /><ns0:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ns0:DigestValue>...
```

So far we see two problems in the string produced from the `ElementTree` object:

1. Namespace name `ds` has been changed to `ns0`.
2. Its declaration is not under the `<Signature>` tag, but under the `<Test>` tag.

## InvalidSignature

The `InvalidSignature` exception we are getting is caused by the first problem.
`signxml` first does [signature validation](https://tools.ietf.org/html/rfc3275#section-3.2.2),
where the signature value (read from the XML string in our case) is calculated over the
`<SignedInfo>` XML element (*canonicalized*, but we can skip this **complicated** issue for now).
`<SignedInfo>` has changed as `ds` has been replaced by where-did-it-come-from `ns0`, so the
signature does not match.
 

The solution is [register_namespace(prefix, uri)](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.register_namespace)
function of `ElementTree`. Its documentation reads: 
_(...) Tags and attributes in this namespace will be serialized with the given prefix, if at all possible._
That's what we need (the author needs to understand why `ElementTree` is not made to get this from our XML string).
The function must be called before verification, not necessarily before the parsing; as the documentation says,
_(...) any existing mapping for either the given prefix or the namespace URI will be removed._


```
from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
XMLVerifier().verify(data_parsed, x509_cert=cert)
```

Now if we inspect `ET.tostring(data_parsed)`, we'll see the correct `ds` namespace in use.
The result?

```
Traceback (most recent call last):
  File "1.py", line 15, in <module>
    XMLVerifier().verify(data_parsed, x509_cert=cert)
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 765, in verify
    raise InvalidDigest("Digest mismatch for reference {}".format(len(verify_results)))
signxml.exceptions.InvalidDigest: Digest mismatch for reference 0
```

## InvalidDigest

Now `signxml` has successfully verified `<SignatureValue>` over the `<SignedInfo>`
(in *canonicalized* form, but again we may skip this topic for now).
The "digest mismatch", about which `signxml` is complaining now, is between the calculated digest over the
**original XML document** (in canonicalized form...) and the digest string read from the `<DigestValue>`
element under the `<Signagure>`. It is caused by the "Problem 2" mentioned above.
`DigestValue` is `KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=`,
and it was, in fact, made over the string `<Test></Test>`:

```
$ echo -n '<Test></Test>' | openssl sha256 -binary | base64
KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=
```

But in the deserialized ElementTree object the XML string became
`<Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>`, which of course produces a different SHA-256 digest.
It is possible to see (with a debugger) that `verify()` tries to match the digest of **this** string.

Such modification of the original XML looks like result of some of *canonicalization*
transforms, but I have not found it in the specs.
The [examples given](https://www.w3.org/TR/xml-c14n11/#Example-SETags) show that the
namespace declarations are not moved out of the tag inside which they are used.

If we want the receiver of our XML string to be able to parse it with ElementTree
and then to successfully verify, we have no other option but to sign exactly
such string. I.e. we must produce such argument for `sign()` which, when canonicalized,
produces the string `<Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>`.

Understanding how to do this requires knowledge about how ElementTree stores its XML objects.
What is important here is that it represents namespaces by string prefixes of the tags under them.
Because of this, if we only **declare** a namespace, it will not be saved.
We must create some elements under this namespace **and** call `register_namespace()`.
Let's create element `ds:foo`. To be valid XML, the string must represent just
one root XML element, so add an element `<wrapper>`:


```
from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
xml_obj = ET.fromstring("<wrapper><ds:foo xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\"/><Test/></wrapper>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)
```

Now the `ds` namespace is available in the ElementTree object passed to `sign()`.
When `sign()` is called, it performs **canonicalization** which puts the namespace
declaration for `ds` under the first tag in the document. Addition of the signature,
which brings in this namespace, already does not change anything inside our `<Test>` element.

## Almost Done

The signagure verifies, but we have added content to the XML we are signing - and
also signed it! 
`verify().signed_data` will return the content with our additions.
Probably, software which signs things (even XML) with **such** side effects
would not gain wide acceptance :-)

This is solved - by luck or by intention - with the possibility provided by `signxml`
to specify the location of the enveloped signature.
As the [documentation](https://signxml.readthedocs.io/en/latest/#signxml.XMLSigner) says,


> To specify the location of an enveloped signature within data, insert a 
<ds:Signature Id="placeholder"></ds:Signature> element in data 
(where “ds” is the “http://www.w3.org/2000/09/xmldsig#” namespace). 
This element will be replaced by the generated signature, and excised when generating the digest.


Such insert will introduce the `ds` namespace into the `xml_obj`,
but no any new elements outside of the `<ds:Signature>` element.
So we'll not add anything to the content which is signed.

<a name="tldr"/>

```
from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
xml_obj = ET.fromstring("<Test><ds:Signature xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\" Id=\"placeholder\"></ds:Signature></Test>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)
```

Such signed serialized data can be also parsed by `lxml.etree` and successfully verified.

Do not forget that if the receiver is using ElementTree, they must call 
`ElementTree.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")` 
before calling `XMLSigner().verify()`.

## Afterword

After reading this post, you should have no doubts that 
[XML signing is broken](http://www.cs.auckland.ac.nz/~pgut001/pubs/xmlsec.txt).
