XML Signatures with Python ElementTree

I just need to sign XML...

Python has a standard library module to handle XML, and there seems to be exactly one library for the signing part: signxml. So it should be straighforward:

from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

data = ET.fromstring("<Test/>")
signed_root = XMLSigner().sign(data, key=key)

XMLVerifier().verify(signed_root, x509_cert=cert)

Simple?

But the receiver tells that your signature does not verify?

The solution code is at the end of the article. The rest explains what is happening.

...and send it forward

Well, when you sign something, usually it is for the purpose of verifying the signature by someone else. Unless that someone else has access to your Python object, we need to serialize it:

data_serialized = ET.tostring(signed_root)

# Sending the data...

XMLVerifier().verify(data_serialized, x509_cert=cert)

And this is not simple any more, because it fails:

Traceback (most recent call last):
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 729, in verify
    verify(signing_cert, raw_signature, signed_info_c14n, signature_digest_method)
  File "/path/to/python3.6/site-packages/OpenSSL/crypto.py", line 2928, in verify
    _raise_current_error()
  File "/path/to/python3.6/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue
    raise exception_type(errors)
OpenSSL.crypto.Error: [('rsa routines', 'int_rsa_verify', 'bad signature')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "1.py", line 13, in <module>
    XMLVerifier().verify(data_serialized, x509_cert=cert)
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 735, in verify
    raise InvalidSignature("Signature verification failed: {}".format(reason))
signxml.exceptions.InvalidSignature: Signature verification failed: bad signature

So, did the serialization break the signature? "When nothing helps, read the instructions". Documentation of signxml tells that the return value of sign() is an lxml.etree.Element object, not an xml.etree.ElementTree object.

Next try:

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

data = ET.fromstring("<Test/>")
signed_root = XMLSigner().sign(data, key=key)
data_serialized = lxml_ET.tostring(signed_root)

# Sending the data...

XMLVerifier().verify(data_serialized, x509_cert=cert)

The moment of happiness, this works. Simple?

Good if this works for you. But we are sending the serialized XML somewhere faraway and the receiver might first deserialize it, and verify the signature after that on a deserialized version. And do it with ElementTree.

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

data = ET.fromstring("<Test/>")
signed_root = XMLSigner().sign(data, key=key)
data_serialized = lxml_ET.tostring(signed_root)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)

And this fails with the same InvalidSignature. Bump. Now reading the instructions does not help.

Debugging starts

We can check that the string representation of our signed data "looks right" (although you never can be sure with the XML). Compare it to ElementTree.fromstring(data_parsed) to note that they are not identical. We do not know yet what is inside ElementTree object, but in its serialized string the signature will fail:

Sent string:

<Test><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11"/><ds:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11"/></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/><ds:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ds:DigestValue>....

De-serialized ElementTree object:

<Test xmlns:ns0="http://www.w3.org/2000/09/xmldsig#"><ns0:Signature><ns0:SignedInfo><ns0:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11" /><ns0:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256" /><ns0:Reference URI=""><ns0:Transforms><ns0:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature" /><ns0:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11" /></ns0:Transforms><ns0:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256" /><ns0:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ns0:DigestValue>...

So far we see two problems in the string produced from the ElementTree object:

  1. Namespace ds="http://www.w3.org/2000/09/xmldsig#" has been changed to ns0.
  2. Its declaration is not under the <Signature> element, but under the <Test> element.

InvalidSignature

The InvalidSignature exception we are getting is caused by the first problem. signxml first does signature validation, where the signature value (read from the XML string in our case) is calculated over the <SignedInfo> XML element (canonicalized, but we can skip this complicated issue for now). <SignedInfo> has changed, ds has been replaced by where-did-it-come-from ns0, so the signature does not match.

The solution is register_namespace(prefix, uri) function of ElementTree. Its documentation reads: (...) Tags and attributes in this namespace will be serialized with the given prefix, if at all possible. That's what we need (it's unclear why ElementTree is not getting this from our XML string). The function must be called before verification, not necessarily before the parsing; as the documentation says, (...) any existing mapping for either the given prefix or the namespace URI will be removed.

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

data = ET.fromstring("<Test/>")
signed_root = XMLSigner().sign(data, key=key)
data_serialized = lxml_ET.tostring(signed_root)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
XMLVerifier().verify(data_parsed, x509_cert=cert)

Now if we inspect ET.tostring(data_parsed), we'll see the correct ds namespace in use. The result?

Traceback (most recent call last):
  File "1.py", line 15, in <module>
    XMLVerifier().verify(data_parsed, x509_cert=cert)
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 765, in verify
    raise InvalidDigest("Digest mismatch for reference {}".format(len(verify_results)))
signxml.exceptions.InvalidDigest: Digest mismatch for reference 0

InvalidDigest

Now signxml has successfully verified <SignatureValue> over the <SignedInfo> (in canonicalized form, but again we may skip this topic for now). The "digest mismatch", about which signxml is complaining now, is between the calculated digest over the original XML document (in canonicalized form...) and the digest string read from the <DigestValue> element under the <Signagure>. It is caused by the "Problem 2" mentioned above. DigestValue is KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=, and we can verify that it is made over the string <Test></Test>:

$ echo -n '<Test></Test>' | openssl sha256 -binary | base64
KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=

But as we see in the deserialized ElementTree object, our XML string became now <Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>, which of course produces a different SHA-256 digest.

This unexpected modification of the original XML is an unfortunate result of canonicalization. (TODO: ref to the place in the spec.)

If we want the receiver of our XML string to be able to parse it with ElementTree and then to successfully verify, we do not have other option but to sign exactly such string. I.e. we must produce such argument for sign() which, when canonicalized, would produce the string <Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>.

Understanding how to do this requires knowledge about how ElementTree stores its XML objects. What is important here is that it represents namespaces by string prefixes of the tags under them. Because of this, if we only declare a namespace, it will not be saved. We must create some elements under this namespace (and call register_namespace() before creating the object). Let's create element ds:foo:

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
data = ET.fromstring("<wrapper><ds:foo xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\"/><Test/></wrapper>")
signed_root = XMLSigner().sign(data, key=key)
data_serialized = lxml_ET.tostring(signed_root)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)

Now the ds namespace is available in the ElementTree object passed to sign(). When sign() is called, it performs canonicalization which puts the namespace declaration for ds under the first tag in the document. Addition of the signature, which brings in this namespace, already does not change anything inside our <Test> element.

Almost Done

The signagure verifies, but we have added content to the XML we are signing - and also signed it! verify().signed_data will return the content with our additions. Probably, software which signs things (even XML) with such side effects would not gain wide acceptance :-)

This is solved - by luck or by intention - with the possibility provided by signxml to specify the location of the enveloped signature. As the documentation says,

To specify the location of an enveloped signature within data, insert a element in data (where “ds” is the “http://www.w3.org/2000/09/xmldsig#” namespace). This element will be replaced by the generated signature, and excised when generating the digest.

Such insert will introduce the ds namespace into the data, but no any new element outside of the <ds:Signature> element. So we'll not add anything to the content which is signed.

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
data = ET.fromstring("<Test><ds:Signature xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\" Id=\"placeholder\"></ds:Signature></Test>")
signed_root = XMLSigner().sign(data, key=key)
data_serialized = lxml_ET.tostring(signed_root)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)

Such signed serialized data can be also parsed by lxml.etree and successfully verified.

Afterword

After reading this post, you should have no doubts that XML security is broken.


Contents © 2019 Konstantin Shemyak - Powered by Nikola