XML Signatures and Python ElementTree

I just need to sign XML...

Python has a standard library module to handle XML, and there seems to be exactly one library for the signing part: signxml. So it should be straighforward:

from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)

XMLVerifier().verify(signed_xml_obj, x509_cert=cert)

Simple?

But the receiver tells you that your signature does not verify?

The solution code is at the end of the article. The rest explains what is happening.

An optional parameter of the sign() method specifies type of the XML signature, which can be enveloped, enveloping, or detached. This article covers only the default case of enveloped signature.

...and send it forward

When you sign something, usually it is for the purpose of verifying the signature by someone else. Unless that someone else has access to your Python object, we need to serialize the latter:

data_serialized = ET.tostring(signed_xml_obj)

# Sending the data...

XMLVerifier().verify(data_serialized, x509_cert=cert)

And this is not simple any more, because verification of the serialized data fails:

Traceback (most recent call last):
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 729, in verify
    verify(signing_cert, raw_signature, signed_info_c14n, signature_digest_method)
  File "/path/to/python3.6/site-packages/OpenSSL/crypto.py", line 2928, in verify
    _raise_current_error()
  File "/path/to/python3.6/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue
    raise exception_type(errors)
OpenSSL.crypto.Error: [('rsa routines', 'int_rsa_verify', 'bad signature')]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "1.py", line 13, in <module>
    XMLVerifier().verify(data_serialized, x509_cert=cert)
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 735, in verify
    raise InvalidSignature("Signature verification failed: {}".format(reason))
signxml.exceptions.InvalidSignature: Signature verification failed: bad signature

So, did the serialization break the signature? "When nothing helps, read the instructions". Documentation of signxml tells that the return value of sign() is an lxml.etree.Element object, not an xml.etree.ElementTree object.

Next try is the serialization with lxml:

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

XMLVerifier().verify(data_serialized, x509_cert=cert)

The moment of happiness, this works. Simple?

Good if this works for you. But we are sending the serialized XML somewhere faraway and the receiver might first deserialize it, and verify the signature after that on a deserialized version. (Which is not smart, but is probably out of your control.) And do it with ElementTree.

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)

And this fails with the same InvalidSignature. Bump. Now reading the instructions does not help.

Debugging starts

We can check that the string representation of our signed data "looks right" (although you never can be sure with the XML). Compare it to ElementTree.tostring(data_parsed) to note that they are not identical. We do not know yet what is inside ElementTree object, but in its serialized string the signature will fail:

Sent string:

<Test><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11"/><ds:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11"/></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/><ds:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ds:DigestValue>....

De-serialized ElementTree object:

<Test xmlns:ns0="http://www.w3.org/2000/09/xmldsig#"><ns0:Signature><ns0:SignedInfo><ns0:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11" /><ns0:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256" /><ns0:Reference URI=""><ns0:Transforms><ns0:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature" /><ns0:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11" /></ns0:Transforms><ns0:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256" /><ns0:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ns0:DigestValue>...

So far we see two problems in the string produced from the ElementTree object:

  1. Namespace name ds has been changed to ns0.
  2. Its declaration is not under the <Signature> tag, but under the <Test> tag.

InvalidSignature

The InvalidSignature exception we are getting is caused by the first problem. signxml first does signature validation, where the signature value (read from the XML string in our case) is calculated over the <SignedInfo> XML element (canonicalized, but we can skip this complicated issue for now). <SignedInfo> has changed as ds has been replaced by where-did-it-come-from ns0, so the signature does not match.

The solution is register_namespace(prefix, uri) function of ElementTree. Its documentation reads: (...) Tags and attributes in this namespace will be serialized with the given prefix, if at all possible. That's what we need (the author needs to understand why ElementTree is not made to get this from our XML string). The function must be called before verification, not necessarily before the parsing; as the documentation says, (...) any existing mapping for either the given prefix or the namespace URI will be removed.

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

xml_obj = ET.fromstring("<Test/>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
XMLVerifier().verify(data_parsed, x509_cert=cert)

Now if we inspect ET.tostring(data_parsed), we'll see the correct ds namespace in use. The result?

Traceback (most recent call last):
  File "1.py", line 15, in <module>
    XMLVerifier().verify(data_parsed, x509_cert=cert)
  File "/path/to/python3.6/site-packages/signxml/__init__.py", line 765, in verify
    raise InvalidDigest("Digest mismatch for reference {}".format(len(verify_results)))
signxml.exceptions.InvalidDigest: Digest mismatch for reference 0

InvalidDigest

Now signxml has successfully verified <SignatureValue> over the <SignedInfo> (in canonicalized form, but again we may skip this topic for now). The "digest mismatch", about which signxml is complaining now, is between the calculated digest over the original XML document (in canonicalized form...) and the digest string read from the <DigestValue> element under the <Signagure>. It is caused by the "Problem 2" mentioned above. DigestValue is KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=, and it was, in fact, made over the string <Test></Test>:

$ echo -n '<Test></Test>' | openssl sha256 -binary | base64
KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=

But in the deserialized ElementTree object the XML string became <Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>, which of course produces a different SHA-256 digest. It is possible to see (with a debugger) that verify() tries to match the digest of this string.

Such modification of the original XML looks like result of some of canonicalization transforms, but I have not found it in the specs. The examples given show that the namespace declarations are not moved out of the tag inside which they are used.

If we want the receiver of our XML string to be able to parse it with ElementTree and then to successfully verify, we have no other option but to sign exactly such string. I.e. we must produce such argument for sign() which, when canonicalized, produces the string <Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>.

Understanding how to do this requires knowledge about how ElementTree stores its XML objects. What is important here is that it represents namespaces by string prefixes of the tags under them. Because of this, if we only declare a namespace, it will not be saved. We must create some elements under this namespace and call register_namespace(). Let's create element ds:foo. To be valid XML, the string must represent just one root XML element, so add an element <wrapper>:

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
xml_obj = ET.fromstring("<wrapper><ds:foo xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\"/><Test/></wrapper>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)

Now the ds namespace is available in the ElementTree object passed to sign(). When sign() is called, it performs canonicalization which puts the namespace declaration for ds under the first tag in the document. Addition of the signature, which brings in this namespace, already does not change anything inside our <Test> element.

Almost Done

The signagure verifies, but we have added content to the XML we are signing - and also signed it! verify().signed_data will return the content with our additions. Probably, software which signs things (even XML) with such side effects would not gain wide acceptance :-)

This is solved - by luck or by intention - with the possibility provided by signxml to specify the location of the enveloped signature. As the documentation says,

To specify the location of an enveloped signature within data, insert a element in data (where “ds” is the “http://www.w3.org/2000/09/xmldsig#” namespace). This element will be replaced by the generated signature, and excised when generating the digest.

Such insert will introduce the ds namespace into the xml_obj, but no any new elements outside of the <ds:Signature> element. So we'll not add anything to the content which is signed.

from lxml import etree as lxml_ET
from xml.etree import ElementTree as ET
from signxml import XMLSigner, XMLVerifier

cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")]

ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
xml_obj = ET.fromstring("<Test><ds:Signature xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\" Id=\"placeholder\"></ds:Signature></Test>")
signed_xml_obj = XMLSigner().sign(xml_obj, key=key)
data_serialized = lxml_ET.tostring(signed_xml_obj)

# Sending the data...

data_parsed = ET.fromstring(data_serialized)
XMLVerifier().verify(data_parsed, x509_cert=cert)

Such signed serialized data can be also parsed by lxml.etree and successfully verified.

Do not forget that if the receiver is using ElementTree, they must call ElementTree.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#") before calling XMLSigner().verify().

Afterword

After reading this post, you should have no doubts that XML signing is broken.


Contents © 2021 Konstantin Shemyak - Powered by Nikola