XML Signatures with Python ElementTree
I just need to sign XML...
Python has a standard library module to handle XML, and there seems to be exactly one library for the signing part: signxml
.
So it should be straighforward:
from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] data = ET.fromstring("<Test/>") signed_root = XMLSigner().sign(data, key=key) XMLVerifier().verify(signed_root, x509_cert=cert)
Simple?
But the receiver tells that your signature does not verify?
The solution code is at the end of the article. The rest explains what is happening.
...and send it forward
Well, when you sign something, usually it is for the purpose of verifying the signature by someone else. Unless that someone else has access to your Python object, we need to serialize it:
data_serialized = ET.tostring(signed_root) # Sending the data... XMLVerifier().verify(data_serialized, x509_cert=cert)
And this is not simple any more, because it fails:
Traceback (most recent call last): File "/path/to/python3.6/site-packages/signxml/__init__.py", line 729, in verify verify(signing_cert, raw_signature, signed_info_c14n, signature_digest_method) File "/path/to/python3.6/site-packages/OpenSSL/crypto.py", line 2928, in verify _raise_current_error() File "/path/to/python3.6/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue raise exception_type(errors) OpenSSL.crypto.Error: [('rsa routines', 'int_rsa_verify', 'bad signature')] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "1.py", line 13, in <module> XMLVerifier().verify(data_serialized, x509_cert=cert) File "/path/to/python3.6/site-packages/signxml/__init__.py", line 735, in verify raise InvalidSignature("Signature verification failed: {}".format(reason)) signxml.exceptions.InvalidSignature: Signature verification failed: bad signature
So, did the serialization break the signature?
"When nothing helps, read the instructions". Documentation of signxml
tells that
the return value of sign()
is an lxml.etree.Element
object, not an xml.etree.ElementTree
object.
Next try:
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] data = ET.fromstring("<Test/>") signed_root = XMLSigner().sign(data, key=key) data_serialized = lxml_ET.tostring(signed_root) # Sending the data... XMLVerifier().verify(data_serialized, x509_cert=cert)
The moment of happiness, this works. Simple?
Good if this works for you. But we are sending the serialized XML somewhere faraway and the receiver might first deserialize it, and verify the signature after that on a deserialized version. And do it with ElementTree.
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] data = ET.fromstring("<Test/>") signed_root = XMLSigner().sign(data, key=key) data_serialized = lxml_ET.tostring(signed_root) # Sending the data... data_parsed = ET.fromstring(data_serialized) XMLVerifier().verify(data_parsed, x509_cert=cert)
And this fails with the same InvalidSignature
. Bump. Now reading the instructions does not help.
Debugging starts
We can check that the string representation of our signed data "looks right" (although you never can be sure with the XML).
Compare it to ElementTree.fromstring(data_parsed)
to note that they are not identical.
We do not know yet what is inside ElementTree object, but in its serialized string the signature will fail:
Sent string:
<Test><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11"/><ds:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11"/></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/><ds:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ds:DigestValue>....
De-serialized ElementTree object:
<Test xmlns:ns0="http://www.w3.org/2000/09/xmldsig#"><ns0:Signature><ns0:SignedInfo><ns0:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11" /><ns0:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256" /><ns0:Reference URI=""><ns0:Transforms><ns0:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature" /><ns0:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11" /></ns0:Transforms><ns0:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256" /><ns0:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ns0:DigestValue>...
So far we see two problems in the string produced from the ElementTree
object:
- Namespace
ds="http://www.w3.org/2000/09/xmldsig#"
has been changed tons0
. - Its declaration is not under the
<Signature>
element, but under the<Test>
element.
InvalidSignature
The InvalidSignature
exception we are getting is caused by the first problem.
signxml
first does signature validation,
where the signature value (read from the XML string in our case) is calculated over the
<SignedInfo>
XML element (canonicalized, but we can skip this complicated issue for now).
<SignedInfo>
has changed, ds
has been replaced by where-did-it-come-from ns0
, so the
signature does not match.
The solution is register_namespace(prefix, uri)
function of ElementTree
. Its documentation reads:
(...) Tags and attributes in this namespace will be serialized with the given prefix, if at all possible.
That's what we need (it's unclear why ElementTree
is not getting this from our XML string).
The function must be called before verification, not necessarily before the parsing; as the documentation says,
(...) any existing mapping for either the given prefix or the namespace URI will be removed.
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] data = ET.fromstring("<Test/>") signed_root = XMLSigner().sign(data, key=key) data_serialized = lxml_ET.tostring(signed_root) # Sending the data... data_parsed = ET.fromstring(data_serialized) ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#") XMLVerifier().verify(data_parsed, x509_cert=cert)
Now if we inspect ET.tostring(data_parsed)
, we'll see the correct ds
namespace in use.
The result?
Traceback (most recent call last): File "1.py", line 15, in <module> XMLVerifier().verify(data_parsed, x509_cert=cert) File "/path/to/python3.6/site-packages/signxml/__init__.py", line 765, in verify raise InvalidDigest("Digest mismatch for reference {}".format(len(verify_results))) signxml.exceptions.InvalidDigest: Digest mismatch for reference 0
InvalidDigest
Now signxml
has successfully verified <SignatureValue>
over the <SignedInfo>
(in canonicalized form, but again we may skip this topic for now).
The "digest mismatch", about which signxml
is complaining now, is between the calculated digest over the
original XML document (in canonicalized form...) and the digest string read from the <DigestValue>
element under the <Signagure>
. It is caused by the "Problem 2" mentioned above.
DigestValue
is KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=
,
and we can verify that it is made over the string <Test></Test>
:
$ echo -n '<Test></Test>' | openssl sha256 -binary | base64
KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=
But as we see in the deserialized ElementTree object, our XML string became now
<Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>
, which of course produces a different SHA-256 digest.
This unexpected modification of the original XML is an unfortunate result of canonicalization. (TODO: ref to the place in the spec.)
If we want the receiver of our XML string to be able to parse it with ElementTree
and then to successfully verify, we do not have other option but to sign exactly
such string. I.e. we must produce such argument for sign()
which, when canonicalized,
would produce the string <Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>
.
Understanding how to do this requires knowledge about how ElementTree stores its XML objects.
What is important here is that it represents namespaces by string prefixes of the tags under them.
Because of this, if we only declare a namespace, it will not be saved.
We must create some elements under this namespace (and call register_namespace()
before creating the object). Let's create element ds:foo
:
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#") data = ET.fromstring("<wrapper><ds:foo xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\"/><Test/></wrapper>") signed_root = XMLSigner().sign(data, key=key) data_serialized = lxml_ET.tostring(signed_root) # Sending the data... data_parsed = ET.fromstring(data_serialized) XMLVerifier().verify(data_parsed, x509_cert=cert)
Now the ds
namespace is available in the ElementTree object passed to sign()
.
When sign()
is called, it performs canonicalization which puts the namespace
declaration for ds
under the first tag in the document. Addition of the signature,
which brings in this namespace, already does not change anything inside our <Test>
element.
Almost Done
The signagure verifies, but we have added content to the XML we are signing - and
also signed it!
verify().signed_data
will return the content with our additions.
Probably, software which signs things (even XML) with such side effects
would not gain wide acceptance :-)
This is solved - by luck or by intention - with the possibility provided by signxml
to specify the location of the enveloped signature.
As the documentation says,
To specify the location of an enveloped signature within data, insert a
element in data (where “ds” is the “http://www.w3.org/2000/09/xmldsig#” namespace). This element will be replaced by the generated signature, and excised when generating the digest.
Such insert will introduce the ds
namespace into the data,
but no any new element outside of the <ds:Signature>
element.
So we'll not add anything to the content which is signed.
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#") data = ET.fromstring("<Test><ds:Signature xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\" Id=\"placeholder\"></ds:Signature></Test>") signed_root = XMLSigner().sign(data, key=key) data_serialized = lxml_ET.tostring(signed_root) # Sending the data... data_parsed = ET.fromstring(data_serialized) XMLVerifier().verify(data_parsed, x509_cert=cert)
Such signed serialized data can be also parsed by lxml.etree
and successfully verified.
Afterword
After reading this post, you should have no doubts that XML security is broken.