XML Signatures and Python ElementTree
I just need to sign XML...
Python has a standard library module to handle XML, and there seems to be exactly one library for the signing part: signxml
.
So it should be straighforward:
from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] xml_obj = ET.fromstring("<Test/>") signed_xml_obj = XMLSigner().sign(xml_obj, key=key) XMLVerifier().verify(signed_xml_obj, x509_cert=cert)
Simple?
But the receiver tells you that your signature does not verify?
The solution code is at the end of the article. The rest explains what is happening.
An optional parameter of the sign()
method specifies type of the XML signature, which
can be enveloped, enveloping, or detached. This article covers only the default
case of enveloped signature.
...and send it forward
When you sign something, usually it is for the purpose of verifying the signature by someone else. Unless that someone else has access to your Python object, we need to serialize the latter:
data_serialized = ET.tostring(signed_xml_obj) # Sending the data... XMLVerifier().verify(data_serialized, x509_cert=cert)
And this is not simple any more, because verification of the serialized data fails:
Traceback (most recent call last): File "/path/to/python3.6/site-packages/signxml/__init__.py", line 729, in verify verify(signing_cert, raw_signature, signed_info_c14n, signature_digest_method) File "/path/to/python3.6/site-packages/OpenSSL/crypto.py", line 2928, in verify _raise_current_error() File "/path/to/python3.6/site-packages/OpenSSL/_util.py", line 54, in exception_from_error_queue raise exception_type(errors) OpenSSL.crypto.Error: [('rsa routines', 'int_rsa_verify', 'bad signature')] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "1.py", line 13, in <module> XMLVerifier().verify(data_serialized, x509_cert=cert) File "/path/to/python3.6/site-packages/signxml/__init__.py", line 735, in verify raise InvalidSignature("Signature verification failed: {}".format(reason)) signxml.exceptions.InvalidSignature: Signature verification failed: bad signature
So, did the serialization break the signature?
"When nothing helps, read the instructions". Documentation of signxml
tells that
the return value of sign()
is an lxml.etree.Element
object, not an xml.etree.ElementTree
object.
Next try is the serialization with lxml
:
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] xml_obj = ET.fromstring("<Test/>") signed_xml_obj = XMLSigner().sign(xml_obj, key=key) data_serialized = lxml_ET.tostring(signed_xml_obj) # Sending the data... XMLVerifier().verify(data_serialized, x509_cert=cert)
The moment of happiness, this works. Simple?
Good if this works for you. But we are sending the serialized XML somewhere faraway and the receiver might first deserialize it, and verify the signature after that on a deserialized version. (Which is not smart, but is probably out of your control.) And do it with ElementTree.
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] xml_obj = ET.fromstring("<Test/>") signed_xml_obj = XMLSigner().sign(xml_obj, key=key) data_serialized = lxml_ET.tostring(signed_xml_obj) # Sending the data... data_parsed = ET.fromstring(data_serialized) XMLVerifier().verify(data_parsed, x509_cert=cert)
And this fails with the same InvalidSignature
. Bump. Now reading the instructions does not help.
Debugging starts
We can check that the string representation of our signed data "looks right" (although you never can be sure with the XML).
Compare it to ElementTree.tostring(data_parsed)
to note that they are not identical.
We do not know yet what is inside ElementTree object, but in its serialized string the signature will fail:
Sent string:
<Test><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11"/><ds:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256"/><ds:Reference URI=""><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"/><ds:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11"/></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256"/><ds:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ds:DigestValue>....
De-serialized ElementTree object:
<Test xmlns:ns0="http://www.w3.org/2000/09/xmldsig#"><ns0:Signature><ns0:SignedInfo><ns0:CanonicalizationMethod Algorithm="http://www.w3.org/2006/12/xml-c14n11" /><ns0:SignatureMethod Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256" /><ns0:Reference URI=""><ns0:Transforms><ns0:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature" /><ns0:Transform Algorithm="http://www.w3.org/2006/12/xml-c14n11" /></ns0:Transforms><ns0:DigestMethod Algorithm="http://www.w3.org/2001/04/xmlenc#sha256" /><ns0:DigestValue>KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=</ns0:DigestValue>...
So far we see two problems in the string produced from the ElementTree
object:
- Namespace name
ds
has been changed tons0
. - Its declaration is not under the
<Signature>
tag, but under the<Test>
tag.
InvalidSignature
The InvalidSignature
exception we are getting is caused by the first problem.
signxml
first does signature validation,
where the signature value (read from the XML string in our case) is calculated over the
<SignedInfo>
XML element (canonicalized, but we can skip this complicated issue for now).
<SignedInfo>
has changed as ds
has been replaced by where-did-it-come-from ns0
, so the
signature does not match.
The solution is register_namespace(prefix, uri)
function of ElementTree
. Its documentation reads:
(...) Tags and attributes in this namespace will be serialized with the given prefix, if at all possible.
That's what we need (the author needs to understand why ElementTree
is not made to get this from our XML string).
The function must be called before verification, not necessarily before the parsing; as the documentation says,
(...) any existing mapping for either the given prefix or the namespace URI will be removed.
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] xml_obj = ET.fromstring("<Test/>") signed_xml_obj = XMLSigner().sign(xml_obj, key=key) data_serialized = lxml_ET.tostring(signed_xml_obj) # Sending the data... data_parsed = ET.fromstring(data_serialized) ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#") XMLVerifier().verify(data_parsed, x509_cert=cert)
Now if we inspect ET.tostring(data_parsed)
, we'll see the correct ds
namespace in use.
The result?
Traceback (most recent call last): File "1.py", line 15, in <module> XMLVerifier().verify(data_parsed, x509_cert=cert) File "/path/to/python3.6/site-packages/signxml/__init__.py", line 765, in verify raise InvalidDigest("Digest mismatch for reference {}".format(len(verify_results))) signxml.exceptions.InvalidDigest: Digest mismatch for reference 0
InvalidDigest
Now signxml
has successfully verified <SignatureValue>
over the <SignedInfo>
(in canonicalized form, but again we may skip this topic for now).
The "digest mismatch", about which signxml
is complaining now, is between the calculated digest over the
original XML document (in canonicalized form...) and the digest string read from the <DigestValue>
element under the <Signagure>
. It is caused by the "Problem 2" mentioned above.
DigestValue
is KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=
,
and it was, in fact, made over the string <Test></Test>
:
$ echo -n '<Test></Test>' | openssl sha256 -binary | base64
KP3ncf09YSgkeTt+i4PR+W0AMvUTo7M8gu0z15piPMc=
But in the deserialized ElementTree object the XML string became
<Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>
, which of course produces a different SHA-256 digest.
It is possible to see (with a debugger) that verify()
tries to match the digest of this string.
Such modification of the original XML looks like result of some of canonicalization transforms, but I have not found it in the specs. The examples given show that the namespace declarations are not moved out of the tag inside which they are used.
If we want the receiver of our XML string to be able to parse it with ElementTree
and then to successfully verify, we have no other option but to sign exactly
such string. I.e. we must produce such argument for sign()
which, when canonicalized,
produces the string <Test xmlns:ds="http://www.w3.org/2000/09/xmldsig#"></Test>
.
Understanding how to do this requires knowledge about how ElementTree stores its XML objects.
What is important here is that it represents namespaces by string prefixes of the tags under them.
Because of this, if we only declare a namespace, it will not be saved.
We must create some elements under this namespace and call register_namespace()
.
Let's create element ds:foo
. To be valid XML, the string must represent just
one root XML element, so add an element <wrapper>
:
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#") xml_obj = ET.fromstring("<wrapper><ds:foo xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\"/><Test/></wrapper>") signed_xml_obj = XMLSigner().sign(xml_obj, key=key) data_serialized = lxml_ET.tostring(signed_xml_obj) # Sending the data... data_parsed = ET.fromstring(data_serialized) XMLVerifier().verify(data_parsed, x509_cert=cert)
Now the ds
namespace is available in the ElementTree object passed to sign()
.
When sign()
is called, it performs canonicalization which puts the namespace
declaration for ds
under the first tag in the document. Addition of the signature,
which brings in this namespace, already does not change anything inside our <Test>
element.
Almost Done
The signagure verifies, but we have added content to the XML we are signing - and
also signed it!
verify().signed_data
will return the content with our additions.
Probably, software which signs things (even XML) with such side effects
would not gain wide acceptance :-)
This is solved - by luck or by intention - with the possibility provided by signxml
to specify the location of the enveloped signature.
As the documentation says,
To specify the location of an enveloped signature within data, insert a
element in data (where “ds” is the “http://www.w3.org/2000/09/xmldsig#” namespace). This element will be replaced by the generated signature, and excised when generating the digest.
Such insert will introduce the ds
namespace into the xml_obj
,
but no any new elements outside of the <ds:Signature>
element.
So we'll not add anything to the content which is signed.
from lxml import etree as lxml_ET from xml.etree import ElementTree as ET from signxml import XMLSigner, XMLVerifier cert, key = [open(f, "rb").read() for f in ("cert.pem", "key.pem")] ET.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#") xml_obj = ET.fromstring("<Test><ds:Signature xmlns:ds=\"http://www.w3.org/2000/09/xmldsig#\" Id=\"placeholder\"></ds:Signature></Test>") signed_xml_obj = XMLSigner().sign(xml_obj, key=key) data_serialized = lxml_ET.tostring(signed_xml_obj) # Sending the data... data_parsed = ET.fromstring(data_serialized) XMLVerifier().verify(data_parsed, x509_cert=cert)
Such signed serialized data can be also parsed by lxml.etree
and successfully verified.
Do not forget that if the receiver is using ElementTree, they must call
ElementTree.register_namespace("ds", "http://www.w3.org/2000/09/xmldsig#")
before calling XMLSigner().verify()
.
Afterword
After reading this post, you should have no doubts that XML signing is broken.