Theory Definitions
section "Logging-independent message anonymity and Restricted Identification"
theory Definitions
imports Main
begin
subsection "Introduction"
text ‹
\emph{Logging-dependent message anonymity} is the property for a message exchanged or otherwise used
in a cryptographic protocol to remain anonymous although the attacker can log the messages generated
or accepted by legitimate agents, and map any two observable messages contained in any such message
to the same agent. An approach to modeling and verifying this security property according to the
\emph{relational method} for formal protocol verification has been described in \<^cite>‹"Noce20"›,
along with the method itself. This approach makes use of two further type constructors for messages,
@{text IDInfo} and @{text Log}, as well as of two functions, @{text crypts} and @{text key_sets}.
Particularly, @{text IDInfo} is used to model message anonymity, while the remaining constants are
used to formalize the property for a message to be observable by the spy within some logged message.
\emph{Logging-independent message anonymity} rather is the property for a given message to remain
anonymous in spite of the attacker's capability of mapping messages of that sort to agents without
resorting to message logging, namely by means of some intrinsic feature of such messages. From the
above observation, it follows that @{text IDInfo} is the sole anonymity-related constant required if
the only kind of anonymity of interest for a given protocol is the logging-independent one, whereas
@{text Log}, @{text crypts}, and @{text key_sets} are unnecessary and can be left out of the model.
It is also possible to include both kinds of anonymity in the model, in which case some protocol
rule will enable the spy to map messages of some sort to agents only if they are observable within
logged messages, while some other protocol rule will enable the spy to do so independently of this
condition.
This paper illustrates how logging-independent message anonymity can be formalized according to the
relational method by considering a real-world protocol, namely the Restricted Identification one by
the BSI \<^cite>‹"BSI16-2"› \<^cite>‹"BSI16-3"›, whose very purpose is to allow for the exchange of
messages endowed with this security property.
›
subsection "Case study: the Restricted Identification protocol"
text ‹
\label{Protocol}
The Restricted Identification protocol enables \emph{user identification tokens} (e.g. electronic
documents) to generate and output unambiguous \emph{pseudonymous identifiers}, distinct for any
given group of terminals of arbitrary granularity, referred to as a \emph{sector}, and usable to
identify the tokens across different sessions taking place within the same sector. For example, such
identifiers allow for the creation of sector-specific revocation lists, at the same time preserving
the anonymity of the holder of any token included in such a list.
This protocol is based on a \emph{Public Key Infrastructure (PKI)} comprising the token issuer,
which owns a \emph{revocation key pair} $(SK_{Rev}, PK_{Rev})$ and generates a \emph{token key pair}
$(SK_{Tok}, PK_{Tok})$ for each token, and sectors, each one endowed with its own \emph{sector key
pair} $(SK_{Sec}, PK_{Sec})$, where $PK_{Sec} = [SK_{Sec}]PK_{Rev}$. This PKI may use either an
integer finite field or an elliptic curve group, as long as the selected domain parameters are
cryptographically secure. In a real-world PKI, each sector has actually two distinct key pairs,
which enables the tokens to generate as many different pseudonymous identifiers per sector, but this
detail is irrelevant to the anonymity of such identifiers and can then be omitted from the model.
According to \<^cite>‹"BSI16-2"› \<^cite>‹"BSI16-3"›, the Restricted Identification protocol may only be
executed using the session keys established via Chip Authentication version 2/3, after performing
Terminal Authentication version 2 with a terminal certificate containing both sector public keys'
hashes (including domain parameters), whose authenticity is ensured by the certificate's signature.
After requesting to start the protocol, which again is irrelevant and can be left out of the model,
the terminal sends either of its sector public keys $PK_{Sec}$ (including domain parameters) to the
token, which in turn verifies that $PK_{Sec}$'s hash matches the one contained in the certificate
and replies with its pseudonymous identifier $H([SK_{Tok}]PK_{Sec})$, where $H$ is a hash function.
If necessary (for instance, to insert it into a sector-specific revocation list), this identifier
can be recomputed in the external world as $H([SK_{Sec}]([SK_{Rev}]PK_{Tok}))$, with the concurrence
of both the token issuer and the entity responsible for the involved sector.
As a matter of fact, since the only purpose of the protocol model to be developed is to verify the
logging-independent anonymity of token pseudonymous identifiers, without any confidentiality or
authenticity concern, it is sufficient to model a simpler protocol in which the terminal and the
token exchange their messages in plain, without using any session key. Moreover, since both Chip and
Terminal Authentication are out of scope, the Restricted Identification protocol will be modeled as
a stand-alone one. Consequently, although both of them are exchanged during Terminal Authentication
in the real-world protocol, $PK_{Sec}$'s signature will be assumed to be exchanged with $PK_{Sec}$
in the model, while the related verification key will be assumed to be known by the token a priori.
The hash function used to sign $PK_{Sec}$ may differ from the one used to compute token pseudonymous
identifiers, but once more, this is just an omissible functional detail.
A further simplification, admissible for the same reason, is to let the token use domain parameters
known a priori rather than the input ones, whose presence in the input message can then be left out.
Indeed, this prevents the spy from snatching $SK_{Tok}$ by making an element of a smaller group pass
for an authentic sector public key, which could be done by signing it with a compromised signature
generation key. For example, if the PKI used a group of 128-bit order, $SK_{Tok}$ could be disclosed
by first searching the private key $SK'_{Tok}$ associated with a fake identifier ranging in a group
of 64-bit order $n$, and then detecting $SK_{Tok}$ as the unique private key associated with a given
genuine identifier among all those differing from $SK'_{Tok}$ by a multiple of $n$. So, two searches
within as many spaces of $2^{64}$ elements, which is a computationally feasible task nowadays, would
suffice to find $SK_{Tok}$. However, such \emph{small group attacks} can safely be ruled out as long
as the initial state @{text s⇩0} comprises an arbitrary set of compromised token private keys, given
that verifying the conditions under which these keys remain secret is out of scope.
As a result of all the simplifications described above, the protocol that is going to be modeled is
as follows.
▸ Terminal $\rightarrow$ Token: \tabto{10em} $\{PK_{Sec}, \{H(PK_{Sec})\}_{SK_{Sign}}\}$\\
The terminal sends the token a message consisting of its sector public key and a precomputed
signature of this key.\\
▸ Token $\rightarrow$ Terminal: \tabto{10em} $H([SK_{Tok}]PK_{Sec})$\\
The token verifies that the hash of the received public key matches the signed one, and then
replies with the pseudonymous identifier resulting from this key.
Unless it is compromised by means other than attacking the protocol, the anonymity of a given token
pseudonymous identifier $H([SK_{Tok}]PK_{Sec})$ is expected to be vulnerable if and only if either
$SK_{Tok}$ is anonymity-compromised, or $SK_{Sec}$ is compromised and there is another compromised
sector private key $SK'_{Sec}$ such that $H([SK_{Tok}]PK'_{Sec})$ is anonymity-compromised. In fact,
the spy can detect the use of $SK_{Tok}$ in $H([SK_{Tok}]PK_{Sec})$ in the former case, whereas in
the latter one he can map $H([SK_{Tok}]PK_{Sec})$ to the same token as $H([SK_{Tok}]PK'_{Sec})$ by
recognizing that $[SK_{Tok}]PK_{Sec} = [SK_{Sec}\times(SK'_{Sec})^{-1}]([SK_{Tok}]PK'_{Sec})$.
The purpose of the following formal development is precisely to formally prove the correctness of
this expectation. In more detail, the \emph{only if} conditional implied by the previous statement
will be proven as an \emph{anonymity property} in the next section, while the \emph{if} conditional
will be proven in the form of two \emph{possibility properties} in the subsequent one. Since both
relevant attack options leverage the intrinsic features of token pseudonymous identifiers, namely
the private keys used to generate them, logging-independent message anonymity has to be considered
rather than logging-dependent one.
For further information about the formal definitions and proofs contained in this paper, see
Isabelle documentation, particularly \<^cite>‹"Paulson21"›, \<^cite>‹"Nipkow21"›, \<^cite>‹"Krauss21"›,
and \<^cite>‹"Nipkow11"›.
›
subsection "Agents, messages, protocol rules"
text ‹
Agents consist of an infinite population of tokens and sectors, identified through natural numbers,
plus the spy. Actually, the model can safely ignore terminals altogether and assume that tokens are
presented to sectors as a whole, since all the terminal-side messages used in the protocol refer to
sectors rather than to individual terminals. The only possible exceptions are signature key pairs.
In fact, although the entity signing terminal certificates is the same for every terminal in a given
sector (in \<^cite>‹"BSI16-2"› and \<^cite>‹"BSI16-3"›, it is named \emph{Document Verifier}), nothing
prevents that entity to use distinct signature generation keys for different terminals (for example,
if their certificates are issued at different times). Nonetheless, the granularity of signature key
pairs is irrelevant to the anonymity of token pseudonymous identifiers according to the previous
considerations, so these key pairs can be associated with sectors as well.
As opposed to what happens in \<^cite>‹"Noce20"›, there is no correlation here between any two agents
@{text "Token n"} and @{text "Sector n"} marked with the same numeric identifier @{text n}. In fact,
tokens and sectors are independent of each other, and any token may be presented to whatever sector.
\null
›
type_synonym agent_id = nat