Theory M_05_Proofs_Ontologies

(*************************************************************************
 * Copyright (C) 
 *               2019-2022 University of Exeter 
 *               2018-2022 University of Paris-Saclay
 *               2018      The University of Sheffield
 *
 * License:
 *   This program can be redistributed and/or modified under the terms
 *   of the 2-clause BSD-style license.
 *
 *   SPDX-License-Identifier: BSD-2-Clause
 *************************************************************************)

(*<*)
theory 
  "M_05_Proofs_Ontologies"
  imports 
    "M_04_Document_Ontology" 
begin

(*>*)

chapter*[onto_proofs::technical]‹Proofs over Ontologies›

text‹It is a distinguishing feature of dof that it does not directly generate meta-data rather
than generating a ‹theory of meta-data› that can be used in HOL-terms on various levels
of the Isabelle-system and its document generation technology. Meta-data theories can be 
converted into executable code and efficiently used in validations, but also used for theoretic
reasoning over given ontologies. While the full potential of this latter possibility still
needs to be explored, we present in the following sections two applications:

   Verified ontological morphisms, also called ‹ontological mappings› in the literature
    cite"books/daglib/0032976" and "atl" and "BGPP95", ie proofs of invariant preservation
    along translation-functions of all instances of ‹doc_class›-es.
   Verified refinement relations between the structural descriptions of theory documents,
    ie proofs of language inclusions of monitors of global ontology classes.   
›

section*["morphisms"::scholarly_paper.text_section] ‹Proving Properties over Ontologies›

subsection‹Ontology-Morphisms: a Prototypical Example›

text‹We define a small ontology with the following classes:›

doc_class AA = aa :: nat
doc_class BB = bb :: int
doc_class CC = cc :: int

doc_class DD = dd :: int
doc_class EE = ee :: int
doc_class FF = ff :: int

onto_morphism (AA, BB) to CC  and (DD, EE) to FF
  where "convertAA×BBCC σ =  CC.tag_attribute = 1::int, 
                                           CC.cc = int(aa (fst σ)) + bb (snd σ)"
  and   "convertDD×EEFF σ =  FF.tag_attribute = 1::int, 
                                           FF.ff = dd (fst σ) + ee (snd σ) "

text‹Note that the termconvertAA×BBCC-morphism involves a data-format conversion, and that the
resulting transformation of @{doc_class AA}-instances and @{doc_class BB}-instances is surjective 
but not injective. The termCC.tag_attribute is used to potentially differentiate instances with
equal attribute-content and is irrelevant here.›

(*<*) (* Just a test, irrelevant for the document.*)

doc_class A_A = aa :: nat
doc_class BB' = bb :: int
onto_morphism (A_A, BB', CC, DD, EE) to FF
  where "convertA_A×BB'×CC×DD×EEFF σ =  FF.tag_attribute = 1::int, 
                                      FF.ff = int(aa (fst σ)) + bb (fst (snd σ))"

(*>*)

text‹This specification construct  introduces the following constants and definitions:
    @{term [source] convertAA_BB_CC :: AA × BB  CC}
    @{term [source] convertDD_EE_FF :: DD × EE  FF}
   % @{term [source] convertA_A×BB'×CC×DD×EEFF :: A_A × BB' × CC × DD × EE  FF}

and corresponding definitions. ›

subsection‹Proving the Preservation of Ontological Mappings : A Document-Ontology Morphism›

textdof as a system is currently particularly geared towards ‹document›-ontologies, in 
particular for documentations generated from Isabelle theories. We used it meanwhile for the
generation of various conference and journal papers, notably using the
 theoryIsabelle_DOF.scholarly_paper and  theoryIsabelle_DOF.technical_report-ontologies, 
targeting a (small) variety of LaTeX style-files. A particular aspect of these ontologies,
especially when targeting journals from publishers such as ACM, Springer or Elsevier, is the
description of publication meta-data. Publishers tend to have their own styles on what kind 
meta-data should be associated with a journal publication; thus, the depth and 
precise format of affiliations, institution, their relation to authors, and author descriptions
(with photos or without, hair left-combed or right-combed, etcpp...)  varies.

In the following, we present an attempt to generalized ontology with several ontology mappings 
to more specialized ones such as concrete journals and/or the  theoryIsabelle_DOF.scholarly_paper-
ontology which we mostly used for our own publications. 
›


doc_class elsevier_org =
  organization :: string
  address_line :: string
  postcode :: int
  city :: string
  
doc_class acm_org =
  position :: string
  institution :: string
  department :: int
  street_address :: string
  city :: string
  state :: int
  country :: string
  postcode :: int

doc_class lncs_inst =
  institution :: string

doc_class author =
    name   :: string
    email  :: "string"   <= "''''"
    invariant ne_fsnames :: "name σ  ''''"

doc_class elsevierAuthor = "author" +
  affiliations :: "elsevier_org list"
  firstname   :: string
  surname     :: string
  short_author :: string
  url :: string
  footnote :: string
  invariant ne_fsnames :: "firstname σ  ''''  surname σ  ''''"

(*<*)
text*[el1:: "elsevier_org"]‹An example elsevier-journal affiliation.›
term*@{elsevier_org ‹el1›}
text*[el_auth1:: "elsevierAuthor", affiliations = "[@{elsevier_org ‹el1›}]"]‹›
(*>*)
text‹›

doc_class acmAuthor = "author" +
  affiliations :: "acm_org list"
  firstname   :: string
  familyname     :: string
  orcid :: int
  footnote :: string
  invariant ne_fsnames :: "firstname σ  ''''  familyname σ  ''''"

(*<*)
text*[acm1:: "acm"]‹An example acm-style affiliation›
(*>*)
text‹›

doc_class lncs_author = "author" +
  affiliations :: "lncs list"
  orcid :: int
  short_author :: string
  footnote :: string

(*<*)
text*[lncs1:: "lncs"]‹An example lncs-style affiliation›
text*[lncs_auth1:: "lncs_author", affiliations = "[@{lncs ‹lncs1›}]"]‹Another example lncs-style affiliation›
find_theorems elsevier.tag_attribute
(*>*)
text‹›

definition acm_name where "acm_name f s = f @ '' '' @ s"

fun concatWith :: "string  string list  string"
  where "concatWith str [] = ''''"
       |"concatWith str [a] = a"
       |"concatWith str (a#R) = a@str@(concatWith str R)"

lemma concatWith_non_mt : "(S[]  ( sset S. s''''))   (concatWith sep S)  ''''"
proof(induct S)
  case Nil
  then show ?case by simp
next
  case (Cons a S)
  then show ?case by (cases a; cases S; auto)
qed

onto_morphism (acm) to elsevier
  where "convertacmelsevier σ =  
               elsevier.tag_attribute = acm.tag_attribute σ, 
                organization = acm.institution σ,
                address_line = concatWith '','' [acm.street_address σ, acm.city σ], 
                postcode     =  acm.postcode σ , 
                city         =  acm.city σ "

text‹Here is a more basic, but equivalent definition for the other way round:›

definition elsevier_to_acm_morphism :: "elsevier_org  acm_org"
          ("_ ⟨acm⟩elsevier" [1000]999)
          where "σ ⟨acm⟩elsevier =  acm_org.tag_attribute = 1::int,
                                   acm_org.position = ''no position'',
                                   acm_org.institution = organization σ,
                                   acm_org.department = 0,
                                   acm_org.street_address = address_line σ,
                                   acm_org.city = elsevier_org.city σ,
                                   acm_org.state = 0,
                                   acm_org.country = ''no country'',
                                   acm_org.postcode = elsevier_org.postcode σ  "

text‹The following onto-morphism links typelsevierAuthor's and typacmAuthor. Note that
the conversion implies trivial data-conversions (renaming of attributes in the classes), 
string-representation conversions, and conversions of second-staged, referenced instances.›

onto_morphism (elsevierAuthor) to acmAuthor
  where "convertelsevierAuthoracmAuthor σ =  
               author.tag_attribute = undefined, 
                name = concatWith '','' [elsevierAuthor.firstname σ,elsevierAuthor.surname σ], 
                email = author.email σ,
                acmAuthor.affiliations = (elsevierAuthor.affiliations σ)
                                         |> map (λx. x ⟨acm⟩elsevier),
                firstname = elsevierAuthor.firstname σ, 
                familyname = elsevierAuthor.surname σ,
                orcid = 0,  ― ‹la triche ! ! !›
                footnote = elsevierAuthor.footnote σ"


lemma elsevier_inv_preserved :
  "elsevierAuthor.ne_fsnames_inv σ 
    acmAuthor.ne_fsnames_inv (convertelsevierAuthoracmAuthor σ) 
        author.ne_fsnames_inv (convertelsevierAuthoracmAuthor σ)"
  unfolding elsevierAuthor.ne_fsnames_inv_def acmAuthor.ne_fsnames_inv_def
            convertelsevierAuthor_acmAuthor_def author.ne_fsnames_inv_def
  by auto  

text‹The proof is, in order to quote Tony Hoare, ``as simple as it should be''. Note that it involves
the lemmas like @{thm concatWith_non_mt} which in itself require inductions, ie, which are out of
reach of pure ATP proof-techniques. ›

subsection‹Proving the Preservation of Ontological Mappings : A Domain-Ontology Morphism›
text‹The following example is drawn from a domain-specific scenario: For conventional data-models,
be it represented by UML-class diagrams or SQL-like "tables" or Excel-sheet like presentations
of uniform data, we can conceive an ontology (which is equivalent here to a conventional style-sheet)
and annotate textual raw data. This example describes how meta-data can be used to 
calculate and transform this kind of representations in a type-safe and verified way. ›

(*<*)
(* Mapped_PILIB_Ontology example *)
(* rethink examples: should we "morph" providence too ? ? ? Why not ? bu *)

termfold (+) S 0  

definition sum 
  where "sum S = (fold (+) S 0)"
(*>*)

text‹We model some basic enumerations as inductive data-types: ›
datatype Hardware_Type = 
  Motherboard     |  Expansion_Card |  Storage_Device |   Fixed_Media | 
  Removable_Media | Input_Device    | Output_Device

datatype Software_Type =
  Operating_system |  Text_editor | Web_Navigator |  Development_Environment

text‹In the sequel, we model a ''Reference Ontology'', ie a data structure in which we assume
that standards or some de-facto-standard data-base refer to the data in the domain of electronic
devices:›

onto_class Resource =
  name :: string

onto_class Electronic = Resource +
  provider :: string
  manufacturer :: string

onto_class Component = Electronic +
  mass :: int

onto_class Simulation_Model = Electronic +
  simulate :: Component
  composed_of :: "Component list" 
  version :: int

onto_class Informatic = Resource +
  description :: string

onto_class Hardware = Informatic +
  type :: Hardware_Type
  mass :: int
  composed_of :: "Component list" 
  invariant c1 :: "mass σ = sum(map Component.mass (composed_of σ))"

onto_class Software = Informatic +
  type ::  Software_Type
  version :: int

text‹Finally, we present a ‹local ontology›,  ie a data structure used in a local store
in its data-base of cash-system:›

onto_class Item =
  name :: string

onto_class Product = Item +
  serial_number :: int
  provider :: string
  mass :: int

onto_class Electronic_Component = Product +
  serial_number :: int

onto_class Monitor = Product +
  composed_of :: "Electronic_Component list" 
  invariant c2 :: "Product.mass σ = sum(map Product.mass (composed_of σ))"

termProduct.mass σ = sum(map Product.mass (composed_of σ))

onto_class Computer_Software = Item +
  type ::  Software_Type
  version :: int

text‹These two example ontologies were linked via conversion functions called ‹morphisms›.
The hic is that we can prove for the morphisms connecting these ontologies, that the conversions
are guaranteed to preserve the data-invariants, although the data-structures (and, of course,
the presentation of them) is very different. Besides, morphisms functions can be ``forgetful''
(ie surjective), ``embedding'' (ie injective) or even ``one-to-one'' ((ie bijective).›

definition Item_to_Resource_morphism :: "Item  Resource"
          ("_ ⟨Resource⟩Item" [1000]999)
          where "  σ ⟨Resource⟩Item = 
                                    Resource.tag_attribute = 1::int ,
                                    Resource.name = name σ " 

definition Product_to_Resource_morphism :: "Product  Resource"
           ("_ ⟨Resource⟩Product" [1000]999)
           where "  σ ⟨Resource⟩Product = 
                                 Resource.tag_attribute = 2::int ,
                                  Resource.name = name σ " 

definition Computer_Software_to_Software_morphism :: "Computer_Software  Software" 
             ("_ ⟨Software⟩SoftCmp" [1000]999)
             where "σ ⟨Software⟩SoftCmp = 
                                 Resource.tag_attribute = 3::int ,
                                  Resource.name = name σ ,
                                  Informatic.description = ''no description'', 
                                  Software.type  = type  σ ,
                                  Software.version = version σ "

definition Electronic_Component_to_Component_morphism :: "Electronic_Component  Component" 
             ("_ ⟨Component⟩ElecCmp" [1000]999)
             where "σ ⟨Component⟩ElecCmp = 
                                 Resource.tag_attribute = 4::int ,
                                  Resource.name = name σ ,
                                  Electronic.provider = provider  σ  ,
                                  Electronic.manufacturer = ''no manufacturer'' ,
                                  Component.mass = mass  σ "

definition Monitor_to_Hardware_morphism :: "Monitor  Hardware"
           ("_ ⟨Hardware⟩ComputerHardware" [1000]999)
           where "σ ⟨Hardware⟩ComputerHardware =
                    Resource.tag_attribute = 5::int ,
                     Resource.name = name σ ,
                     Informatic.description = ''no description'', 
                     Hardware.type = Output_Device,
                     Hardware.mass = mass σ ,
                     Hardware.composed_of = map  Electronic_Component_to_Component_morphism (composed_of σ)
                   " 

text‹On this basis, we can state the following invariant preservation theorems:›

lemma inv_c2_preserved :
  "c2_inv σ  c1_inv (σ ⟨Hardware⟩ComputerHardware)"
  unfolding c1_inv_def c2_inv_def 
            Monitor_to_Hardware_morphism_def Electronic_Component_to_Component_morphism_def
  by (auto simp: comp_def)

lemma Monitor_to_Hardware_morphism_total :
  "Monitor_to_Hardware_morphism ` ({X::Monitor. c2_inv X})  ({X::Hardware. c1_inv X})"
  using inv_c2_preserved 
  by auto

type_synonym local_ontology = "Item * Electronic_Component * Monitor"
type_synonym reference_ontology = "Resource * Component * Hardware"

fun ontology_mapping :: "local_ontology  reference_ontology" 
  where "ontology_mapping (x, y, z) = (x⟨Resource⟩Item, y⟨Component⟩ElecCmp, z⟨Hardware⟩ComputerHardware)" 

lemma ontology_mapping_total :
        "ontology_mapping ` {X.  c2_inv (snd (snd X))}  {X.  c1_inv (snd (snd X))}"
  using  inv_c2_preserved 
  by auto

text‹Note that in contrast to conventional data-translations, the preservation of a class-invariant
is not just established by a validation of the result, it is proven once and for all for all instances
of the classes.›

subsection‹Proving Monitor-Refinements›

(*<*)
(* switch on regexp syntax *)
notation Star  ("(_)*" [0]100)
notation Plus  (infixr "||" 55)
notation Times (infixr "~~" 60)
notation Atom  ("_" 65)
(*>*)


text‹Monitors are regular-expressions that allow for specifying instances of classes to appear in 
a particular order in a document. They are used to specify some structural aspects of a document.
Based on an AFP theory by Tobias Nipkow on Functional Automata
(ie a characterization of regular automata using functional polymorphic descriptions of transition
functions avoiding any of the ad-hoc finitizations commonly used in automata theory), which 
comprises also functions to generate executable deterministic and non-deterministic automata,
this theory is compiled to SML-code that was integrated in the dof system. The necessary
adaptions of this integration can be found in the theory theoryIsabelle_DOF.RegExpInterface,
which also contains the basic definitions and theorems for the concepts used here.

Recall that the monitor of termscholarly_paper.article is defined by:  vs‹0.5cm›

@{thm [indent=20, margin=70, names_short] scholarly_paper.article_monitor_def}

 vs‹0.5cm› However, it is possible to reason over the language of monitors and prove classical 
refinement notions such as trace-refinement on the monitor-level, so once-and-for-all for all 
instances of validated documents conforming to a particular ontology. The primitive recursive 
operators termRegExpInterface.Lang and termRegExpInterface.Lsub generate the languages of the
regular expression language, where termLsub takes the sub-ordering relation of classes into 
account. 

The proof of : vs‹0.5cm›

  @{thm [indent=20,margin=70,names_short] articles_sub_reports}

 vs‹0.5cm› can be found in theory  theoryIsabelle_DOF.technical_report; 
it is, again, "as simple as it should be" (to cite Tony Hoare). 

The proof of: vs‹0.5cm›

  @{thm [indent=20,margin=70,names_short] articles_Lsub_reports}

 vs‹0.5cm› is slightly more evolved; this is due to the fact that dof does not generate a proof of 
the acyclicity of the graph of the class-hierarchy @{term doc_class_rel} automatically. For a given 
hierarchy, this proof will always succeed (since  dof checks this on the meta-level, of course), 
which permits to deduce the anti-symmetry of the transitive closure of @{term doc_class_rel} 
and therefore to establish that the doc-classes can be organized in an order (ie the
type typdoc_class is an instance of the type-class classorder). On this basis, the proof
of the above language refinement is quasi automatic. This proof is also part of 
 theoryIsabelle_DOF.technical_report.
›

(*<*)

(* switch off regexp syntax *)
no_notation Star  ("(_)*" [0]100)
no_notation Plus  (infixr "||" 55)
no_notation Times (infixr "~~" 60)
no_notation Atom  ("_" 65)

end
(*>*)