Structuring, representing and linking data
Contents
- Local files
- Structuring and representing data
- W3C and related
- Multiple roots
- Mark-up - HTML, XML, XBRL
- Wikipedia - DBPedia, WikiData, ...
- Semantics - RDF - Turtle - SPARQL - OWL - SKOS - DC
- Ontologies
- Overviews
- W3C - general
- W3C - CA and provenance
- Well-known - DC, FOAF, ...
- Ontochain, graphchain, ...
- Upper level ontos - Schema.org, gist, Dolce, BFO, UFO, ...
- Ontology matching
- Financial ontologies - FIBO, GLEIS and GLEIF, ...
- Other well-known ontologies
- Knowledge Graphs
- Education
- Publication
- Privacy
- Pharma and Healthcare
- Trust
- Available data with semantic indices
- Ontology design patterns
- Lexicons/Thesauri
- EC ISA Semic and Publication Office
- Linking data
- Tools
- Sundry
Local files
See also local files (GrayTiger):
Structuring and representing data
W3C and related
W3C - WWW
Security and cryptography
General
- W3C Security Vocabulary (draft) - originated in web payments, 2016, addresses crypto, GraphSignature, LinkedDataSignature
- W3C Web of Things security ontology
- W3C Web CryptoAPI (for usage under a browser, as an alternative for an applet)
- W3C WebAuthn API for public key-based credentials by web applications
- Defines an API enabling the creation and use of strong, attested, scoped, public key-based credentials by web applications,
for the purpose of strongly authenticating users. Conceptually, one or more public key credentials,
each scoped to a given WebAuthn Relying Party, are created by and bound to authenticators as requested by the web application.
The user agent mediates access to authenticators and their public key credentials in order to preserve user privacy.
Authenticators are responsible for ensuring that no operation is performed without user consent.
Authenticators provide cryptographic proof of their properties to Relying Parties via attestation.
This specification also describes the functional model for WebAuthn conformant authenticators,
including their signature and attestation functionality.
- W3C WebAuthn Wikipedia
Verifiable Credentials
Multiple roots, graphs, blockchain
Applications
See also local files.
Application development
- LinkedData.org - a home for, or pointers to, resources from across the Linked Data community
Application development
Jena
Other
Mark-up languages
HTML
XML
The W3C - XML Recommendation states: This document specifies a syntax created by subsetting an existing, widely used international text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web. XML focuses on syntax, not on semantics.
To aid with the interpretation of an XML document, additional data can be provided in an XML schema. This is a description of a type of XML document, in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. XML schemas are expressed through
- the document type definition (DTD) language, which is native to the XML specification, a schema language that is of relatively limited capability
- XML Schema (with a capital S)
- RELAX NG
The XML Schema Definition is commonly referred to as XSD.
XML basics
- XML local files
- Java-XML local info, parsing, Xerces, SAX, DOM, ...
- W3C - XML Recommendation
- XML - Wikipedia
- XML Schema - W3C
- XML Schema - Wikipedia
- XML Schema primer- W3C
- Basic Concepts: how to declare the elements and attributes that appear in XML documents, the distinctions between simple and complex types, defining complex types, the use of simple types for element and attribute values, schema annotation, a simple mechanism for re-using element and attribute definitions, and nil values.
- Advanced Concepts I: Namespaces, Schemas & Qualification, the first advanced section in the primer, explains the basics of how namespaces are used in XML and schema documents.
- Advanced Concepts II: deriving types from existing types, and for controlling these derivations. The section also describes mechanisms for merging together fragments of a schema from multiple sources, and for element substitution
- Advanced Concepts III: a mechanism for specifying uniqueness among attributes and elements, a mechanism for using types across namespaces, a mechanism for extending types based on namespaces, and a description of how documents are checked for conformance.
- Appendices provide reference information on simple types and a regular expression language.
The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se -- they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset "Information Items".
An instance is not required to reference a schema.
XML further information
XML applications
Process definitions in XML
XBRL - the XML Business Reporting Language
Check out IETF's CNRP (Common Name Resolution Protocol), which can coded be in XML. Can be used e.g. to request a statement of accounts.
XBRL and iXBRL specification
XBRL taxonomies
XBRL usage
XBRL service providers
Wikipedia - DBPedia, WikiData, ...
Local data
Wikipedia
The free encyclopedia, hosted by the Wikimedia Foundation.
Wikimedia
A global movement whose mission is to bring free educational content to the world.
- wikimedia.org
- meta.wikimedia.org - MetaWiki - the global community site for the Wikimedia Foundation's projects
- wikitech.wikimedia.org - technology - cloud, tooling, replicas, reliability engineering
- wikimedia's infrastructure - data centres, networking, routing, mediawiki, searching, performance, engineering, ...
Mediawiki software
The software that powers a.o. Wikipedia.
Wikidata
Concepts
Acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.
The Wikidata repository consists mainly of items, each one having a label, a description and any number of aliases.
Statements describe detailed characteristics of an Item and consist of a property and a value.
Furthermore
- Glossary - terminology of items, properties, ...
- wikidata help
- L Lexemes contain lexicographical data, which is data about words or phrases, such as language, etymology, inflections, etc. E.g. L7 cat en, language: English, Lexical category: noun
- Q Items are uniquely identified by a Q followed by a number, such as Douglas Adams (Q42).
- P Properties have a P followed by a number, such as with educated at (P69).
DBpedia
A crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG).
DBpedia data is served as Linked Data.
The DBpedia RDF Data Set is hosted and published using OpenLink Virtuoso. Access is via a SPARQL endpoint, alongside HTTP support for any Web client’s standard GETs for HTML or RDF representations of DBpedia resources.
Semantics - RDF - Turtle - SPARQL - OWL - SKOS - DC
Interesting: cryptographic definition of semantic security:
QUOTE
In cryptography, a semantically secure cryptosystem is one where only negligible information about the plaintext can be feasibly extracted from the ciphertext. Specifically, any probabilistic, polynomial-time algorithm (PPTA) that is given the ciphertext of a certain message m (taken from any distribution of messages), and the message's length, cannot determine any partial information on the message with probability non-negligibly higher than all other PPTA's that only have access to the message length (and not the ciphertext).
This concept is the computational complexity analogue to Shannon's concept of perfect secrecy.
Perfect secrecy means that the ciphertext reveals no information at all about the plaintext,
whereas semantic security implies that any information revealed cannot be feasibly extracted.
Semantic Web - basics
Semantics - Turtle and RDF
RDF
Caroll: An RDF graph is defined as a set of triples (Subject, Predicate, Object). A named graph is an RDF graph which has been assigned a name in the form of a URIref. A named graph is an entity with two functions name and rdfgraph defined on it which determine respectively its name, which is a URI, and the RDF graph that it encodes or represents.
Put otherwise: in an RDF database, a named graph is what we call a subset of our data that has been given a unique label (name). A graph database can contain any number of named graphs alongside its default graph, and each fact can be present in or absent from any graph.
- W3.org - RDF (Resource Description Framework) - RDF primer
- W3.org - RDF 1.1 concepts - dataset, graphs
- Basic RDF graph consists of two nodes (Subject and Object) and an edge connecting them (Predicate) - together referred to as a triple
- An RDF graph is a set of triples
- RDF datasets are used to organize collections of RDF graphs, and comprise a default graph and zero or more named graphs.
- The default graph of the RDF dataset does not have an associated IRI
- All other graphs have an associated IRI or blank node. They are called named graphs, and the IRI or blank node is called the graph name.
- Each named graph is a pair consisting of an IRI or a blank node (the graph name), and an RDF graph. Graph names are unique within an RDF dataset.
- Despite the use of the word “name” in “named graph”, the graph name is not required to denote the graph. It is merely syntactically paired with the graph. RDF does not place any formal restrictions on what resource the graph name may denote, nor on the relationship between that resource and the graph.
- In a SPARQL query, all basic statement patterns are always matched against the default graph in the dataset, except when a GRAPH clause is involved.
- W3.org - RDF-star - extends RDF with a convenient way to make statements about other statements
Turtle
- Turtle (Terse RDF Triple Language), a format for expressing RDF data - W3.org
- Is one of the five syntaxes for expressing OWL 2.
- The simplest triple statement is a sequence of (subject, predicate, object) terms, separated by whitespace and terminated by ‘.’ after each triple.
- The token 'a' in the predicate position of a Turtle triple represents the IRI http://www.w3.org/1999/02/22-rdf-syntax-ns#type. Remember this is used in OWL to refer to a class.
- Turtle - Wikipedia
- W3.org - TriG - the RDF dataset language
- Textual syntax for RDF that allows an RDF dataset to be written in a compact and natural text form, with abbreviations for common usage patterns and datatypes
- An extension of Turtle
- A TriG document consists of a sequence of directives, triple statements and graph statements which contain triple-generating statements
- Graph statements are a pair of an IRI or blank node label and a group of triple statements surrounded by {}. The IRI or blank node label of the graph statement may be used in another graph statement which implies taking the union of the tripes generated by each graph statement.
Semantics - SKOS - 2009 - W3C
The Simple Knowledge Organization System (SKOS) is an RDF vocabulary for representing semi-formal knowledge organization systems (KOSs),
such as thesauri, taxonomies, classification schemes and subject heading lists. SKOS is based on RDF. In SKOS, conceptual resources (concepts)
can be identified with URIs, semantically related through hierarchies and association networks, labeled with lexical strings,
documented with notes, and aggregated into concept schemes.
SKOS can be used on its own, or in combination with more-formal languages such as OWL. SKOS can be seen as a bridging technology, providing
the missing link between the rigorous logical formalism of OWL and the informal and weakly-structured world of Web-based collaboration
tools, as exemplified by social tagging applications.
The aim of SKOS is not to replace original conceptual vocabularies in their initial context of use,
but to allow them to be ported to a shared space, based on a simplified model, enabling wider re-use and better interoperability.
Semantics - DAML - superceded by OWL
SPARQL - SPIN - SHACL
The query evaluation mechanism in SPARQL is based on subgraph matching.
Entailment:
- Subgraph matching is also called simple entailment since it can equally be defined in terms of the simple entailment relation between RDF graphs (a set of 'Subject, Predicate, Object'-triples).
- In order to use more elaborate entailment relations, such as those induced by RDF Schema (RDFS) or OWL semantics, SPARQL 1.1 includes several entailment regimes, including RDFS and OWL.
SPARQL
SPARQL - a recursive acronym for SPARQL Protocol and RDF Query Language.
SPIN - SPARQL Inferencing Notation
- SPINrdf.org - the initial informal SPIN working group and the earliest versions of the SPIN specification
- W3C SPIN submission - 2011 (overview, syntax, modeling vocabulary)
- SPIN - Overview and Motivation
- SPIN - SPARQL Syntax
- SPIN - Modeling Vocabulary
Apparently SPIN did not (yet) make it to a W3C recommendation but rather was succeeded by SHACL
SHACL - Shapes Constraint Language
RDF schemas are implicit, graphs can grow. Shapes are a way to define more explicit schemes for a graph.
SHACL hands-on
Semantics - W3C OWL
- W3.org - Ontologies and vocabularies
- W3C OWLED community
- W3.org - OWL and OWL 2 standards
-
OWL is a computational and declarative logic-based language so that knowledge expressed in it can be reasoned with to verify the consistency of the knowledge or how to make implicit knowledge explicit
- OWL uses the open world assumption (OWA - if a fact is missing it may still be true, but simply missing), as opposed to the closed world assumption (if fact is not present in DB it is assumed to be false)
- Uses 'object properties' to relate individuals (which are of a class) to one-another, and 'data properties' to relate an individual to a literal
- An OWL ontology maps to DL knowledge base, Tbox, RBox and Abox
- Three species of OWL:
- OWL Full, the union of OWL syntax and RDF
- OWL DL, restricted to a decidable FOL fragment (corresponds to DAML+OIL)
- OWL Lite, an easier to implement subset of OWL DL
- Can be specified in five syntaxes:
- W3.org - TR OWL Guide - 2004
- W3.org - TR OWL 2 overview - 2009
- W3.org - TR OWL 2 primer - 2012
- W3.org - OWL DL Query - Manchester Syntax (as used in Protégé)
- OWL Design Patterns - Public Catalog - e.g. N-ary Relations, Closure, ValuePartition, ValueSet,...
- University of Manchester - OWL group
Semantics - SHACL
Ontologies and Vocabularies
Overviews
Ontologies and Vocabularies - W3C - general
Ontologies and Vocabularies - W3C - CA and Provenance
- W3C CertOnto - Certification Authority ontology - certificate/public key/private key/...
- W3C Provenance- overview of documents - 4 recommendations
- Provenance Wiki at Semantic Web
- Provenance Wiki (by initial working group, now closed)
- Provenance FAQ
- Provenance Implementations
- Provenance components including
- Provenance examples of the various ontology components
- Provenance examples second and more recent list
- Luc Moreau homepage
- The Provenance Book by Luc Moreau and Paul Groth
- Luc Moreau Provtoolbox is a Java library to create Java representations of PROV-DM, and convert them between RDF, PROV-XML, PROV-N, and PROV-JSON
- Open Provenance (based on Luc Moreau) including toolbox with Java, Python, Prov-N editor etc
- Open Provenance Store for storing provenance documents
- SOTON Provenance tools
- ProvoViz tool - visualisation of provenance graph expressed PROV-O vocabulary as an Sankey diagram (upload turtle or rdf/xml
- TheGazette - UK's official public record, The Gazette is published by TSO (The Stationery Office) under the superintendence of Her Majesty's Stationery Office (HMSO), part of The National Archives, content available under the Open Government Licence v3.0
- TheGazette using provenance and digital signature
- TheGazette provenance described by Luc Moreau
- BBC on using the Provenance ontology
- REPRODUCE-ME ontology - for reproduction of experiments, based on PROVO and P-PLAN, by Sheeba Samuel
- REPRODUCE-ME ontology - github Sheeba Samuel (onto, sparql queries etc)
Ontologies and Vocabularies - well-known
Dublin Core
- dublincore.org - a vocabulary of fifteen properties for use in resource description as metadata terms - also ISO 15836
- the classic 15 metadata terms are referred to as the Dublin Core Metadata Element Set (DC MES)
- DC MES got integrated into the DCMI Metadata Terms, which are regularly updated
Other
FOAF
Upper ontologies
Introduction
DOLCE and related
Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE)
CYC
BFO
Basic Formal Ontology (BFO) is a top-level ontology developed by Barry Smith and his associates for the purposes of promoting interoperability among domain ontologies built in its terms through a process of downward population.
UFO
The Unified Foundational Ontology (UFO) is divided into three incrementally layered compliance sets:
- UFO-A, an ontology of endurants (objects)
- UFO-B, an ontology of events (perdurants)
- UFO-C, an ontology of social entities built on the top of UFO-A and UFO-B, which addresses terms related to the spheres of inten-
tional and social things
gist
Yamato
- Yamato - Yet Another More Advanced Top-level Ontology - Riichiro Mizoguchi
Sumo
The Suggested Upper Merged Ontology (SUMO) is an upper ontology intended as a foundation ontology for a variety of computer information processing systems. SUMO defines a hierarchy of classes and related rules and relationships. These are expressed in a version of the language SUO-KIF which has a LISP-like syntax. A mapping from WordNet synsets to SUMO has been defined. Initially, SUMO was focused on meta-level concepts (general entities that do not belong to a specific problem domain), and thereby would lead naturally to a categorization scheme for encyclopedias. It has now been considerably expanded to include a mid-level ontology and dozens of domain ontologies.
Ontology selection
Ontology matching
Aims at finding correspondences between semantically related entities of different ontologies.
OMG and related
- OMG SBVR - Semantics Of Business Vocabulary And Rules
- SBVR is aligned with Common Logic – published by ISO as ISO/IEC 24707:2007
- FIBO - Financial Industry Business Ontology
- A joint effort by OMG and the Enterprise Data Management (EDM) Council
- See also EDMC github FIBO
- FIBO Foundations, FIBO Business Entities and FIBO Indices and Indicators
- Specification Documents
- Normative Documents
- Normative Machine Readable Documents, a series of RDF/XML files
- Informative Machine Readable Documents
- Foundations address people, agents, currency, agreements, contracts, dates, goals, objectives, ...
- FIBO - Foundations version 1.1
- FIBO - Foundations version 1.2
- Business Entities address corporations, government entities, functional entities, ...
- FIBO - Business Entities version 1.1
- FIBO - Business Entities version 1.0
- FIRO - Financial Industry Regulatory Ontology,
a series of interlinked Ontologies based on industry standards to capture regulatory imperatives and rules in formal semantics.
- Developed by the Governance, Risk and Compliance Technology Centre (Ireland), Tom Butler
- It also expresses the abstract model followed by the other GRCTC projects:
the parser (Hermes), the Methodology (RIM), the XML Schema (Mercury), the Tool (Ganesha), FiORO
- Firo conists of
- FIRO-H(ighLevel)- a core legal ontology about regulatory compliance, centred around the concept of Requirement
(Rule Statement) and the concept of Action, and defined in OWL.
- FIRO-Structure (FIRO-S) deals with the structure and the semantics of the source document.
It accounts for legal and non-legal documents alike.
The purpose of FIRO-S is to integrate information from the source text part of Mercury to allow querying,
Regulatory Change Management, and reasoning.
FIRO-S relies on LegalDocML for the representation of the structure and the semantics of the legal document.
FIRO-S is not formalized in OWL.
- FIRO-Domain (FIRO-D) identifies the domain ontologies based on FIRO-H.
Each contains one rulebook and the related vocabulary.
This means that different rulebooks result in different instances of FIRO-Domain, and any common rule (or vocabulary entry)
will be present in all relevant instances. Its possible applications include:
- Extract the rules valid for a particular point in time (exploiting RCM of FIRO-S).
- Classify instances of RegulatoryStatements as exceptions to other RegulatoryStatements.
- Classify BusinessRules as ensuring compliance with LegalRules.
- FIRO-PurposeSpecific (FIRO-PS) is the ontology used for performing reasoning towards a specific application.
It is a specialization of one or more FIRO-Ds. It may contain Factor instances to represent (either real or fictional) data.
Its possible applications include:
Classify Events (instances of Actions) on the basis of their relation to RegulatoryStatements, as either “relevant”, “complying”, “allowed”, “breaching”, “exempted”.
- FinRegOnt - Financial Regulation Ontology - based on LKIF and FIBO
Privacy
Legal
- CEN Metalex ontology
- Usage of the CEN Metalex ontology
- OASIS LegaldocML - Akoma Ntoso
- The LegalDocumentXML Specifications provide a common legal document standard for the specification of parliamentary,
legislative, and judicial documents, for their interchange between institutions anywhere in the world, and for the creation
of a common data and metadata model that allows experience, expertise, and tools to be shared and extended by all
participating peers, courts, Parliaments, Assemblies, Congresses, and administrative branches of governments.
The standard aims to provide a format for long-term storage of and access to parliamentary, legislative and judicial
documents that allows search, interpretation, and visualization of documents.
- OASIS LegalruleML
- The objective of the LegalRuleML TC is to extend RuleML with formal features specific to legal norms, guidelines,
policies and reasoning; that is, the TC defines a standard (expressed with XML-schema and Relax NG) that is able to represent
the particularities of the legal normative rules with a rich, articulated, and meaningful markup language.
- OASIS LegalXML enotary with use cases
- OASIS LegalXML courtfiling
- OASIS LegalXML econtracts
- EU Publications - Metadata Registry for e.g. OJ
Financial ontologies
FIBO by EDM Council
GLEIS and GLEIF
GLEIS basics
The GLEIS is composed of the ROC, the Global LEI Foundation (GLEIF) and LEI issuers, also known as Local Operating Units (LOUs).
- LEIROC.org established in November 2012 to coordinate and oversee a worldwide framework of legal entity identification, the Global LEI System (GLEIS)
- In October 2020 the ROC expanded its mandate to become the International Governance Body (IGB) of
- the globally harmonised Unique Transaction Identifier (UTI),
- the Unique Product Identifier (UPI) and
- the Critical Data Elements (CDE) for derivatives transactions.
- GLEIF.org established by the Financial Stability Board (FSB) and G20
GLEIF basics
LEI issuers (LOUs) and Registration Agents
The Registration Agent’s role in the Global LEI System is directly connected to the LEI issuing organization. The Registration Agent may choose to partner with one or more LEI issuing organizations to ensure its clients’ needs for LEI services are met.
GLEIF Registration Authorities (RAs)
The RA list contains more than 1.000 business registers and other relevant registration and validation authority sources and assigns a unique code to each. Going forward, Legal Entity Identifier (LEI) issuing organizations issuing organizations will reference this code in their LEI issuance processes and reporting.
GLEIF Validation Agents
Are referred to on the website but are not defined in the STVR. Part of the 'solutions'. 'LEI issuers that have developed Validation Agent roles can simplify the onboarding process and enhance KYC and due diligence checks to reduce fraud and improve the customer experience within the financial sector.'
GLEIF and ISO
- ISO 5009 Financial organisations and roles
- ISO 3166 Country codes
- ISO 20275 ‘Financial Services – Entity Legal Forms (ELF)
- ISO 17422 LEI specifies the minimum reference data, which must be supplied for each LEI. This data is usually referred as Level 1 Data - Who is who
- The LEI code itself is neutral, with no embedded intelligence
- Data:
- The official name of the legal entity as recorded in the official registers.
- The registered address of that legal entity.
- The country of formation.
- The codes for the representation of names of countries and their subdivisions.
- The date of the first LEI assignment; the date of last update of the LEI information; and the date of expiry, if applicable.
GLEIF and PKI certificates
GLEIF ontologies
- GLEIF specifies ontologies, however the basic mechanisms offered for downloading data (concatenated files, golden copy files) are based on XML (specified according to XSDs), not RDF.
- GLEIF in partnership with data.world makes the content of the Golden Copy Files available as linked open data in RDF, at data.world/gleif.
- GLEIF specifies 4 ontologies:
- GLEIF level 1 ontology, 'Who is Who', covers key reference data for a legal entity identifiable with an LEI, builds on ISO 17442
- GLEIF L1 level 1 ontology, 'Who is Who'
- Local copy of GLEIF ontologies
- GLEIF L1 level 1 inferences description
- Corresponds to the Common Data Format version 2.1
- Covers key reference data for a legal entity identifiable with an LEI according the ISO 17442 (set of attributes or LEI reference data that comprises the essential elements of identification).
- Specifies the minimum reference data, which must be supplied for each LEI:
- The official name of the legal entity as recorded in the official registers.
- The registered address of that legal entity.
- The country of formation.
- The codes for the representation of names of countries and their subdivisions.
- The date of the first LEI assignment;
the date of last update of the LEI information; and
- the date of expiry, if applicable.
- Imports following ontologies:
- OMG: Specification Metadata
- Skos core: Skos reference
- GLEIF: Base defines generic concepts for reuse, including generic classes for (legal) Entities and their relationships and statuses; and generic properties for different types of name and address. It makes use of the OMG Languages Countries and Codes (LCC) ontology (based on the ISO 3166 standard) for country and region information.
- Technically speaking:
- Ontology elements are identified by IRIs, as specified in https://www.w3.org/TR/owl2-syntax/#IRIs
- Class registered entity (subclass of 'entity')
- identified by IRI: IRI: https://www.gleif.org/ontology/L1/RegisteredEntity
- defined by https://www.gleif.org/ontology/L1/ and by the statement 'LEI-registered entities including, but not limited to, unique parties that are legally or financially responsible for the performance of financial transactions or have the legal right in their jurisdiction to enter independently into legal contracts, regardless of whether they are incorporated or constituted in some other way (e.g. trust, partnership, contractual). It excludes natural persons, but includes governmental organizations and supranationals.'
- has super-class: entity
- which is defined in the 'base ontology': https://www.gleif.org/ontology/v1.0/Base/index-en.html#Entity
- is identified by IRI: https://www.gleif.org/ontology/Base/Entity
- has no superclasses
- has sub-classes: legal person, registration authority
- is in domain of: has address, has entity expiration date, has entity expiration reason, has entity status, has legal jurisdiction, has successor, has successor name
- is in range of: has source, has successor, has target
- Class legal entity
- identified by IRI: https://www.gleif.org/ontology/L1/LegalEntity
- defined by https://www.gleif.org/ontology/L1/
- has super-classes: legal person, registered entity
- has sub-classes; fund family, local operating unit, sole proprietor
- Class legal entity identifier (subclass of 'identifier')
- identified by IRI: https://www.gleif.org/ontology/L1/LegalEntityIdentifier
- defined by https://www.gleif.org/ontology/L1/ and corresponds to the ISO 17442 compatible identifier for the legal entity referenced.
- has super-class: identifier
- which is defined in the 'base ontology': https://www.gleif.org/ontology/v1.0/Base/index-en.html#Identifier as 'Sequence of characters, capable of uniquely identifying that with which it is associated, within a specified context.'
- is identified by IRI: https://www.gleif.org/ontology/Base/Identifier
- has no superclasses
- has sub-class: registry identifier
- has sub-classes; fund family, local operating unit, sole proprietor
- GLEIF L2 level 2 ontology, Who Owns Whom, for legal entity parent relationships.
- Entity Legal Form ontology defining concepts for Entity Legal Forms
and their abbreviations by jurisdiction, based on ISO 20275
- Registration Authority ontology defining concepts for Business Registries, including the jurisdictions served
GLEIF data files
GLEIF publishes
- The Global LEI Index, which contains historical and current LEI records including related reference data in one authoritative, central repository
- Daily updated Concatenated Files, that include specific information on an LEI record and related reference data based on the relevant reporting format published by all LEI issuers worldwide. Data is provided in one central file for each reporting format, without changes.
- These include:
- All current Legal Entity Identifiers (LEIs) globally
- All related LEI reference data (LE-RD)
- Golden Copy Files provide a way explore the information included within the Global LEI Index
- Three sets of Golden Copy Files are issued daily. With each updated version of the Golden Copy Files, four delta files are made available.
The Common Data File (CDF) formats (in XSD) define how LEI issuing organizations report their LEI reference data. A file which does not pass XSD validation cannot be included in the Concatenated Files and the Golden Copy and Delta Files.
- XML schemas
- XML schema for level 1 CDF v3.1
- Code Lists used in level 1 data
- ‘Registration Authorities List’, of business registers and other relevant registration authority sources
- ‘Entity Legal Forms (ELF) Code List’, containing entity legal forms established across individual jurisdictions. Examples of entity legal forms include: Limited liability partnership (LLP), Gesellschaft mit beschränkter Haftung (GmbH) or Société Anonyme (SA). The ELF Code List assigns a unique code to each entity legal form.
- ‘Accepted Legal Jurisdictions Code List’ specifies accepted values for the field ‘LegalJurisdiction’ in the LEI Common Data File format.
Searching GLEIF data files
Options:
- LEI Search 2.0 web interface
- GLEIF API
- Download files
- Download the Concatenated Files
- Download the Golden Copy and Delta Files
- Download data.world copy of Golden Copy
- Query the data.world copy of Golden Copy on-line
- Use an alternative such as Makolab's LEI resolver
LEI Search 2.0
- LEI Search 2.0 web interface
- Any interested party can access and search the complete Legal Entity Identifier (LEI) data pool free of charge and without the need to register, using the GLEIF web-based LEI search tool. The LEI search tool 2.0 provides enhanced functionality including the option to identify corporate ownership structures or pinpoint other identifiers that have been mapped to an LEI.
GLEIF API
Downloads
- Download the Concatenated Files
- Download the Golden Copy and Delta Files
GLEIF and data.world
GLEIF in partnership with data.world makes the content of the Golden Copy Files available as linked open data in RDF, at data.world/gleif.
- GLEIF description of LEI in RDF
- data.world/company/about - 'Public Benefit Company' incorporated in Delaware
- data.world/gleif - following as marc-louis-sel - using Google
- XSL code on Github for transformation into RDF/XML
- Getting started with LEI data in RDF
- The LEI RDF database contains all of the LEI data published as part of the GLEIF "Golden Copy" dataset. The data has been expressed as an RDF dataset, using the vocabulary defined by the GLEIF LEI Ontology.
- The Ontology is defined by these namespaces:
- https://www.gleif.org/ontology/Base/
- https://www.gleif.org/ontology/L1/
- https://www.gleif.org/ontology/L2/
- https://www.gleif.org/ontology/EntityLegalForm/
- https://www.gleif.org/ontology/RegistrationAuthority/
- Data world states: 'You can query the dataset using SPARQL through this interface, or by connecting to the SPARQL endpoint.'
- However, the SPARQL endpoint seems to be a bit hidden, you must:
- Go to https://data.world/gleif/lei-data
- Select ‘launch workspace’
- Register or login e.g. with Google
Evolution 1: LEI.INFO, GLEIO, Makolabs, semantic blockchain
The LEI.INFO service is based on the transformation of the standard XML-based data model used by the LEI system (called LEI CDF) to the RDF-based Semantic Data Model.
Furthermore, LEI.INFO is implemented according to the rules of Linked Open Data principles.
Evolution 2: vLEI and KERI
vLEIs are based on the Trust over IP Authentic Chained Data Container (ACDC) specification (based on the Key Event Receipt Infrastructure (KERI) protocol (github.com/WebOfTrust/keri), both Internet Engineering Task Force (IETF) draft specifications). Key Event Receipt Infrastructure (KERI) is decentralized identity system.
Ontochain
Other
Knowledge Graphs
A knowledge graph is a knowledge base that uses a graph-structured data model or topology to integrate data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics underlying the used terminology.
Education
- UNESCO ISCED and ISCED-F - International Standard Classification of Education for organising education programmes and related qualifications by levels and fields
- ISCED 2011 (levels and fields of education) has been implemented in all EU data collections since 2014
- ISCED-F 2013 (fields of education and training) has been implemented since 2016
- Eurostat - uses ISCED and ISCED-F
- EQF- European Qualifications Framework
Described using
- SDMX - ISO 17369:2013 describes statistical data and metadata, based on XML, has its own validation and transformation language, and has 3 key components:
- Information model (data and metadata)
- Content-Oriented Guidelines for communication
- IT architecture and tools
Publication
- ACM CCS
- the 2012 ACM Computing Classification System has been developed as a poly-hierarchical ontology that can be utilized in semantic web applications
Privacy
- Usable Privacy - project
- resulting in the privacy ontology PRIVONT
- contains OPP-115 corpus, consisting of 115 privacy policies with 23K fine-grained data practice annotations (manual/ML)
Pharma and Healthcare
Trust
Indices content available with semantic metadata
- MakoLab - web analytics - using schema.org to improve search results
- Sindice
Ontology design patterns
- ODP overview
- Closure
- it is not enough to say that carnivore eats some meat, as that is equivalent to saying that it can eat another things apart of meat
- a universal restriction that it only eats meat is also needed (the existential and the universal restriction need to have the same filler)
- Value Partition (at class level) - referred to as enumeration, if it is built using individuals instead of classes.
- a person can be defined as being short, medium or tall, and the attribute height can just get those values
- height is said to be covered or exhausted by those values; the possible heights are only those three
- refer also W3 Tech Report
Lexicons/Thesauri
Global
- UNBIS - United Nations Thesaurus
- ISO OBP - ISO's Online Browsing Platform - preview before you buy
- IEC IEV - IEC's online terminology database of electrotechnical vocabulary
EC ISA, PO, SEMIC and related
Overviews
- Digit ISA2 core vocabularies
- Core Person: captures the fundamental characteristics of a person, e.g. name, gender, date of birth, location, and is referenced at by W3C for person
- Registered Organisation: captures the fundamental characteristics of a legal entity (e.g. its identifier, activities) which is created through a formal registration process, typically in a national or regional register
- Core Location: captures the fundamental characteristics of a location, represented as an address, a geographic name or geometry
- Core Public Service: captures the fundamental characteristics of a service offered by a public administration
- Core Criterion and Core Evidence: describe the principles and the means that a private entity must fulfil to become eligible or qualified to perform public services. A Criterion is a rule or a principle that is used to judge, evaluate or test something. An Evidence is a means to prove a Criterion
- Core Public Organisation: describes public organisations in the European Union
- Core Public Event Vocabulary: under development
- Semantic Interoperability Community (SEMIC) page on core vocabularies
Publications Office (PO)
- EU Publications Office (PO)- uses controlled vocabularies, ontologies, models, ...
- EU PO EuroVoc - Multilingual, multidisciplinary thesaurus covering the activities of the EU, the European Parliament in particular. It contains terms in 23 official EU languages, plus in three languages of countries which are candidates for EU accession.
- EU PO Metadata specifying how to describe legal information, the PO's Meta Data Register (MDR) provides access to a number of value lists relevant to ELI implementation, including the Named Authority Lists (NAL)
- CELLAR stores all PO's content and metadata, and the major portals (EUR-Lex, OP Portal), its resources are semantically described
by the CDM (Common Data Model), an FRBR-compliant OWL ontology, serving as the basis for ELI. A public SPARQL endpoint is available
- EUR-LEX ELI- the European Legislation Identifier to make legislation available online, specifying:
- web identifiers (URIs) for legal information
- a specific language for exchanging legislation in machine-readable formats
PwC's ELI test project
EC Webgate
Linking data
Linked Data concept
Linked Data is a method of publishing RDF data on the Web and of interlinking data between different data sources. Linked Data can be accessed using Semantic Web browsers. However, instead of blindly following nondescript links between HTML pages, Semantic Web browsers enable users to navigate by following self-described RDF links. It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web.
JSON-LD format
JSON-LD is a JSON-based format to serialize Linked Data.
It is designed to be usable directly as JSON, with no knowledge of RDF. It is also designed to be usable as RDF in conjunction with other Linked Data technologies like SPARQL.
Intro
W3C core specifications
Generally speaking, the data model described by a JSON-LD document is a labeled, directed graph.
- W3C JSON-LD specification introducing:
- a universal identifier mechanism for JSON objects via the use of IRIs,
- a way to disambiguate keys shared among different JSON documents by mapping them to IRIs via a context,
- a mechanism in which a value in a JSON object may refer to a resource on a different site on the Web,
- the ability to annotate strings with their language,
- a way to associate datatypes with values such as dates and times, and
- a facility to express one or more directed graphs, such as a social network, in a single document.
- JSON-LD relationship to RDF - W3C
- W3C JSON-LD algorithms and API
- Defines a set of algorithms for programmatic transformations of JSON-LD documents
- Proposes an Application Programming Interface (API) for developers implementing the specified algorithms
- W3C JSON-LD framing
- JSON-LD Framing allows developers to query by example and force a specific tree layout to a JSON-LD document
JSON-LD specifies a number of syntax tokens and keywords that are a core part of the language. A normative description of the keywords is given in § 9.16 Keywords.
- @container: Used to set the default container type for a term.
- @context: Used to define the short-hand names that are used throughout a JSON-LD document. These short-hand names are called terms and help developers to express specific identifiers in a compact manner. The @context keyword is described in detail in § 3.1 The Context.
- @id: Used to uniquely identify node objects that are being described in the document with IRIs or blank node identifiers. This keyword is described in § 3.3 Node Identifiers.
Selective terminology
- Context: a set of rules for interpreting a JSON-LD document as described in the 'The Context' section of JSON-LD 1.1, and normatively specified in the Context Definitions section of JSON-LD 1.1.
- Term: a short word defined in a context that may be expanded to an IRI. See the Terms section of JSON-LD 1.1 for a normative description.
- Term definition: an entry in a context, where the key defines a term which may be used within a map as a key, type, or elsewhere that a string is interpreted as a vocabulary item. Its value is either a string (simple term definition), expanding to an IRI, or a map (expanded term definition).
- Frame: a JSON-LD document which describes the form for transforming another JSON-LD document using matching and embedding rules. A frame document allows additional keywords and certain map entries to describe the matching and transforming process.
- Map (or ordered map) is a specification type consisting of a finite ordered sequence of key/value pairs, with no key appearing twice. Each key/value pair is called an entry. See https://infra.spec.whatwg.org/#ordered-map
- Vocabulary mapping: is set in the context using the @vocab key whose value must be an IRI, a compact IRI, a term, or null. See the Context Definitions section of JSON-LD 1.1 for a normative description.
W3C other
JSON-LD tooling
Started by Mark Musen 1987, much work done by Natasha Noy.
- Stanford's Protégé - desktop and web versions (webversion does not support reasoning)
- Protégé Github
- Protégé basics at wiki.opensemanticframework.org
- Desktop Protégé 5.0.0
- WebProtégé - userid required
- WebProtégé terms of use
- Careful: By uploading, emailing, posting, publishing or otherwise transmitting content to any Forum or submitting any content to WebProtégé, you automatically grant (or warrant that the owner of such rights has expressly granted) WebProtégé a perpetual, royalty-free, irrevocable, nonexclusive right and license to use, reproduce, modify, adapt, publish, transmit and distribute such content on in any form, medium, or technology now known or later developed.
In addition, you warrant that all so-called moral rights in the content have been waived.
- Protégé Wiki
- Protégé Wiki ontology library
- Introduction to OWL reasoning
- OntoDebug - plugin
- Protégé Wiki- SWRL
- SWRL
- SWRL plugin for Protégé 5
- OWLViz for Protégé 5 - OWL visualisation
- VOWL plugin for Protégé 5 - OWL visualisation
OntoText
GraphDB
- Ontotext's GraphDB - originally OWLIM (OWL in Memory), but when a transactional, index-based file storage layer was added, the name was no longer appropriate
- tag.ontotext.com - tagging service, semantically enrich your content with annotations such as Person, Organization, Location, and the relationships between them
- GraphDB info on Stackoverflow
Applications using GraphDB
Loading data
Ontop, OntoRefine, OpenRefine (open source version of OntoRefine).
Ontop
Ontop is a Virtual Knowledge Graph system. It exposes the content of arbitrary relational databases as knowledge graphs. These graphs are virtual, which means that data remains in the data sources instead of being moved to another database.
Ontop translates SPARQL queries expressed over the knowledge graphs into SQL queries executed by the relational data sources. It relies on R2RML mappings and can take advantage of lightweight ontologies.
OntoRefine
Allows mapping and transformation of any structured data to RDF schema and loading it in GraphDB.
Until GraphDB 10, OntoRefine was part of the GraphDB Workbench. Since GraphDB 10, OntoRefine became and independent product, Ontotext Refine.
Supports the formats TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet.
It makes use of GREL (Google Refine Expression Language) and supports a.o. SPIN SPARQL functions.
OpenRefine
Originally Google Refine. Originally oriented to take tables as input.
Neo4J
- Neo4J- an ACID-compliant transactional database with native graph storage and processing
- Neo4J - founded in 2000, US, Sweden, UK, ...
Triply
NL, VU University Amsterdam campus.
TriplyDB
RDF4J
Formerly OpenRDF Sesame. Created by the Dutch software company Aduna as part of "On-To-Knowledge", a semantic web project that ran from 1999 to 2002. It contains implementations of an in-memory triplestore and an on-disk triplestore, along with two separate Servlet packages that can be used to manage and provide access to these triplestores, on a permanent server.
In May 2016, Sesame officially forked into an Eclipse project called RDF4J,[4] in recognition of Aduna no longer being involved in its development.[
- RDF4J- RDF Java framework plus native in-memory and on-disk repositories
- RDF4J docs
- RDF4J API doc
- open source Java framework for processing RDF data
- includes parsing, storing, inferencing and querying of/over such data
- allows to connect with SPARQL endpoints such its own native repositories, Ontotext, Stardog, Virtuoso etc
Sundry
Semantics - initiatives