Structuring, representing and linking data

Contents

Local files

See also local files (GrayTiger):

Structuring and representing data

W3C and related

W3C - WWW

Security and cryptography

General

Verifiable Credentials

Multiple roots, graphs, blockchain

Applications

See also local files.

Application development

Application development

Jena

Other

Mark-up languages

HTML

XML

The W3C - XML Recommendation states: This document specifies a syntax created by subsetting an existing, widely used international text processing standard (Standard Generalized Markup Language, ISO 8879:1986(E) as amended and corrected) for use on the World Wide Web. XML focuses on syntax, not on semantics.

To aid with the interpretation of an XML document, additional data can be provided in an XML schema. This is a description of a type of XML document, in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. XML schemas are expressed through The XML Schema Definition is commonly referred to as XSD.

XML basics

The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema. In fact, neither instances nor schemas need to exist as documents per se -- they may exist as streams of bytes sent between applications, as fields in a database record, or as collections of XML Infoset "Information Items".

An instance is not required to reference a schema.

XML further information

XML applications

Process definitions in XML

XBRL - the XML Business Reporting Language

Check out IETF's CNRP (Common Name Resolution Protocol), which can coded be in XML. Can be used e.g. to request a statement of accounts.

XBRL and iXBRL specification

XBRL taxonomies

XBRL usage

XBRL service providers

Wikipedia - DBPedia, WikiData, ...

Local data

Wikipedia

The free encyclopedia, hosted by the Wikimedia Foundation.

Wikimedia

A global movement whose mission is to bring free educational content to the world.

Mediawiki software

The software that powers a.o. Wikipedia.

Wikidata

Concepts

Acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

The Wikidata repository consists mainly of items, each one having a label, a description and any number of aliases.

Statements describe detailed characteristics of an Item and consist of a property and a value.

Furthermore

DBpedia

A crowd-sourced community effort to extract structured content from the information created in various Wikimedia projects. This structured information resembles an open knowledge graph (OKG). DBpedia data is served as Linked Data.

The DBpedia RDF Data Set is hosted and published using OpenLink Virtuoso. Access is via a SPARQL endpoint, alongside HTTP support for any Web client’s standard GETs for HTML or RDF representations of DBpedia resources.

Semantics - RDF - Turtle - SPARQL - OWL - SKOS - DC

Interesting: cryptographic definition of semantic security: QUOTE In cryptography, a semantically secure cryptosystem is one where only negligible information about the plaintext can be feasibly extracted from the ciphertext. Specifically, any probabilistic, polynomial-time algorithm (PPTA) that is given the ciphertext of a certain message m (taken from any distribution of messages), and the message's length, cannot determine any partial information on the message with probability non-negligibly higher than all other PPTA's that only have access to the message length (and not the ciphertext). This concept is the computational complexity analogue to Shannon's concept of perfect secrecy. Perfect secrecy means that the ciphertext reveals no information at all about the plaintext, whereas semantic security implies that any information revealed cannot be feasibly extracted.

Semantic Web - basics

Semantics - Turtle and RDF

RDF

Caroll: An RDF graph is defined as a set of triples (Subject, Predicate, Object). A named graph is an RDF graph which has been assigned a name in the form of a URIref. A named graph is an entity with two functions name and rdfgraph defined on it which determine respectively its name, which is a URI, and the RDF graph that it encodes or represents.

Put otherwise: in an RDF database, a named graph is what we call a subset of our data that has been given a unique label (name). A graph database can contain any number of named graphs alongside its default graph, and each fact can be present in or absent from any graph.

Turtle

Semantics - SKOS - 2009 - W3C

The Simple Knowledge Organization System (SKOS) is an RDF vocabulary for representing semi-formal knowledge organization systems (KOSs), such as thesauri, taxonomies, classification schemes and subject heading lists. SKOS is based on RDF. In SKOS, conceptual resources (concepts) can be identified with URIs, semantically related through hierarchies and association networks, labeled with lexical strings, documented with notes, and aggregated into concept schemes.

SKOS can be used on its own, or in combination with more-formal languages such as OWL. SKOS can be seen as a bridging technology, providing the missing link between the rigorous logical formalism of OWL and the informal and weakly-structured world of Web-based collaboration tools, as exemplified by social tagging applications.

The aim of SKOS is not to replace original conceptual vocabularies in their initial context of use, but to allow them to be ported to a shared space, based on a simplified model, enabling wider re-use and better interoperability.

Semantics - DAML - superceded by OWL

SPARQL - SPIN - SHACL

The query evaluation mechanism in SPARQL is based on subgraph matching.

Entailment:

SPARQL

SPARQL - a recursive acronym for SPARQL Protocol and RDF Query Language.

SPIN - SPARQL Inferencing Notation

SHACL - Shapes Constraint Language

RDF schemas are implicit, graphs can grow. Shapes are a way to define more explicit schemes for a graph. SHACL hands-on

Semantics - W3C OWL

Semantics - SHACL

Ontologies and Vocabularies

Overviews

Ontologies and Vocabularies - W3C - general

Ontologies and Vocabularies - W3C - CA and Provenance

Ontologies and Vocabularies - well-known

Dublin Core

Other

FOAF

Upper ontologies

Introduction

DOLCE and related

Descriptive Ontology for Linguistic and Cognitive Engineering (DOLCE)

CYC

BFO

Basic Formal Ontology (BFO) is a top-level ontology developed by Barry Smith and his associates for the purposes of promoting interoperability among domain ontologies built in its terms through a process of downward population.

UFO

The Unified Foundational Ontology (UFO) is divided into three incrementally layered compliance sets:

gist

Yamato

Sumo

The Suggested Upper Merged Ontology (SUMO) is an upper ontology intended as a foundation ontology for a variety of computer information processing systems. SUMO defines a hierarchy of classes and related rules and relationships. These are expressed in a version of the language SUO-KIF which has a LISP-like syntax. A mapping from WordNet synsets to SUMO has been defined. Initially, SUMO was focused on meta-level concepts (general entities that do not belong to a specific problem domain), and thereby would lead naturally to a categorization scheme for encyclopedias. It has now been considerably expanded to include a mid-level ontology and dozens of domain ontologies.

Ontology selection

Ontology matching

Aims at finding correspondences between semantically related entities of different ontologies.

OMG and related

Privacy

Legal

Financial ontologies

FIBO by EDM Council
GLEIS and GLEIF
GLEIS basics
The GLEIS is composed of the ROC, the Global LEI Foundation (GLEIF) and LEI issuers, also known as Local Operating Units (LOUs).
GLEIF basics
LEI issuers (LOUs) and Registration Agents

The Registration Agent’s role in the Global LEI System is directly connected to the LEI issuing organization. The Registration Agent may choose to partner with one or more LEI issuing organizations to ensure its clients’ needs for LEI services are met.
GLEIF Registration Authorities (RAs)
The RA list contains more than 1.000 business registers and other relevant registration and validation authority sources and assigns a unique code to each. Going forward, Legal Entity Identifier (LEI) issuing organizations issuing organizations will reference this code in their LEI issuance processes and reporting.
GLEIF Validation Agents
Are referred to on the website but are not defined in the STVR. Part of the 'solutions'. 'LEI issuers that have developed Validation Agent roles can simplify the onboarding process and enhance KYC and due diligence checks to reduce fraud and improve the customer experience within the financial sector.'
GLEIF and ISO
GLEIF and PKI certificates
GLEIF ontologies
GLEIF data files
GLEIF publishes The Common Data File (CDF) formats (in XSD) define how LEI issuing organizations report their LEI reference data. A file which does not pass XSD validation cannot be included in the Concatenated Files and the Golden Copy and Delta Files.
Searching GLEIF data files
Options: LEI Search 2.0 GLEIF API Downloads
GLEIF and data.world
GLEIF in partnership with data.world makes the content of the Golden Copy Files available as linked open data in RDF, at data.world/gleif.
Evolution 1: LEI.INFO, GLEIO, Makolabs, semantic blockchain
The LEI.INFO service is based on the transformation of the standard XML-based data model used by the LEI system (called LEI CDF) to the RDF-based Semantic Data Model. Furthermore, LEI.INFO is implemented according to the rules of Linked Open Data principles.
Evolution 2: vLEI and KERI
vLEIs are based on the Trust over IP Authentic Chained Data Container (ACDC) specification (based on the Key Event Receipt Infrastructure (KERI) protocol (github.com/WebOfTrust/keri), both Internet Engineering Task Force (IETF) draft specifications). Key Event Receipt Infrastructure (KERI) is decentralized identity system.

Ontochain

Other

Knowledge Graphs

A knowledge graph is a knowledge base that uses a graph-structured data model or topology to integrate data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics underlying the used terminology.

Education

Publication

Privacy

Pharma and Healthcare

Trust

Indices content available with semantic metadata

Ontology design patterns

Lexicons/Thesauri

Global

EC ISA, PO, SEMIC and related

Overviews

Publications Office (PO)

EC Webgate

Linking data

Linked Data concept

Linked Data is a method of publishing RDF data on the Web and of interlinking data between different data sources. Linked Data can be accessed using Semantic Web browsers. However, instead of blindly following nondescript links between HTML pages, Semantic Web browsers enable users to navigate by following self-described RDF links. It also allows the robots of Semantic Web search engines to follow these links to crawl the Semantic Web.

JSON-LD format

JSON-LD is a JSON-based format to serialize Linked Data. It is designed to be usable directly as JSON, with no knowledge of RDF. It is also designed to be usable as RDF in conjunction with other Linked Data technologies like SPARQL.

Intro

W3C core specifications

Generally speaking, the data model described by a JSON-LD document is a labeled, directed graph. JSON-LD specifies a number of syntax tokens and keywords that are a core part of the language. A normative description of the keywords is given in § 9.16 Keywords.

Selective terminology

W3C other

JSON-LD tooling

Tools

W3C

Protégé

Started by Mark Musen 1987, much work done by Natasha Noy.

Graph databases

OntoText

GraphDB
Applications using GraphDB
Loading data
Ontop, OntoRefine, OpenRefine (open source version of OntoRefine).
Ontop
Ontop is a Virtual Knowledge Graph system. It exposes the content of arbitrary relational databases as knowledge graphs. These graphs are virtual, which means that data remains in the data sources instead of being moved to another database.

Ontop translates SPARQL queries expressed over the knowledge graphs into SQL queries executed by the relational data sources. It relies on R2RML mappings and can take advantage of lightweight ontologies.
OntoRefine
Allows mapping and transformation of any structured data to RDF schema and loading it in GraphDB. Until GraphDB 10, OntoRefine was part of the GraphDB Workbench. Since GraphDB 10, OntoRefine became and independent product, Ontotext Refine. Supports the formats TSV, CSV, *SV, XLS, XLSX, JSON, XML, RDF as XML, and Google sheet. It makes use of GREL (Google Refine Expression Language) and supports a.o. SPIN SPARQL functions.
OpenRefine
Originally Google Refine. Originally oriented to take tables as input.

Neo4J

Triply

NL, VU University Amsterdam campus.
TriplyDB

RDF4J

Formerly OpenRDF Sesame. Created by the Dutch software company Aduna as part of "On-To-Knowledge", a semantic web project that ran from 1999 to 2002. It contains implementations of an in-memory triplestore and an on-disk triplestore, along with two separate Servlet packages that can be used to manage and provide access to these triplestores, on a permanent server.

In May 2016, Sesame officially forked into an Eclipse project called RDF4J,[4] in recognition of Aduna no longer being involved in its development.[

Reasoners

Other tools

Sundry

Semantics - initiatives