Using data

Wikidata’s focus is on creating Linked Open (meta)Data to supplement Wikipedia documents
DBpedia’s focus is on generating Linked Open Data from Wikipedia documents

Both projects provide access to their respective Linked Open Data via SPARQL Query Service endpoints.

Wikidata - a free and open knowledge base that can be read and edited by both humans and machines

Its focus is on creating Linked Open (meta)Data to supplement Wikipedia documents. Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.

Wikidata introduction

Data model

Wikidata makes use of namespaces, including main, Property, Lexeme, and EntitySchema.

The Wikidata repository consists mainly of Items, each one having a label, a description and any number of aliases.
Items are uniquely identified by a Q followed by a number, such as Douglas Adams (Q42).
Wikidata Glossary

Statements describe detailed characteristics of an Item and consist of a property and a value.
Sample items:

Properties in Wikidata have a P followed by a number, such as with educated at (P69).

Furthermore:

For a person, you can add a property to specify where they were educated, by specifying a value for a school.
For buildings, you can assign geographic coordinates properties by specifying longitude and latitude values.
Properties can also link to external databases. A property that links an item to an external database, such as an authority control database used by libraries and archives, is called an identifier.
Special Sitelinks connect an item to corresponding content on client wikis, such as Wikipedia, Wikibooks or Wikiquote.

Wikidata queries

SPARQL endpoint

DBpedia

Objective: query Wikipedia as you query an SQL database. The structured version of the Wikipedia encyclopedia. Evolution: use of DBpedia KG as entry into other KGs.

Wikipedia infobox => RDF triples

DBpedia Slack login with Google
DBpedia how to extract structured content from the information created in various Wikimedia projects

Is an ongoing project at infai.org in Leipzig
Is based on OpenLink Software's Virtuoso database
Info in DBpedia gitbooks

Regular extraction of data from Wikipedia by DBpedia

DBpedia extraction frameworkto extract different kinds of structured information from Wikipedia, written in Scala 2.8

Retrieval of data from DBpedia

Databus - an RDF-based meta data registry

DBpedia releases are available on the DBpedia Databus. The Databus is a data management and release platform that enables automized data publishing and retrieval.
Databus Collections facilitate data retrieval by grouping various data pieces under a public identifier.
The DBpedia releases can be found in the Latest Core Collection.

Databus client
DBpedia Spotlight performs named entity extraction, including entity detection and name resolution, can also be used for named entity recognition

DBpedia SPARQL endpoint

DBpedia semantics

DBpedia ontology

Ontology classes
The DBpedia Ontology is a shallow, cross-domain ontology, which has been manually created based on the most commonly used infoboxes within Wikipedia
The ontology currently (2018-10) covers 685 classes which form a subsumption hierarchy and are described by 2,795 different properties
DBpedia 3.2 used an infobox extraction method based on hand-generated mappings of Wikipedia infoboxes to the DBpedia ontology, where the mappings defined rules on how to parse infobox values
DBpedia 3.5 introduced a public wiki for writing infobox mappings, allowing external contributors to define mappings for the infoboxes they are interested in and to extend the existing DBpedia ontology with additional classes and properties
DBpedia 3.7 uses a directed-acyclic graph (not a tree) as ontology, so classes may have multiple superclasses, which was important for the mappings to schema.org. A taxonomy can still be constructed by ignoring all superclasses except the one that is specified first in the list and is considered the most important.

Wikimedia

Wikimedia is a global movement to bring free educational content to the world
Wikidata

Integration and Knowledge Graphs

Knowledge Graphs book - a.o. Jose Emilio Labra

Linked Data cloud

Linked Open Data cloud

OpenAlex

OpenAlex - a fully open catalog of the global research system

Created by Arcadiafund - UK
Agregates data sources: MAG, Crossref, ORCID, ROR, DOAJ, Unpaywall, Pubmed, Pubmed Central, The ISSN International Centre, Web crawls, Subject-area and institutional repositories from arXiv to Zenodo and everywhere in between
Uses its own OpenAlex ID

FactForge

FactForge

uses the Financial Industry Business Ontology (FIBO) as an upper-level ontology. Various aspects of the schemata of the different datasets are mapped to the corresponding FIBO classes and relationships. In this way, one can query across different datasets using FIBO. The following two modules of FIBO have been loaded into FactForge:

Foundations, version 14-11-30 (November 2014);
Business Entities, version 15-02-23 (February 2015)

includes more than 1 billion facts from popular datasets such as DBpedia, Geonames, Wordnet, GLEIF, the Panama Papers, etc., as well as ontologies such as the Financial Industry Business Ontology (FIBO)

DBPedia: only the English version of DBPedia is loaded.
Geonames: a worldwide geographical database, which “contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places”.
Wordnet: popular semantic dictionary for English. Words “are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations”. It contains 117.000 synsets.
WorldFacts: dataset about countries, languages, currencies and other related information. Developed by the DBPedia association and includes information derived from LEXVO, CIA World FactBook and other datasets.
Linked Leaks: The LOD version of the Panama Papers database released by the International Consortium for Investigative Journalism (ICIJ) in May 2016. Includes “about 200,000 offshore entities that are part of the Panama Papers investigation and about more than 100,000 additional companies that were part of the 2013 ICIJ Offshore Leaks investigation”.
GLEI (Global Legal Entity Identifier): profiles of about 211 000 organizations, derived from the GMEI utility data dump from April 2016. “The Global Markets Entity Identifier (GMEI) utility is DTCC’s legal entity identifier solution offered in collaboration with SWIFT. The GMEI utility is a pre-Local Operating Unit of the Global Legal Entity Identifier System (GLEIS)”
NOW News: article texts and metadata for a stream of general news. The metadata includes annotations that link mentions of entities (e.g., people or organizations) and concepts (e.g., “chocolate” or “recession”) in the news to the corresponding DBPedia and Wikidata concepts.

Other

Applications and reasoning

Logic, Rules and Reasoners

Reasoners - an overview

Hermit
Boris Motik (UK/Oxford)- Hermit development

Pellet - open source version maintained by Stardog

Open source, OWL DL reasoner in Java, originally developed at the University of Maryland’s Mindswap Lab
Originally commercially supported by Clark and Parsia LLC, later by Complexible
Pellet 3.0, a closed source, next-gen version of Pellet, is embedded and available in Stardog, the RDF database

Fact++

Rascalli - responsive artificial situated cognitive agents that live and learn on the Internet - origin of TRREE reasoner
TRREEreasoner, included in Ontotext GraphDB

Datalog - Mitre manual
John Ramsdell homepage

SWRL as part of Protégé

Application development

Jena

Jena for application development - APIs

Implements APIs for OWL, RDF, SPARQL, reasoners, storage, parsing and writing (XML, N3, turtle, ...)
Includes triples stores: Fuseki and TDB

Jena Javadoc
Jena at Github.com - including examples
Jena tutorials
Jena with Eclipse

Other

OWL API
ODP - Ontology Design Patterns

DE - thingsTHINKING - semantic processing, natural language processing
US - Semagix
BE - TenForce

Selective applications

SecuPedia (Germany)
Bullipedia.net - El Bulli

Valcri - Visual analytics for sense-making in criminal intelligence analysis

FushatAmal - missing persons register (Lebanon)

LiveJournal

Applications - vendors

Applications - music

Musicbrainz.org - database and tagging service used by a.o. Amarok
dbtune.org - hosts a number of servers, providing access to music-related structured data, in a Linked Data fashion
Jamendo.com- a community collection of music all freely licensed under Creative Commons licenses

Applications - trust

Rebooting Web of Trust

Decentralized Identity
Verifiable Claims
Linked Data and Signature Suites

BOScoin - trust smart contracts
VIVOWEB.org - an open source semantic web application originally developed and implemented at Cornell - participating in PublishTrust
PublishTrust.org - examines the feasibility of adding trust values to on-line identities for authors of scholarly publications -
OIX.org - Open IDentity eXchange (UK/US, involving UK Cabinet and many vendors)

Olaf Hartig - trusted linked data, provenance, ...
FP7 OPTET - operational trustworthiness enabling technologies
OpenGUID (seems stalled since 2008)
Trust Assertion Ontology - A light-weight vocabulary to describe asserted user’s subjective trust values

W3C Trust - user and agent use cases

SOLID - Tim Berners-Lee and Lalane Kagal

Basics

Playground

SlidOS, in full SolidOS Databrowser Frontend

Enables you to create, share and collaborate on data stored within Solid Pods

SolidOS github

Legacy

marcsel.solid.openlinksw.com:8445/ (use Chrome)
marcsel.solid.openlinksw.com pod admin (use Chrome)

Basics

Solid (derived from "social linked data") is a proposed set of conventions and tools for building decentralized social applications based on Linked Data principles. It relies as much as possible on existing W3C standards and protocols. These are:

RDF (by default Turtle, otheriwse JSON-LD and RDFa)
WebID 1.0 (Web Identity and Discovery) to provide universal usernames/IDs for Solid apps, and to refer to unique Agents (people, organizations, devices). WebIDs, when accessed, yield WebID Profile documents (in Turtle and other RDF formats).
FOAF vocabulary is used both in WebID profiles, and in specifying Access Control lists.
Authentication (for logins, page personalization and more) is done via the WebID-TLS protocol. WebID-TLS extends WebID Profiles to include references to the subject's public keys in the form of X.509 Certificates, using Cert Ontology 1.0 vocabulary. The authentication sequence is done using the HTTP over TLS protocol.
HTML5 keygen. Unlike normal HTTPS use cases, WebID-TLS is done without referring to Certificate Authority hierarchies, and instead encourages host server-signed (or self-signed) certificates. In Solid, certificate creation is typically done in the browser using the HTML5 keygen element, to provide a one-step creation and certificate publication user experience.
Authorization and access lists are done using Basic Access Control ontology (see also the WebAccessControl wiki page for more details).
Support for WebID-OIDC as another primary authentication mechanism is on its way. It is based on the OAuth2/OpenID Connect protocols, adapted for WebID based decentralized use cases.
The Linked Data Platform (LDP) standard is used for reading and writing generic Linked Data resources through HTTP operations on web resources.

Solid project and specifications

SOLID (original website at MIT)

SOLID World video recordings (Vimeo)
SOLID World video recordings - alternate url (Vimeo)

SOLID specification - on github
The SOLID ecosystem - on github - work in progress

Solid and W3C

W3C Solid community
WebID 1.0 - 2014, by Sambra, Story, Berners-Lee

Inrupt company

Inrupt - commercial startup co-founded by CEO John Bruce and CTO Sir Tim Berners-Lee
Inrupt.net - cloud-hosted instance of the open source software Node Solid Server

Created primarily to provide open source application developers with Pods to test against
Products include:

Inrupt Enterprise Solid Server - A production-grade Solid server produced and supported by Inrupt
Node Solid Server - An open source server created by MIT
Javascript Solid Client Libraries
Javascript Solid React SDK

Solid in Belgium

Solidlab.be - Flanders/Crevits
athumi.be - part of Digitaal Vlaanderen

Synergies

Ontochain - Solid-Verif using verifiable credentials in the Solid ecosystem.

SOLID VC

Other software

Ruben Verborgh Solid pod server
Ruben Verborgh Github
Pieter Heyvaert Solid development

SOLID Community Pod - prototype of community server
SOLID Community Solid Server (CSS) an open and modular implementation of the Solid specifications
SOLID platform (archived) - solid servers (Node Solide Server, NSS) and client libs

SolidFlix - i.e. Netflix in your pod
Dan Barclay’s Cend app - i.e. decentralised send
Tonda Karola’s Inbox application

Digita
Datavillage

Indices content available with semantic metadata

MakoLab - web analytics - using schema.org to improve search results

Sindice

Data quality

AT Lisa Erlinger - Qualle, MeeRKaT - GraphDB, blockchain, ...
Data source ontology

Other stuff

Consulting

BE - Vadis data mining and linking - when trust is unreliable
US - SemanticArts - Michael Ushold
BE - Ontoforce - biomedical
Makolab
RomanticWeb - Relational Object Model for Semantic Web in .net
US - Stardog - supports the RDF graph data model; SPARQL query language; property graph model and Gremlin graph traversal language; OWL 2 and user-defined rules for inference and data analytics; virtual graphs; geospatial query answering; and programmatic interaction via several languages and network interfaces.
AT - Semantic Web Company
FI - Leiki
Semafora - ontology modelling and inferencing (ontobroker - ex Karlsruhe Institute of Technology)

Blockchain and Ontology

Blondie - blockchain ontology MOVE THIS
Hedugaro - linked blockchain data

Tools

General tools

See local info on tools

AI-based or related tools

Gnoss - Semantic Framework - AI assistance

Converting to RDF

Convertors to RDF - W3C

Linked Data browsers

Google Chrome OpenLink Data explorer (extension)
Linked Data by Tim Berners-Lee

Callimachus- building linked data with your browser

Graph note keeping

Obsidian - keep your notes in markdown and create graphs from them
Foambubble - using VSCode and Github - from Roam research
RoamResearch - .deb version available, account required

Using data

Contents

Local files

Data sources

Legislation and goverment data

UK

Belgium

Other

Legal entities

GLEIF

Wikipedia

Wikidata and DBpedia

Wikidata - a free and open knowledge base that can be read and edited by both humans and machines

Wikidata introduction

Data model

Wikidata queries

DBpedia

DBpedia semantics

Wikimedia

Integration and Knowledge Graphs

Linked Data cloud

OpenAlex

FactForge

Other

Applications and reasoning

Logic, Rules and Reasoners

Application development

Jena

Other

Selective applications

Applications - vendors

Applications - music

Applications - trust

SOLID - Tim Berners-Lee and Lalane Kagal

Basics

Playground

Basics

Solid project and specifications

Solid and W3C

Inrupt company

Solid in Belgium

Synergies

Other software

Indices content available with semantic metadata

Data quality

Other stuff

Consulting

Blockchain and Ontology

Tools

General tools

AI-based or related tools

Converting to RDF

Linked Data browsers

Graph note keeping