Using data
Contents
Local files
See also local files (GrayTiger):
Data sources
Legislation and goverment data
UK
Belgium
Other
Legal entities
GLEIF
Established by the Financial Stability Board (FSB), specifies 4 ontologies.
Wikipedia
Wikidata and DBpedia
Wikidata and DBpedia are related, but different, Linked Data projects, both built around Wikipedia.
- Wikidata’s focus is on creating Linked Open (meta)Data to supplement Wikipedia documents
- DBpedia’s focus is on generating Linked Open Data from Wikipedia documents
Both projects provide access to their respective Linked Open Data via SPARQL Query Service endpoints.
Wikidata - a free and open knowledge base that can be read and edited by both humans and machines
Its focus is on creating Linked Open (meta)Data to supplement Wikipedia documents.
Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.
Wikidata introduction
Data model
Wikidata makes use of namespaces, including main, Property, Lexeme, and EntitySchema.
- The Wikidata repository consists mainly of Items, each one having a label, a description and any number of aliases.
- Items are uniquely identified by a Q followed by a number, such as Douglas Adams (Q42).
- Wikidata Glossary
- Statements describe detailed characteristics of an Item and consist of a property and a value.
- Sample items:
- Properties in Wikidata have a P followed by a number, such as with educated at (P69).
Furthermore:
- For a person, you can add a property to specify where they were educated, by specifying a value for a school.
- For buildings, you can assign geographic coordinates properties by specifying longitude and latitude values.
- Properties can also link to external databases. A property that links an item to an external database, such as an authority control database used by libraries and archives, is called an identifier.
- Special Sitelinks connect an item to corresponding content on client wikis, such as Wikipedia, Wikibooks or Wikiquote.
Wikidata queries
DBpedia
Objective: query Wikipedia as you query an SQL database.
The structured version of the Wikipedia encyclopedia.
Evolution: use of DBpedia KG as entry into other KGs.
Wikipedia infobox => RDF triples
- DBpedia Slack login with Google
- DBpedia how to extract structured content from the information created in various Wikimedia projects
- Regular extraction of data from Wikipedia by DBpedia
- Retrieval of data from DBpedia
- Databus - an RDF-based meta data registry
- DBpedia releases are available on the DBpedia Databus. The Databus is a data management and release platform that enables automized data publishing and retrieval.
- Databus Collections facilitate data retrieval by grouping various data pieces under a public identifier.
- The DBpedia releases can be found in the Latest Core Collection.
- Databus client
- DBpedia Spotlight performs named entity extraction, including entity detection and name resolution, can also be used for named entity recognition
- DBpedia SPARQL endpoint
DBpedia semantics
- DBpedia ontology
- Ontology classes
- The DBpedia Ontology is a shallow, cross-domain ontology, which has been manually created based on the most commonly used infoboxes within Wikipedia
- The ontology currently (2018-10) covers 685 classes which form a subsumption hierarchy and are described by 2,795 different properties
- DBpedia 3.2 used an infobox extraction method based on hand-generated mappings of Wikipedia infoboxes to the DBpedia ontology, where the mappings defined rules on how to parse infobox values
- DBpedia 3.5 introduced a public wiki for writing infobox mappings, allowing external contributors to define mappings for the infoboxes they are interested in and to extend the existing DBpedia ontology with additional classes and properties
- DBpedia 3.7 uses a directed-acyclic graph (not a tree) as ontology, so classes may have multiple superclasses, which was important for the mappings to schema.org. A taxonomy can still be constructed by ignoring all superclasses except the one that is specified first in the list and is considered the most important.
- Wikimedia is a global movement to bring free educational content to the world
- Wikidata
Integration and Knowledge Graphs
Linked Data cloud
OpenAlex
- OpenAlex - a fully open catalog of the global research system
- Created by Arcadiafund - UK
- Agregates data sources: MAG, Crossref, ORCID, ROR, DOAJ, Unpaywall, Pubmed, Pubmed Central, The ISSN International Centre, Web crawls, Subject-area and institutional repositories from arXiv to Zenodo and everywhere in between
- Uses its own OpenAlex ID
FactForge
- FactForge
- uses the Financial Industry Business Ontology (FIBO) as an upper-level ontology. Various aspects of the schemata of the different datasets are mapped to the corresponding FIBO classes and relationships. In this way, one can query across different datasets using FIBO. The following two modules of FIBO have been loaded into FactForge:
- Foundations, version 14-11-30 (November 2014);
- Business Entities, version 15-02-23 (February 2015)
- includes more than 1 billion facts from popular datasets such as DBpedia, Geonames, Wordnet, GLEIF, the Panama Papers, etc., as well as ontologies such as the Financial Industry Business Ontology (FIBO)
- DBPedia: only the English version of DBPedia is loaded.
- Geonames: a worldwide geographical database, which “contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places”.
- Wordnet: popular semantic dictionary for English. Words “are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations”. It contains 117.000 synsets.
- WorldFacts: dataset about countries, languages, currencies and other related information. Developed by the DBPedia association and includes information derived from LEXVO, CIA World FactBook and other datasets.
- Linked Leaks: The LOD version of the Panama Papers database released by the International Consortium for Investigative Journalism (ICIJ) in May 2016. Includes “about 200,000 offshore entities that are part of the Panama Papers investigation and about more than 100,000 additional
companies that were part of the 2013 ICIJ Offshore Leaks investigation”.
- GLEI (Global Legal Entity Identifier): profiles of about 211 000 organizations, derived from the GMEI utility data dump from April 2016. “The Global Markets Entity Identifier (GMEI) utility is DTCC’s legal entity identifier solution offered in collaboration with SWIFT. The GMEI utility is a pre-Local Operating Unit of the Global Legal Entity Identifier System (GLEIS)”
- NOW News: article texts and metadata for a stream of general news. The metadata includes annotations that link mentions of entities (e.g., people or organizations) and concepts (e.g., “chocolate” or “recession”) in the news to the corresponding DBPedia and Wikidata concepts.
Other
Applications and reasoning
Logic, Rules and Reasoners
Application development
Jena
Other
Selective applications
Applications - vendors
Applications - music
- Musicbrainz.org - database and tagging service used by a.o. Amarok
- dbtune.org - hosts a number of servers, providing access to music-related structured data, in a Linked Data fashion
- Jamendo.com- a community collection of music all freely licensed under Creative Commons licenses
Applications - trust
Indices content available with semantic metadata
- MakoLab - web analytics - using schema.org to improve search results
- Sindice
Data quality
Other stuff
Consulting
Blockchain and Ontology
- Gnoss - Semantic Framework - AI assistance
Converting to RDF
Linked Data browsers
Graph note keeping
- Obsidian - keep your notes in markdown and create graphs from them
- Foambubble - using VSCode and Github - from Roam research
- RoamResearch - .deb version available, account required