Data Science, AI, Machine Learning
Data science
Data sets
Google
Analysis
- Timi.eu - Frank Vanden Berghen (ULB) - Belgian data science and mining (Anatella, R, Python, ...) - telecom, banking
- Endor - predictive analytics
- Caseware - IDEA
IBM
Amazon
- Amazon- machine learning - SageMaker, DeepLens
SAP
Microsoft
Apache
- Apache Spark- a unified analytics engine for large-scale data processing, including map/reduce and machine learning
- Apache Lucene- search and indexing
- Lucene Core provides Java-based indexing and search technology, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities
- Solr is a high performance search server built using Lucene Core, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface
- PyLucene is a Python port of the Core project
- Apache Hadoop - framework for the distributed processing of large data sets across clusters of computers, basis for Cassandra, Spark and many others
- Hadoop Common: common utilities that support the other Hadoop modules
- Hadoop Distributed File System (HDFS): provides high-throughput access to application data
- Hadoop YARN: A framework for job scheduling and cluster resource management
- Hadoop MapReduce: A YARN-based system for parallel processing of large data sets
Other
Imageprocessing
Natural Language Processing (NLP)
- Sheffield NLP group
- Sheffield GATE- General Architecture for Text Engineering - Open Source - tutorials etc
- Stanford NLP- Open Source
- NLTK- Open Source
- OpenNLP Apache- Open Source
- GATE- a full-lifecycle open source solution for text processing
- UIMA Apache- Unstructured Information Management - Open Source
- WordNet- Princeton's lexical database, 117 000 synsets
- FrameNet- Berkeley lexical database of English, human- and machine-readable, based on annotating examples of how words are used in actual texts
PwC
Making AI explainable
- SHAP - 'SHapley Additive exPlanations' is a unified approach to explain the output of any machine learning model
- DALEX - Descriptive mAchine Learning EXplanations
- LIME - Local Interpretable Model-agnostic Explanations, Marco Tulio Ribeiro, Department of Computer Science and Engineering, University of Washington
- Bulletproof.AI - Martin Rehak