Data Flow of AI

Data Flow is a machine learning pattern representing the data movement sequence in the AI engineering life cycle.

First, Data is processed layer by layer, as shown in Fig.1, to prepare it for storage, training, etc.

Then, data passes through processing layers as it is stored, refined, and prepared for use in Machine Learning models and applications. In a more functional perspective, the data is then used by different machine learning function groups, as shown below:

A detail for each layer in the above chart is as follows:

Sources

Data sources include:

  • Company Internal Databases

  • Company Internal Files

  • Websites

  • Public Data

  • Smartphone Apps

  • IoT Devices

  • Commercial Data Aggregators

  • Point of Sale

  • Corporate Internal Processes

  • Social Media

  • Data Streams

Capture

Capture mechanisms include:

  • Website Scraping

  • Website and Smartphone Chat Dialogues

  • Website and Smartphone Form Submissions

  • IoT Device Interfaces

  • Commercial Data Aggregator Feeds

  • Corporate Internal Process Feeds

Pipeline

Pipeline processes include:

  • Data Ingestion

  • Data Temporary Storage

  • Data Subscription

  • Data Publication

Databases

Databases include:

  • Data Lakes

  • Sequel Databases

  • Document Databases

  • Graph Databases

ETLs

ETLs Include:

  • Extract Functions: pulling data from selected sources

  • Transform Functions: normalization, regularization, aggregation

  • Load Functions: saving data in formats for use in modeling processes

Models

Model-type category examples include:

  • Artificial Neural Networks

  • Decision Trees

  • Probabilistic Graphical Models

  • Cluster Analysis

  • Gaussian Processes

  • Regression Analysis

Applications

Application examples include:

  • Medical Diagnosis

  • Autonomous Vehicles

  • Chatbot Dialog

  • Image Recognition

  • Face Recognition

  • Product Recommendations

  • Churn Prediction

  • Malware Detection

  • Search Refinement