Post

Azure - Data

Introduction

  • In Azure, “data” refers to any information that can be stored, processed, or analyzed within the various Azure services.
  • This encompasses a wide range of types and formats, which can include structured, semi-structured, and unstructured data.

Types of Data

  1. Structured Data:
    • Definition: Data that adheres to a pre-defined data model or schema.
    • Examples: Databases, tables, and spreadsheets.
    • Formats: CSV, SQL, Excel, JSON (in a structured format).
  2. Semi-Structured Data:
    • Definition: Data that does not conform to a rigid structure but contains tags or markers to separate elements.
    • Examples: Logs, JSON, XML.
    • Formats: JSON, XML, YAML.
  3. Unstructured Data:
    • Definition: Data that does not have a predefined data model or schema.
    • Examples: Text documents, images, videos, audio files.
    • Formats: Text files (TXT), PDFs, JPEG, PNG, MP4, MP3.

Common Data Formats in Azure

  1. CSV (Comma-Separated Values):
    • Usage: Commonly used for tabular data and data exchange.
    • Services: Azure Blob Storage, Azure Data Lake Storage, Azure SQL Database.
  2. JSON (JavaScript Object Notation):
    • Usage: Often used for configuration files, data interchange between servers and web applications, and semi-structured data.
    • Services: Azure Cosmos DB, Azure Blob Storage, Azure Data Lake Storage, Azure Functions.
  3. XML (Extensible Markup Language):
    • Usage: Used for documents that contain structured information.
    • Services: Azure SQL Database, Azure Blob Storage, Azure Data Lake Storage.
  4. Parquet:
    • Usage: Columnar storage file format optimized for analytical queries.
    • Services: Azure Data Lake Storage, Azure Synapse Analytics.
  5. Avro:
    • Usage: Row-based storage format suitable for serializing data in Hadoop.
    • Services: Azure Data Lake Storage, Azure Synapse Analytics.
  6. ORC (Optimized Row Columnar):
    • Usage: Columnar storage file format optimized for Hadoop workloads.
    • Services: Azure Data Lake Storage, Azure Synapse Analytics.
  7. Text Files:
    • Usage: Simple storage of unstructured data.
    • Services: Azure Blob Storage, Azure Data Lake Storage.
  8. Binary Files:
    • Usage: Includes any non-text data such as images, videos, and executable files.
    • Services: Azure Blob Storage, Azure Data Lake Storage.
  9. SQL:
    • Usage: Structured data in relational databases.
    • Services: Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics (dedicated SQL pool).

Azure Services for Data Storage and Processing

  1. Azure Blob Storage:
    • Description: A scalable object storage service for storing large amounts of unstructured data such as text or binary data.
    • Use Cases: Suitable for storing images, videos, backups, and logs.
  2. Azure Data Lake Storage:
    • Description: An optimized storage service for big data analytics, integrating with tools like Azure HDInsight, Azure Databricks, and Azure Synapse Analytics.
    • Use Cases: Ideal for storing and analyzing large datasets in formats like Parquet, Avro, and ORC.
  3. Azure SQL Database:
    • Description: A fully managed relational database service with built-in high availability, scalability, and security features.
    • Use Cases: Best for transactional applications, data warehousing, and reporting solutions.
  4. Azure Cosmos DB:
    • Description: A globally distributed, multi-model database service designed for high availability, low latency, and scalability.
    • Use Cases: Suitable for mission-critical applications requiring quick response times and high availability across different geographic regions.
  5. Azure Synapse Analytics:
    • Description: An integrated analytics service that combines big data and data warehousing capabilities, providing a unified experience for data ingestion, preparation, management, and serving.
    • Use Cases: Excellent for complex analytical queries, data integration, and large-scale data processing.
  6. Azure Table Storage:
    • Description: A NoSQL key-value store for semi-structured data, offering fast access and scalability.
    • Use Cases: Ideal for applications requiring flexible data schemas, such as web apps, address books, and user metadata storage.
  7. Azure Files:
    • Description: A managed file share service that uses the SMB protocol, allowing you to mount Azure Files shares concurrently from cloud or on-premises deployments.
    • Use Cases: Useful for migrating on-premises applications reliant on file shares, and for distributed file systems.
This post is licensed under CC BY 4.0 by the author.