MODERN DATA ARCHITECTURES WITH PYTHON a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python /: a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python. (2023)
- Record Type:
- Book
- Title:
- MODERN DATA ARCHITECTURES WITH PYTHON a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python /: a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python. (2023)
- Main Title:
- MODERN DATA ARCHITECTURES WITH PYTHON a practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python
- Further Information:
- Note: Brian Lipp.
- Authors:
- Lipp, Brian
- Contents:
- Cover -- Title Page -- Copyright and Credits -- Dedications -- Contributors -- Table of Contents -- Preface -- Part 1: Fundamental Data Knowledge -- Chapter 1: Modern Data Processing Architecture -- Technical requirements -- Databases, data warehouses, and data lakes -- OLTP -- OLAP -- Data lakes -- Event stores -- File formats -- Data platform architecture at a high level -- Comparing the Lambda and Kappa architectures -- Lambda architecture -- Kappa architecture -- Lakehouse and Delta architectures -- Lakehouses -- The seven central tenets The medallion data pattern and the Delta architecture -- Data mesh theory and practice -- Defining terms -- The four principles of data mesh -- Summary -- Practical lab -- Solution -- Chapter 2: Understanding Data Analytics -- Technical requirements -- Setting up your environment -- Python -- venv -- Graphviz -- Workflow initialization -- Cleaning and preparing your data -- Duplicate values -- Working with nulls -- Using RegEx -- Outlier identification -- Casting columns -- Fixing column names -- Complex data types -- Data documentation -- diagrams -- Data lineage graphs -- Data modeling patterns Relational -- Dimensional modeling -- Key terms -- OBT -- Practical lab -- Loading the problem data -- Solution -- Summary -- Part 2: Data Engineering Toolset -- Chapter 3: Apache Spark Deep Dive -- Technical requirements -- Setting up your environment -- Python, AWS, and Databricks -- Databricks CLI -- Cloud data storage -- Object storage --Cover -- Title Page -- Copyright and Credits -- Dedications -- Contributors -- Table of Contents -- Preface -- Part 1: Fundamental Data Knowledge -- Chapter 1: Modern Data Processing Architecture -- Technical requirements -- Databases, data warehouses, and data lakes -- OLTP -- OLAP -- Data lakes -- Event stores -- File formats -- Data platform architecture at a high level -- Comparing the Lambda and Kappa architectures -- Lambda architecture -- Kappa architecture -- Lakehouse and Delta architectures -- Lakehouses -- The seven central tenets The medallion data pattern and the Delta architecture -- Data mesh theory and practice -- Defining terms -- The four principles of data mesh -- Summary -- Practical lab -- Solution -- Chapter 2: Understanding Data Analytics -- Technical requirements -- Setting up your environment -- Python -- venv -- Graphviz -- Workflow initialization -- Cleaning and preparing your data -- Duplicate values -- Working with nulls -- Using RegEx -- Outlier identification -- Casting columns -- Fixing column names -- Complex data types -- Data documentation -- diagrams -- Data lineage graphs -- Data modeling patterns Relational -- Dimensional modeling -- Key terms -- OBT -- Practical lab -- Loading the problem data -- Solution -- Summary -- Part 2: Data Engineering Toolset -- Chapter 3: Apache Spark Deep Dive -- Technical requirements -- Setting up your environment -- Python, AWS, and Databricks -- Databricks CLI -- Cloud data storage -- Object storage -- Relational -- NoSQL -- Spark architecture -- Introduction to Apache Spark -- Key components -- Working with partitions -- Shuffling partitions -- Caching -- Broadcasting -- Job creation pipeline -- Delta Lake -- Transaction log Grouping tables with databases -- Table -- Adding speed with Z-ordering -- Bloom filters -- Practical lab -- Problem 1 -- Problem 2 -- Problem 3 -- Solution -- Summary -- Chapter 4: Batch and Stream Data Processing Using PySpark -- Technical requirements -- Setting up your environment -- Python, AWS, and Databricks -- Databricks CLI -- Batch processing -- Partitioning -- Data skew -- Reading data -- Spark schemas -- Making decisions -- Removing unwanted columns -- Working with data in groups -- The UDF -- Stream processing -- Reading from disk -- Debugging -- Writing to disk Batch stream hybrid -- Delta streaming -- Batch processing in a stream -- Practical lab -- Setup -- Creating fake data -- Problem 1 -- Problem 2 -- Problem 3 -- Solution -- Solution 1 -- Solution 2 -- Solution 3 -- Summary -- Chapter 5: Streaming Data with Kafka -- Technical requirements -- Setting up your environment -- Python, AWS, and Databricks -- Databricks CLI -- Confluent Kafka -- Signing up -- Kafka architecture -- Topics -- Partitions -- Brokers -- Producers -- Consumers -- Schema Registry -- Kafka Connect -- Spark and Kafka -- Practical lab -- Solution -- Summary … (more)
- Publisher Details:
- Birmingham, UK : Packt Publishing Ltd
- Publication Date:
- 2023
- Extent:
- 1 online resource
- Subjects:
- 005.13/3
Python (Computer program language)
Data structures (Computer science)
Big data - Languages:
- English
- ISBNs:
- 9781801076418
1801076413 - Related ISBNs:
- 1801070490
9781801070492 - Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.806526
- Ingest File:
- 21_018.xml