PySpark SQL Recipes : with HiveQL, Dataframe and Graphframes /: with HiveQL, Dataframe and Graphframes. (2019)
- Record Type:
- Book
- Title:
- PySpark SQL Recipes : with HiveQL, Dataframe and Graphframes /: with HiveQL, Dataframe and Graphframes. (2019)
- Main Title:
- PySpark SQL Recipes : with HiveQL, Dataframe and Graphframes
- Further Information:
- Note: Raju Kumar Mishra and Sundar Rajan Raman.
- Authors:
- Mishra, Raju Kumar
- Other Names:
- Raman, Sundar Rajan
- Contents:
- Chapter 1: Introduction to PySparkSQL Chapter Goal: Reader will understand about PySpark, PySparkSQL, Catalyst Optimizer, Project Tungsten and Hive No of pages 20-30 Sub -Topics 1. PySpark 2. PySparkSQL 3. Hive 4. Catalyst 5. Project Tungsten Chapter 2: Some time with Installation Chapter Goal: Learner will understand about installation of Spark, Hive, PostgreSQL, MySQL, MongoDB, Cassandra etc. No of pages: 30 -40 Sub - Topics 1. Installation Spark 2. Installation Hive 3. Installation MySQL 4. Installation MongoDB Chapter 3: IO in PySparkSQL Chapter Goal: This chapter will provide recipes to the reader, which will enable them to create PySparkSQL DataFrame from different sources. No of pages : 40-50 Sub - Topics: 1. Creating DataFrame from data. 2. Reading csv file to create Dataframe 3. Reading JSON file to create Dataframe. 4. Saving DataFrames to different formats. Chapter 4 : Operations on PySparkSQL DataFrames Chapter Goal: Reader will learn about data filtering, data manuipulation, data descriptive analysis, Dealing with missing value etc No Of Pages ; 40 -50 1. Data filtering 2. Data manipulation 3. Row and column manipulation Chapter 5 : Data Merging and Data Aggregation using PySparkSQL Chapter Goal: Reader will learn about data merging and aggregation using PySparkSQL 1. Data Merging 2. Data aggregation Chapter 6: SQL, NoSQL and PySparkSQL Chapter Goal: Reader will learn to run SQL and HiveQL queries on Dataframe No of pages: 30-40 Sub - Topics: 1. Running SQL onChapter 1: Introduction to PySparkSQL Chapter Goal: Reader will understand about PySpark, PySparkSQL, Catalyst Optimizer, Project Tungsten and Hive No of pages 20-30 Sub -Topics 1. PySpark 2. PySparkSQL 3. Hive 4. Catalyst 5. Project Tungsten Chapter 2: Some time with Installation Chapter Goal: Learner will understand about installation of Spark, Hive, PostgreSQL, MySQL, MongoDB, Cassandra etc. No of pages: 30 -40 Sub - Topics 1. Installation Spark 2. Installation Hive 3. Installation MySQL 4. Installation MongoDB Chapter 3: IO in PySparkSQL Chapter Goal: This chapter will provide recipes to the reader, which will enable them to create PySparkSQL DataFrame from different sources. No of pages : 40-50 Sub - Topics: 1. Creating DataFrame from data. 2. Reading csv file to create Dataframe 3. Reading JSON file to create Dataframe. 4. Saving DataFrames to different formats. Chapter 4 : Operations on PySparkSQL DataFrames Chapter Goal: Reader will learn about data filtering, data manuipulation, data descriptive analysis, Dealing with missing value etc No Of Pages ; 40 -50 1. Data filtering 2. Data manipulation 3. Row and column manipulation Chapter 5 : Data Merging and Data Aggregation using PySparkSQL Chapter Goal: Reader will learn about data merging and aggregation using PySparkSQL 1. Data Merging 2. Data aggregation Chapter 6: SQL, NoSQL and PySparkSQL Chapter Goal: Reader will learn to run SQL and HiveQL queries on Dataframe No of pages: 30-40 Sub - Topics: 1. Running SQL on DataFrame 2. Running HiveQL Chapter 7: Structured Streaming Chapter Goal: Reader will understand about structured streaming No of pages : 30-40 1. Different type of modes. 2. Data aggregation in structured streaming 3. Different type of sources Chapter 8 : Optimizing PySparkSQL Chapter Goal: Reader will learn about optimizing PySparkSQL No Of pages : 20-30 Optimizing PySparkSQL Chapter 9 : GraphFrames Chapter Goal: Reader will understand about graph data analysis with Graphframes. No of pages : 30-40 1. GraphFrame Creation 1. Page Rank 2. Breadth First Search. … (more)
- Publisher Details:
- Place of publication not identified : Springer Nature Apress
- Publication Date:
- 2019
- Extent:
- 1 online resource
- Subjects:
- 005.13/3
COMPUTERS / General
Python (Computer program language)
SPARK (Computer program language)
Big data
Electronic books - Languages:
- English
- ISBNs:
- 9781484243350
1484243358 - Related ISBNs:
- 9781484243343
- Notes:
- Note: Online resource; title from PDF title page (EBSCO, viewed March 26, 2019).
- Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.400644
- Ingest File:
- 02_439.xml