Hands-on big data analytics with PySpark : analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs /: analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs. (2019)

Record Type:: Book
Title:: Hands-on big data analytics with PySpark : analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs /: analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs. (2019)
Main Title:: Hands-on big data analytics with PySpark : analyze large datasets and discover techniques for testing, immunizing, and parallelizing Spark jobs
Further Information:: Note: Rudy Lai, Bartłomiej Potaczek.
Authors:: Lai, Rudy
Potaczek, Bartłomiej
Contents:: Cover; Title Page; Copyright and Credits; About Packt; Contributors; Table of Contents; Preface; Chapter 1: Pyspark and Setting up Your Development Environment; An overview of PySpark; Spark SQL; Setting up Spark on Windows and PySpark; Core concepts in Spark and PySpark; SparkContext; Spark shell; SparkConf; Summary; Chapter 2: Getting Your Big Data into the Spark Environment Using RDDs; Loading data on to Spark RDDs; The UCI machine learning repository; Getting the data from the repository to Spark; Getting data into Spark; Parallelization with Spark RDDs; What is parallelization? Basics of RDD operationSummary; Chapter 3: Big Data Cleaning and Wrangling with Spark Notebooks; Using Spark Notebooks for quick iteration of ideas; Sampling/filtering RDDs to pick out relevant data points; Splitting datasets and creating some new combinations; Summary; Chapter 4: Aggregating and Summarizing Data into Useful Reports; Calculating averages with map and reduce; Faster average computations with aggregate; Pivot tabling with key-value paired data points; Summary; Chapter 5: Powerful Exploratory Data Analysis with MLlib; Computing summary statistics with MLlib Using Pearson and Spearman correlations to discover correlationsThe Pearson correlation; The Spearman correlation; Computing Pearson and Spearman correlations; Testing our hypotheses on large datasets; Summary; Chapter 6: Putting Structure on Your Big Data with SparkSQL; Manipulating DataFrames with Spark SQL schemas; Using Spark … (more)
Publisher Details:: Birmingham, UK : Packt Publishing
Publication Date:: 2019
Extent:: 1 online resource, illustrations
Subjects:: 004.2
SPARK (Computer program language)
Application software -- Development
Big data
Electronic data processing
Python (Computer program language)
Electronic books
Languages:: English
ISBNs:: 9781838648831
1838648836
Related ISBNs:: 9781838644130
Notes:: Note: Description based on online resource; title from title page (Safari, viewed May 9, 2019).
Access Rights:: Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
Access Usage:: Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library HMNTS - ELD.DS.410143
Ingest File:: 02_509.xml