Learning Apache Spark 2. (2017)
- Record Type:
- Book
- Title:
- Learning Apache Spark 2. (2017)
- Main Title:
- Learning Apache Spark 2
- Other Names:
- Abbasi, Muhammad Asif
- Contents:
- Cover; Copyright; Credits; About the Author; About the Reviewers; www.packtpub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: Architecture and Installation; Apache Spark architecture overview; Spark-core; Spark SQL; Spark streaming; MLlib; GraphX; Spark deployment; Installing Apache Spark; Writing your first Spark program; Scala shell examples; Python shell examples; Spark architecture; High level overview; Driver program; Cluster Manager; Worker; Executors; Tasks; SparkContext; Spark Session; Apache Spark cluster manager types Building standalone applications with Apache SparkSubmitting applications; Deployment strategies; Running Spark examples; Building your own programs; Brain teasers; References; Summary; Chapter 2: Transformations and Actions with Spark RDDs; What is an RDD?; Constructing RDDs; Parallelizing existing collections; Referencing external data source; Operations on RDD; Transformations; Actions; Passing functions to Spark (Scala); Anonymous functions; Static singleton functions; Passing functions to Spark (Java); Passing functions to Spark (Python); Transformations; Map(func); Filter(func) FlatMap(func)Sample (withReplacement, fraction, seed); Set operations in Spark; Distinct(); Intersection(); Union(); Subtract(); Cartesian(); Actions; Reduce(func); Collect(); Count(); Take(n); First(); SaveAsXXFile(); foreach(func); PairRDDs; Creating PairRDDs; PairRDD transformations; reduceByKey(func); GroupByKey(func); reduceByKey vs. groupByKey --Cover; Copyright; Credits; About the Author; About the Reviewers; www.packtpub.com; Customer Feedback; Table of Contents; Preface; Chapter 1: Architecture and Installation; Apache Spark architecture overview; Spark-core; Spark SQL; Spark streaming; MLlib; GraphX; Spark deployment; Installing Apache Spark; Writing your first Spark program; Scala shell examples; Python shell examples; Spark architecture; High level overview; Driver program; Cluster Manager; Worker; Executors; Tasks; SparkContext; Spark Session; Apache Spark cluster manager types Building standalone applications with Apache SparkSubmitting applications; Deployment strategies; Running Spark examples; Building your own programs; Brain teasers; References; Summary; Chapter 2: Transformations and Actions with Spark RDDs; What is an RDD?; Constructing RDDs; Parallelizing existing collections; Referencing external data source; Operations on RDD; Transformations; Actions; Passing functions to Spark (Scala); Anonymous functions; Static singleton functions; Passing functions to Spark (Java); Passing functions to Spark (Python); Transformations; Map(func); Filter(func) FlatMap(func)Sample (withReplacement, fraction, seed); Set operations in Spark; Distinct(); Intersection(); Union(); Subtract(); Cartesian(); Actions; Reduce(func); Collect(); Count(); Take(n); First(); SaveAsXXFile(); foreach(func); PairRDDs; Creating PairRDDs; PairRDD transformations; reduceByKey(func); GroupByKey(func); reduceByKey vs. groupByKey -- Performance Implications; CombineByKey(func); Transformations on two PairRDDs; Actions available on PairRDDs; Shared variables; Broadcast variables; Accumulators; References; Summary; Chapter 3: ETL with Spark; What is ETL?; Exaction; Loading TransformationHow is Spark being used?; Commonly Supported File Formats; Text Files; CSV and TSV Files; Writing CSV files; Tab Separated Files; JSON files; Sequence files; Object files; Commonly supported file systems; Working with HDFS; Working with Amazon S3; Structured Data sources and Databases; Working with NoSQL Databases; Working with Cassandra; Obtaining a Cassandra table as an RDD; Saving data to Cassandra; Working with HBase; Bulk Delete example; Map Partition Example; Working with MongoDB; Connection to MongoDB; Writing to MongoDB; Loading data from MongoDB Working with Apache SolrImporting the JAR File via Spark-shell; Connecting to Solr via DataFrame API; Connecting to Solr via RDD; References; Summary; Chapter 4: Spark SQL; What is Spark SQL?; What is DataFrame API?; What is DataSet API?; What's new in Spark 2.0?; Under the hood -- catalyst optimizer; Solution 1; Solution 2; The Sparksession; Creating a SparkSession; Creating a DataFrame; Manipulating a DataFrame; Scala DataFrame manipulation -- examples; Python DataFrame manipulation -- examples; R DataFrame manipulation -- examples; Java DataFrame manipulation -- examples … (more)
- Publisher Details:
- Place of publication not identified : Packt Publishing
- Publication Date:
- 2017
- Extent:
- 1 online resource ()
- Subjects:
- 006.3
COMPUTERS -- Data Processing
COMPUTERS -- Databases -- Data Mining
COMPUTERS -- Desktop Applications -- Databases
Electronic books
Electronic books - Languages:
- English
- ISBNs:
- 1785889583
9781785889585 - Related ISBNs:
- 1785885138
- Notes:
- Note: Description based on print version record.
- Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.134130
- Ingest File:
- 01_001.xml