Big data analytics with Spark : a practitioner's guide to using Spark for large-scale data processing, machine learning, and graph analytics, and high-velocity data stream processing /: a practitioner's guide to using Spark for large-scale data processing, machine learning, and graph analytics, and high-velocity data stream processing. (2015)
- Record Type:
- Book
- Title:
- Big data analytics with Spark : a practitioner's guide to using Spark for large-scale data processing, machine learning, and graph analytics, and high-velocity data stream processing /: a practitioner's guide to using Spark for large-scale data processing, machine learning, and graph analytics, and high-velocity data stream processing. (2015)
- Main Title:
- Big data analytics with Spark : a practitioner's guide to using Spark for large-scale data processing, machine learning, and graph analytics, and high-velocity data stream processing
- Further Information:
- Note: Mohammed Guller.
- Authors:
- Guller, Mohammed
- Contents:
- At a Glance; Contents; About the Author; About the Technical Reviewers; Acknowledgments; Introduction; Chapter 1: Big Data Technology Landscape; Hadoop; HDFS (Hadoop Distributed File System); MapReduce; Hive; Data Serialization; Avro; Thrift; Protocol Buffers; SequenceFile; Columnar Storage; RCFile; ORC; Parquet; Messaging Systems; Kafka; ZeroMQ; NoSQL; Cassandra; HBase; Distributed SQL Query Engine; Impala; Presto; Apache Drill; Summary; Chapter 2: Programming in Scala; Functional Programming (FP); Functions; First-Class; Composable; No Side Effects; Simple. Immutable Data Structures Everything Is an Expression; Scala Fundamentals; Getting Started; Basic Types; Variables; Functions; Methods; Local Functions; Higher-Order Methods; Function Literals; Closures; Classes; Singletons; Case Classes; Pattern Matching; Operators; Traits; Tuples; Option Type; Collections; Sequences; Array; List; Vector; Sets; Map; Higher-Order Methods on Collection Classes; map; flatMap; filter; foreach; reduce; A Standalone Scala Application; Summary; Chapter 3: Spark Core; Overview; Key Features; Easy to Use; Fast; General Purpose; Scalable. Fault Tolerant Ideal Applications; Iterative Algorithms; Interactive Analysis; High-level Architecture; Workers; Cluster Managers; Driver Programs; Executors; Tasks; Application Execution; Terminology; How an Application Works; Data Sources; Application Programming Interface (API); SparkContext; Resilient Distributed Datasets (RDD); Immutable; Partitioned;At a Glance; Contents; About the Author; About the Technical Reviewers; Acknowledgments; Introduction; Chapter 1: Big Data Technology Landscape; Hadoop; HDFS (Hadoop Distributed File System); MapReduce; Hive; Data Serialization; Avro; Thrift; Protocol Buffers; SequenceFile; Columnar Storage; RCFile; ORC; Parquet; Messaging Systems; Kafka; ZeroMQ; NoSQL; Cassandra; HBase; Distributed SQL Query Engine; Impala; Presto; Apache Drill; Summary; Chapter 2: Programming in Scala; Functional Programming (FP); Functions; First-Class; Composable; No Side Effects; Simple. Immutable Data Structures Everything Is an Expression; Scala Fundamentals; Getting Started; Basic Types; Variables; Functions; Methods; Local Functions; Higher-Order Methods; Function Literals; Closures; Classes; Singletons; Case Classes; Pattern Matching; Operators; Traits; Tuples; Option Type; Collections; Sequences; Array; List; Vector; Sets; Map; Higher-Order Methods on Collection Classes; map; flatMap; filter; foreach; reduce; A Standalone Scala Application; Summary; Chapter 3: Spark Core; Overview; Key Features; Easy to Use; Fast; General Purpose; Scalable. Fault Tolerant Ideal Applications; Iterative Algorithms; Interactive Analysis; High-level Architecture; Workers; Cluster Managers; Driver Programs; Executors; Tasks; Application Execution; Terminology; How an Application Works; Data Sources; Application Programming Interface (API); SparkContext; Resilient Distributed Datasets (RDD); Immutable; Partitioned; Fault Tolerant; Interface; Strongly Typed; In Memory; Creating an RDD; parallelize; textFile; wholeTextFiles; sequenceFile; RDD Operations; Transformations; map; filter; flatMap; mapPartitions; union; intersection; subtract. Distinctcartesian; zip; zipWithIndex; groupBy; keyBy; sortBy; pipe; randomSplit; coalesce; repartition; sample; Transformations on RDD of key-value Pairs; keys; values; mapValues; join; leftOuterJoin; rightOuterJoin; fullOuterJoin; sampleByKey; subtractByKey; groupByKey; reduceByKey; Actions; collect; count; countByValue; first; max; min; take; takeOrdered; top; fold; reduce; Actions on RDD of key-value Pairs; countByKey; lookup; Actions on RDD of Numeric Types; mean; stdev; sum; variance; Saving an RDD; saveAsTextFile; saveAsObjectFile; saveAsSequenceFile; Lazy Operations. Action Triggers Computation Caching; RDD Caching Methods; cache; persist; RDD Caching Is Fault Tolerant; Cache Memory Management; Spark Jobs; Shared Variables; Broadcast Variables; Accumulators; Summary; Chapter 4: Interactive Data Analysis with Spark Shell; Getting Started; Download; Extract; Run ; REPL Command s; Using the Spark Shell as a Scala Shell ; Number Analysis ; Log Analysis; Summary; Chapter 5: Writing a Spark Application; Hello World in Spark; Compiling and Running the Application; sbt (Simple Build Tool); Build Definition File; Directory Structure. … (more)
- Publisher Details:
- Berkeley, CA : Apress
- Publication Date:
- 2015
- Copyright Date:
- 2015
- Extent:
- 1 online resource (xxiii, 277 pages), illustrations
- Subjects:
- 005.7
Computer science
Big data
Data mining
COMPUTERS -- Database Management -- Data Mining
COMPUTERS -- Machine Theory
MATHEMATICS -- General
Big data
Data mining
Computers -- Data Processing
Public administration
Information systems
Computers -- Database Management -- General
Databases
Electronic books - Languages:
- English
- ISBNs:
- 9781484209646
1484209648 - Related ISBNs:
- 9781484209653
1484209656 - Notes:
- Note: Includes bibliographical references and index.
Note: Online resource; title from PDF title page (SpringerLink, viewed January 8, 2016). - Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.359721
- Ingest File:
- 01_322.xml