Spark for Python developers : a concise guide to implementing Spark big data analytics for Python developers and building a real-time and insightful trend tracker data-intensive app /: a concise guide to implementing Spark big data analytics for Python developers and building a real-time and insightful trend tracker data-intensive app. (2015)
- Record Type:
- Book
- Title:
- Spark for Python developers : a concise guide to implementing Spark big data analytics for Python developers and building a real-time and insightful trend tracker data-intensive app /: a concise guide to implementing Spark big data analytics for Python developers and building a real-time and insightful trend tracker data-intensive app. (2015)
- Main Title:
- Spark for Python developers : a concise guide to implementing Spark big data analytics for Python developers and building a real-time and insightful trend tracker data-intensive app
- Further Information:
- Note: Amit Nandi.
- Authors:
- Nandi, Amit
- Contents:
- Cover ; Copyright; Credits; About the Author; Acknowledgment; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Setting Up a Spark Virtual Environment ; Understanding the architecture of data-intensive applications; Infrastructure layer; Persistence layer; Integration layer; Analytics layer; Engagement layer; Understanding Spark; Spark libraries; PySpark in action; Resilient Distributed Dataset; Understanding Anaconda; Setting up the Spark powered environment; Setting up an Oracle VirtualBox with Ubuntu; Installing Anaconda with Python 2.7; Installing Java 8. Installing SparkEnabling IPython Notebook; Building our first app with PySpark; Virtualizing the environment with Vagrant; Moving to the cloud; Deploying apps in Amazon Web Services; Virtualizing the environment with Docker; Summary; Chapter 2: Building Batch and Streaming Apps with Spark ; Architecting data-intensive apps; Processing data at rest; Processing data in motion; Exploring data interactively; Connecting to social networks; Getting Twitter data; Getting GitHub data; Getting Meetup data; Analyzing the data; Discovering the anatomy of tweets; Exploring the GitHub world. Understanding the community through MeetupPreviewing our app; Summary; Chapter 3: Juggling Data with Spark ; Revisiting the data-intensive app architecture; Serializing and deserializing data; Harvesting and storing data; Persisting data in CSV; Persisting data in JSON; Setting up MongoDB; Installing the MongoDBCover ; Copyright; Credits; About the Author; Acknowledgment; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Setting Up a Spark Virtual Environment ; Understanding the architecture of data-intensive applications; Infrastructure layer; Persistence layer; Integration layer; Analytics layer; Engagement layer; Understanding Spark; Spark libraries; PySpark in action; Resilient Distributed Dataset; Understanding Anaconda; Setting up the Spark powered environment; Setting up an Oracle VirtualBox with Ubuntu; Installing Anaconda with Python 2.7; Installing Java 8. Installing SparkEnabling IPython Notebook; Building our first app with PySpark; Virtualizing the environment with Vagrant; Moving to the cloud; Deploying apps in Amazon Web Services; Virtualizing the environment with Docker; Summary; Chapter 2: Building Batch and Streaming Apps with Spark ; Architecting data-intensive apps; Processing data at rest; Processing data in motion; Exploring data interactively; Connecting to social networks; Getting Twitter data; Getting GitHub data; Getting Meetup data; Analyzing the data; Discovering the anatomy of tweets; Exploring the GitHub world. Understanding the community through MeetupPreviewing our app; Summary; Chapter 3: Juggling Data with Spark ; Revisiting the data-intensive app architecture; Serializing and deserializing data; Harvesting and storing data; Persisting data in CSV; Persisting data in JSON; Setting up MongoDB; Installing the MongoDB server and client; Running the MongoDB server; Running the Mongo client; Installing the PyMongo driver; Creating the Python client for MongoDB; Harvesting data from Twitter; Exploring data using Blaze; Transferring data using Odo; Exploring data using Spark SQL. Understanding Spark dataframesUnderstanding the Spark SQL query optimizer; Loading and processing CSV files with Spark SQL; Querying MongoDB from Spark SQL; Summary; Chapter 4: Learning from Data Using Spark ; Contextualizing Spark MLlib in the app architecture; Classifying Spark MLlib algorithms; Supervised and unsupervised learning; Additional learning algorithms; Spark MLlib data types; Machine learning workflows and data flows; Supervised machine learning workflows; Unsupervised machine learning workflows; Clustering the Twitter dataset; Applying SciKit-Learn on the Twitter dataset. Preprocessing the datasetRunning the clustering algorithm; Evaluating the model and the results; Building machine learning pipelines; Summary; Chapter 5: Streaming Live Data with Spark ; Laying the foundations of streaming architecture; Spark streaming inner working; Going under the hood of Spark Streaming; Building in fault tolerance; Processing live data with TCP sockets; Setting up TCP sockets; Processing live data; Manipulating Twitter data in real time; Processing Tweets in real time from the Twitter firehose; Building a reliable and scalable streaming app; Setting up Kafka. … (more)
- Publisher Details:
- Birmingham : Packt Publishing
- Publication Date:
- 2015
- Extent:
- 1 online resource, illustrations
- Subjects:
- 005.133
COMPUTERS -- Enterprise Applications -- Business Intelligence Tools
Python (Computer program language)
SPARK (Computer program language)
COMPUTERS / Programming Languages / Python
COMPUTERS / Programming Languages / General
Python (Computer program language)
SPARK (Computer program language)
COMPUTERS -- Data Processing
Electronic books - Languages:
- English
- ISBNs:
- 9781784397371
1784397377
1784399698
9781784399696 - Related ISBNs:
- 9781784399696
- Notes:
- Note: Online resource; title from PDF title page (EBSCO, viewed May 3, 2016)
- Access Rights:
- Legal Deposit; Only available on premises controlled by the deposit library and to one user at any one time; The Legal Deposit Libraries (Non-Print Works) Regulations (UK).
- Access Usage:
- Restricted: Printing from this resource is governed by The Legal Deposit Libraries (Non-Print Works) Regulations (UK) and UK copyright law currently in force.
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library HMNTS - ELD.DS.88258
- Ingest File:
- 01_078.xml