HOME > Detail View

Detail View

Introduction to data science [electronic resource] : a Python approach to concepts, techniques and applications

Introduction to data science [electronic resource] : a Python approach to concepts, techniques and applications

Material type
E-Book(소장)
Personal Author
Igual, Laura. Seguí, Santi.
Title Statement
Introduction to data science [electronic resource] : a Python approach to concepts, techniques and applications / Laura Igual, Santi Seguí.
Publication, Distribution, etc
Cham :   Springer,   c2017.  
Physical Medium
1 online resource (xiv, 218 p.) : ill.
Series Statement
Undergraduate Topics in Computer Science,1863-7310
ISBN
9783319500164 9783319500171 (e-book)
요약
This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the emerging and interdisciplinary field of data science. The coverage spans key concepts adopted from statistics and machine learning, useful techniques for graph analysis and parallel programming, and the practical application of data science for such tasks as building recommender systems or performing sentiment analysis. Topics and features: Provides numerous practical case studies using real-world data throughout the book Supports understanding through hands-on experience of solving data science problems using Python Describes techniques and tools for statistical analysis, machine learning, graph analysis, and parallel programming Reviews a range of applications of data science, including recommender systems and sentiment analysis of text data Provides supplementary code resources and data at an associated website This practically-focused textbook provides an ideal introduction to the field for upper-tier undergraduate and beginning graduate students from computer science, mathematics, statistics, and other technical disciplines. The work is also eminently suitable for professionals on continuous education short courses, and to researchers following self-study courses. Dr. Laura Igual is an Associate Professor at the Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Spain. Dr. Santi Seguí is an Assistant Professor at the same institution.
General Note
Title from e-Book title page.  
Content Notes
Introduction to Data Science -- Toolboxes for Data Scientists -- Descriptive statistics -- Statistical Inference -- Supervised Learning -- Regression Analysis -- Unsupervised Learning -- Network Analysis -- Recommender Systems -- Statistical Natural Language Processing for Sentiment Analysis -- Parallel Computing.
Bibliography, Etc. Note
Includes bibliographical references and index.
이용가능한 다른형태자료
Issued also as a book.  
Subject Added Entry-Topical Term
Quantitative research. Python (Computer program language).
Short cut
URL
000 00000cam u2200205 a 4500
001 000045992210
005 20190805142708
006 m d
007 cr
008 190726s2017 sz a ob 001 0 eng d
020 ▼a 9783319500164
020 ▼a 9783319500171 (e-book)
040 ▼a 211009 ▼c 211009 ▼d 211009
050 4 ▼a QA76.9.D343
082 0 4 ▼a 001.42 ▼2 23
084 ▼a 001.42 ▼2 DDCK
090 ▼a 001.42
100 1 ▼a Igual, Laura.
245 1 0 ▼a Introduction to data science ▼h [electronic resource] : ▼b a Python approach to concepts, techniques and applications / ▼c Laura Igual, Santi Seguí.
260 ▼a Cham : ▼b Springer, ▼c c2017.
300 ▼a 1 online resource (xiv, 218 p.) : ▼b ill.
490 1 ▼a Undergraduate Topics in Computer Science, ▼x 1863-7310
500 ▼a Title from e-Book title page.
504 ▼a Includes bibliographical references and index.
505 0 ▼a Introduction to Data Science -- Toolboxes for Data Scientists -- Descriptive statistics -- Statistical Inference -- Supervised Learning -- Regression Analysis -- Unsupervised Learning -- Network Analysis -- Recommender Systems -- Statistical Natural Language Processing for Sentiment Analysis -- Parallel Computing.
520 ▼a This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the emerging and interdisciplinary field of data science. The coverage spans key concepts adopted from statistics and machine learning, useful techniques for graph analysis and parallel programming, and the practical application of data science for such tasks as building recommender systems or performing sentiment analysis. Topics and features: Provides numerous practical case studies using real-world data throughout the book Supports understanding through hands-on experience of solving data science problems using Python Describes techniques and tools for statistical analysis, machine learning, graph analysis, and parallel programming Reviews a range of applications of data science, including recommender systems and sentiment analysis of text data Provides supplementary code resources and data at an associated website This practically-focused textbook provides an ideal introduction to the field for upper-tier undergraduate and beginning graduate students from computer science, mathematics, statistics, and other technical disciplines. The work is also eminently suitable for professionals on continuous education short courses, and to researchers following self-study courses. Dr. Laura Igual is an Associate Professor at the Departament de Matemàtiques i Informàtica, Universitat de Barcelona, Spain. Dr. Santi Seguí is an Assistant Professor at the same institution.
530 ▼a Issued also as a book.
538 ▼a Mode of access: World Wide Web.
650 0 ▼a Quantitative research.
650 0 ▼a Python (Computer program language).
700 1 ▼a Seguí, Santi.
830 0 ▼a Undergraduate Topics in Computer Science.
856 4 0 ▼u https://oca.korea.ac.kr/link.n2s?url=https://doi.org/10.1007/978-3-319-50017-1
945 ▼a KLPA
991 ▼a E-Book(소장)

Holdings Information

No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Main Library/e-Book Collection/ Call Number CR 001.42 Accession No. E14016062 Availability Loan can not(reference room) Due Date Make a Reservation Service M

Contents information

Table of Contents

CONTENTS
1 Introduction to Data Science = 1
 1.1 What is Data Science? = 1
 1.2 About This Book = 3
2 Toolboxes for Data Scientists = 5
 2.1 Introduction = 5
 2.2 Why Python? = 6
 2.3 Fundamental Python Libraries for Data Scientists = 6
  2.3.1 Numeric and Scientific Computation : NumPy and SciPy = 7
  2.3.2 SCIKIT-Learn : Machine Learning in Python = 7
  2.3.3 PANDAS : Python Data Analysis Library = 7
 2.4 Data Science Ecosystem Installation = 7
 2.5 Integrated Development Environments (IDE) = 8
  2.5.1 Web Integrated Development Environment (WIDE) : Jupyter = 9
 2.6 Get Started with Python for Data Scientists = 10
  2.6.1 Reading = 14
  2.6.2 Selecting Data = 16
  2.6.3 Filtering Data = 17
  2.6.4 Filtering Missing Values = 17
  2.6.5 Manipulating Data = 18
  2.6.6 Sorting = 22
  2.6.7 Grouping Data = 23
  2.6.8 Rearranging Data = 24
  2.6.9 Ranking Data = 25
  2.6.10 Plotting = 26
 2.7 Conclusions = 28
3 Descriptive Statistics = 29
 3.1 Introduction = 29
 3.2 Data Preparation = 30
  3.2.1 The Adult Example = 30
 3.3 Exploratory Data Analysis = 32
  3.3.1 Summarizing the Data = 32
  3.3.2 Data Distributions = 36
  3.3.3 Outlier Treatment = 38
  3.3.4 Measuring Asymmetry : Skewness and Pearson''''s Median Skewness Coefficient = 41
  3.3.5 Continuous Distribution = 42
  3.3.6 Kernel Density = 44
 3.4 Estimation = 46
  3.4.1 Sample and Estimated Mean, Variance and Standard Scores = 46
  3.4.2 Covariance, and Pearson''''s and Spearman''''s Rank Correlation = 47
 3.5 Conclusions = 50
  References = 50
4 Statistical Inference = 51
 4.1 Introduction = 51
 4.2 Statistical Inference : The Frequentist Approach = 52
 4.3 Measuring the Variability in Estimates = 52
  4.3.1 Point Estimates = 53
  4.3.2 Confidence Intervals = 56
 4.4 Hypothesis Testing = 59
  4.4.1 Testing Hypotheses Using Confidence Intervals = 60
  4.4.2 Testing Hypotheses Using p-Values = 61
 4.5 But Is the Effect E Real? = 64
 4.6 Conclusions = 64
  References = 65
5 Supervised Learning = 67
 5.1 Introduction = 67
 5.2 The Problem = 68
 5.3 First Steps = 69
 5.4 What Is Learning? = 78
 5.5 Learning Curves = 79
 5.6 Training, Validation and Test = 82
 5.7 Two Learning Models = 86
  5.7.1 Generalities Concerning Learning Models = 86
  5.7.2 Support Vector Machines = 87
  5.7.3 Random Forest = 90
 5.8 Ending the Learning Process = 91
 5.9 A Toy Business Case = 92
 5.10 Conclusion = 95
  Reference = 96
6 Regression Analysis = 97
 6.1 Introduction = 97
 6.2 Linear Regression = 98
  6.2.1 Simple Linear Regression = 98
  6.2.2 Multiple Linear Regression and Polynomial Regression = 103
 6.2.3 Sparse Model = 104
 6.3 Logistic Regression = 110
 6.4 Conclusions = 113
  References = 114
7 Unsupervised Learning = 115
 7.1 Introduction = 115
 7.2 Clustering = 116
  7.2.1 Similarity and Distances = 117
  7.2.2 What Constitutes a Good Clustering? Defining Metrics to Measure Clustering Quality = 117
  7.2.3 Taxonomies of Clustering Techniques = 120
 7.3 Case Study = 132
 7.4 Conclusions = 138
  References = 139
8 Network Analysis = 141
 8.1 Introduction = 141
 8.2 Basic Definitions in Graphs = 142
 8.3 Social Network Analysis = 144
  8.3.1 Basics in NetworkX = 144
  8.3.2 Practical Case : Facebook Dataset = 145
 8.4 Centrality = 147
  8.4.1 Drawing Centrality in Graphs = 152
  8.4.2 PageRank = 154
 8.5 Ego-Networks = 157
 8.6 Community Detection = 162
 8.7 Conclusions = 163
  References = 164
9 Recommender Systems = 165
 9.1 Introduction = 165
 9.2 How Do Recommender Systems Work? = 166
  9.2.1 Content-Based Filtering = 166
  9.2.2 Collaborative Filtering = 167
  9.2.3 Hybrid Recommenders = 167
 9.3 Modeling User Preferences = 167
 9.4 Evaluating Recommenders = 168
 9.5 Practical Case = 169
  9.5.1 MovieLens Dataset = 169
  9.5.2 User-Based Collaborative Filtering = 171
 9.6 Conclusions = 179
  References = 179
10 Statistical Natural Language Processing for Sentiment Analysis = 181
 10.1 Introduction = 181
 10.2 Data Cleaning = 182
 10.3 Text Representation = 185
  10.3.1 Bi-Grams and n-Grams = 190
 10.4 Practical Cases = 191
 10.5 Conclusions = 196
  References = 196
11 Parallel Computing = 199
 11.1 Introduction = 199
 11.2 Architecture = 200
  11.2.1 Getting Started = 201
  11.2.2 Connecting to the Cluster (The Engines) = 202
 11.3 Multicore Programming = 203
  11.3.1 Direct View of Engines = 203
  11.3.2 Load-Balanced View of Engines = 206
 11.4 Distributed Computing = 207
 11.5 A Real Application : New York Taxi Trips = 208
  11.5.1 A Direct View Non-Blocking Proposal = 209
  11.5.2 Results = 212
 11.6 Conclusions = 214
  References = 215
Index = 217

New Arrivals Books in Related Fields

Mitchell, Michael N (2022)
이어령 (2022)