HOME > Detail View

Detail View

Modern data science with R / 2nd ed

Modern data science with R / 2nd ed

Material type
단행본
Personal Author
Baumer, Benjamin, author. Kaplan, Daniel, author. Horton, Nicholas J., author.
Title Statement
Modern data science with R / Benjamin S. Baumer, Daniel T. Kaplan, Nicholas J. Horton.
판사항
2nd ed.
Publication, Distribution, etc
Boca Raton :   CRC Press,   2021.  
Physical Medium
xvii, 631 p. : ill. (some col.) ; 26 cm.
Series Statement
Chapman and Hall/CRC Press texts in statistical science
ISBN
9780367191498 (hardback) 9780429200717 (ebook)
요약
"Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions. The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice. From a review of the first edition: "Modern Data Science with R ... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician)"--
Bibliography, Etc. Note
Includes bibliographical references (p.573-588) and indexes.
Subject Added Entry-Topical Term
Data mining. Big data. Mathematical statistics --Data processing. R (Computer program language).
000 00000cam u2200205 a 4500
001 000046082056
005 20210608140654
008 210607s2021 flua b 001 0 eng d
010 ▼a 2020052396
020 ▼a 9780367191498 (hardback)
020 ▼a 9780429200717 (ebook)
035 ▼a (KERIS)REF000019455286
040 ▼a DLC ▼b eng ▼e rda ▼c DLC ▼d 211009
042 ▼a pcc
050 0 0 ▼a QA76.9.D343 ▼b B38 2021
082 0 0 ▼a 006.3/12 ▼2 23
084 ▼a 0016.312 ▼2 DDCK
090 ▼a 006.312 ▼b B347m2
100 1 ▼a Baumer, Benjamin, ▼e author.
245 1 0 ▼a Modern data science with R / ▼c Benjamin S. Baumer, Daniel T. Kaplan, Nicholas J. Horton.
250 ▼a 2nd ed.
260 ▼a Boca Raton : ▼b CRC Press, ▼c 2021.
264 1 ▼a Boca Raton : ▼b CRC Press, ▼c 2021.
300 ▼a xvii, 631 p. : ▼b ill. (some col.) ; ▼c 26 cm.
336 ▼a text ▼b txt ▼2 rdacontent
337 ▼a unmediated ▼b n ▼2 rdamedia
338 ▼a volume ▼b nc ▼2 rdacarrier
490 0 ▼a Chapman and Hall/CRC Press texts in statistical science
504 ▼a Includes bibliographical references (p.573-588) and indexes.
520 ▼a "Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions. The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice. From a review of the first edition: "Modern Data Science with R ... is rich with examples and is guided by a strong narrative voice. What's more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician)"-- ▼c Provided by publisher.
650 0 ▼a Data mining.
650 0 ▼a Big data.
650 0 ▼a Mathematical statistics ▼x Data processing.
650 0 ▼a R (Computer program language).
700 1 ▼a Kaplan, Daniel, ▼e author.
700 1 ▼a Horton, Nicholas J., ▼e author.
945 ▼a KLPA

Holdings Information

No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Main Library/Western Books/ Call Number 006.312 B347m2 Accession No. 111849703 Availability Available Due Date Make a Reservation Service B M

Contents information

Table of Contents

Preface


Background and motivation


Intended audience


Key features of this book


Changes in the second edition


Key role of technology


How to use this book


Acknowledgments





I Part I: Introduction to Data Science





1. Prologue: Why data science?


What is data science?


Case study: The evolution of sabermetrics


Datasets


Further resources





2. Data visualization


The federal election cycle


Composing data graphics


Importance of data graphics: Challenger


Creating effective presentations


The wider world of data visualization


Further resources


Exercises


Supplementary exercises





3. A grammar for graphics


A grammar for data graphics


Canonical data graphics in R


Extended example: Historical baby names


Further resources


Exercises


Supplementary exercises





4. Data wrangling on one table


A grammar for data wrangling


Extended example: Ben''s time with the Mets


Further resources


Exercises


Supplementary exercises





5. Data wrangling on multiple tables


inner_join()


left_join()


Extended example: Manny Ramirez


Further resources


Exercises


Supplementary exercises





6. Tidy data


Tidy data


Reshaping data


Naming conventions


Data intake


Further resources


Exercises


Supplementary exercises





7. Iteration


Vectorized operations


Using across() with dplyr functions


The map() family of functions


Iterating over a one-dimensional vector


Iteration over subgroups


Simulation


Extended example: Factors associated with BMI


Further resources


Exercises


Supplementary exercises





8. Data Science Ethics


Introduction


Truthful falsehoods


Role of data science in society


Some settings for professional ethics


Some principles to guide ethical action


Algorithmic bias


Data and disclosure


Reproducibility


Ethics, collectively


Professional guidelines for ethical conduct


Further resources


Exercises


Supplementary exercises





II Part II: Statistics and Modeling





9. Statistical foundations


Samples and populations


Sample statistics


The bootstrap


Outliers


Statistical models: Explaining variation


Confounding and accounting for other factors


The perils of p-values


Further resources


Exercises


Supplementary exercises





10. Predictive modeling


Predictive modeling


Simple classification models


Evaluating models


Extended example: Who has diabetes?


Further resources


Exercises


Supplementary exercises





11. Supervised learning


Non-regression classifiers


Parameter tuning


Example: Evaluation of income models redux


Extended example: Who has diabetes this time?


Regularization


Further resources


Exercises


Supplementary exercises





12. Unsupervised learning


Clustering


Dimension reduction


Further resources


Exercises


Supplementary exercises





13. Simulation


Reasoning in reverse


Extended example: Grouping cancers


Randomizing functions


Simulating variability


Random networks


Key principles of simulation


Further resources


Exercises


Supplementary exercises





III Part III: Topics in Data Science





14. Dynamic and customized data graphics


Rich Web content using Djs and htmlwidgets


Animation


Flexdashboard


Interactive Web apps with Shiny


Customization of library(ggplot)ggplot graphics


Extended example: Hot dog eating


Further resources


Exercises


Supplementary exercises





15. Database querying using SQL


From dplyr to SQL


Flat-file databases


The SQL universe


The SQL data manipulation language


Extended example: FiveThirtyEight flights


SQL vs R


Further resources


Exercises


Supplementary exercises





16. Database administration


Constructing efficient SQL databases


Changing SQL data


Extended example: Building a database


Scalability


Further resources


Exercises


Supplementary exercises





17. Working with geospatial data


Motivation: What''s so great about geospatial data?


Spatial data structures


Making maps


Extended example: Congressional districts


Effective maps: How (not) to lie


Projecting polygons


Playing well with others


Further resources


Exercises


Supplementary exercises





18. Geospatial computations


Geospatial operations


Geospatial aggregation


Geospatial joins


Extended example: Trail elevations at MacLeish


Further resources


Exercises


Supplementary exercises





19. Text as data


Regular expressions using Macbeth


Extended example: Analyzing textual data from arXivorg


Ingesting text


Further resources


Exercises


Supplementary exercises





20. Network science


Introduction to network science


Extended example: Six degrees of Kristen Stewart


PageRank


Extended example: men''s college basketball


Further resources


Exercises


Supplementary exercises





21. Epilogue: Towards "big data"


Notions of big data


Tools for bigger data


Alternatives to R


Closing thoughts


Further resources





IV Part IV: Appendices





A Packages used in this book


The mdsr package


Other packages


Further resources


B Introduction to R and RStudio


Installation


Learning R


Fundamental structures and objects


Add-ons: Packages


Further resources


Exercises


Supplementary exercises


C Algorithmic thinking


Introduction


Simple example


Extended example: Law of large numbers


Non-standard evaluation


Debugging and defensive coding


Further resources


Exercises


Supplementary exercises


D Reproducible analysis and workflow


Scriptable statistical computing


Reproducible analysis with R Markdown


Projects and version control


Further resources


Exercises


Supplementary exercises


E Regression modeling


Multiple regression


Inference for regression


Assumptions underlying regression


Logistic regression


Further resources


Exercises


Supplementary exercises


F Setting up a database server


SQLite


MySQL


PostgreSQL


Connecting to SQL

New Arrivals Books in Related Fields