HOME > Detail View

Detail View

Textual data science with R

Textual data science with R

Material type
Personal Author
Bécue-Bertaut, Monica, author.
Title Statement
Textual data science with R / Monica Bécue-Bertaut.
Publication, Distribution, etc
Boca Raton, FL :   CRC Press,Taylor & Francis Group,   c2018.  
Physical Medium
xvii, 194 p. : ill. ; 25 cm.
9781138626911 (hardback : alk. paper) 9781315212661 (ebook)
Bibliography, Etc. Note
Includes bibliographical references (p. 189-190) and index.
Subject Added Entry-Topical Term
Data processing. Corpora (Linguistics) --Data processing.
000 00000cam u2200205 a 4500
001 000046057425
005 20201130103413
008 201126s2018 flua b 001 0 eng d
010 ▼a 2018054020
020 ▼a 9781138626911 (hardback : alk. paper)
020 ▼a 9781315212661 (ebook)
020 ▼z 9781351816359 ▼q ebook
035 ▼a (KERIS)REF000018868189
040 ▼a DLC ▼b eng ▼c DLC ▼e rda ▼d DLC ▼d 211009
042 ▼a pcc
050 0 0 ▼a HF5548.2 ▼b .B36116 2018
082 0 0 ▼a 410.1/8802855133 ▼2 23
084 ▼a 410.1880285 ▼2 DDCK
090 ▼a 410.1880285 ▼b B398t
100 1 ▼a Bécue-Bertaut, Monica, ▼e author.
245 1 0 ▼a Textual data science with R / ▼c Monica Bécue-Bertaut.
260 ▼a Boca Raton, FL : ▼b CRC Press,Taylor & Francis Group, ▼c c2018.
300 ▼a xvii, 194 p. : ▼b ill. ; ▼c 25 cm.
336 ▼a text ▼b txt ▼2 rdacontent
337 ▼a unmediated ▼b n ▼2 rdamedia
338 ▼a volume ▼b nc ▼2 rdacarrier
504 ▼a Includes bibliographical references (p. 189-190) and index.
650 0 ▼a Data processing.
650 0 ▼a Corpora (Linguistics) ▼x Data processing.
945 ▼a KLPA

Holdings Information

No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Science & Engineering Library/Sci-Info(Stacks2)/ Call Number 410.1880285 B398t Accession No. 121255381 Availability Available Due Date Make a Reservation Service B M

Contents information

Table of Contents

1. Encoding: from a corpus to statistical tables

Textual and contextual data

Textual data

Contextual data

Documents and aggregate documents

Examples and notation

Choosing textual units

Graphical forms



Repeated segments

In practice


Unique spellings

Partially-automated preprocessing

Word selection

Word and segment indexes

The Life UK corpus: preliminary results

Verbal content through word and repeated segment indexes

Univariate description of contextual variables

A note on the frequency range

Implementation with the Xplortext package

In summary

2. Correspondence analysis of textual data

Data and goals

Correspondence analysis: a tool for linguistic data analysis

Data: a small example


Associations between documents and words

Profile comparisons

Independence of documents and words

The X2 test
Association rates between columns and words

Active row and column clouds

Row and column pro_le spaces

Distributional equivalence and the X2 distance

Inertia of a cloud

Fitting document and word clouds

Factorial axes

Visualizing rows and columns

Category representation

Word representation

Transition formulas

Superimposed representation of rows and columns

Interpretation aids

Eigenvalues and representation quality of the clouds

Contribution of documents and words to axis inertia

Representation quality of a point

Supplementary rows and columns

Supplementary tables

Supplementary frequency rows and columns

Supplementary quantitative and qualitative variables

Validating the visualization

Interpretation scheme for textual CA results

Implementation with Xplortext

Summary of the CA approach

3. Applications of correspondence analysis

Choosing the level of detail for analyses

Correspondence analysis on aggregate free text answers

Data and objectives

Word selection

CA on the aggregate table

Document representation

Word representation

Simultaneous interpretation of the plots

Supplementary elements

Supplementary words

Supplementary repeated segments

Supplementary categories

Implementation with Xplortext

Direct analysis

Data and objectives

The main features of direct analysis

Direct analysis of the culture question

Implementation with Xplortext

4. Clustering in textual analysis

Clustering documents

Dissimilarity measures between documents

Measuring partition quality

Document clusters in the factorial space

Partition quality

Dissimilarity measures between document clusters

The single-linkage method

The complete-linkage method

Ward''s method

Agglomerative hierarchical clustering

Hierarchical tree construction algorithm

Selecting the final partition

Interpreting clusters

Direct partitioning

Combining clustering methods

Consolidating partitions

Direct partitioning followed by AHC

A procedure for combining CA and clustering

Example: joint use of CA and AHC

Data and objectives

Data preprocessing using CA

Constructing the hierarchical tree

Choosing the final partition

Contiguity-constrained hierarchical clustering

Principles and algorithm

AHC of age groups with a chronological constraint

Implementation with Xplortext

Example: clustering free text answers

Data and objectives

Data preprocessing

CA: eigenvalues and total inertia

Interpreting the first axes

AHC: building the tree and choosing the final partition

Describing cluster features

Lexical features of clusters

Describing clusters in terms of characteristic words

Describing clusters in terms of characteristic documents

Describing clusters using contextual variables

Describing clusters using contextual qualitative variables

Describing clusters using quantitative contextual variables

Implementation with Xplortext

Summary of the use of AHC on factorial coordinates coming from CA

5. Lexical characterization of parts of a corpus

Characteristic words

Characteristic words and CA

Characteristic words and clustering

Clustering based on verbal content

Clustering based on contextual variables

Hierarchical words

Characteristic documents

Example: characteristic elements and CA

Characteristic words for the categories

Characteristic words and factorial planes

Documents that characterize categories

Characteristic words in addition to clustering

Implementation with Xplortext

6. Multiple factor analysis for textual analysis

Multiple tables in textual analysis

Data and objectives

Data preprocessing

Problems posed by lemmatization

Description of the corpora data

Indexes of the most frequent words



Introduction to MFACT

The limits of CA on multiple contingency tables

How MFACT works

Integrating contextual variables

Analysis of multilingual free text answers

MFACT: eigenvalues of the global analysis

Representation of documents and words

Superimposed representation of the global and partial configurations

Links between the axes of the global analysis and the separate analyses

Representation of the groups of words

Implementation with Xplortext

Simultaneous analysis of two open-ended questions: impact of lemmatization


Preliminary steps

MFACT on the left and right: lemmatized or nonlemmatized

Implementation with Xplortext

Other applications of MFACT in textual analysis

MFACT summary

7. Applications and analysis workflows

General rules for presenting results

Analyzing bibliographic databases

Introduction to the lupus data

The corpus

Exploratory analysis of the corpus

CA of the documents _ words table

The eigenvalues

Meta-keys and doc-keys

Analysis of the year-aggregate table

Eigenvalues and CA of the lexical table

Chronological study of drug names

Implementation with Xplortext

Conclusions from the study

Badinter''s speech: a discursive strategy Methods

Breaking up the corpus into documents

The speech trajectory unveiled by CA


Argument flow

Conclusions on the study of Badinter''s speech

Implementation with Xplortext

Political speeches

Data and objectives



Data preprocessing

Lexicometric characteristics of the speeches and lexical table coding

Eigenvalues and Cramer''s V

Speech trajectory

Word representation


Hierarchical structure of the corpus


Implementation with Xplortext

Corpus of sensory descriptions



Eight Catalan wines


Verbal categorization

Encoding the data


Statistical methodology

MFACT and constructing the mean configuration

Determining consensual words


Data preprocessing

Some initial results

Individual configurations

MFACT: directions of inertia common to the majority of groups

MFACT: representing words and documents on the first plane

Word contributions

MFACT: group representation

Consensual words


New Arrivals Books in Related Fields