HOME > Detail View

Detail View

Textual data science with R

Textual data science with R

Material type
단행본
Personal Author
Bécue-Bertaut, Monica, author.
Title Statement
Textual data science with R / Monica Bécue-Bertaut.
Publication, Distribution, etc
Boca Raton, FL :   CRC Press,Taylor & Francis Group,   c2018.  
Physical Medium
xvii, 194 p. : ill. ; 25 cm.
ISBN
9781138626911 (hardback : alk. paper) 9781315212661 (ebook)
Bibliography, Etc. Note
Includes bibliographical references (p. 189-190) and index.
Subject Added Entry-Topical Term
Data processing. Corpora (Linguistics) --Data processing.
000 00000cam u2200205 a 4500
001 000046057425
005 20201130103413
008 201126s2018 flua b 001 0 eng d
010 ▼a 2018054020
020 ▼a 9781138626911 (hardback : alk. paper)
020 ▼a 9781315212661 (ebook)
020 ▼z 9781351816359 ▼q ebook
035 ▼a (KERIS)REF000018868189
040 ▼a DLC ▼b eng ▼c DLC ▼e rda ▼d DLC ▼d 211009
042 ▼a pcc
050 0 0 ▼a HF5548.2 ▼b .B36116 2018
082 0 0 ▼a 410.1/8802855133 ▼2 23
084 ▼a 410.1880285 ▼2 DDCK
090 ▼a 410.1880285 ▼b B398t
100 1 ▼a Bécue-Bertaut, Monica, ▼e author.
245 1 0 ▼a Textual data science with R / ▼c Monica Bécue-Bertaut.
260 ▼a Boca Raton, FL : ▼b CRC Press,Taylor & Francis Group, ▼c c2018.
300 ▼a xvii, 194 p. : ▼b ill. ; ▼c 25 cm.
336 ▼a text ▼b txt ▼2 rdacontent
337 ▼a unmediated ▼b n ▼2 rdamedia
338 ▼a volume ▼b nc ▼2 rdacarrier
504 ▼a Includes bibliographical references (p. 189-190) and index.
650 0 ▼a Data processing.
650 0 ▼a Corpora (Linguistics) ▼x Data processing.
945 ▼a KLPA

Holdings Information

No. Location Call Number Accession No. Availability Due Date Make a Reservation Service
No. 1 Location Science & Engineering Library/Sci-Info(Stacks2)/ Call Number 410.1880285 B398t Accession No. 121255381 Availability Available Due Date Make a Reservation Service B M

Contents information

Table of Contents

1. Encoding: from a corpus to statistical tables


Textual and contextual data


Textual data


Contextual data


Documents and aggregate documents


Examples and notation


Choosing textual units


Graphical forms


Lemmas


Stems


Repeated segments


In practice


Preprocessing


Unique spellings


Partially-automated preprocessing


Word selection


Word and segment indexes


The Life UK corpus: preliminary results


Verbal content through word and repeated segment indexes


Univariate description of contextual variables


A note on the frequency range


Implementation with the Xplortext package


In summary





2. Correspondence analysis of textual data


Data and goals


Correspondence analysis: a tool for linguistic data analysis


Data: a small example


Objectives


Associations between documents and words


Profile comparisons


Independence of documents and words


The X2 test
Association rates between columns and words

Active row and column clouds


Row and column pro_le spaces


Distributional equivalence and the X2 distance


Inertia of a cloud


Fitting document and word clouds


Factorial axes


Visualizing rows and columns


Category representation


Word representation


Transition formulas


Superimposed representation of rows and columns


Interpretation aids


Eigenvalues and representation quality of the clouds


Contribution of documents and words to axis inertia


Representation quality of a point


Supplementary rows and columns


Supplementary tables


Supplementary frequency rows and columns


Supplementary quantitative and qualitative variables


Validating the visualization


Interpretation scheme for textual CA results


Implementation with Xplortext


Summary of the CA approach





3. Applications of correspondence analysis


Choosing the level of detail for analyses


Correspondence analysis on aggregate free text answers


Data and objectives


Word selection


CA on the aggregate table


Document representation


Word representation


Simultaneous interpretation of the plots


Supplementary elements


Supplementary words


Supplementary repeated segments


Supplementary categories


Implementation with Xplortext


Direct analysis


Data and objectives


The main features of direct analysis


Direct analysis of the culture question


Implementation with Xplortext





4. Clustering in textual analysis


Clustering documents


Dissimilarity measures between documents


Measuring partition quality


Document clusters in the factorial space


Partition quality


Dissimilarity measures between document clusters


The single-linkage method


The complete-linkage method


Ward''s method


Agglomerative hierarchical clustering


Hierarchical tree construction algorithm


Selecting the final partition


Interpreting clusters


Direct partitioning


Combining clustering methods


Consolidating partitions


Direct partitioning followed by AHC


A procedure for combining CA and clustering


Example: joint use of CA and AHC


Data and objectives


Data preprocessing using CA


Constructing the hierarchical tree


Choosing the final partition


Contiguity-constrained hierarchical clustering


Principles and algorithm


AHC of age groups with a chronological constraint


Implementation with Xplortext


Example: clustering free text answers


Data and objectives


Data preprocessing


CA: eigenvalues and total inertia


Interpreting the first axes


AHC: building the tree and choosing the final partition


Describing cluster features


Lexical features of clusters


Describing clusters in terms of characteristic words


Describing clusters in terms of characteristic documents


Describing clusters using contextual variables


Describing clusters using contextual qualitative variables


Describing clusters using quantitative contextual variables


Implementation with Xplortext


Summary of the use of AHC on factorial coordinates coming from CA





5. Lexical characterization of parts of a corpus


Characteristic words


Characteristic words and CA


Characteristic words and clustering


Clustering based on verbal content


Clustering based on contextual variables


Hierarchical words


Characteristic documents


Example: characteristic elements and CA


Characteristic words for the categories


Characteristic words and factorial planes


Documents that characterize categories


Characteristic words in addition to clustering


Implementation with Xplortext





6. Multiple factor analysis for textual analysis


Multiple tables in textual analysis


Data and objectives


Data preprocessing


Problems posed by lemmatization


Description of the corpora data


Indexes of the most frequent words


Notation


Objectives


Introduction to MFACT


The limits of CA on multiple contingency tables


How MFACT works


Integrating contextual variables


Analysis of multilingual free text answers


MFACT: eigenvalues of the global analysis


Representation of documents and words


Superimposed representation of the global and partial configurations


Links between the axes of the global analysis and the separate analyses


Representation of the groups of words


Implementation with Xplortext


Simultaneous analysis of two open-ended questions: impact of lemmatization


Objectives


Preliminary steps


MFACT on the left and right: lemmatized or nonlemmatized


Implementation with Xplortext


Other applications of MFACT in textual analysis


MFACT summary





7. Applications and analysis workflows


General rules for presenting results


Analyzing bibliographic databases


Introduction to the lupus data


The corpus


Exploratory analysis of the corpus


CA of the documents _ words table


The eigenvalues


Meta-keys and doc-keys


Analysis of the year-aggregate table


Eigenvalues and CA of the lexical table


Chronological study of drug names


Implementation with Xplortext


Conclusions from the study


Badinter''s speech: a discursive strategy Methods


Breaking up the corpus into documents


The speech trajectory unveiled by CA


Results


Argument flow


Conclusions on the study of Badinter''s speech


Implementation with Xplortext


Political speeches


Data and objectives


Methodology


Results


Data preprocessing


Lexicometric characteristics of the speeches and lexical table coding


Eigenvalues and Cramer''s V


Speech trajectory


Word representation


Remarks


Hierarchical structure of the corpus


Conclusions


Implementation with Xplortext


Corpus of sensory descriptions


Introduction


Data


Eight Catalan wines


Jury


Verbal categorization


Encoding the data


Objectives


Statistical methodology


MFACT and constructing the mean configuration


Determining consensual words


Results


Data preprocessing


Some initial results


Individual configurations


MFACT: directions of inertia common to the majority of groups


MFACT: representing words and documents on the first plane


Word contributions


MFACT: group representation


Consensual words


Conclusion

New Arrivals Books in Related Fields