Pondicherry Winter School W3 workshop (2023)

Presentation

This page gathers the statistical textual analysis tutorials for the W3 workshop of the 2023 Pondicherry Winter School. The workshop will be led by Anupam Das, Aasim Khan and myself. I focus on exploratory textual statistical analysis.

Though the workshop uses R, students do not need prior knowledge of this programming environment to go through the workshop, since we use two wonderful R shiny apps to analyse the data:

  • Radiant developed by Vincent Nijs for descriptive statistics (cross tabulations)

  • Rainette developed by Julien Barnier for textual analysis based on Max Reinert’s method

I also (quickly) introduce to textual data cleaning and pre-treatment based on the quanteda package (developed by Kenneth Benoit and Kohei Watanabe).

In this workshop, we work on an excerpt of a database web scraped from an Indian online matrimonial website. We analyse a few of the characteristics of the advertisements, by looking at how spouse-to-be present themselves. In particular, we are interested in how they described their family and what kind of partner they say they desire.

I draw from a work conducted with Jeanne Subtil. You can read our working paper here.

Back to the main website.