Text Analysis I: Unlocking the Power of Text Analysis

— by Katherine Choi, Kayla Ng, Terry Chung

In today’s post, we delve into the fascinating world of text analysis. Text analysis is the process of examining and understanding the structure and content of written works. By identifying patterns, themes, and relationships within a text, text analysis provides insights into the author’s intentions, the intended audience, and the cultural context in which the text was created. This helps us gain a deeper understanding of the meaning and significance behind written words. 

Why is Text Analysis Important? 

  1. Uncovering Hidden Meanings: Text analysis reveals biases, assumptions, and hidden meanings, providing new insights into the author’s intentions and the text’s cultural context. 
  1. Encouraging Critical Thinking: It helps readers gain a deeper appreciation and interpretation of the work, promoting critical thinking beyond the surface level. 
  1. Broad Applicability: This tool is used by scholars and researchers across various disciplines to seek deep understanding of written content. 

Key Steps in Text Analysis 

Text analysis involves several key steps: 

  1. Corpus Creation and Selection: Identify relevant data sources and collect appropriate data. 
  1. Preprocessing: Clean and prepare text data by removing irrelevant characters, correcting spelling errors, and standardizing formats. 
  1. Data Analysis: Apply computational methods like statistical analysis, natural language processing (NLP) techniques, and machine learning algorithms to extract insights and patterns. 
  1. Visualization: Present findings using visualization techniques such as charts and graphs for meaningful representation of data. 

Key Techniques in Text Analysis 

  • Text Preprocessing: Involving tasks such as tokenization (breaking down the text into individual words or tokens), stopword removal (eliminating words without meaningful insights, such as “a”, “an”, “the”), and lemmatization or stemming (reducing words to their base or root form). 

Figure 1: Example of Tokenization 

  • N-gram: Capturing sequences of N consecutive items, typically words in text information. It is useful for understanding language structure and context. The count results are typically visualized in various forms such as word clouds or diagrams like line charts. 

Figure 2: Example of Word Cloud Visualization 

Figure 3: Example of Line Chart for N-Grams 

  • Concordance: Listing each occurrence of a word in a text and presenting it with surrounding words. Concordances visualize data and reveal usage patterns, with “Key Word in Context” (KWIC) being a common example in corpus linguistics (Wynne, 2008). 
  • Collocation: Exploring groups of words that often appear together, not just considering grammar or order, can result in idioms and phrases in English and uncover linguistic or semantic relationships (Sinclair, 1991). 
  • Sentiment Analysis: Extracting subjective information from text to determine overall sentiment (positive😆, negative😢, or neutral😐). 

Types of Text Data 

Text data analysis can be applied to various types of textual sources, for example: 

  • Books and Articles 
  • Social Media Texts 
  • Customer Reviews, Surveys, and Feedback 

Real-World Applications 

Text analysis is widely used in various fields: 

  • Marketing: Analyzing customer sentiment and feedback contributes to shaping effective strategies and enhancing customer satisfaction. 
  • Healthcare: Analyzing patient records and research papers helps in identifying trends, improving treatment protocols, and advancing medical research. 
  • Journalism: Investigating large volumes of text for trends and insights aids in uncovering stories, detecting patterns, and informing public discourse. 
  • Academia: Analyzing research publications assists in understanding trends in knowledge creation and evaluating theories. 

Tools for Text Analysis 

Several tools are available for text analysis, ranging from open-source software like MALLET, Python, and R, to commercial software like NVIVO and Leximancer. (A list of tools can be found at: https://dss.lib.hku.hk/resources/tools ) Among these, Voyant Tools is a popular web-based platform that offers a user-friendly interface and diverse analysis techniques. 

Next Steps 

In part II, we will give a demonstration of Voyant Tools, showcasing its features and how you can use it to analyze text data effectively. Stay tuned! 

If you want to know more about text analysis or digital scholarship, please feel free to visit the webpage of Digital Scholarship Services at HKU Libraries at: https://dss.lib.hku.hk/getting-started/what-is-ds

Reference 

Chung, T., & Ng, K. (2024, April 19). Introduction to Text Analysis and Voyant Tools. 

Sinclair, J. (1991). Corpus, Concordance, Collocation. Oxford University Press. 

Wynne, M.(2008). Searching and concordancing. In Lüdeling, A., & Kytö, M. (Eds.), Corpus linguistics: An international handbook (Vol. 1, pp. 706-737). Berlin: Walter de Gruyter. 

(The blog post is based on Research Data Academy, RDA, organized by HKU Libraries in 2024. The RDA is a series of training sessions designed to strengthen participants’ data literacy skills, covering multiple areas in the research data life cycle.)  

Share