Rohini
Even though 5 million job advts may
contain 500 million “
words “ , these are not Unique
Most of these are used
again and again , hundreds or thousands of times
Thru data mining , it is
not difficult to compute their “ Frequency of Usage “
And then , these
frequencies can be graphically plotted against any particular time-period
Such Graphical
Representations can be further broken up by ,
Ø
City Names
Ø
Company Names
Ø
Industry Names
Ø
Function Names
Ø
Designations ( Vacancy
Names ).. etc
And such graphical
analysis can be done , not only for “ Keywords “ but even for “ Key Phrases “ and “ Sentences “ !
Regards
Hcp
A Google database Ngram helps to understand
American novels better
By
New York Times | 15 Jul, 2013, 05.00AM IST
7
2
Share
More
READ MORE ON » Women | Seneca
Falls Convention | Ngram
| Guess | Google | American
novels | American
authors
By examining the changing frequencies of key words in books
published in the US, researchers can gain new perspectives on America and its
novels.
ET
SPECIAL:
By
Marc Egnal
Can the technologies of Big Data, which are transforming so many areas of life, change our understanding of American novels? After conducting research with Google's Ngram database, which tabulates the frequency of words used in over five million books, I believe the answer is yes.
Consider the question of which themes and books characterise a literary era. The time-honoured approach to this problem has been for a critic or a group of scholars to select and analyse key novels. That methodology has its flaws. No one person or team of readers can do more than dip their toes into the vast sea of literary works. By the 1840s, Americans wrote more than 100 novels annually; by the 1880s, more than 1,000; by the early 21st century, more than 10,000. In addition, there is the threat of subjective bias. Not long ago, for example, critics focused their attention almost exclusively on white male authors.
The Ngram database offers an alternative approach. By examining the changing frequencies of key words in books published in the US, researchers can gain new perspectives on America and its novels. There are important caveats in using this source. The "American English" subset of the Ngram database includes a broad selection of books published in the US — not just fiction or writings by American authors. It excludes the dime novels favoured by the lower class, and so has a middle-class bias. But as a guide to the works that middle-class Americans read, it is a fruitful source of hypotheses and a healthy check on subjective opinion.
In a number of instances, Ngram data suggests challenges to common assumptions.
Word Processing
Take the role of women in mid-19th century American novels. Scholars have argued that domesticity shaped the world of middle-class women. Women were supposed to be submissive, pious, domestic and pure. But Ngram indicates that the use of those words peaked, respectively, in 1807, 1814, 1835 and 1847. All fell off by 1950.
By contrast, striking gains were recorded in the usage of woman's rights. Virtually unknown before the 1840s, the term soared in frequency after the Seneca Falls Convention in 1848. Perhaps we need to invert conventional wisdom and declare as "representative" those mid-century novels criticising domesticity and celebrating independent women, like Fanny Fern's Ruth Hall (1854) and Emma Southworth's Hidden Hand.
Ngram data also provides a new perspective on the novels of the 1930s. These years are traditionally viewed as the heyday of the proletarian novel, a time of gloom and a period when business leaders were despised. John Steinbeck's 1939 novel, The Grapes of Wrath, is considered a quintessential novel of the decade. But according to Ngram data, the use of businessman, a term virtually unknown before 1930, surged during the decade. Of course, you might guess that those citations were negative, but trends in other terms point to a more positive reading.
Mentions of the American dream, a term rarely seen before 1930, also climbed precipitously. So instead of Steinbeck's novel, works highlighting scrappy entrepreneurs may best mark this decade. In Their Eyes Were Watching God (1939), for example, the heroine's first two husbands were successful businessmen who overcame racial prejudice. Similarly, Gone With the Wind (1936) details Scarlett regaining the affluence she once enjoyed.
Our view of postmodern fiction might also need adjusting. Chaos, conspiracy and nihilism are thought to reign in this literary world. Word usage, however, indicates the growing attention paid to children. Among the terms whose frequency escalates after 1960 are caring, nurturing, infant, toddler and childhood.
Perhaps the representative works of this era are novels like Toni Morrison's Beloved, Philip Roth's American Pastoral and Cormac McCarthy's The Road, all of which feature deep parent-child bonds. These hypotheses are suggestive, but as tools like Ngram improve, it should encourage scholars to revisit longstanding assumptions.
(The writer is a professor of history at York University, Toronto)
Can the technologies of Big Data, which are transforming so many areas of life, change our understanding of American novels? After conducting research with Google's Ngram database, which tabulates the frequency of words used in over five million books, I believe the answer is yes.
Consider the question of which themes and books characterise a literary era. The time-honoured approach to this problem has been for a critic or a group of scholars to select and analyse key novels. That methodology has its flaws. No one person or team of readers can do more than dip their toes into the vast sea of literary works. By the 1840s, Americans wrote more than 100 novels annually; by the 1880s, more than 1,000; by the early 21st century, more than 10,000. In addition, there is the threat of subjective bias. Not long ago, for example, critics focused their attention almost exclusively on white male authors.
The Ngram database offers an alternative approach. By examining the changing frequencies of key words in books published in the US, researchers can gain new perspectives on America and its novels. There are important caveats in using this source. The "American English" subset of the Ngram database includes a broad selection of books published in the US — not just fiction or writings by American authors. It excludes the dime novels favoured by the lower class, and so has a middle-class bias. But as a guide to the works that middle-class Americans read, it is a fruitful source of hypotheses and a healthy check on subjective opinion.
In a number of instances, Ngram data suggests challenges to common assumptions.
Word Processing
Take the role of women in mid-19th century American novels. Scholars have argued that domesticity shaped the world of middle-class women. Women were supposed to be submissive, pious, domestic and pure. But Ngram indicates that the use of those words peaked, respectively, in 1807, 1814, 1835 and 1847. All fell off by 1950.
By contrast, striking gains were recorded in the usage of woman's rights. Virtually unknown before the 1840s, the term soared in frequency after the Seneca Falls Convention in 1848. Perhaps we need to invert conventional wisdom and declare as "representative" those mid-century novels criticising domesticity and celebrating independent women, like Fanny Fern's Ruth Hall (1854) and Emma Southworth's Hidden Hand.
Ngram data also provides a new perspective on the novels of the 1930s. These years are traditionally viewed as the heyday of the proletarian novel, a time of gloom and a period when business leaders were despised. John Steinbeck's 1939 novel, The Grapes of Wrath, is considered a quintessential novel of the decade. But according to Ngram data, the use of businessman, a term virtually unknown before 1930, surged during the decade. Of course, you might guess that those citations were negative, but trends in other terms point to a more positive reading.
Mentions of the American dream, a term rarely seen before 1930, also climbed precipitously. So instead of Steinbeck's novel, works highlighting scrappy entrepreneurs may best mark this decade. In Their Eyes Were Watching God (1939), for example, the heroine's first two husbands were successful businessmen who overcame racial prejudice. Similarly, Gone With the Wind (1936) details Scarlett regaining the affluence she once enjoyed.
Our view of postmodern fiction might also need adjusting. Chaos, conspiracy and nihilism are thought to reign in this literary world. Word usage, however, indicates the growing attention paid to children. Among the terms whose frequency escalates after 1960 are caring, nurturing, infant, toddler and childhood.
Perhaps the representative works of this era are novels like Toni Morrison's Beloved, Philip Roth's American Pastoral and Cormac McCarthy's The Road, all of which feature deep parent-child bonds. These hypotheses are suggestive, but as tools like Ngram improve, it should encourage scholars to revisit longstanding assumptions.
(The writer is a professor of history at York University, Toronto)
No comments:
Post a Comment