Computing Point-of-View: Modeling and Simulating Judgments of Taste

səhifə	14/28
tarix	26.06.2016
ölçüsü	8.55 Mb.

1 ... 10 11 12 13 14 15 16 17 ... 28

4.3 Perception realm: ‘escada’

Experimental System for Character Affect Dynamics Analysis (ESCADA) (Liu 2005) performs a RATE processing of an individual’s everyday texts, such as a weblog diary, in order to produce a perception model. Of all the realms’ models, this is the most experimental, and its results are the most tenuous. Recall that the space of perception is framed by Jung’s four fundamental psychological functions—sense, intuit, think, feel—taken as orthogonal axes of perception space. Also recall that in the reading schema for perception realm, there are perception-lexemes and perception-classemes. Perception-lexemes are instances of affective communication between the writer, called ‘ego’, and other textual entities, called ‘alters’. For example, the utterance “I laughed at John so hard” is abstracted into an affective transaction—a passing of the valence (-,+,+) associated with the phrasal verb “laugh at” from ego into ‘alters’. This instance is called a perception-lexeme, since affective transaction is hypothesized as a basic unit of perceptual disposition. ESCADA transforms a weblog diary into a bag of perception-lexemes using textual affect sensing, especially invoking the lexical sentiment analyzer. The machine learner, Boostexter (Schapire & Singer 2000), was run over a large corpus of annotated blogs, and a mapping was learned between perception-lexemes and perception-classemes. The rest of this section 1) overviews the Character Affect Dynamics theory; 2) discusses how lexeme-to-classeme mappings were learned from a corpus of weblog diaries; and 3) presents an evaluation of the RATE reader’s performance.
§
CAD theory. Character Affect Dynamics is a theory which posits that latent patterns of affective communication in a narrative betray the time-stable perceptual dispositions of the characters of the narrative. In the case of a first-person narrative such as an everyday text, it is the writer’s affective engagement with other persons and things that is of interest. CAD theory has a cognitive linguistic basis. Talmy’s (1998a) force dynamics theory models linguistic utterances as forces exchanged between agents and objects (e.g. ‘the door could not be opened’). Force dynamics theory in fact proposes applications to social interactions and to internal psychodynamics. Following force dynamics, CAD theory examines the affective forces present between characters in a narrative. CAD suggests that much of this analysis can be modeled as the passing of affective tokens between textual entities—since textual entities approximate agents and objects.
For example, the utterance “I stole Mary’s ice cream,” can be interpreted as affective token pushing: “I [negative-act] [positive-object]. In this transaction, the writer does something bad to something valued by Mary. As a result, it could be concluded that the writer has negative affect, that he is aggressive, that Mary’s ‘ice cream’ henceforth bares the traumatic connotation of something negative (due to emotion’s contagion). Furthermore, if the next utterance is “Mary resented me,” there is confirmation that the previous act was negative, and the fact that Mary’s retaliatory nature is disclosed. “Resent” is known in ESCADA’s lexical knowledge base as a passive-act, thusly, Mary is passive-aggressive.
The above scenario suggests some advanced capabilities of CAD-enabled story understanding. The ESCADA (Experimental System for Character Affect Dynamics Analysis) system implements an extend version of this scenario. Deep story understanding, though, is understandably brittle, but CAD theory’s claim does not rely on deep understanding, only on shallow reading—i.e., emergent patterns of affect token passing between characters can predict their perceptual dispositions—tendency toward thinking or feeling, sensing or intuiting. To test this claim, an initial set of these patterns were implemented in ESCADA:

EGO-PAD (main character’s PAD-level)
ALTERS-PAD (other characters’ PAD-level)
INCOMING-PAD (PAD flowing from alters into ego)
OUTGOING-PAD (PAD flowing out from ego into alters)
MENTAL-ACTIVITY (quantity of invocations of mental hypotheticals, e.g. “I thought that”)
INTROVERSION-EXTRAVERSION-RATIO (ratio of passive acts e.g. ‘resent’ to active acts e.g. ‘murder’)

§
Learning lexeme-to-classeme mappings. RATE processing of an individual’s weblog diary results in the identification of many instances of affective communication from the text. These instances are compiled into these fourteen affective statistics by averaging over each blog entry:

EGO-PLEASURE (scale: -1.0 to 1.0)
EGO-AROUSAL (scale: -1.0 to 1.0)
EGO-DOMINANCE (scale: -1.0 to 1.0)
ALTERS-PLEASURE (scale: -1.0 to 1.0)
ALTERS-AROUSAL (scale: -1.0 to 1.0)
ALTERS-DOMINANCE (scale: -1.0 to 1.0)
INCOMING-PLEASURE (scale: -1.0 to 1.0)
INCOMING-AROUSAL (scale: -1.0 to 1.0)
INCOMING-DOMINANCE (scale: -1.0 to 1.0)
OUTGOING-PLEASURE (scale: -1.0 to 1.0)
OUTGOING-AROUSAL (scale: -1.0 to 1.0)
OUTGOING-DOMINANCE (scale: -1.0 to 1.0)
MENTAL-ACTIVITY (scale: nonnegative integer)
INTROVERSION-EXTRAVERSION-RATIO (scale: 0.0+)

A mapping must be learned from these statistics into these four perception-classemes—thinking, feeling, intuiting, sensing. More accurately, thinking-feeling and intuiting-sensing are binary oppositions, so either pole from each opposition must be selected, but not both. To learn this mapping, machine learning is fed a corpus of weblog diaries already annotated with the correct classemes. Whence such a corpus? Conveniently, the desired classemes can be found in the Myers-Briggs Type Indicator (MBTI) (Briggs & Myers 1976) inventory of personality. Thinking-feeling and intuiting-sensing make up two of the four MBTI scales. MBTI is derived from Jungian’s psychological functions, and is widely used in pop cultural psychology tests such as Bloginality^¹⁹; in fact, because it is possible to search for all blogs which feature their author’s Bloginality test result, hence our annotated corpus.

About the MBTI. MBTI has four scales: Extraversion-Introversion, Sensing-iNtuition, Thinking-Feeling, and Judging-Perceiving. The first three scales were found to be independent, while the fourth was found slightly co-dependent on SN with S predicting J (Myers & McCaulley 1985). By combining the four scales, MBTI allows for sixteen Jungian types, e.g. ENFP, ISTJ, ISFP, etc. This evaluation examines the performance of the four individual scales. While in the actual MBTI assessment these scales are continuously-valued, for simplicity this evaluation treats each scale as a dichotomy.
Corpus. A corpus of roughly 3800 blogs was assembled, for which the MBTI of the blogger is known. “Known MBTI” is accepted as having met at least one of the following two conditions:

Blogger has listed MBTI type in their profile, and not listed any other competing/conflicting MBTI types there as well)
Blogger has featured in their blog a cut-and-paste entry stating the results of an online MBTI test they took, such as Bloginality (MBTI-clone), and not listed any other competing/conflicting MBTI test results as searchable in their blog.

From the 3800 blogs, 85,000 combined blog entries were mined, averaging 22 entries per blog. The average time spanned by the blog entries from each blog is 8 weeks.
Sanitizing. To further prepare the blog entries, noisy entries had to be identified and discarded. A common practice in blogging is the use of occasional canned entries or favorites lists. For example, a blogger may cut-and-paste the results of various online temperament tests and create a blog entry from that. Or, a blogger could fill in her responses to a ’20-questions’ type of personality inventory and make a blog entry from that. Canned entries were identified using clone detection (similar language, similar graphics) across all blog entries. Entries with long numbered lists were also discarded. Finally, null entries and entries without the presence of at least the pronouns “I” or “me” were discarded, as these texts are not likely to be egocentric. Finally, the corpus was pruned such that equal numbers of blog entries were available for each of the sixteen MBTI personality types (as this would create equal proportions of E-I, S-N, F-T, J-P, as a necessary testing condition).
Generating MBTI classifier. After RATE processing proceeds and the fourteen affective statistics are computed for each blog, the statistics along with known MBTI-labels, are fed into a machine learning algorithm to learn optimal numerical weights on each of the 14 profile features. Not only is this an unbiased way to learn a heuristic MBTI classifier for blogs, it is also a way to uncover the relative importance and efficacies of our ESCADA statistics. Boostexter is the machine learning system used, configured for 200 rounds of boosting, and n-grams up to two. Using the produced classifier weblog diaries can be used to roughly locate their authors in the Jungian perception space, though not with excellent granularity. Two of the MBTI scales learned by the MBTI classifier, are not used to create the person’s perception model.
§
Evaluation method. This evaluation challenges ESCADA to read blogs and classify bloggers into their Jungian personality type, as given by the Myers-Briggs Type Indicator (MBTI). The subset of results which are interesting to perception modeling are those which pertain to just the thinking-feeling and intuiting-sensing scales of MBTI. Notwithstanding, all four scales are presented here, for completion. To fairly simulate the efficacy of the ESCADA-derived classifier on unseen data, hold-one-out ten-fold cross-validation was used over the corpus of 3800 MBTI-annotated weblogs. The whole corpus was randomly divided into ten sections. Taking each section in turn as the testing set, the other nine sections served as the training set. Boostexter was again configured for 200 rounds of boosting, and n-grams up to two.

Figure 4 12. Results of ten-fold cross validation showing blog-level classification accuracies

Bounds on performance. Performance on each of MBTI’s scales is bounded below by fair chance guessing (50%), and bounded from above by MBTI test-retest reliability statistics. Because there were equal numbers of blog entries of each MBTI in the corpus, a lower bound on classifier performance is 50%, achieved by a classifier which tosses a fair coin to decide on the value for each of the four-scales. To note, the distribution of the sixteen MBTI types in the overall population is quite uneven, and in our experience gathering the online corpus of MBTI blogs, typing was also very uneven.
A loose upper bound on performance is the MBTI four-to-five-week test-retest reliability statistics. This bound hints at the underlying (in)stability of the MBTI personality inventory, notwithstanding still the de facto popular psychology assessment of personality. Myers and McCaulley (1985) survey continuous score correlations from ten studies for the four-to-five-week test-retest interval. They found reliability coefficients of .77 to .93 for EI, .78 to .92 for SN, .56 to .91 for TF, and .63 to .89 for JP. Assuming roughly binomial distribution for these scores, we estimate cursory median reliabilities of EI .84, SN .85, TF .73, JP .78. Given that the average blog found in our corpus has entries covering a time-span of 8 weeks, we regard the four-to-five-week test-rest reliabilities as loose upper bounds on performance for each respective scale.

Figure 4 13. Learned feature weightings for single-scale classification.

Results. Following hold-one-out ten-fold cross-validation, MBTI classifiers were trained for each of the ten training sets. The classification accuracies of these classifiers applied to their corresponding validation test sets are given in Figure 4-4. Average accuracies ranged between 0.58 and 0.67 for the four independent scales. Classification of E-I was most successful, at 0.67, while S-N was least successful, at 0.58. The scale classifiers demonstrated that they contain information by outperforming the lower bound of 0.50. On average, the classifiers underperformed their corresponding upper-bounds by margins of E-I 0.17, S-N 0.27, T-F 0.11, J-P 0.18. Under this context, T-F most closely approached optimal prediction, while classification of S-N was most ineffective using the ESCADA statistics over the blog corpus.
To ascertain the usefulness of the individual ESCADA statistics to each of the four MBTI scales, an analysis of Boostexter’s outputted .SHYP (strong hypothesis) files was undertaken. The .SHYP files contain the rules which constitute each classifier. For each scale, there were ten classifiers learned from the ten validation sets. The .SHYP file corresponding to each classifier was parsed, and the numerical weights and feature-names implicated in each of the rules were extracted. Based on the combined weights for each feature, and averaged over ten classifiers for each scale, the relative contribution of each feature was calculated. Results are given in Figure 4-5.
According to Figure 4-5, ego’s affect was most important, followed by the mental activity index, then by alters’ affect. Incoming and outgoing affects were more tenuous, while the introversion-extraversion statistic was not reliable for MBTI classification. For the E-I scale, pleasure and arousal of the ego, as well as pleasure flowing into the ego, were the more useful features. For the T-F scale, the ego-centric features, and in particular, the ego’s dominance dimension were most useful. The S-N and J-P scales appraised usefulness in similar fashion and shared common top features, suggesting some mutual information between those scales. The aggregate of incoming-outgoing features was more useful than the ego features and alters features for the S-N scale, suggesting that Sensing bloggers and iNtuiting bloggers can be distinguished by their different affective postures toward alters. By contrast, the greatest utility of ego features in the T-F scale accords with the intuition that T-F can be appraised more solipsistically than the other three scales. The mental activity index—which measures the quantity of vocalizations of mental hypotheticals, e.g. “I thought that”—was a top-three useful feature in S-N, F-T, and J-P, but not in E-I. One could take this result to suggest, counter the intuition of some, that extraverted and introverted bloggers can hardly be distinguished by how they vocalize their thoughts and opinions. Or, this result could be owed to the nature and culture of blogging, which is arguably a revealing activity, and a venue for dramatic performance (Boyd 2004).
Of pertinence to perception modeling, results were mixed. Location within thinking-feeling most closely approached its upper bound, with an accuracy of 0.62 +/- 0.05. Location within sensing-intuiting faired more poorly with an accuracy of 0.58 +/- 0.05, far from the upper bound of 0.85, and only slightly better than guessing.

1 ... 10 11 12 13 14 15 16 17 ... 28