Computing Point-of-View: Modeling and Simulating Judgments of Taste

səhifə	2/28
tarix	26.06.2016
ölçüsü	8.55 Mb.

1 2 3 4 5 6 7 8 9 ... 28

Aperitif

Wither the future battles of humankind? I believe they will increasingly be fought in the aesthetic plane. Media systems and commerce begin to unravel the use function of aesthetics, leading to more systematic productions of poetics. The willful construction of perspective, authenticity, and image up the ante in the worlds of ideology and marketing. Today’s artificial intelligence personalizes search and recommends books, but tomorrow’s will likely design our life-styles and proustian recommenders will select wines and spirits to release particular memories and desires submerged within each of us.

While the explicit topic of this dissertation is point-of-view, its underlying thematic is certainly aesthetics. A point-of-view, after all, may be recognized as a coherent and comprehensive system of aesthetics, efflorescing from the limitless ecology of aesthetics that is life. What renders point-of-view such a challenging study is precisely its complex etiology—just as each snowflake is constituted idiosyncratically by an unknown mixture of passing clouds, so too are our perspectives shaped by psychological predispositions, past experiences, and culture embeddedness.
Point-of-view and its aesthetics have long been studied poetically and rhetorically. Here, I embark upon yet another such philosophical investigation of the topic, this time informed by a computational perspective. Over the past four years, I have implemented a cadre of experimental computational systems, which automatically model and simulate particular persons’ judgments of taste in various aesthetical realms. At some point in this process, the versatility of the approach that was being applied across these realms became aware of itself, and informed by semiotics and cognitive science, a methodology began to crystallize. So, these systems and their results are presented in this thesis—now unified under a common theoretical discourse and computational framework.

0 Introduction

Our capacity for aesthetics and affectedness is one of the most celebrated bastions of humanity. Underlying our explicit knowledge and rationality is a faculty for judgment—the impulsion to prefer, to view the world through our individual lenses of taste. An interesting intellectual question is: can a computer model a person’s tastes, attitudes, and aesthetics richly enough to predict their judgments? This thesis explores one answer to the question.

Our investigation flies under the banner of point-of-view for two reasons. Firstly, the term reflects an understanding that individual tastes are seated in, and articulated against a social and cultural fabric. Secondly, ‘point-of-view’ is developed to mean not isolated taste judgments, but rather, a coherent and systematic apparatus that engenders such judgments.

1.1 Thesis summary

Each person has their own tastes, attitudes, and ways of perceiving the world. I believe that these aesthetic dimensions of our selves are revealed in our everyday writings—weblog diaries, commentary-rich papers, social network profiles, instant messenger conversations, personal emails, and so on. I focus on the genre of everyday texts, which offers first-person and self-expressive accounts of everyday happenstance, and I point out that in this Information Age, these texts are par excellence portraits of who we are. Unlike esoteric ‘user models’ of persons acting within the context of particular computer applications, everyday texts portray persons in the most general sense possible—their domain of discourse is social, pragmatic, everyday life. In this thesis, I show that it is possible to build models of persons’ tastes, attitudes, and ways of perceiving the world by reading their everyday texts. The accuracy of these models and their applications will also be addressed. The rest of this thesis summary 1) discusses related work; 2) introduces the approach taken; summarizes the premise, methods, and results of person modeling in the three primary realms of 3) cultural taste, 4) attitudes, and 5) ways of perceiving; and finally, 6) reviews six implemented applications for the produced models. Subsequent chapters will address all of these topics in detail.

§
Before describing the approach taken by this thesis, I will frame the modeling portion of this work within related work in user modeling, natural language processing, and semiotics.^¹
In user modeling and its related literatures of user-adaptive systems and recommender systems, we find a great deal of prior work on predictive models of people. Two important paradigms are category-based models and behavior-based models. Category-based models—such as Elaine Rich’s (1979) groundbreaking book recommender system, GRUNDY—models users by first collecting a profile of attributes describing the user, and second generalizing the user model from these attributes, based on a priori stereotypes. For example, women 26-35 years old could stereotypically prefer “romance novels.” Whereas each attribute activates a different set of stereotypes with varying numerical strengths, the final recommendation is entailed by the strongest stereotype overall. The categorical approach is a sound one, but models built from a priori stereotypes tend to be overly generic descriptions of individuals. The other paradigm of behavior-based models is a posteriori and predominantly data-driven. Software sensors collect users’ behavioral traces through an application and then generalize each user’s traces via statistical inference. Key examples of the behavior-based approach include collaborative filtering (Shardanand & Maes 1995), and Bayesian goal inference models (Horvitz et al. 1998). This approach has seen a great deal of success, and has been applied to a great variety of recommender systems and intelligent tutoring systems. A caveat related to this approach is that behavioral traces are often specific to an application domain, and may not well describe a person in general. These two paradigms are complementary though, and some systems including Selker’s (1994) COACH system and Orwant’s (1995) DOPPELGANGER system demonstrates aspects of both. While many of the behavior-based systems are more statistical (e.g. collaborative filtering recommenders), systems such as the COACH intelligent tutoring system are also knowledge-based. DOPPELGANGER also exhibited aspects of both paradigms. The current behavior of users (e.g. ‘hacking’, ‘writing’, ‘idle’) was predicted with Markov models, thus adopting a behavior-based approach. Dynamic categorical models generated for communities that the user was a member of served to supplement each user’s profile of interests; thus harkening to the category-based approach.
In the computational linguistics literature, we find related work on methods for computing the subjective and affective dimensions of text. Wiebe, Wilson, Cardie and others have taken a corpus-annotation approach to tracking subjectivity in third-person narratives (Wiebe 1994) and to characterizing the degree of subjectivity in textual passages (Cardie et al. 2003; Wiebe et al. 2004). A drawback of the corpus-annotation approach is that it is supervised. Another approach to computing textual affect is by exploiting dictionaries annotated with semantic orientation, or, prior polarity, of words and concepts (Turney & Littman 2003). With this method, numeric priors on words become building block for statistical estimation of larger pieces of text, and some research has nuanced the application of polarity dictionaries by considering negations and intensifiers (Grefenstette et al. 2004a; Polanyi & Zaenen 2004, Wilson, Wiebe & Hoffman 2005). Liu et al. (2003) suggested a complementary method for appraising text’s event structure using common sense knowledge, in order to also account for event-level connotations, e.g. “be(person, fired).”
In the semiotics literature, we find helpful frameworks for structuring readings of text. Greimas’s (1966) isotopy model of reading describes the reading process convergently, as monosemization. The meanings of words in a textual passage mutually disambiguate one another, converging upon a system of stable themes which represent the overall understanding had from the text. Paralleling Greimas’s bottom-up movement from words to themes, Zholkovsky (1970b) describes a top-down movement from larger theme to specific expression devices which manifest bits of the theme. Finally, in literary stylistics, cf. Todorov’s (1968) introduction, it is thought that the most important evidence for a writer’s style and tastes can be found in the emotive dimensions of a text’s themes—its affective themes, so to speak. Sack’s (1994; 2001) SpinDoctor exemplifies how natural language processing techniques can be employed to operationalize semiotic frameworks. SpinDoctor was able to detect the ideology implied by a news story by reading for ideologically motivated actor-role bindings (Greimas 1987)—e.g. “Oliver North” is an actor who may be bound to various roles, such as “criminal” or “patriot,” by authors with different ideologies.
§
My approach to modeling a person’s tastes, attitudes, and ways of perceiving from their everyday texts relates and contributes to each of these three bodies of literature. Existing textual affect analysis techniques (via dictionaries of word affects and via commonsense-based sensing) are leveraged as a building block technology to advance the greater goal of reading for stable affective themes emergent from a person’s everyday texts—it is hoped that these affective themes will accurately depict a writer’s tastes, attitudes, and ways of perceiving. However, it is recognized that what directly results from such readings will be sparse and disconnected. To generalize a more comprehensive model of a person from fragmentary textual evidence, the person’s evidence is situated in cultural patterns of tastes and attitudes that underlie our society. To acquire these cultural patterns, we engage in comparative readings of hundreds of thousands of people’s everyday texts, finally arriving at a ‘map’ of culture’s topological space. Each person’s textual evidence is located on the map, and can be generalized by activating the neighborhood that surrounds the person’s location. The idea of locating persons in the cultural space inherits from both the behavior-based paradigm and the category-based paradigm. The topology of the cultural space is acquired by statistical inference, but in essence, the knowledge embodied by the topology is still a stereotype, albeit a data-driven one.
This thesis considers models of persons within five aesthetical realms. The three primary realms which are narrated in this thesis summary have already been enounced—attitudes, cultural tastes, and ways of perceiving. Two other realms discussed only in later thesis chapters are taste for food, and sense of humor. These five aesthetical realms are not claimed to be canonical, nor are they claimed to be independent in scope. Rather, realms were chosen opportunistically based on available everyday texts, and my own sense of interesting directions to explore.
A brief description of each realm is now given. An attitude is defined as an affect about some topic, so the realm of attitudes considers a person’s feelings toward every topic under the sun. In the realm of cultural tastes, a person is seen as a pattern of consumption, over a field of consumer interests and identities (e.g. books, music, films, foods). In the realm of perception, a person’s psychological dispositions are modeled as a coordinate location in a four-dimensional Jungian space whose axes are think, feel, sense, and intuit—proposed by Jung (1921) as four basic psychological functions. A perception model addresses the question—is the person a realist who senses and thinks or is she a romantic who intuits and feels? In the realm of food, a person’s taste is modeled as a pattern of liking and disliking over a densely connected semantic fabric of foodstuffs (e.g. flavors, sensations, ingredients, dishes). The realm of humor is motivated by Freud’s (1905) theory of tendentious jokes as outlets for psychic tensions. Thus, a person’s humor model resembles an attitude model—it considers the person’s psychic tension levels toward every topic under the sun.

Figure 1 1. Triptych summarizing the thesis’s approach to modeling persons in aesthetical realms.

The person modeling process which is applicable to each of the five realms can be summarized as three steps (Figure 1 -1)—acquire, generalize, and apply. Each of the five realms has a topological space, and some of these spaces need to be modeled by employing culture mining to analyze cultural corpora—for example, the connectedness of nodes in the semantic fabric of cultural interests is acquired by analyzing pairwise affinities between interests, across 100,000 social network profiles. A person’s everyday texts are read for stable affective themes, and this textual evidence is regarded as location in the topological space. In a second phase, a generalized model of a person is produced. Her location is expanded into a more general surrounding neighborhood, which is implied by the topology of a particular realm. Spreading activation and analogy are two methods used to perform this generalization. In the third phase, the generalized model is employed by a range of applications to simulate the taste, attitude, or perceptual perspective of a person on arbitrary input.
To illustrate and concretize the described approach, the next three subsections narrate the person modeling process for the three realms of cultural taste, attitudes, and ways of perceiving.
§
To model a person’s cultural tastes, social network profiles were focused upon as an everyday text most suitable for acquiring person models. Today tens of millions of internet users are members of social networking sites—such as Myspace^², Friendster^³, Orkut^⁴, and Facebook^⁵-- on which they maintain a text profile of their cultural interests (e.g. favorite books, music, films). In modeling cultural tastes, we assume that tastes are not arbitrary—that there is an unconscious gestalt that unifies each person’s selections of cultural interests. This coherence assumption is consistent with recent consumerist theory stating that people’s consumptive choices tend to form gestalts—McCracken (1988) termed these “Diderot unities” and Solomon & Assael (1987) called them “consumption constellations.” Proceeding from this premise, cultural taste modeling aims to analyze a corpus of 100,000 social network profiles, and extract from them a topology of cultural tastes, which can be used to generalize a single person’s profile.
An algorithm for ‘culture mining’ from 100,000 social network profiles is now described. Each social networking website, although having a somewhat different design, does observe certain conventions in how it elicits and displays users’ profiles. The convention is that a user’s interests are broken down into organizing categories, the most common being ‘books’, ‘music’, and ‘movies’. Within each category, interests are typically given as a token-delimited list. There is typically also an overarching category, called variously ‘passions’ and ‘general interests’—to emphasize their importance, our present modeling maps these descriptions into an ontology of identity descriptors (e.g. ‘fashionista’, ‘book lover’). Figure 1 -2a shows a typical social network profile, taken from the Orkut social networking site—identity and interest categories are depicted. A first step of processing is to normalize as many of these natural language fragments as possible, using ontologies of interests and identities assembled from folksonomies such as DMOZ^⁶ and Wikipedia^⁷. Figure 1 -2b supposes that some subset (the four nodes shown in black) of the profile’s descriptions have been mapped into recognized identity and interest descriptors. In actual processing of the 100,000 profiles, the rate of recognition was 68% of natural language fragments, including 8% false positives. Also shown in Figure 1 -2b are red edges and red nodes—these are metadata associated with the black nodes. They are added to each user’s profile to improve the robustness of the profile, but at a discounted strength of 0.5 per link-hop traversed away from a black node.

Figure 1 2. A walkthrough for cultural taste modeling

Next is a machine learning step which operationalizes the aforementioned assumption that each profile has taste coherence. The goal is to learn the numerical strength of affinity between every pairwise combination of interest and identity descriptors. In a parsed profile, each possible pairwise combination of descriptors is recorded as a co-occurrence. Using the pointwise mutual information (PMI) (Church & Hanks 1990) measure of semantic similarity, the aggregate co-occurrence data for the 100,000 profiles is analyzed, and a PMI score representing affinity is calculated for each pair of descriptors. After pruning, what results is a 12,000 by 12,000 correlation matrix of learned affinities—I will call this a semantic fabric of cultural taste (or taste fabric for short) to emphasize the graph’s density and to re-introduce the spatial metaphor.
Using this taste fabric which has captured cultural patterns of taste, a person’s profile can be generalized. An algorithm for producing a generalized model of a person’s cultural tastes is now described. Suppose the profile shown in Figure 1 -2a is inputted to the model generalizer. First the profile is segmented into natural language fragments, those fragments are normalized via ontology recognition (Figure 1 -2b) and metadata is added at a discount, and the normal and metadata nodes are located into the just-acquired taste fabric (Figure 1 -2c). Via spreading activation outward from these nodes along the fabric’s edges (e.g. discount = 0.5), a generalized model of the person’s taste is produced (Figure 1 -2d). It is well described as an activation cloud, or, to emphasize that it has captured the general interests of the person, we can call this a taste ethos formation. It is useful to illuminate two interpretations of this generalized model. Taking the set intersection between this model and the original profile (assuming that its contents were fully normalizable), Figure 1 -2e shows that in addition to the recognized descriptors, the generalized model has rediscovered other descriptors that were present in the profile, but failed to be recognized; a soon to be described evaluation treats these rediscovered nodes as verification that the algorithm has worked properly. Finally, Figure 1 -2f shows the complementary set of nodes which are not in the profile—we can interpret these as the identities and interests that are suitable to offer up as recommendations.
Some interesting results from taste fabric production are its emergent topological features such as cliques (i.e. a cluster of nodes with strong mutual connectedness) and hubs (i.e. nodes with strong connections to an unusually high number of other nodes, sometimes called ‘stars’). Figure 1 -3 depicts two topological features—the identity hub ‘existentialist’ in the left pane, and two taste cliques straddled by ‘brian eno’ in the right pane. To produce the unexpectedly sparse semantic network shown in this visualization, most of the graph edges in the original 12,000 x 12,000 correlation matrix were thresholded away, until only thousands of the strongest edges remained; edge strength is not shown in the visualization. Identity hubs and taste cliques are interesting because they are disproportionately influential features in the spreading activation process. A telling statistic is that after profiles are normalized into ontology, each identity descriptor in the ontology occurred in the corpus of 100,000 profiles on average 18 times more frequently than the typical interest descriptor. We hypothesize that in the graph, identities behave as indexicals, serving as hubs around which interests organize—as shown in Figure 1 -3’s left pane, ‘existentialist’ is an identity hub which can be seen to unify a variety of interests matching intuition, such as ‘albert camus’, ‘friedrich nietzsche’, ‘death of a salesman’, etc. Likewise, taste-cliques shown in the right pane may behave like identity hubs in spreading activation, because as the size of the clique increases, so does its influence—measured as the number nodes leaving the clique. We may think of taste cliques as unnamed indexicals. That taste cliques and identity hubs seemingly organize the taste fabric is consistent with consumer theorists’ observations that taste is shaped under ‘Diderot unities’ (McCracken 1988) and ‘consumption constellations’ (Solomon & Assael 1987).

Figure 1 3. Topological features in the learned taste fabric—identity hubs (left) and taste cliques (right)

The accuracy of the generalized model of cultural taste was evaluated against three control systems in a five-fold validation experiment over the corpus of 100,000 profiles. Given the task of producing a complete recommendation—total rank-ordering of interest and identity descriptors, a control system approximating item-item collaborative filtering (Sarwar et al. 2001) yielded an accuracy of 0.74, which was exceeded by our system’s accuracy of 0.86. It was also found that identity nodes improved recommendation accuracy by 0.05, and taste cliques were also beneficial, though less so. These results provide one measure of validation of the described method for generalizing a model of persons’ cultural tastes from their social network profile.
§
We now shift gears to consider modeling a person’s attitudes from their everyday texts. We begin with a working definition for attitude—an attitude is a discourse topic, imbued with some affective tint; that is to say, attitude is how you feel about some thing, some one, some event. This definition is consistent with Ortony, Clore & Collins’s (1988) position on emotion—that they always result from cognitive appraisal about some thing, some one, some event. How is the affective tint quantified? We chose Mehrabian’s (1995b) PAD model of affect—which considers affect in a three dimensional Cartesian space whose axes are pleasure-displeasure, arousal-nonarousal, dominance-submissiveness. To model a person’s attitudes, we regarded attitudes as affective themes to emerge from readings of weblog diaries and commentary rich papers. By strategically choosing everyday texts, which are defined as first-person and self-expressive, we hoped to avoid the difficult problem reported by Wiebe (1994) of tracking character viewpoints in third-person texts. With everyday texts, it is a reasonable assumption to attribute affective themes to the writer’s attitudes.
The algorithm used for Reading for Affective Themes (RATE) is now described. The algorithm is consistent with Greimas’s (1966) isotopy model of reading, which stipulates that in reading, meaning is convergent, and the end product of reading is an isotopy—a system of stable themes (Greimas called these classemes). In the beginning is personal everyday text, for example, a weblog diary. The diary is segmented, tokenized, and a surface syntactic parse of the text is made with the MontyLingua natural language processor (Liu 2002). The diary’s original space of words are now reduced to keywords, key phrases, and parsed event structures, where event equals verb(subject, object, indirect objects*). Each of these textual entities are polysemous and taken alone, they have numerous mutually incompatible connotations. However, by intersecting all the entities’ connotations, certain threads of consistent meaning emerge, and these include discourse topics. Accompanying each textual entity is also an affective characterization scored in the PAD model—for example, (P:+0.9,A:-0.2,D:+0.5) might correspond with the affect ‘smug’. Just as textual entities are summed up into emergent topics, so each textual entity’s PAD affect can be statistically averaged into the emergent topic’s stable affect. The association of topic plus its PAD affect is called the stabilized attitude. Proceeding along these lines, a system of attitudes emerges from the weblog diary.
Reading for affective themes in this simple associative way was accomplished with lightweight natural language processing tools, embodying a knowledge-based approach. Natural language parsing tasks were all performed by MontyLingua (Liu 2002). Topic identification was performed using the guess_topic() function in the ConceptNet (Liu & Singh 2004) commonsense reasoning toolkit, augmented with topic hierarchies mined from DMOZ. Eagle et al. (2003) had made similar use of ConceptNet for topic spotting in spoken conversations. Affective analysis of text in terms of the PAD model was achieved by hybridizing a superficial sensing approach with a deep sensing approach. Superficial sensing utilized lexical affect dictionaries which state prior polarities for words—one such dictionary employed was ANEW (Bradley & Lang 1999) and another custom-built dictionary was created out of sentiment headword classes in Roget’s (1911) English Thesaurus. For deeper textual analysis via common sense inferences about event structures in the text, Emotus Ponens (Liu, Lieberman & Selker 2003) was used.
The explicit descriptors extracted from a social network profile were an incomplete model; likewise, the explicit system of attitudes read out of a weblog diary are also incomplete, and require generalization. An algorithm for producing a generalized model of a person’s attitudes is now described. One way to visually represent a person’s location in the space of possible attitudes is via semantic sheets. As shown in Figure 1 -4a, the upper sheet is a grid enumerating all possible topics. A person’s attitudes are captured in the lower sheet, as a grid of PAD affects associated with each possible topic. Walking through generalization, the process begins with the results of RATE being plotted onto these semantic sheets. Figure 1 -4b supposes that two stable attitudes resulted from such a reading—general arousal (P:?, A:+, D:?) about ‘feminism’ and anger (P:-,A:+,D:+) about ‘drugs’. A first generalization technique is to propagate these attitudes to topics closely related to ‘feminism’ and ‘drugs’. Spreading activation along topic hierarchy lines (gotten from DMOZ and ConceptNet) achieves this (Figure 1 -4c). Note that certainty of the spread attitudes is discounted. A second technique is structure-mapping analogy (Gentner 1983). Figure 1 -4d shows that an attitude about ‘aids’ is inferred by analogy with ‘cancer’, based on the topics’ shared attributes. ConceptNet is used to perform analogy. Certain cases of analogy tend to produce wrong inferences, such as inferring that attitudes about ‘dog’ will translate to the taxonomically and functionally similar ‘cat’. These exceptions are discussed in Chapter 2. Next, Figure 1 -4e depicts the idea that your attitudes can be supplemented by introjecting the attitudes of your imprimers—who Minsky (forthcoming) describes as those whose goals and values you mimic, such as mentors and parents. Currently, the identification of imprimers requires supervision. An algorithm identifies imprimer candidates by reading for the co-occurrence of self-conscious emotions (e.g. embarrassment, pride) with mention of persons, but as yet, a corpus of texts cannot be automatically assembled for imprimers. Figure 1 -4e shows that an imprimer’s explicit attitudes can align with and supplement the model of the one who is imprimed (these also occur at a discounted certainty). Finally, we can see that the result of generalization (Figure 1 -4f) has spread the original explicit attitudes into its surrounding semantic neighborhood.

Figure 1 4. A walkthrough for generalizing attitudes

To evaluate the acquired model of attitudes, two experiments were conducted—their results are summarized below. A first experiment was conducted for political culture. Two political corpora were assembled from Democratic Party and Republican Party political speech transcripts and these were used to generate a “Virtual Democrat” and a “Virtual Republican.” Examination of their most positive and most negative attitudes were consistent with intuition, though some attitudes were seemingly incorrect. For example, Virtual Democrat held very negative attitudes about ‘god’ and ‘elderly’, and Virtual Republican held a very positive attitude about ‘poor’. These illustrate one drawback of the associative approach to reading for affective themes—Democrats were actually negative about the invocation of ‘god’ in civil affairs, but that nuanced attitude was incorrectly generalized to ‘god’. Using Virtual Democrat and Virtual Republican to define the poles of a political spectrum, the political bias of six major U.S. newspapers were calculated as a function of their degree of alignment with either pole’s attitudes. With some scale normalization, the results of this experiment were found consistent with a previous study which specified the media bias of these same newspapers (Groseclose & Milyo 2004). The only major discrepancy was that “Wall Street Journal” was found to be conservative leaning using the thesis system while the prior study found it to be liberal. With some investigation into political theory, the discrepancy was explainable—the thesis system looked at the editorial texts of ‘Wall Street Journal’, while Groseclose & Milyo looked at news articles.
A second experiment evaluated the accuracy of attitude prediction using the generalized model against human raters. Measuring the deviation of the model’s PAD reaction to news articles against the actual PAD reaction of the corresponding human raters, it was found that Arousal (A in PAD) prediction was most accurate, with an average deviation of 0.22 (out of 2.0 max). The prediction of Pleasure and Dominance was promising, but variance was large, and their 95% confidence intervals overlapped with one of two baselines. One interpretation of these results is the Arousal is easier to predict than Pleasure or Dominance because it is more amenable to additive calculation. More details of the studies are presented in Chapter 4.
§
We describe person modeling experiments for a third realm—the realm of perception. Consider that persons are disposed to perceive and engage the world in different ways. For example, realists and romantics seemingly interpret the world in antithetical ways. But what sort of theoretical framework can betray the differences between realists and romantics? With a salutary spirit, Carl Jung’s (1921) theory of psychological type was adapted as the framework to be experimented with. Jung proposed four basic psychological functions—sense, intuit, feel, think—and suggested that each person is disposed to engage these functions to various degrees. A most simple model then, would be to take each function as an axis in a four-dimensional Cartesian space. Assuming each axis ranges from 0.0 (not engaged) to 1.0 (fully engaged), a four-tuple coordinate would indicate a person’s location in perception space. As Figure 1 -5 shows, the 4-d model of perception simulates a person’s interpretive process by acting as a ‘prism’ that refracts meaning. For example, as shown the prism is passing over the topic of ‘sunset’. Along each axis gather keywords forming one interpretation of sunset. For example, ‘feel’ interprets ‘sunset’ as ‘beauty’, ‘warmth’, ‘home’. The system predicts that a realist, located at (0.2, 0.8, 0.2, 0.1), would roughly adopt 20% of keywords from ‘feel’, 80% of keywords from ‘sense’, and so on. A person’s way of perceiving is thus taken to be a mixture of the Jungian interpretations.

Figure 1 5. The space of possible ways of perceiving

How can such a model be acquired by reading a person’s everyday texts? A first-pass experiment was completed, to see if affective patterns of communication in a weblog could implicate a rough location of the person in the Jungian space. In this proposed reading, the affective themes being read for differ slightly from the reading scheme proposed for attitude modeling. The following affective themes resulted from the proposed reading:

EGO-PAD (writer’s average PAD-level)
ALTERS-PAD (other persons and things’ PAD-level)
INCOMING-PAD (PAD flowing from alters into ego)
OUTGOING-PAD (PAD flowing out from ego into alters)
MENTAL-ACTIVITY (frequency with which mental hypotheticals were invoked, e.g. “I thought that”)
INTROVERSION-EXTRAVERSION-RATIO (ratio of passive acts e.g. ‘resent’ to active acts e.g. ‘murder’)

These themes are further explained in Chapter 4. A blog corpus for 3800 bloggers with known MBTIs was analyzed for the above statistics. MBTI is the Myers-Briggs Type Indicator (Briggs & Myers 1976), a popular psychological inventory of personality, and is based on Jung’s theory of psychological types. MBTI performs binary classification of persons on four scales—Extravert-Introvert (E-I), Sense-Intuit (S-N), Feel-Think (F-T), Judge-Perceive (J-P). For our purposes, only S-N and F-T classification are relevant. A machine learning algorithm called BoosTexter (Schapire & Singer 2000) was fed the affective theme features and from the 3800 blogs, learned binary classifiers for S-N and F-T. Evaluating this classification with ten-fold cross validation, average classification accuracies were 0.58 for S-N, and 0.62 for F-T. These exceeded a 0.5 lower bound corresponding with guessing, but were well short of upper bounds of 0.85 for S-N and 0.73 for F-T, estimated from MBTI’s five-week test-retest reliability statistics (Myers & McCaulley 1985), which represent a fundamental limitation on the stability of the MBTI scales. Looking at a decomposition of the classifier’s learned features, for S-N classification, affective exchange (incoming and outgoing PAD) was more important than ego and alter’s affects; for T-F classification, ego’s affect was more important than alter’s affect or exchanged affect. This result was nicely consistent with intuition about these Jungian dimensions. The overall result is promising, but this computational reading approach is still a ways away from being viable for real-world application.

§
Up to now we have explored how models of a person’s tastes, attitudes, and ways of perceiving can be acquired and generalized. The results of evaluation proved the general promise of the approach, but also illuminated some weaknesses, which suggest an agenda for further work. Another more tantalizing ‘result’ is a slew of implemented applications driven by these person models (Figure 1 -6)—these concretize and motivate the thus described modeling. What is a person model in the aesthetical realms good for? They enable new tools for self-reflection, person learning, and deep customization. The six implemented applications are now introduced. Subsequently, I distill some interaction design principles gotten from reflecting on building these artifacts.
What Would They Think? (WWTT) (Liu & Maes 2004) is a panel of virtual mentors who reside on the computer desktop, and offer users just-in-time affective feedback. Figure 1 -6a shows WWTT configured to display a panel of mentors from Artificial Intelligence—(from left to right) Rodney Brooks, Seymour Papert, Rosalind Picard, Marvin Minsky, Douglas Lenat. WWTT is envisaged as a novel way to learn about a mentor’s points-of-view. As the user browses web pages, and types emails and papers, the virtual mentors continuously observe the user’s read-write textual activity. The user’s present textual context becomes input for these virtual mentors, and according to each mentor’s generalized model of attitudes, a simulated reaction is produced to the input. Reactions, in the form of a PAD value, are graphically rendered according to this visual metaphor—green for pleasure, red for displeasure; brightness for arousal; blurry if submissive, and sharp if dominant. To learn more about the justification for a reaction, reacting mentors can be double-clicked—bringing about an offering of quotes from that mentor’s personal texts that best justify their reaction. WWTT’s suitability as a tool for person-learning was evaluated in a 36-person user study. The task was to answer a multiple choice test about strangers’ personalities, explicit attitudes whose evidence was located in their weblog diaries, and implicit attitudes whose evidence is not stated in their diaries, but whose correctness was verified by each diary author. Study participants formed three groups—one browsed strangers’ weblogs, another used a text-search version of WWTT, and a third used the full WWTT system. Results of the study showed with statistical significance (95% confidence) that WWTT users outperformed text-search WWTT users on personality questions; WWTT users outperformed both text-search WWTT and weblog users on questions about explicit attitudes. All three groups have equally poor performance on implicit attitude questions, which tested knowledge of attitudes not stated in the weblog diary. These results are strong evidence for the utility of models of personal attitudes presented in the WWTT interface.
The Identity Mirror (Liu & Davenport 2005) shows you your cultural identity as a swarm-of-keywords (Figure 1 -6b). Monitoring your social network profile and generalizing your taste ethos from that profile produces a visualization of the person’s cultural identity in the form of an abstract mirror. Basic image recognition and image tracking was implemented to allow a viewer to interact with his reflection. Walking to and fro, the viewer traverses granularities of identity, from broad descriptions (e.g. identity descriptors, music genre, book genre) of a far-away viewer, to specific descriptions (e.g. song titles, book titles, film titles) when the viewer is up-close. Finally, a viewer’s reflection is time-variant—representing the understanding that as culture’s priorities and desires change, so does cultural identity, which is always articulated against cultural priorities. An advantage of the taste ethos representation is that it is can be biased by activating contextual nodes in the taste fabric to represent the present concerns of culture. A topic parser monitors live news feeds and automatically biases the taste fabric. For example, during Oscars season, nodes relating to film and entertainment are activated in the taste fabric, and as a result, the mirror’s reflection reveals a more glamorous entertainment-oriented facet of your taste ethos.
The Aesthetiscope (Liu & Maes 2005b; Liu & Maes 2006) is a perspective-driven abstract art bot (Figure 1 -6c). In the spirit of Ellsworth Kelly and early twentieth century abstract expressionist artwork that took the form of a semantic color grid, the Aesthetiscope renders inspirational texts (e.g. a word, a poem, song lyrics) into abstract color grid artwork. A model of the viewer’s preferred ways of perceiving and interpreting the inspirational text controls the chosen combination of colors. The viewer’s model is given as four 0.0-1.0 numbers corresponding to the viewer’s disposition for the four scales—think, feel, intuit, and sense. Just as wine may be chosen to pair with foods, the Aesthetiscope generates perspective-specific artwork that pairs with a user’s music playlist and choice of poems. A user study of the Aesthetiscope’s generated artwork validates the claim that artwork that is specific to an inspirational text has greater aesthetic efficacy than artwork with mismatched text. An interesting implication of the Aesthetiscope beyond its deep customization is the ability for viewers to communicate their differing perspectives to each other, and to explore the intersection of their perspectives.
The Synesthetic Cookbook (Liu, Hockenberry & Selker 2005) is an interactive recipe browsing interface backed by a mined semantic fabric of food consisting of 60,000 recipes, 5000 ingredient keywords, 1000 sensorial keywords, 400 cooking procedures, 400 nutritional terms. In the cookbook, virtual tastebuds (Figure 1 -6d) simulate persons’ reactions to recipe selections. A tastebud in food space is akin to cultural taste ethos in cultural taste space. A virtual tastebud is represented as an activation cloud, and is generated by spreading activation across the food fabric, from a starting profile of a user’s likes and dislikes. To account for the importance of disliking in taste for food, the food fabric implements negative activations, or inhibitions, in addition to the more standard positive activation.
Ambient Semantics is a wearable information system that offers wearers of its tag-reading wristband form-factor just-in-time feedback on books that the wearer picks up, and on people that the wearer meets. Each wearer is backed by their social network profile and a model of their attitudes generated from their corpus of everyday texts. Figure 1 -6e is a screenshot of the system’s social introduction faculty. When two strangers meet, the system tells them what they have in common. More than intersecting their keywords, the system explains shared identities, and cultural interests that are in the shared context of their intersecting taste ethoi. By illuminating shared context rather than explicit keywords, there are more opportunities to seed ice-breaking conversations and make social bridges.

Figure 1 6. Summary of built applications

Cartharses (Figure 1 -6f) is a Freudian joke-teller which observes a user’s read-write activity, monitors his/her psychic tension levels about various topics, and delivers jokes by mapping the user’s pattern of tensions into archetypal patterns of tension associated with each niche family of jokes. Rather than inventing jokes, Cartharses selects jokes to tell from a repository of 10,000 jokes. Freud’s (1905) theory of tendentious jokes stated that the function of these jokes was to give catharses to bottled up psychic tensions. The reason why jokes are often grouped into ethnic families is because an ethnos has shared upbringing, and thus, shared patterns of psychic tension. A person’s tension model is represented using semantic sheets similar to an attitude model, except that a measure of psychic tension supplants the PAD measure of attitude. In Cartharses’ backend (pun intend), a corpus of 10,000 jokes was categorized into humor niches—such as blonde jokes, foreigner jokes, sexual jokes, Bush jokes, Clinton jokes—and archetypal tension patterns were learned for each family of jokes, from a corpus of bloggers who appreciated each type of joke.
The process of conceiving, prototyping, implementing, and refining these artifacts was certainly any evolutionary one. These artifacts make use of rich generalized models of a person’s tastes to simulate not just bits of a person’s preferences, but a systematized and comprehensive account of their preferences—their points-of-view, if you will. Because the artifacts have novel capability, they necessitate novel interaction design. Certain ideas worked better than others, and eventually themes emerged. In the spirit of reflexive practice, three design lessons for this class of perspective-based applications were distilled from the application development process.

#1—continuous observation of a user’s textual activities and immediate feedback ensures an interesting ‘walk’ through a perspective, which only becomes apparent through animation.
#2—feedback given just-in-time assures that a user benefits from the perspective when interest is piqued, and feedback given just-in-context provokes the user’s imagination and critical abilities, since a reaction is synthesized for a new textual situation whose attitudes may never have been stated explicitly in the person’s everyday texts.
#3—the perspective should be tinkerable so that users can better grasp the capabilities and limitations of the artifact—e.g. tastebuds can be reprogrammed, perspective in the art bot can be shifted via sliders, and virtual mentors can be prompted to explain their reactions.

In this section, I summarized the thesis’ experiments in person modeling and their results. The next section further distills these thesis results into contributions and a roadmap to the subsequent thesis chapters.

1 2 3 4 5 6 7 8 9 ... 28