Regimes in Social-Cultural Events-Driven Activity Sequences

səhifə	2/3
tarix	24.06.2016
ölçüsü	285 Kb.

1 2 3

THE BAYESIAN NETWORK MODEL
Variables

The unit of observation is an event referred to in the month-diaries. For each event, we record the type of event (Event), the preparation activity (PreA), if any, that induced the longest travel time and the aftermath activity (PostA), if any, that induced the longest travel time. Furthermore, the travel time and transport mode of each trip related to the event (EvTt, EvMo), the preparation activity (PreTt, PreMo) and the aftermath activity (PostTt, PostMo) were recorded. Considering the skewness of frequency distributions of activities across types in the before as well as after stage (Table 2), we used a limited classification of preparation and aftermath activities distinguishing between home-based activities, shopping activities and Other out-of-home activities. If no preparation or aftermath activity was observed, then this was encoded as a zero value of the variable. Since network-learning algorithms require that the variables have discrete values, the travel time attribute was discretisized into 5 categories including no travel and 4 equal-frequency intervals of travel times for each trip purpose, i.e., event itself, preparation or aftermath activity. Transport modes were classified into 4 categories including car as driver, slow (by bike or foot), public transport (bus, train, etc.) and car as passenger. As for the external variables, we included in the analysis the attributes at household and individual level that were covered by the questionnaire. Situational variables such as accessibilities and temporal variables were currently left out of consideration, as our purpose here is a first exploration of the use of BN learning methods for estimating a model of this type. Table 3 shows the complete list of variables used at the levels of event, activity, individual and household. The total number of cases equals 2992, which is the number of reported events in the month diaries after cleaning. Missing values were solved by using the modal class or a random value of the variable depending on the nature and distribution of the variable.

Network-learning method

We used the TPDA^¹ network learning algorithm developed by Cheng et al. (2001). This algorithm is well-tested and has a strong record in terms of prediction accuracy compared to alternative constraint-based BN learning algorithms. Furthermore, the algorithm is implemented in a software tool  called the Bayesian Network PowerConstructor^²  that provides users much flexibility in terms of implementing constraints on existence of arcs between nodes or pre-defining special cases for the network structure. The purpose of BN network structure learning methods in general is to identify connections between variables. The TPDA algorithm implements an incremental procedure: at each point the algorithm assumes a current set of arcs as given and considers whether arcs should be added or removed. As the name suggests, the procedure involves three phases referred to as drafting, thickening and thinning. The purpose of the drafting phase is to produce an initial set of edges based on pair-wise tests of mutual information between nodes, which should provide a good starting point for the next phases. In the subsequent thickening phase, the algorithm adds edges to the current graph based on tests of conditional independence between pairs of nodes. Loosely speaking, two nodes are conditionally independent if their mutual information can be fully explained by indirect relationships between the nodes in the current graph. Only if mutual information is left after having taken into account the paths through which information can flow between them, an edge is added to connect them. In the third thinning phase, each edge is examined and removed if the two nodes appear to be conditionally independent, due to implemented changes in the graph. Finally, the parameters (i.e., CPTs) of the resulting network are estimated based on the same data using the well-known EM learning algorithm. This algorithm uses maximum likelihood estimation of condition probabilities to deal adequately with incomplete data of a sample.

It follows from the foregoing that mutual information is the statistic used in tests on which decisions to add and remove edges are based in the TPDA algorithm as well as other information-theory-based learning methods. Formally, the mutual information between two nodes A and B is defined as:
(1)
where P(a) and P(b) are the unconditional probabilities of A = a and B = b and P(a, b) is the joint probability of these events. The measure is based on information theory and expresses in bits the expected information gained about B after observing the value of A. Or, to put it another way: the measure indicates the extent to which having information about A reduces uncertainty (i.e., entropy) about the value of B. A test of conditional independence involves calculating the mutual information after blocking paths through which information can flow between the two nodes. If I(A, B)  , then the test result is negative and an edge is added (where  is some pre-defined threshold value, e.g.,  = 0.01 bits). As is obvious from the equation, mutual information is a symmetric measure in the sense that I(A, B) = I(B, A). This implies that the measure does not give information about the direction of a possible influence. Based on known formal properties of a BN, the direction of some of the arcs may be resolved. In the final step of the procedure, TPDA runs a procedure to orient so-called essential arcs of the learned graph. A basic element of this procedure is to identify colliders (node Y is a collider in a network X  Y  Z where no arc exists between X and Z), given the property of these nodes that they can let information pass through them when they are instantiated. There is no guarantee that all arcs can be oriented based on formal properties of a BN. Those that are unresolved are presented to the user for making a decision (based on substantial meanings of the variables).
Network structure constraints

Since the method does not guarantee that the correct network structure is found, it is important that the user specifies every constraint on the network structure that can be identified based on domain knowledge. For our model, we can identify a number of such constraints. A first set of constraints follows from our prediction purpose: what we wish to predict is the choice of events, activities and travel, given attributes of the individual/household. Thus, these attributes do not need to be predicted, but rather can be taken as observed. In terms of the network, this means that we do not need to consider links between these exogenous variables. It is most natural to implement this constraint by excluding the possibility that these variables have any incoming arcs. In other words, we could mark the exogenous variables as root nodes. However, it is just as well possible to implement this constraint by imposing the constraint that these variables should not have any outgoing arcs, i.e., marking them as leaf nodes. Note that the only difference between the two alternative methods - marking them as root or leaf nodes - is the direction of arcs and for explaining the data the direction of arcs is arbitrary. For considerations of efficiency, however, the choice between these two alternatives is not arbitrary. By defining the exogenous variables as leafs rather than roots we can reduce the expected number of parent nodes of the dependent variables (i.e., the behavioral variables). Reducing the size of parent sets is important because CPTs become very large and parameter learning less effective when the number of parent nodes increases. For that reason, we defined the socio-economic variables to be leaf nodes^³.

A second set of constraints follows from the theory that we outlined earlier in the context of the proposed conceptual model. This theory assumes that events generate activities and travel rather than the other way round. Based on this we implemented as constraints a partial ordering among the behavioral variables. In this ordering, events precede activities and trips. More formally, we implemented the partial ordering as: Event  PreA, PostA, EvTt, EvMo, PreTt, PreMo, PostTt, PostMo, where no arcs running from any of the RHS nodes to the LHS node of this equation can exist.

Results

As it appears, given these constraints, the direction of all arcs in the network could be identified by the TPDA algorithm (i.e., no interaction with the user was needed to determine directions of arcs). The threshold for mutual information in the test of conditional dependence was set to the default value of 0.01 bits. The resulting network is represented in Figure 2. The special-purpose software Netica (Norsys Software Corp., 1996-2006) was used for compiling and displaying the network. As it appears, almost each socio-economic variable is connected to the event node and to a lesser extent also to the event-transport-mode node. This means that for almost none of the socio-economic variables conditional independence with the event node could be proved, suggesting that there exists almost always some unique mutual information. This does not necessarily mean, however, that all relationships are also relevant. For reasons of parsimony, we therefore decided to make a selection of the most relevant socio-economic variables and regenerate the network. Following the approach by Janssen et al. (2003, 2006), we used mutual information (Equation 1) as a measure of relevance. More specifically, we calculated the mutual information between each combination of a socio-economic variable, X, and a behavioral variable, Y, and determined for each X the maximum value across Ys as an indicator of relevance of X. The values thus obtained varied in the range of 0.013 to 0.030 bits. In total 6 socio-economic variables had a value bigger than 0.02 bits and these variables were selected.

Figure 2 shows the resulting network based on the reduced set of socio-economic variables. The bar diagram at each node shows the probability distribution across the possible states of the variable that follow from the estimated conditional probabilities at each node. Since no evidence is entered in the network, the probabilities refer to a-priori beliefs regarding the states of the variables. To give some examples of findings: Social/recreation events (Event = 7) have the highest a-priori occurrence probability of 43.6 % (in the sample); there is a probability of 20.7 % that no trip is involved in case of an event and a probability of 44.0 % that the car driver is the transport mode used for an event; the probability that no preparation activity is involved in an event equals 84.8% and the probability that a shopping activity precedes an event equals 8.7%; the probability of an aftermath activity is even lower and in case an activity succeeds an event Other rather than shopping has the highest probability.

Turning to the relationships in the network, we see that socio-economic variables have many direct influences particularly on event type and transport mode. However, the socio-economic variables appear to have no direct relationships with the choice of activities or trips before or after the event. The only exception is a direct influence of age on the choice of preparation activity. This partially confirms our prior expectations (i.e., that there is no direct influence of external variables on activities, but only an indirect influence through event). On the other hand, we did expect direct influences of exogenous variables on trips. As it appears, however, direct influences of external variables on trip variables only exist for the trips related to events (transport mode); they are absent for trips related to preparation and aftermath activities. As for the relationships among behavioral variables, our a-priori expectations are largely confirmed. Event type appears to have a direct influence on travel time and transport mode of trips for the event and on the choice of preparation and aftermath activities. However, event type does not have a direct influence on travel choices related to before and after activities. The influence of event type on trips in these stages is indirect and mediated through the choice of activity. Also as could be expected, we find that a direct relationship exists between travel time and choice of transport mode of trips in each phase of the chain – the event, preparation and aftermath phase.

The model can be used to predict the behavior of an individual by entering attribute data of the individual as evidence in the network and updating the probabilities of the behavioral nodes. This is shown for an example in Figure 4. The individual of the example is from a Single, one-worker household (hComp = 1) which has one car (hNcar = 1), has no children (hChild = 0), is a male (Gend = 1), has a driver license (Driv = 1) and is between 25 – 45 years of age (Age = 2). The network shows the updated probabilities of the behavioral variables, after having entered this evidence. Comparing this state of the network to the state before entering the evidence (Figure 3) reveals the changes in probabilities as an effect of this attribute profile. For example, we see an increase of probabilities of Maintenance and Sports events and a decrease of probabilities of Special day, Church and Health events. Furthermore, the model predicts an increase of the probability of Car driver mode from 44.0 % to 60.8 %. As a final example, we note that the predicted probability of shopping as a preparation activity increases somewhat. If deterministic rather than probabilistic probabilities are to be derived in an application of the model, a Monte-Carlo simulation should be used. To account for dependencies between behavioral variables, Monte-Carlo decisions should be made in a sequential order: each time a value is drawn for a variable, the decision should be entered as new evidence in the network and remaining variables should be updated before a next decision is made, and so on.

Finally, we note that the network can be used to predict effects of specific instantiations of a variable on the probability of some other variable value of interest. This can be done by entering the evidence in the network and recording the probability for the variable value of interest, after updating the network. As an illustration, Table 4 summarizes the results of applying this procedure for measuring the effects of different instantiations of the event variable on probabilities of an (arbitrary) selection of outcome variable values. Event types are listed in rows and variable values of interest are represented in the columns. Each cell indicates a ratio between the updated probability after and before instantiating the event variable. Thus, a value smaller than 1 indicates a decreasing effect of the event and a value larger than 1 an increasing effect, as predicted by the model.

The table reveals several notable effects of events. Person/family/relatives and Special day events strongly increase the base probability of shopping as a preparation activity. Church/school events strongly increase the probability that the individual is a female. As a somewhat surprising effect, Sports events increase the probability that the person is 65 years of age or older. This suggests that according to the model elderly people have a relatively high probability of being engaged in a passive or active form of a sports event, given that they experience an event. Social/recreation events strongly increase the probability of the car-driver mode and long trips for the event. Maintenance activities, in contrast, are characterized by a higher probability of shopping occurring as a preparation activity and a lower probability of observing a long trip for the event. Finally, events of the category Other increase the probability of taking place in households with 2 or more cars and involving a long trip. We emphasize that the model simultaneously predicts behavior on a set of independent variables, thereby taking indirect and direct relationships between variables into account. Therefore, the model is potentially more powerful than models assuming (linear) functions of a single dependent variable at a time.

CONCLUSIONS AND DISCUSSION
As part of our efforts to gradually replace activity-based models of transport demand by event-based models of transport demand, this paper has reported the formulation and application of a Bayesian network model to identify and analyze regimes in activity sequences that are triggered by (social) events. In particular, we develop and test a model to explain and predict activity-travel chains associated with particular events. We assume that an event will trigger a (sequenced) set of associated, interdependent activities and travel. Events may involve trips, may require one or more preparation activities, preceding the event, and one or more aftermath activities, succeeding the event.

The network that was found depicts the direct and indirect effects between events, activities and travel. Furthermore, it links age and socio-demographic variables to participation in social events. It is a potentially valuable building block in creating dynamic models of activity-travel demand. The Bayesian network can be used to simulate the series of activities and associated travel that will be generated. The output of this model, in turn, can then serve as input to models of short-term dynamics and daily activity scheduling behavior. Evidently, to develop a fully operational model, many elements of the suggested approach need further testing and perhaps elaboration. We plan to report on such further developments in the near future.

REFERENCES
Andersen, S. K., K. G. Olesen, F. V. Jensen & F. Jensen (1989). Hugin – a shell for building Bayesian belief universes for expert systems. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pages 1080-1085, Detroit, Michigan, Aug. 20-25.

Arentze, T.A. and H.J.P. Timmermans (2000), Albatross, EIRASS, Eindhoven.

Arentze, T.A. and H.J.P. Timmermans (2003), Albatross-2, EIRASS, Eindhoven.

Arentze, T.A. and H.J.P. Timmermans (2005), Representing mental maps and cognitive learning in micro-simulation models of activity-travel choice dynamics, Transportation, 32, 321-340.

Arentze, T.A. and H.J.P. Timmermans (2006a), Multi-agent models of spatial cognition, learning and complex choice behavior in urban environments, In: J. Portugali (ed.), Complex Artificial Environments, Springer Verlag, Berlin, pp. 181-200.

Arentze, T.A. and H.J.P. Timmermans (2006b), A new theory of dynamic activity generation. In: Proceedings of the 85th Annual Meeting of the Transportation Research Board, Washington, D.C., (CD-ROM: 20 pp.)

Arentze, T.A., A.W.J. Borgers, M. Ponje and H.J.P. Timmermans (2006), From activity-based modeling to social event-based modeling: Conceptual framework and empirical results. In: Proceedings of the IATBR Conference, Kyoto (CD-ROM, 17 pp.)

Arentze, T.A., C. Pelizaro and H.J.P. Timmermans (2005), Implementation of a model of dynamic activity-travel rescheduling decisions: an agent-based micro-simulation framework. In: Proceedings of the Computers in Urban Planning and Urban Management Conference, London (CD-Rom: 16 pp.).

Axhausen, K. (1998), Can we ever obtain the data we would like to have?. In: Theoretical Foundations of Travel Choice Modelling (T. Gärling, T. Laitila and K. Weston, eds.), pp. 203-324. Pergamon, Oxford.

Bhat, C.R., Guo, J., Srinivasan, S., Sivakumar, A., Pinjari, A. & N. Eluru (2004), Activity-Based Travel-Demand Modeling for Metropolitan Areas in Texas: Representation and Analysis Frameworks for Population Updating and Land-Use Forecasting, Report 4080-6, prepared for the Texas Department of Transportation.

Cheng, J., D. Bell, and W. Liu (2002) Learning Bayesian Networks from Data: An Information-Theory Based Approach, Artificial Intelligence, 137, 43-90.

Heckerman, D., A. Mandani and M. P. Wellman. 1995. Real-world Applications of Bayesian Networks, Communications of the ACM 38: 24-26.

Hugin Expert A/S (1995-2005), Software program, Aalborg, Denmark.

Janssens, D., G. Wets, T. Buys, K. Vanhoof, H.J.P. Timmermans and T.A. Arentze (2006), Integrating Bayesian networks and decision trees in a sequential rule-based transportation model, European Journal of Operations Research, 175, 16-34.

Janssens, D., G. Wets, T. Brijs, K. Vanhoof and H.J.P. Timmermans (2003) Identifying behavioral principles underlying activity patterns by means of Bayesian Networks, Annual Meeting of the Transportation Research Board, Washington, D.C., (CD-ROM: 20 pp.).

Joh, C.-H., Arentze, T.A. and H.J.P. Timmermans (2005), A utility-based analysis of activity time allocation decisions underlying segmented daily activity -- travel patterns, Environment and Planning A, 37, 105-126.

Joh, C.-H., Arentze, T.A. and H.J.P. Timmermans (2006), Measuring and predicting adaptation behavior in multi-dimensional activity-travel patterns, Transportmetrika., 2 ,153-173.

Joh, C.-H., T.A. Arentze and H.J.P. Timmermans (2004), Activity-travel rescheduling decisions: empirical estimation of the Aurora model, Transportation Research Record, 1898, pp. 10-18.

Kitamura, R. & Fujii, S. (1998), Two Computational Process Models of Activity-Travel Choice, In: T. Garling, T. Laitila & K. Westin (eds.): Theoretical Foundations of Travel Choice Modeling, Elsevier, Oxford, 251-279.

Lauritzen, S. L. (1995). The EM algorithm for graphical association models with missing data. Computational Statistics & Data Analysis, 19:191-201.

Miller, E. J. (2005a), Propositions for modelling household decision-making, in Integrated Land-Use and Transportation Models: Behavioural Foundations, M. Lee-Gosselin and S Doherty, Eds. New York: Elsevier.

Miller, E. J. (2005b), An integrated framework for modelling short- and long-run household decision-making, in Progress in Activity-Based Analysis, H. Timmermans, Ed. New York: Elsevier.

Miller, E. J. (2005c), Project-based activity scheduling for household and person agents, in Flow, Dynamics and Human Interaction (Proceedings of the 16th International Symposium on Transportation and Traffic Theory), H. Mahmassani, Ed. New York: Elsevier.

Miller, E.J. and M.J. Roorda (2003), A Prototype Model of Household Activity/Travel Scheduling, Transportation Research Record, Journal of the Transportation Research Board, No. 1831, 114-121.

Miller, E.J., M.J. Roorda and J.A. Carrasco (2005), A Tour-Based Model of Travel Mode Choice", Transportation, Vol. 32, No. 4, 399-422.

Norsys Software Corp. (1996-2006) Netica 3.17 for MS Windows. Vancouver, Canada.

Norsys Software Corp. 1997. Netica: application for belief networks and influence diagrams: user’s guide, version 1.05 for Windows. Norsys Software Corp. Vancouver, Canada.

Pearl, J. 1988. Probabilistic Reasoning in Intelligent systems: Networks of Plausible Inference, San Francisco, CA: Morgan Kaufman.

Pendyala, R., Kitamura, R., Kikuchi, A., Yamamoto, T. & Fujii, S. (2005), Florida Activity Mobility Simulator, Transportation Research Record, 1921, 123-130.

Roorda, M.J., E.J. Miller and K.M.N. Habib (2006), Validation of TASHA: a 24-Hour Activity Scheduling Microsimulation Model submitted to Transportation Research A.

Spiegelhalter, D. J., A. Philip, S. L. Lauritzen and R. G. Cowell. 1993. Bayesian Analysis in Expert Systems, Statistical Science 8: 219-283.

Timmermans, H.J.P., Arentze, T.A. and Joh, C.-H. (2000), Modeling Learning and Evolutionary Adaptation Processes in Activity Settings: Theory and Numerical Simulations, Transportation Research Record, 1718, 27-33.

1 2 3