To make this thread more useful I'll keep a running ToC on this first post. (Note that four relatively excellent introductory articles immediately follow in the same post.)
Note parallel threads: Surveillance and General Patton's stellar tutorial on the tech tools behind this, WE NSA NOW
Recorded Futures and "Temporal Analytics"
Kevin Slavin & Karl Schroeder
ECHELON: America's Secret Global Surveillance Network / James Bamford on NSA data centers
Public Intelligence posts: CATALYST, ICE datamining systems & contracts
WaPo: Monitoring America,
Tim Shorrock: Domestic Spying, Inc.
Hank Asher, NCMEC, data mining to find pedophiles
Critical notes on Palantir
EXCELLENT: Tim Shorrock on how privatized intelligence leads to a flood of classified leaks
Tim Shorrock on US telecom coordination with DHS/NSA/FBI etc
60's vintage SigInt: Dark Gene & IBEX
Lockheed Martin's big investment in monitoring America
Reuters on the NSA's core curriculum and need for trained workers
2012 Congressional inquiry into "Data Sellers"
General Patton on quantum computing and exponential tech effects on data mining / visualization
General Patton on the observable limits of data mining, AGI, machine learning
Overview on World Simulators - also note parallel thread: SEAS world simulators and real time global surveillance
Solid introduction to Data Mining, via today's Atlantic:
http://www.theatlantic.com/technology/a ... sk/255388/
Everything You Wanted to Know About Data Mining but Were Afraid to Ask
Big data is everywhere we look these days. Businesses are falling all over themselves to hire 'data scientists,' privacy advocates are concerned about personal data and control, and technologists and entrepreneurs scramble to find new ways to collect, control and monetize data. We know that data is powerful and valuable. But how?
This article is an attempt to explain how data mining works and why you should care about it. Because when we think about how our data is being used, it is crucial to understand the power of this practice. Without data mining, when you give someone access to information about you, all they know is what you have told them. With data mining, they know what you have told them and can guess a great deal more. Put another way, data mining allows companies and governments to use the information you provide to reveal more than you think.
To most of us data mining goes something like this: tons of data is collected, then quant wizards work their arcane magic, and then they know all of this amazing stuff. But, how? And what types of things can they know? Here is the truth: despite the fact that the specific technical functioning of data mining algorithms is quite complex -- they are a black box unless you are a professional statistician or computer scientist -- the uses and capabilities of these approaches are, in fact, quite comprehensible and intuitive.
For the most part, data mining tells us about very large and complex data sets, the kinds of information that would be readily apparent about small and simple things. For example, it can tell us that "one of these things is not like the other" a la Sesame Street or it can show us categories and then sort things into pre-determined categories. But what's simple with 5 datapoints is not so simple with 5 billion datapoints.
And these days, there's always more data. We gather far more of it then we can digest. Nearly every transaction or interaction leaves a data signature that someone somewhere is capturing and storing. This is, of course, true on the internet; but, ubiquitous computing and digitization has made it increasingly true about our lives away from our computers (do we still have those?). The sheer scale of this data has far exceeded human sense-making capabilities. At these scales patterns are often too subtle and relationships too complex or multi-dimensional to observe by simply looking at the data. Data mining is a means of automating part this process to detect interpretable patterns; it helps us see the forest without getting lost in the trees.
Discovering information from data takes two major forms: description and prediction. At the scale we are talking about, it is hard to know what the data shows. Data mining is used to simplify and summarize the data in a manner that we can understand, and then allow us to infer things about specific cases based on the patterns we have observed. Of course, specific applications of data mining methods are limited by the data and computing power available, and are tailored for specific needs and goals. However, there are several main types of pattern detection that are commonly used. These general forms illustrate what data mining can do.
Anomaly detection: in a large data set it is possible to get a picture of what the data tends to look like in a typical case. Statistics can be used to determine if something is notably different from this pattern. For instance, the IRS could model typical tax returns and use anomaly detection to identify specific returns that differ from this for review and audit.
Association learning: This is the type of data mining that drives the Amazon recommendation system. For instance, this might reveal that customers who bought a cocktail shaker and a cocktail recipe book also often buy martini glasses. These types of findings are often used for targeting coupons/deals or advertising. Similarly, this form of data mining (albeit a quite complex version) is behind Netflix movie recommendations.
Cluster detection: one type of pattern recognition that is particularly useful is recognizing distinct clusters or sub-categories within the data. Without data mining, an analyst would have to look at the data and decide on a set of categories which they believe captures the relevant distinctions between apparent groups in the data. This would risk missing important categories. With data mining it is possible to let the data itself determine the groups. This is one of the black-box type of algorithms that are hard to understand. But in a simple example - again with purchasing behavior - we can imagine that the purchasing habits of different hobbyists would look quite different from each other: gardeners, fishermen and model airplane enthusiasts would all be quite distinct. Machine learning algorithms can detect all of the different subgroups within a dataset that differ significantly from each other.
Classification: If an existing structure is already known, data mining can be used to classify new cases into these pre-determined categories. Learning from a large set of pre-classified examples, algorithms can detect persistent systemic differences between items in each group and apply these rules to new classification problems. Spam filters are a great example of this - large sets of emails that have been identified as spam have enabled filters to notice differences in word usage between legitimate and spam messages, and classify incoming messages according to these rules with a high degree of accuracy.
Regression: Data mining can be used to construct predictive models based on many variables. Facebook, for example, might be interested in predicting future engagement for a user based on past behavior. Factors like the amount of personal information shared, number of photos tagged, friend requests initiated or accepted, comments, likes etc. could all be included in such a model. Over time, this model could be honed to include or weight things differently as Facebook compares how the predictions differ from observed behavior. Ultimately these findings could be used to guide design in order to encourage more of the behaviors that seem to lead to increased engagement over time.
The patterns detected and structures revealed by the descriptive data mining are then often applied to predict other aspects of the data. Amazon offers a useful example of how descriptive findings are used for prediction. The (hypothetical) association between cocktail shaker and martini glass purchases, for instance, could be used, along with many other similar associations, as part of a model predicting the likelihood that a particular user will make a particular purchase. This model could match all such associations with a user's purchasing history, and predict which products they are most likely to purchase. Amazon can then serve ads based on what that user is most likely to buy.
Data mining, in this way, can grant immense inferential power. If an algorithm can correctly classify a case into known category based on limited data, it is possible to estimate a wide-range of other information about that case based on the properties of all the other cases in that category. This may sound dry, but it is how most successful Internet companies make their money and from where they draw their power.
THEORY IN ACTION
Via: NYT, "How Companies Learn Your Secrets"
Andrew Pole had just started working as a statistician for Target in 2002, when two colleagues from the marketing department stopped by his desk to ask an odd question: “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that? ”
Pole has a master’s degree in statistics and another in economics, and has been obsessed with the intersection of data and human behavior most of his life. His parents were teachers in North Dakota, and while other kids were going to 4-H, Pole was doing algebra and writing computer programs. “The stereotype of a math nerd is true,” he told me when I spoke with him last year. “I kind of like going out and evangelizing analytics.”
As the marketers explained to Pole — and as Pole later explained to me, back when we were still speaking and before Target told him to stop — new parents are a retailer’s holy grail. Most shoppers don’t buy everything they need at one store. Instead, they buy groceries at the grocery store and toys at the toy store, and they visit Target only when they need certain items they associate with Target — cleaning supplies, say, or new socks or a six-month supply of toilet paper. But Target sells everything from milk to stuffed animals to lawn furniture to electronics, so one of the company’s primary goals is convincing customers that the only store they need is Target. But it’s a tough message to get across, even with the most ingenious ad campaigns, because once consumers’ shopping habits are ingrained, it’s incredibly difficult to change them.
There are, however, some brief periods in a person’s life when old routines fall apart and buying habits are suddenly in flux. One of those moments — the moment, really — is right around the birth of a child, when parents are exhausted and overwhelmed and their shopping patterns and brand loyalties are up for grabs. But as Target’s marketers explained to Pole, timing is everything. Because birth records are usually public, the moment a couple have a new baby, they are almost instantaneously barraged with offers and incentives and advertisements from all sorts of companies. Which means that the key is to reach them earlier, before any other retailers know a baby is on the way. Specifically, the marketers said they wanted to send specially designed ads to women in their second trimester, which is when most expectant mothers begin buying all sorts of new things, like prenatal vitamins and maternity clothing. “Can you give us a list?” the marketers asked.
“We knew that if we could identify them in their second trimester, there’s a good chance we could capture them for years,” Pole told me. “As soon as we get them buying diapers from us, they’re going to start buying everything else too. If you’re rushing through the store, looking for bottles, and you pass orange juice, you’ll grab a carton. Oh, and there’s that new DVD I want. Soon, you’ll be buying cereal and paper towels from us, and keep coming back.”
The desire to collect information on customers is not new for Target or any other large retailer, of course. For decades, Target has collected vast amounts of data on every person who regularly walks into one of its stores. Whenever possible, Target assigns each shopper a unique code — known internally as the Guest ID number — that keeps tabs on everything they buy. “If you use a credit card or a coupon, or fill out a survey, or mail in a refund, or call the customer help line, or open an e-mail we’ve sent you or visit our Web site, we’ll record it and link it to your Guest ID,” Pole said. “We want to know everything we can.”
Also linked to your Guest ID is demographic information like your age, whether you are married and have kids, which part of town you live in, how long it takes you to drive to the store, your estimated salary, whether you’ve moved recently, what credit cards you carry in your wallet and what Web sites you visit. Target can buy data about your ethnicity, job history, the magazines you read, if you’ve ever declared bankruptcy or got divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own. (In a statement, Target declined to identify what demographic information it collects or purchases.) All that information is meaningless, however, without someone to analyze and make sense of it. That’s where Andrew Pole and the dozens of other members of Target’s Guest Marketing Analytics department come in.
Almost every major retailer, from grocery chains to investment banks to the U.S. Postal Service, has a “predictive analytics” department devoted to understanding not just consumers’ shopping habits but also their personal habits, so as to more efficiently market to them. “But Target has always been one of the smartest at this,” says Eric Siegel, a consultant and the chairman of a conference called Predictive Analytics World. “We’re living through a golden age of behavioral research. It’s amazing how much we can figure out about how people think now.”
The reason Target can snoop on our shopping habits is that, over the past two decades, the science of habit formation has become a major field of research in neurology and psychology departments at hundreds of major medical centers and universities, as well as inside extremely well financed corporate labs. “It’s like an arms race to hire statisticians nowadays,” said Andreas Weigend, the former chief scientist at Amazon.com. “Mathematicians are suddenly sexy.” As the ability to analyze data has grown more and more fine-grained, the push to understand how daily habits influence our decisions has become one of the most exciting topics in clinical research, even though most of us are hardly aware those patterns exist. One study from Duke University estimated that habits, rather than conscious decision-making, shape 45 percent of the choices we make every day, and recent discoveries have begun to change everything from the way we think about dieting to how doctors conceive treatments for anxiety, depression and addictions.
This research is also transforming our understanding of how habits function across organizations and societies. A football coach named Tony Dungy propelled one of the worst teams in the N.F.L. to the Super Bowl by focusing on how his players habitually reacted to on-field cues. Before he became Treasury secretary, Paul O’Neill overhauled a stumbling conglomerate, Alcoa, and turned it into a top performer in the Dow Jones by relentlessly attacking one habit — a specific approach to worker safety — which in turn caused a companywide transformation. The Obama campaign has hired a habit specialist as its “chief scientist” to figure out how to trigger new voting patterns among different constituencies.
Researchers have figured out how to stop people from habitually overeating and biting their nails. They can explain why some of us automatically go for a jog every morning and are more productive at work, while others oversleep and procrastinate. There is a calculus, it turns out, for mastering our subconscious urges. For companies like Target, the exhaustive rendering of our conscious and unconscious patterns into data sets and algorithms has revolutionized what they know about us and, therefore, how precisely they can sell.
Inside the brain-and-cognitive-sciences department of the Massachusetts Institute of Technology are what, to the casual observer, look like dollhouse versions of surgical theaters. There are rooms with tiny scalpels, small drills and miniature saws. Even the operating tables are petite, as if prepared for 7-year-old surgeons. Inside those shrunken O.R.’s, neurologists cut into the skulls of anesthetized rats, implanting tiny sensors that record the smallest changes in the activity of their brains.
An M.I.T. neuroscientist named Ann Graybiel told me that she and her colleagues began exploring habits more than a decade ago by putting their wired rats into a T-shaped maze with chocolate at one end. The maze was structured so that each animal was positioned behind a barrier that opened after a loud click. The first time a rat was placed in the maze, it would usually wander slowly up and down the center aisle after the barrier slid away, sniffing in corners and scratching at walls. It appeared to smell the chocolate but couldn’t figure out how to find it. There was no discernible pattern in the rat’s meanderings and no indication it was working hard to find the treat.
The probes in the rats’ heads, however, told a different story. While each animal wandered through the maze, its brain was working furiously. Every time a rat sniffed the air or scratched a wall, the neurosensors inside the animal’s head exploded with activity. As the scientists repeated the experiment, again and again, the rats eventually stopped sniffing corners and making wrong turns and began to zip through the maze with more and more speed. And within their brains, something unexpected occurred: as each rat learned how to complete the maze more quickly, its mental activity decreased. As the path became more and more automatic — as it became a habit — the rats started thinking less and less.
This process, in which the brain converts a sequence of actions into an automatic routine, is called “chunking.” There are dozens, if not hundreds, of behavioral chunks we rely on every day. Some are simple: you automatically put toothpaste on your toothbrush before sticking it in your mouth. Some, like making the kids’ lunch, are a little more complex. Still others are so complicated that it’s remarkable to realize that a habit could have emerged at all.
Take backing your car out of the driveway. When you first learned to drive, that act required a major dose of concentration, and for good reason: it involves peering into the rearview and side mirrors and checking for obstacles, putting your foot on the brake, moving the gearshift into reverse, removing your foot from the brake, estimating the distance between the garage and the street while keeping the wheels aligned, calculating how images in the mirrors translate into actual distances, all while applying differing amounts of pressure to the gas pedal and brake.
Now, you perform that series of actions every time you pull into the street without thinking very much. Your brain has chunked large parts of it. Left to its own devices, the brain will try to make almost any repeated behavior into a habit, because habits allow our minds to conserve effort. But conserving mental energy is tricky, because if our brains power down at the wrong moment, we might fail to notice something important, like a child riding her bike down the sidewalk or a speeding car coming down the street. So we’ve devised a clever system to determine when to let a habit take over. It’s something that happens whenever a chunk of behavior starts or ends — and it helps to explain why habits are so difficult to change once they’re formed, despite our best intentions.
To understand this a little more clearly, consider again the chocolate-seeking rats. What Graybiel and her colleagues found was that, as the ability to navigate the maze became habitual, there were two spikes in the rats’ brain activity — once at the beginning of the maze, when the rat heard the click right before the barrier slid away, and once at the end, when the rat found the chocolate. Those spikes show when the rats’ brains were fully engaged, and the dip in neural activity between the spikes showed when the habit took over. From behind the partition, the rat wasn’t sure what waited on the other side, until it heard the click, which it had come to associate with the maze. Once it heard that sound, it knew to use the “maze habit,” and its brain activity decreased. Then at the end of the routine, when the reward appeared, the brain shook itself awake again and the chocolate signaled to the rat that this particular habit was worth remembering, and the neurological pathway was carved that much deeper.
The process within our brains that creates habits is a three-step loop. First, there is a cue, a trigger that tells your brain to go into automatic mode and which habit to use. Then there is the routine, which can be physical or mental or emotional. Finally, there is a reward, which helps your brain figure out if this particular loop is worth remembering for the future. Over time, this loop — cue, routine, reward; cue, routine, reward — becomes more and more automatic. The cue and reward become neurologically intertwined until a sense of craving emerges. What’s unique about cues and rewards, however, is how subtle they can be. Neurological studies like the ones in Graybiel’s lab have revealed that some cues span just milliseconds. And rewards can range from the obvious (like the sugar rush that a morning doughnut habit provides) to the infinitesimal (like the barely noticeable — but measurable — sense of relief the brain experiences after successfully navigating the driveway). Most cues and rewards, in fact, happen so quickly and are so slight that we are hardly aware of them at all. But our neural systems notice and use them to build automatic behaviors.
Habits aren’t destiny — they can be ignored, changed or replaced. But it’s also true that once the loop is established and a habit emerges, your brain stops fully participating in decision-making. So unless you deliberately fight a habit — unless you find new cues and rewards — the old pattern will unfold automatically.
“We’ve done experiments where we trained rats to run down a maze until it was a habit, and then we extinguished the habit by changing the placement of the reward,” Graybiel told me. “Then one day, we’ll put the reward in the old place and put in the rat and, by golly, the old habit will re-emerge right away. Habits never really disappear.”
Luckily, simply understanding how habits work makes them easier to control. Take, for instance, a series of studies conducted a few years ago at Columbia University and the University of Alberta. Researchers wanted to understand how exercise habits emerge. In one project, 256 members of a health-insurance plan were invited to classes stressing the importance of exercise. Half the participants received an extra lesson on the theories of habit formation (the structure of the habit loop) and were asked to identify cues and rewards that might help them develop exercise routines.
The results were dramatic. Over the next four months, those participants who deliberately identified cues and rewards spent twice as much time exercising as their peers. Other studies have yielded similar results. According to another recent paper, if you want to start running in the morning, it’s essential that you choose a simple cue (like always putting on your sneakers before breakfast or leaving your running clothes next to your bed) and a clear reward (like a midday treat or even the sense of accomplishment that comes from ritually recording your miles in a log book). After a while, your brain will start anticipating that reward — craving the treat or the feeling of accomplishment — and there will be a measurable neurological impulse to lace up your jogging shoes each morning.
Our relationship to e-mail operates on the same principle. When a computer chimes or a smartphone vibrates with a new message, the brain starts anticipating the neurological “pleasure” (even if we don’t recognize it as such) that clicking on the e-mail and reading it provides. That expectation, if unsatisfied, can build until you find yourself moved to distraction by the thought of an e-mail sitting there unread — even if you know, rationally, it’s most likely not important. On the other hand, once you remove the cue by disabling the buzzing of your phone or the chiming of your computer, the craving is never triggered, and you’ll find, over time, that you’re able to work productively for long stretches without checking your in-box.
Some of the most ambitious habit experiments have been conducted by corporate America. To understand why executives are so entranced by this science, consider how one of the world’s largest companies, Procter & Gamble, used habit insights to turn a failing product into one of its biggest sellers. P.& G. is the corporate behemoth behind a whole range of products, from Downy fabric softener to Bounty paper towels to Duracell batteries and dozens of other household brands. In the mid-1990s, P.& G.’s executives began a secret project to create a new product that could eradicate bad smells. P.& G. spent millions developing a colorless, cheap-to-manufacture liquid that could be sprayed on a smoky blouse, stinky couch, old jacket or stained car interior and make it odorless. In order to market the product — Febreze — the company formed a team that included a former Wall Street mathematician named Drake Stimson and habit specialists, whose job was to make sure the television commercials, which they tested in Phoenix, Salt Lake City and Boise, Idaho, accentuated the product’s cues and rewards just right.
The first ad showed a woman complaining about the smoking section of a restaurant. Whenever she eats there, she says, her jacket smells like smoke. A friend tells her that if she uses Febreze, it will eliminate the odor. The cue in the ad is clear: the harsh smell of cigarette smoke. The reward: odor eliminated from clothes. The second ad featured a woman worrying about her dog, Sophie, who always sits on the couch. “Sophie will always smell like Sophie,” she says, but with Febreze, “now my furniture doesn’t have to.” The ads were put in heavy rotation. Then the marketers sat back, anticipating how they would spend their bonuses. A week passed. Then two. A month. Two months. Sales started small and got smaller. Febreze was a dud.
The panicked marketing team canvassed consumers and conducted in-depth interviews to figure out what was going wrong, Stimson recalled. Their first inkling came when they visited a woman’s home outside Phoenix. The house was clean and organized. She was something of a neat freak, the woman explained. But when P.& G.’s scientists walked into her living room, where her nine cats spent most of their time, the scent was so overpowering that one of them gagged.
According to Stimson, who led the Febreze team, a researcher asked the woman, “What do you do about the cat smell?”
“It’s usually not a problem,” she said.
“Do you smell it now?”
“No,” she said. “Isn’t it wonderful? They hardly smell at all!”
A similar scene played out in dozens of other smelly homes. The reason Febreze wasn’t selling, the marketers realized, was that people couldn’t detect most of the bad smells in their lives. If you live with nine cats, you become desensitized to their scents. If you smoke cigarettes, eventually you don’t smell smoke anymore. Even the strongest odors fade with constant exposure. That’s why Febreze was a failure. The product’s cue — the bad smells that were supposed to trigger daily use — was hidden from the people who needed it the most. And Febreze’s reward (an odorless home) was meaningless to someone who couldn’t smell offensive scents in the first place.
P.& G. employed a Harvard Business School professor to analyze Febreze’s ad campaigns. They collected hours of footage of people cleaning their homes and watched tape after tape, looking for clues that might help them connect Febreze to people’s daily habits. When that didn’t reveal anything, they went into the field and conducted more interviews. A breakthrough came when they visited a woman in a suburb near Scottsdale, Ariz., who was in her 40s with four children. Her house was clean, though not compulsively tidy, and didn’t appear to have any odor problems; there were no pets or smokers. To the surprise of everyone, she loved Febreze.
“I use it every day,” she said.
“What smells are you trying to get rid of?” a researcher asked.
“I don’t really use it for specific smells,” the woman said. “I use it for normal cleaning — a couple of sprays when I’m done in a room.”
The researchers followed her around as she tidied the house. In the bedroom, she made her bed, tightened the sheet’s corners, then sprayed the comforter with Febreze. In the living room, she vacuumed, picked up the children’s shoes, straightened the coffee table, then sprayed Febreze on the freshly cleaned carpet.
“It’s nice, you know?” she said. “Spraying feels like a little minicelebration when I’m done with a room.” At the rate she was going, the team estimated, she would empty a bottle of Febreze every two weeks.
When they got back to P.& G.’s headquarters, the researchers watched their videotapes again. Now they knew what to look for and saw their mistake in scene after scene. Cleaning has its own habit loops that already exist. In one video, when a woman walked into a dirty room (cue), she started sweeping and picking up toys (routine), then she examined the room and smiled when she was done (reward). In another, a woman scowled at her unmade bed (cue), proceeded to straighten the blankets and comforter (routine) and then sighed as she ran her hands over the freshly plumped pillows (reward). P.& G. had been trying to create a whole new habit with Febreze, but what they really needed to do was piggyback on habit loops that were already in place. The marketers needed to position Febreze as something that came at the end of the cleaning ritual, the reward, rather than as a whole new cleaning routine.
The company printed new ads showing open windows and gusts of fresh air. More perfume was added to the Febreze formula, so that instead of merely neutralizing odors, the spray had its own distinct scent. Television commercials were filmed of women, having finished their cleaning routine, using Febreze to spritz freshly made beds and just-laundered clothing. Each ad was designed to appeal to the habit loop: when you see a freshly cleaned room (cue), pull out Febreze (routine) and enjoy a smell that says you’ve done a great job (reward). When you finish making a bed (cue), spritz Febreze (routine) and breathe a sweet, contented sigh (reward). Febreze, the ads implied, was a pleasant treat, not a reminder that your home stinks.
And so Febreze, a product originally conceived as a revolutionary way to destroy odors, became an air freshener used once things are already clean. The Febreze revamp occurred in the summer of 1998. Within two months, sales doubled. A year later, the product brought in $230 million. Since then Febreze has spawned dozens of spinoffs — air fresheners, candles and laundry detergents — that now account for sales of more than $1 billion a year. Eventually, P.& G. began mentioning to customers that, in addition to smelling sweet, Febreze can actually kill bad odors. Today it’s one of the top-selling products in the world.
Andrew Pole was hired by Target to use the same kinds of insights into consumers’ habits to expand Target’s sales. His assignment was to analyze all the cue-routine-reward loops among shoppers and help the company figure out how to exploit them. Much of his department’s work was straightforward: find the customers who have children and send them catalogs that feature toys before Christmas. Look for shoppers who habitually purchase swimsuits in April and send them coupons for sunscreen in July and diet books in December. But Pole’s most important assignment was to identify those unique moments in consumers’ lives when their shopping habits become particularly flexible and the right advertisement or coupon would cause them to begin spending in new ways.
In the 1980s, a team of researchers led by a U.C.L.A. professor named Alan Andreasen undertook a study of peoples’ most mundane purchases, like soap, toothpaste, trash bags and toilet paper. They learned that most shoppers paid almost no attention to how they bought these products, that the purchases occurred habitually, without any complex decision-making. Which meant it was hard for marketers, despite their displays and coupons and product promotions, to persuade shoppers to change.
But when some customers were going through a major life event, like graduating from college or getting a new job or moving to a new town, their shopping habits became flexible in ways that were both predictable and potential gold mines for retailers. The study found that when someone marries, he or she is more likely to start buying a new type of coffee. When a couple move into a new house, they’re more apt to purchase a different kind of cereal. When they divorce, there’s an increased chance they’ll start buying different brands of beer.
Consumers going through major life events often don’t notice, or care, that their shopping habits have shifted, but retailers notice, and they care quite a bit. At those unique moments, Andreasen wrote, customers are “vulnerable to intervention by marketers.” In other words, a precisely timed advertisement, sent to a recent divorcee or new homebuyer, can change someone’s shopping patterns for years.
And among life events, none are more important than the arrival of a baby. At that moment, new parents’ habits are more flexible than at almost any other time in their adult lives. If companies can identify pregnant shoppers, they can earn millions.
The only problem is that identifying pregnant customers is harder than it sounds. Target has a baby-shower registry, and Pole started there, observing how shopping habits changed as a woman approached her due date, which women on the registry had willingly disclosed. He ran test after test, analyzing the data, and before long some useful patterns emerged. Lotions, for example. Lots of people buy lotion, but one of Pole’s colleagues noticed that women on the baby registry were buying larger quantities of unscented lotion around the beginning of their second trimester. Another analyst noted that sometime in the first 20 weeks, pregnant women loaded up on supplements like calcium, magnesium and zinc. Many shoppers purchase soap and cotton balls, but when someone suddenly starts buying lots of scent-free soap and extra-big bags of cotton balls, in addition to hand sanitizers and washcloths, it signals they could be getting close to their delivery date.
As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.
One Target employee I spoke to provided a hypothetical example. Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August. What’s more, because of the data attached to her Guest ID number, Target knows how to trigger Jenny’s habits. They know that if she receives a coupon via e-mail, it will most likely cue her to buy online. They know that if she receives an ad in the mail on Friday, she frequently uses it on a weekend trip to the store. And they know that if they reward her with a printed receipt that entitles her to a free cup of Starbucks coffee, she’ll use it when she comes back again.
In the past, that knowledge had limited value. After all, Jenny purchased only cleaning supplies at Target, and there were only so many psychological buttons the company could push. But now that she is pregnant, everything is up for grabs. In addition to triggering Jenny’s habits to buy more cleaning products, they can also start including offers for an array of products, some more obvious than others, that a woman at her stage of pregnancy might need.
Pole applied his program to every regular female shopper in Target’s national database and soon had a list of tens of thousands of women who were most likely pregnant. If they could entice those women or their husbands to visit Target and buy baby-related products, the company’s cue-routine-reward calculators could kick in and start pushing them to buy groceries, bathing suits, toys and clothing, as well. When Pole shared his list with the marketers, he said, they were ecstatic. Soon, Pole was getting invited to meetings above his paygrade. Eventually his paygrade went up.
At which point someone asked an important question: How are women going to react when they figure out how much Target knows?
“If we send someone a catalog and say, ‘Congratulations on your first child!’ and they’ve never told us they’re pregnant, that’s going to make some people uncomfortable,” Pole told me. “We are very conservative about compliance with all privacy laws. But even if you’re following the law, you can do things where people get queasy.”
About a year after Pole created his pregnancy-prediction model, a man walked into a Target outside Minneapolis and demanded to see the manager. He was clutching coupons that had been sent to his daughter, and he was angry, according to an employee who participated in the conversation.
“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”
The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again.
On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”
When I approached Target to discuss Pole’s work, its representatives declined to speak with me. “Our mission is to make Target the preferred shopping destination for our guests by delivering outstanding value, continuous innovation and exceptional guest experience,” the company wrote in a statement. “We’ve developed a number of research tools that allow us to gain insights into trends and preferences within different demographic segments of our guest population.” When I sent Target a complete summary of my reporting, the reply was more terse: “Almost all of your statements contain inaccurate information and publishing them would be misleading to the public. We do not intend to address each statement point by point.” The company declined to identify what was inaccurate. They did add, however, that Target “is in compliance with all federal and state laws, including those related to protected health information.”
When I offered to fly to Target’s headquarters to discuss its concerns, a spokeswoman e-mailed that no one would meet me. When I flew out anyway, I was told I was on a list of prohibited visitors. “I’ve been instructed not to give you access and to ask you to leave,” said a very nice security guard named Alex.
Using data to predict a woman’s pregnancy, Target realized soon after Pole perfected his model, could be a public-relations disaster. So the question became: how could they get their advertisements into expectant mothers’ hands without making it appear they were spying on them? How do you take advantage of someone’s habits without letting them know you’re studying their lives?
Before I met Andrew Pole, before I even decided to write a book about the science of habit formation, I had another goal: I wanted to lose weight.
I had got into a bad habit of going to the cafeteria every afternoon and eating a chocolate-chip cookie, which contributed to my gaining a few pounds. Eight, to be precise. I put a Post-it note on my computer reading “NO MORE COOKIES.” But every afternoon, I managed to ignore that note, wander to the cafeteria, buy a cookie and eat it while chatting with colleagues. Tomorrow, I always promised myself, I’ll muster the willpower to resist.
Tomorrow, I ate another cookie.
When I started interviewing experts in habit formation, I concluded each interview by asking what I should do. The first step, they said, was to figure out my habit loop. The routine was simple: every afternoon, I walked to the cafeteria, bought a cookie and ate it while chatting with friends.
Next came some less obvious questions: What was the cue? Hunger? Boredom? Low blood sugar? And what was the reward? The taste of the cookie itself? The temporary distraction from my work? The chance to socialize with colleagues?
Rewards are powerful because they satisfy cravings, but we’re often not conscious of the urges driving our habits in the first place. So one day, when I felt a cookie impulse, I went outside and took a walk instead. The next day, I went to the cafeteria and bought a coffee. The next, I bought an apple and ate it while chatting with friends. You get the idea. I wanted to test different theories regarding what reward I was really craving. Was it hunger? (In which case the apple should have worked.) Was it the desire for a quick burst of energy? (If so, the coffee should suffice.) Or, as turned out to be the answer, was it that after several hours spent focused on work, I wanted to socialize, to make sure I was up to speed on office gossip, and the cookie was just a convenient excuse? When I walked to a colleague’s desk and chatted for a few minutes, it turned out, my cookie urge was gone.
All that was left was identifying the cue.
Deciphering cues is hard, however. Our lives often contain too much information to figure out what is triggering a particular behavior. Do you eat breakfast at a certain time because you’re hungry? Or because the morning news is on? Or because your kids have started eating? Experiments have shown that most cues fit into one of five categories: location, time, emotional state, other people or the immediately preceding action. So to figure out the cue for my cookie habit, I wrote down five things the moment the urge hit:
Where are you? (Sitting at my desk.)
What time is it? (3:36 p.m.)
What’s your emotional state? (Bored.)
Who else is around? (No one.)
What action preceded the urge? (Answered an e-mail.)
The next day I did the same thing. And the next. Pretty soon, the cue was clear: I always felt an urge to snack around 3:30.
Once I figured out all the parts of the loop, it seemed fairly easy to change my habit. But the psychologists and neuroscientists warned me that, for my new behavior to stick, I needed to abide by the same principle that guided Procter & Gamble in selling Febreze: To shift the routine — to socialize, rather than eat a cookie — I needed to piggyback on an existing habit. So now, every day around 3:30, I stand up, look around the newsroom for someone to talk to, spend 10 minutes gossiping, then go back to my desk. The cue and reward have stayed the same. Only the routine has shifted. It doesn’t feel like a decision, any more than the M.I.T. rats made a decision to run through the maze. It’s now a habit. I’ve lost 21 pounds since then (12 of them from changing my cookie ritual).
After Andrew Pole built his pregnancy-prediction model, after he identified thousands of female shoppers who were most likely pregnant, after someone pointed out that some of those women might be a little upset if they received an advertisement making it obvious Target was studying their reproductive status, everyone decided to slow things down.
The marketing department conducted a few tests by choosing a small, random sample of women from Pole’s list and mailing them combinations of advertisements to see how they reacted.
“We have the capacity to send every customer an ad booklet, specifically designed for them, that says, ‘Here’s everything you bought last week and a coupon for it,’ ” one Target executive told me. “We do that for grocery products all the time.” But for pregnant women, Target’s goal was selling them baby items they didn’t even know they needed yet.
“With the pregnancy products, though, we learned that some women react badly,” the executive said. “Then we started mixing in all these ads for things we knew pregnant women would never buy, so the baby ads looked random. We’d put an ad for a lawn mower next to diapers. We’d put a coupon for wineglasses next to infant clothes. That way, it looked like all the products were chosen by chance.
“And we found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.”
In other words, if Target piggybacked on existing habits — the same cues and rewards they already knew got customers to buy cleaning supplies or socks — then they could insert a new routine: buying baby products, as well. There’s a cue (“Oh, a coupon for something I need!”) a routine (“Buy! Buy! Buy!”) and a reward (“I can take that off my list”). And once the shopper is inside the store, Target will hit her with cues and rewards to entice her to purchase everything she normally buys somewhere else. As long as Target camouflaged how much it knew, as long as the habit felt familiar, the new behavior took hold.
Soon after the new ad campaign began, Target’s Mom and Baby sales exploded. The company doesn’t break out figures for specific divisions, but between 2002 — when Pole was hired — and 2010, Target’s revenues grew from $44 billion to $67 billion. In 2005, the company’s president, Gregg Steinhafel, boasted to a room of investors about the company’s “heightened focus on items and categories that appeal to specific guest segments such as mom and baby.”
Pole was promoted. He has been invited to speak at conferences. “I never expected this would become such a big deal,” he told me the last time we spoke.
A few weeks before this article went to press, I flew to Minneapolis to try and speak to Andrew Pole one last time. I hadn’t talked to him in more than a year. Back when we were still friendly, I mentioned that my wife was seven months pregnant. We shop at Target, I told him, and had given the company our address so we could start receiving coupons in the mail. As my wife’s pregnancy progressed, I noticed a subtle upswing in the number of advertisements for diapers and baby clothes arriving at our house.
Pole didn’t answer my e-mails or phone calls when I visited Minneapolis. I drove to his large home in a nice suburb, but no one answered the door. On my way back to the hotel, I stopped at a Target to pick up some deodorant, then also bought some T-shirts and a fancy hair gel. On a whim, I threw in some pacifiers, to see how the computers would react. Besides, our baby is now 9 months old. You can’t have too many pacifiers.
When I paid, I didn’t receive any sudden deals on diapers or formula, to my slight disappointment. It made sense, though: I was shopping in a city I never previously visited, at 9:45 p.m. on a weeknight, buying a random assortment of items. I was using a corporate credit card, and besides the pacifiers, hadn’t purchased any of the things that a parent needs. It was clear to Target’s computers that I was on a business trip. Pole’s prediction calculator took one look at me, ran the numbers and decided to bide its time. Back home, the offers would eventually come. As Pole told me the last time we spoke: “Just wait. We’ll be sending you coupons for things you want before you even know you want them.”
DATAMINING AS PANOPTICON
I am also reminded of the Joshua-Michéle Ross series on O'Rielly a few years back...let's review.
Source: http://radar.oreilly.com/2009/05/captiv ... mmons.html
In January 2002 DARPA launched the Information Awareness Office. The mission was to, “ imagine, develop, apply, integrate, demonstrate and transition information technologies, components and prototype, closed-loop, information systems that will counter asymmetric threats by achieving total information awareness (emphasis added)” The notion of a government agency achieving total information awareness was too Orwellian to ignore. Under criticism that this “awareness” could quickly migrate to a mass surveillance system the program was defunded.
Fast-forward to last week and my near-purchase of Libbey Duratuff Gibralter Glasses (the perfect bourbon glass one might speculate). Over the course of the next few days I was peppered with exact-match ads for Libbey Duratuff glassware on several other websites; A small example of information awareness at work.
Personal data is the currency of Web 2.0. Knowing what we watch, buy, click, own, what we think, intend and ultimately do confers competitive advantage. Facebook possesses your social graph, your personal interests and your full profile (age, location, relationship status etc.) not to mention your daily (or hourly) answer to their persistent question, “what’s on your mind?”. Reviewing the “25 Surprising Things Google Knows About You” should give anyone pause. And it’s not just the Web 2.0 set. Credit Card Companies, Telcos, Insurance , Pharma… all are collecting vast stores of personal data. If you watch the trendline it is moving toward more data and more analytic capability - not less.
So why is it that we seem to have more comfort when the capacity for total information awareness lies with corporations as opposed to government? Experience shows that there is a very thin barrier between the two. To wit, the release of thousands of phone records to the U.S. government - and, conveniently, government immunity for those same corporations after the breach. Google and Yahoo! and Microsoft have all been accused of cooperating with the Chinese government to aid censorship and repression of free speech. What happens if/when we encounter the next version of the Bush administration that sees no problem abrogating civil rights in pursuit of “evildoers”?
What's more, when we deliver our personal information over to corporations we are giving this data over to an institution that is amoral. Companies are not yet structured to deliver moral or ethical results - they are encouraged to grow and deliver “shareholder value” (read money) which is a numb and narrow measure of value. Do I want my data to be managed by an amoral institution?
To be clear - I want the convenience and miracles that modern technology brings. I love the Internet and I am willing to give over lots of data in the trade. But I want two fundamental protections:
First, change the corporation. The structure of the corporation continues to be driven by 20th century hard goals of efficiency and scale - not by more complex measures of environmental sustainability, value creation and the commonweal. These are simply not adequately factored into any structural, organizational, incentive or taxation systems of business today. Profit and profit motive are fine - but hiding social and environmental costs is no longer acceptable. I want to deal with institutions capable of morality. This is no small task - but if we can build the Internet….
Second. We need a right to privacy that matches the 21st century reality. As a friend of mine likes to say, “privacy is now a responsibility - not a right.” While it is pithy (and perhaps true), the reason we grant rights - and laws to enforce those rights in society is the simple fact that people do not generally have the wherewithal to protect themselves from large, institutional interests. In the same way that regulatory structures are needed to keep a financial system in balance (alas even the Ayn Rand acolyte Greenspan finally agrees with this truism), we need new rights and regulations governing the use of our personal data - and simple sets of controls over who has access to it.
The true work of the 21st century lies not in refining our technology - this we will achieve without any political will. The work lies in re-imagining our institutions.
Of course, his first "solution" is such an obvious category error you can immediately tell he wasn't going through an editor. The challenge of building the internet was electrical engineering and physical logistics -- changing the nature of the corporation is an institutional crusade requiring a completely different skillset and strategy. Strategies, really.
Today on twitter, all the Big Thinkers are abuzz about the idea of "repurposing" Federal bureaucracy and I couldn't help but ask for examples of that being done successfully -- so far all the responses have been token "corporate turnaround" stories from the private sector. I can't tell if I'm blinded by my cynicism, or they're really that naive to the tremendous logistical gap between fixing IBM's management structure and turning around the US Federal Government...
Anyways, the final installment in Ross's series gets more meaty...
Source: http://radar.oreilly.com/2009/05/the-di ... ticon.html
The Digital Panopticon
Bentham was left frustrated in his vision to build the Panopticon. But the concept endured - not just as a literal architecture for controlling physical subjects (there are many Panopticons that now bear Bentham’s stamp) - but as a metaphor for understanding the function of power in modern times. French philosopher Michel Foucault dedicated a whole section of his book Discipline and Punish to the significance of the Panopticon. His take was essentially this: The same mechanism at work in the Panopticon - making subjects totally visible to authority - leads to those subjects internalizing the norms of power. In Foucault’s words “…the major effect of the Panopticon; to induce in the inmate a state of conscious and permanent visibility that assures the automatic functioning of power. So to arrange things that the surveillance is permanent in its effects, even if it is discontinuous in its action; that the perfection of power should tend to render its actual exercise unnecessary” In short, under the possibility of total surveillance the inmate becomes self regulating.
The social technologies we see in use today are fundamentally panoptical - the architecture of participation is inherently an architecture of surveillance.
In the age of social networks we find ourselves coming under a vast grid of surveillance - of permanent visibility. The routine self-reporting of what we are doing, reading, thinking via status updates makes our every action and location visible to the crowd. This visibility has a normative effect on behavior (in other words we conform our behavior and/or our speech about that behavior when we know we are being observed).
In many cases we are opting into automated reporting structures (Google Lattitude, Loopt etc.) that detail our location at any given point in time. We are doing this in exchange for small conveniences (finding local sushi more quickly, gaining “ambient intimacy”) without ever considering the bargain that we are striking. In short, we are creating the ultimate Panopticon - with our data centrally housed in the cloud (see previous post on the Captivity of the Commons) - our every movement, and up-to-the-minute status is a matter of public record. In the same way that networked communications move us from a one to many broadcast model to a many to many - so we are seeing the move to a many-to-many surveillance model. A global community of voyeurs ceaselessly confessing to "What are you doing? (Twitter) or "What's on your mind? (Facebook)
Captivity of the Commons focused on the risks of corporate ownership of personal data. This post is concerned with how, as individuals, we have grown comfortable giving our information away; how our sense of privacy is changing under the small conveniences that disclosure brings. How our identity changes as an effect of constant self-disclosure. Many previous comments have rightly noted that privacy is often cultural -- if you don't expect it - there is no such thing as an infringement. Yet it is important to reckon with the changes we see occurring around us and argue what kind of a culture we wish to create (or contribute to).
Jacques Ellul’s book, Propaganda, had a thesis that was at once startling and obvious: Propaganda’s end goal is not to change your mind at any one point in time - but to create a changeable mind. Thus when invoked at the necessary time - humans could be manipulated into action. In the U.S. this language was expressed by catchphrases like, “communism in our backyard,” “enemies of freedom” or the current manufactured hysteria about Obama as a “socialist”.
Similarly the significance of status updates and location based services may not lie in the individual disclosure but in the significance of a culture that has become accustomed to constant disclosure.
Tech guys waking up to social conditioning implications of their own work is a beautiful thing, innit?