Do your own datamining

Moderators: Elvis, DrVolin, Jeff

Do your own datamining

Postby Searcher08 » Wed Apr 09, 2014 10:41 am

So a serious question for all...

Does anyone know how to datamine a large forum thread?

I have been searching for software that can automatically make a large Forum thread into a Wiki or into some form of Marc Lombardi style interconnected mind-map.

Any ideas / leads?
User avatar
Searcher08
 
Posts: 5887
Joined: Thu Dec 20, 2007 10:21 am
Blog: View Blog (0)

Re: Do your own datamining

Postby Pushkarev » Wed Apr 09, 2014 11:13 am

Maltego used to be the go-to software years ago, but now you have to buy it (at least from memory, things might have changed). There are a bunch of similar tools in Kali Linux (formerly backtrack linux), a penetration testing OS.

Things off the top of my head:

I use archive.org and HTTrack to archive websites. The former makes a public copy of the website, while the latter gives you a private copy.

There is a tool in kali linux that harvests emails and names from target websites: https://code.google.com/p/theharvester/

metagoofil is useful for extracting meta data from documents: http://www.edge-security.com/metagoofil.php

for organizing target information, there is the dradis framework: http://dradisframework.org/
Pushkarev
 
Posts: 35
Joined: Fri Jan 10, 2014 6:32 am
Blog: View Blog (0)

Re: Do your own datamining

Postby DrEvil » Wed Apr 09, 2014 4:43 pm

Here's a list of open source data mining software from wikipedia:

http://en.wikipedia.org/wiki/Data_mining#Software

Most of it is decidedly not user friendly though..

One thing that would be nice to have is a sort of reverse dictionary look-up, giving a list of everything that's not a word, e.g. names, companies etc. You could then make that list available to the forum for refinement, flagging and removing nonsense words, linking different spellings of the same name (Usama/Osama) and adding stuff the algorithm missed (companies whose names are also words etc.).

The end result should hopefully be a sort of "master key" you could use to map out the forum and various links between actors that wasn't obvious before ("Hmm. This guy keeps showing up in the strangest places..").

Not strictly related, but you might also want to look into GIS (Geographic Information System) software.
With the amount of structured data available for free you can make some very interesting maps (crime vs. weed consumption, pollution vs. Koch owned businesses etc.).

Here's the wiki-page for GIS software (again, not exactly user friendly):

http://en.wikipedia.org/wiki/List_of_ge ... s_software
"I only read American. I want my fantasy pure." - Dave
User avatar
DrEvil
 
Posts: 3981
Joined: Mon Mar 22, 2010 1:37 pm
Blog: View Blog (0)

Re: Do your own datamining

Postby 82_28 » Wed Apr 09, 2014 5:25 pm

No, no, no, no. Lord Balto would not go along with that. Because that is exactly how it is done -- one must go meta and never rely on a geneology outfit. Takes time and not a mouse click. One must cross-reference. Sorry and thanks, Lord Balto!
There is no me. There is no you. There is all. There is no you. There is no me. And that is all. A profound acceptance of an enormous pageantry. A haunting certainty that the unifying principle of this universe is love. -- Propagandhi
User avatar
82_28
 
Posts: 11194
Joined: Fri Nov 30, 2007 4:34 am
Location: North of Queen Anne
Blog: View Blog (0)

Re: Do your own datamining

Postby Luther Blissett » Thu Apr 10, 2014 11:30 am

Marc Lombardi was an artist. No algorithm can replace that gracefully and augustly without a lot of human intervention.

However, since I one day would like to partially pick up where Lombardi left off, pm me if you want to collaborate on a project (I'm always building on some of my own self-initiated data visualizations). I would be more than excited to work on something.
The Rich and the Corporate remain in their hundred-year fever visions of Bolsheviks taking their stuff - JackRiddler
User avatar
Luther Blissett
 
Posts: 4990
Joined: Fri Jan 02, 2009 1:31 pm
Location: Philadelphia
Blog: View Blog (0)

Re: Do your own datamining

Postby Wombaticus Rex » Thu Apr 10, 2014 11:40 am

Fascinating thought.

I'm assuming any algorithmic approach to organizing the multifarious networks examined in long-form RI threads (the recent Fabian thread would be an ideal nightmare of a test case) would result in mostly noise -- but perhaps, that is the real value of this exercise.

IE, we're not going to have a machine-assisted learning approach make sense of this for us, but it would be a source of novel connections and strange parallels.
User avatar
Wombaticus Rex
 
Posts: 10896
Joined: Wed Nov 08, 2006 6:33 pm
Location: Vermontistan
Blog: View Blog (0)

Re: Do your own datamining

Postby smiths » Fri Apr 11, 2014 8:41 am

the question is why, who, why, what, why, when, why and why again?
User avatar
smiths
 
Posts: 2205
Joined: Wed May 18, 2005 4:18 am
Location: perth, western australia
Blog: View Blog (0)

Re: Do your own datamining

Postby smiths » Fri Apr 11, 2014 8:53 am

ahhh, it would be incredible to mine the data and to develop a brilliant, clear and aestetically pleasing way to render that information, maps are excellent, animations are also fantastic

connections over time and space

there has been an explosion of visualization forms and tools in the last few years, to connect the 'reality' of the last few years to an incredible visual form and to sit it atop a mountain of data and connections and ... meaning, ahhh, that would be sweet

even something simple like Prezi could be quite handy, in fact i might have a go myself
the question is why, who, why, what, why, when, why and why again?
User avatar
smiths
 
Posts: 2205
Joined: Wed May 18, 2005 4:18 am
Location: perth, western australia
Blog: View Blog (0)

Re: Do your own datamining

Postby Luther Blissett » Fri Apr 11, 2014 10:59 am

smiths » Fri Apr 11, 2014 7:53 am wrote:ahhh, it would be incredible to mine the data and to develop a brilliant, clear and aestetically pleasing way to render that information, maps are excellent, animations are also fantastic

connections over time and space

there has been an explosion of visualization forms and tools in the last few years, to connect the 'reality' of the last few years to an incredible visual form and to sit it atop a mountain of data and connections and ... meaning, ahhh, that would be sweet

even something simple like Prezi could be quite handy, in fact i might have a go myself


I have a copy of this book on my desk at all times and constantly refer to it for inspiration: Visual Complexity by Manuel Lima. The examples he collects strive for clarity and function, it's just a shame that many of them are in some service of capital. The majority are related to social sciences. I am trying to apply it to matters of the deep state, crimes against humanity, and high weirdness.

I imagine that a lot of this was made with Python or Processing. I still love manipulating data by hand using regular old vector software and math though.
The Rich and the Corporate remain in their hundred-year fever visions of Bolsheviks taking their stuff - JackRiddler
User avatar
Luther Blissett
 
Posts: 4990
Joined: Fri Jan 02, 2009 1:31 pm
Location: Philadelphia
Blog: View Blog (0)

Re: Do your own datamining

Postby Wombaticus Rex » Fri Apr 11, 2014 1:05 pm

^^Just ordered that, thank you sir.
User avatar
Wombaticus Rex
 
Posts: 10896
Joined: Wed Nov 08, 2006 6:33 pm
Location: Vermontistan
Blog: View Blog (0)

Re: Do your own datamining

Postby DrEvil » Fri Apr 11, 2014 2:08 pm

While perusing the Visual Complexity link (which is awesome. Thanks!) I came across this link too, which looks interesting. A nice collection of data sets.

http://www.visualizing.org/
http://www.visualizing.org/data/browse

And I just remembered - you can do all kinds of fun stuff to data in a regular spreadsheet program.
Sorting, filtering, graphing etc.

And notepad! Great for converting stuff to raw text. Paste it in, copy it out, and all formatting except line breaks and Tabs(indentations or whatever it's called) is gone. It struggles with very large data sets, so be careful. :)
"I only read American. I want my fantasy pure." - Dave
User avatar
DrEvil
 
Posts: 3981
Joined: Mon Mar 22, 2010 1:37 pm
Blog: View Blog (0)

Re: Do your own datamining

Postby Luther Blissett » Fri Apr 11, 2014 4:54 pm

Awesome, thanks for those links.
The Rich and the Corporate remain in their hundred-year fever visions of Bolsheviks taking their stuff - JackRiddler
User avatar
Luther Blissett
 
Posts: 4990
Joined: Fri Jan 02, 2009 1:31 pm
Location: Philadelphia
Blog: View Blog (0)

Re: Do your own datamining

Postby justdrew » Sat Apr 12, 2014 2:04 am

The Good Judgment Project is an experiment put together by three well-known psychologists and some people inside the intelligence community. What they aim to prove is that average, ordinary people in large groups and access just to Google search can predict far more accurately events of geopolitical importance than smart intelligence analysts with access to actual classified information. In fact there is a clearly identified top 1 percent of the 3000 predictors group, who have been identified as super-forecasters: people whose predictions are reportedly 30 percent better than intelligence officers."
By 1964 there were 1.5 million mobile phone users in the US
User avatar
justdrew
 
Posts: 11966
Joined: Tue May 24, 2005 7:57 pm
Location: unknown
Blog: View Blog (11)

Re: Do your own datamining

Postby Wombaticus Rex » Thu Apr 17, 2014 8:39 am

Wombaticus Rex » Fri Apr 11, 2014 12:05 pm wrote:^^Just ordered that, thank you sir.


Holy shit, what a wise move that turned out to be.

Anyone writing a book about data visualization will be compared to Tufte, this is both a tribute to Tufte's influence and a sad statement on the centimeter-thick referential depth of contemporary critics.

Still, I think the comparison is more than apt in the case of Manuel Lima. The book is a bevy of thought-provoking eye candy, to be sure, but where it really excels is the writing itself, which would merit publication without a single illustration being involved. Lima manages to navigate a ton of history and mines deeply for insights instead of dropping a Markov Chain of names & concepts. While the prose occasionally gets clunky, and many passages smell suspiciously like imported blog content, the thinking is always brilliant.

Lombardi makes an appearance in Chapter 3, too.
User avatar
Wombaticus Rex
 
Posts: 10896
Joined: Wed Nov 08, 2006 6:33 pm
Location: Vermontistan
Blog: View Blog (0)

Re: Do your own datamining

Postby JackRiddler » Thu Apr 17, 2014 3:51 pm

Here's a high point in the art of presenting data. Note the source:

Image
We meet at the borders of our being, we dream something of each others reality. - Harvey of R.I.

To Justice my maker from on high did incline:
I am by virtue of its might divine,
The highest Wisdom and the first Love.

TopSecret WallSt. Iraq & more
User avatar
JackRiddler
 
Posts: 15987
Joined: Wed Jan 02, 2008 2:59 pm
Location: New York City
Blog: View Blog (0)

Next

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 9 guests