Is this the beginnings of Skynet? The data machines are studying us but they are not self-aware, yet. They mine our personal data, learn our behavioural patterns and predict our moves, all for the sake of network-capitalism (and industrial-capitalism). People should understand how because data equals power. So in the words of Rage Against the Machine: “We gotta take the data back!”
Or, do we head to the underground activity of the deep, dark interwebs…
This is not the place to explain how Internet Infrastructure (or it’s History, Technical Aspects or Users) enables the collection, analysis, visualisation, application development and sale of data. In this week’s post I’ll present a mini-glossary that looks at some of the advanced practices and niche places in data activity on the internet today.
Deep Web: is several magnitudes larger than the everyday internet (also called ‘Surface Web,’ the sites indexed by search engines) and contains secure, context specific and dynamically created websites/databases that laymen cannot access. It also includes private networks, P2P and bitorrent clients. There is a small shadowy part of the Deep Web where the anonymous thrive, called DarkNet (requiring a VPN, TOR/I2P client and an invite).
Dark Net (Onessa, 2012): often decentralised F2F networks that host hidden servers with activities including legal anonymity (private communication, file-sharing and dissident groups under oppressive regimes), but mostly illegal activity (including child porn, drug trafficking, weapon sales, coordinated hacking, Botnets and hit-men). Suffice to say the underground is never safe for the uninitiated, but interestingly, Silk Road (Moses, 2012) (the eBay of drugs) trades in Bitcoin, an emerging digital data-currency.
Data Science: uses the theories and techniques developed in a range of fields in order to extract meaning from large datasets (big data) to make and market data products. Areas include: statistics, pattern recognition, machine learning, advanced computing, visualisation, predictive analytics, natural language processing and data architecture.
Big data: collections of datasets whose volume, velocity (I/O speed) and variety range from terabytes to petabytes of information. Due to a wide range of input sources cataloguing many different sets of information simultaneously, big data is beyond the ability of typical database software tools to capture, store, manage, search and analyze, requiring complex parallel software running on huge groups of servers. Analytical algorithms allow correlations to be found to identify business/consumer trends, collate research data, link databases and determine real-time parameters and conditions.
Smart Data (De Goes, 2013): the latest industry jargon where predictive analytics are used on persistent streams of big data and digital footprints to create user or context specific data modules to generate revenue, such as targeted marketing & advertisements, product recommendations, page personalisations, social selling, etc.
Data Architecture: an ontology of data composed of models, policies, rules or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and in organisations. The below graphic illustrates the general components of everyday data infrastructure.
Machine Learning: part of artificial intelligence, the study of systems that can learn from data and assign/distinguish different values and make predictions based on those known values. It deals with the represenation of ‘seen’ data instances to evaluate an algorithms function then present ‘unseen’ instances and measure accurate performance when the learner can generalise from previous instances. Integral to Machine Learning is Pattern Recognition (where a label is assignment to a given input value and the attempt to assign a class to data according to a set of common values) and Natural Language Processing (the linguistic understanding and transcoding of human language to derive meaning/values). More on Deep Learning (Hof, 2013).
Data Mining: uses computational processes from machine learning with statistics and data architecture to autonomously analyse and discover new patterns in ‘big’ datasets (including: detecting anomalies, associating variables, classifying and summarising groups). These value structures are then transcoded into an understandable model for further use in predictive analytics and a wide array of Industry Uses. There is an important emerging subset in Reality Mining: the collection and analysis of data from mobile devices (social apps, geolocation, motion sensors, health sensors, etc) to predict behavioural dynamics and social patterns. Bonus: in-depth article on Big Data from Cheap Phones (Talbot, 2013).
Predictive Analytics: uses techniques from statistics, machine learning and data mining in the analysis of past and present data and variables to make predictions about future behaviours or events. Often used to identify risks, opportunities and conditions to guide decision making or build descriptive models that quantify/classify relationships in datasets to generate ‘Smart Data’ modules.
Data Visualisation (Friedman, 2007): the graphic representation (or direct display) of data to engage and effectively communicate complex ideas, patterns and statistics in a thematic way. There are a huge array of representational styles often tailored to the data and its original organisational pattern; see the Venn diagram below and check out the Periodic Table of Visualisations (Lengler, 2007).
Data Journalism (Rogers, 2011): the analysis, filtering and visualisation of open ‘big data’ sets (e.g. Academia, Government (AUS), Wikileaks) and information streams (e.g. newswires, twitter, etc) to discover/update stories in real-time, provide context for complex issues and allow readers access to relevant information from online sources. Machines performing this analysis and filtering might be called Algorithm Journalism (Marshall, 2013).
Digital Footprint: a personal data trail of the engagement with digital media (e.g. attention, location, time, clicks, searches, likes, purchases, media consumed, comments, etc.) that can be used in data/reality mining, predictive analytics, targeted marketing, and social profiling. There are trends towards Self-Tracking involving in-depth metrics from measuring tools/apps about day-to-day activities, and Personal Data Mining (Rowan, 2011) about reclaiming troves of personal information collected by companies to view your own ‘digital profile’.
Briggs, William ‘Machine Learning, Big Data, Deep Learning, Data Mining, Statistics, Decision & Risk Analysis, Probability, Fuzzy Logic FAQ’ <http://wmbriggs.com/blog/?p=6465>
Cerf, Vinton ‘Computer Networking: Global Infrastructure for the 21st Century’ <http://homes.cs.washington.edu/~lazowska/cra/networks.html>
De Goes, John ”Big Data’ is dead. Whats next?’ Venture Beat <http://venturebeat.com/2013/02/22/big-data-is-dead-whats-next/>
Eagle & Pentland ‘Reality Mining: Sensing Complex Social Systems’ MIT <http://realitycommons.media.mit.edu/realitymining.html>
Friedman, Vitaly ‘Data Visualization: Modern Approaches’ Smashing Magazine <http://www.smashingmagazine.com/2007/08/02/data-visualization-modern-approaches/>
Hof, Robert ‘Deep Learning’ MIT Technology Review <http://www.technologyreview.com/featuredstory/513696/deep-learning/>
Lenger & Eppler (2007) ‘Towards a Periodic Table of Visualization Methods for Management’ Visual Literacy <http://www.visual-literacy.org/pages/documents.htm>
Marshall, Sarah ‘Robot Reporters: A Look at the Computers Writing the News’ Journalism.co.uk <http://www.journalism.co.uk/news/robot-reporters-how-computers-are-writing-la-times-articles/s2/a552359/>
Moses, Asher ”Dark net’ drug deals boom on cyber Silk Road’ SMH <http://www.smh.com.au/technology/technology-news/dark-net-drug-deals-boom-on-cyber-silk-road-20120809-23wdj.html>
Onessa ‘DarkNet: Explained & Then Done Right’ <https://whattheserver.me/blog/darknet-explained-then-done-right/>
Quilty-Harper, Conrad ’10 ways data is changing how we live’, The Telegraph <http://www.telegraph.co.uk/technology/7963311/10-ways-data-is-changing-how-we-live.html>
Rogers, Simon ‘Data journalism at the Guardian: what is it and how do we do it?’ The Guardian <http://www.guardian.co.uk/news/datablog/2011/jul/28/data-journalism?INTCMP=SRCH>
Rowan, David ‘Persona data mining to improve your cognitive toolkit’ Wired <http://www.wired.co.uk/news/archive/2011-01/18/edge-question>
Talbot, David ‘Big Data from Cheap Phones’ MIT Technology Review <http://www.technologyreview.com/featuredstory/513721/big-data-from-cheap-phones/
Tyson, Jeff ‘How Internet Infrastructure Works’ How Stuff Works <http://computer.howstuffworks.com/internet/basics/internet-infrastructure.htm>
Anon ‘Internet Census 2012’ <http://internetcensus2012.bitbucket.org/paper.html>
Car & Bones ‘Machine Learning’ SoundCloud <https://soundcloud.com/becojo/machine-learning>
FFunction ‘What is Data Visualization?’ <http://blog.ffctn.com/what-is-data-visualization#>
Litref, ‘PunchedCard’ Wikimedia Commons <http://commons.wikimedia.org/wiki/File:Punchedcard.jpg>
LRDC ‘Data Management Reference Model’ Wikimedia Commons <http://commons.wikimedia.org/wiki/File:DBARCH_Data_Management_Reference_Model.JPG>