Almost every crime leaves a proverbial “paper trail”, but when the probative information spans a vast volume of confiscated documents, electronic records, e-mail, wiretap transcripts, observation reports, cold cases, intelligence, information from telecom and internet providers, handwritten notes, audio files, pictures, video’s and social network posts, it is hard to link and combine all these information sources in order to eventually find “the smoking gun”. Managing massive amounts of information and making sure your team members can collaborate effectively in these large investigations is a daunting task.
The ongoing information explosion is reaching epic proportions and has earned its own name: Big Data. Gartner, and now much of the industry, use the so-called “3Vs” model to classify Big Data: “Big data is high Volume, high Velocity, and/or high Variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.” Obviously, this described exactly the type of data investigators have to deal with today.
Big Data encompasses both challenges and opportunities. The opportunity, as focused on by many investigators, is to use the collective Big Data to identify and recognize patterns of behavior and collect evidence of criminal behavior. But there is also a dark side to Big Data: investigating and analyzing Big Data collections is an enormous challenge and puts lots of pressure on investigative teams and resources. New data formats (multimedia, in particular), different languages, cloud and other off-side locations and the continual increase in regulations and legislation—which may contradict previous protocols—add even more complexity to this puzzle.
Todays’ investigators deal with Tera-bytes or even Peta-bytes of digital data in various languages that is stored in many different electronic formats and on many different locations. To create a case around one or multiple suspects, to collect relevant data on a location of interest or to investigate a criminal case, these numerous sources of unstructured information have to be accessed, information has to be collected, normalized, analyzed, linked, enriched, interpreted, verified and applied to the case. Most agencies deal with homegrown point solutions to access and process these different types of information and use different software tools that do not necessarily integrate well.
During the Enron investigation, the FBI confiscated 12 million pages of paper documents plus terabytes of e-mail and other electronic files that need to be made completely searchable to conduct sophisticated queries, and create evidence annotations. The investigators also had to thoroughly, accurately, and promptly archive all this information and organize and disclose evidence in a logical, complete and auditable manner.
A similar paramount challenge was faced by the investigators involved with the International Criminal Tribunal for the former Yugoslavia (ICTY). The evidence collection of the tribunal involves the analysis of tens of millions of pages of case documentation, in more than ten different languages, as well as vast amounts of electronic files, e-mails and attachments, third-party databases, and the testimonies of hundreds of suspects, witnesses, and victims. Adding to the complexity, hundreds of attorneys and prosecutors involved in the case requested different sets of data. Needless to say, the question of how best to manage and distribute all the case information was one of the first issues that had to be addressed back in the mid-nineties.
These projects show why search alone is not enough to address the full range of requirements users face when conducting large-scale investigations. Times have changed, and today not only is almost all data electronically created in thousands of formats, it is stored on many different locations (physically and in the cloud) and are a full range of corresponding tools needed to manage and document the law enforcement process demanded by an increasingly sophisticated user base.
In addition to requiring redeployed search and text-mining technology, large-scale investigations like these need flexible data identification, – capturing, – collection and – preservation tools, support for many different file formats and languages, data normalization and enrichment tools, content analytics, support for multi-media, open long-term data storage, privacy-, privileged-, health- and other data protection compliance, and flexible disclosure and production tools.
Defensibility, auditing, quality control and the chain of custody are paramount for classification processes during internal and law enforcement investigations. If you cannot explain exactly how the classification and correlation process are implemented and executed, you will have a hard time in court.
Information protected under privacy and data-protection regulations or attorney-client privileged information for example requires constant attention during investigations. Violations of such regulations and rights are very counterproductive and result in many problems later in the investigation or during trials and even lead to lower punishment or dismissals.
At the same time, every investigative team has limited resources and will have to answer to very strict deadlines. This requires strict resource planning and constant monitoring of progress.
Several of the above mentioned requirements are contradicting in nature. In essence, we need computers and advanced algorithms to deal with these contradicting requirements and at the same time battle the data explosion. This is where data content analytics come into play: without the right kind of technology it is virtually impossible to manage today’s projects in law enforcement and internal investigations. Not only are the data volumes too large, the investigation process itself has also become much more complex and demanding and the effects of errors have much larger financial-, legal- and political effects.
In the upcoming years, this problem will only become bigger as the size of electronic data collections, the type of data (more multimedia), the location of data (cloud, social networks, …) and complexity of legislation will only continue to increase!
I can only recommend anybody in law enforcement and internal security to use technology to fight the effects of technology and at the same time, stay compliant with a very valid legal framework which is created to protect the rights of all parties involved.