Enterprise Search – Then and Now

This following post was originally published on emerj.com and is based on a presentation by Daniel Faggella for Sinequa‘s INFORM 2019 client event.

Traditional Search – Then

Older search applications would usually search through structured documents, such as loan application forms. They emphasized predictable formats and matching keywords directly to their appearances in enterprise documents. Also, at the time, only natively digital text was searchable, as opposed to scanned print and handwriting. It would take some years before scanned documents and other unstructured data types became searchable.

Daniel Faggella speaking at Sinequa’s INFORM 2019 event in Paris.

Daniel Faggella speaking at Sinequa’s INFORM 2019 event in Paris.

Before machine learning, “intelligent” search applications could not handle as much metadata as current systems. This made searching for complex topics difficult. In addition, metadata was applied to documents manually. This was a time-consuming process that was required for documents that a company wished to be able to search in the future. In many cases, this continues to be the case.

Intelligent Search – Now
Current search applications can now handle all kinds of structured and unstructured content in various file types with an emphasis on classification for further accessibility. These applications could also enrich documents with metadata, allowing for concept searching and automatic document organization.

Past Difficulties Persist Today
Artificial intelligence and machine learning are not the solution to every search-related business problem. Despite how much search applications have developed over the years, companies still face some of the same difficulties as in the past. The difficulties with adopting an intelligent search application include integration, defining metadata, and determining what data is needed to search the documents a bank or financial institution wants to search.

AI startups and other vendors that are new to the intelligent search space often underestimate the difficulties their clients are likely to face with adoption. Overcoming these challenges can be hard work, and we find that many companies that are just starting out with intelligent search do not consider the commitment required to do so.

These companies often market their AI applications as easy to deploy within the enterprise. However, it is likely that they do this because they have not finished the thorough process of bringing an AI application into the enterprise. They may not have run into the common problems with data infrastructure (an ML problem that almost every enterprise data science leader struggles with) or defining their use cases (easier said than done, requires lots of business context from subject-matter experts).

What AI and ML Bring to Enterprise Search

The potential influence of artificial intelligence and machine learning on enterprise search can be understood as two important capabilities:

Making more information accessible – Making data digitally accessible using techniques such as optical character recognition, machine vision, scanning documents, and analyzing more data types. An AI application can also accomplish this by automatically adding metadata to backlogs of enterprise data.

Enabling companies to ask deeper questions – Enabling the capability of searching for broader concepts as opposed to strict keywords. This is helpful for finding insights on a general topic instead of simply every document including a few terms. Employees could search for documents and information beyond what directly pertains to a single keyword.
When observing the differences between search applications of the past and those of the present, one can see that artificial intelligence could help broaden a bank’s access to data. At the same time, the technology could transform the way in which employees search for that data, thus capitalizing on that access even more.

Use-Case Overview

Enrichment and Classification
One use case of intelligent search for banks and financial institutions is in data enrichment and classification. Documents need to be tagged with metadata, or data that describes the data within those documents. Metadata is what allows employees to search for documents using search queries with keywords and filters.

Traditionally, these documents need to be manually tagged with metadata, often upon uploading or creating them in the ideal situation. But that doesn’t always happen, and as a result, a bank’s digital ecosystem can end up very disorganized. Employees forget to tag documents or tag them incorrectly, making them difficult to find when needed.

Artificial intelligence could improve this process, but leaders at the bank will still need to decide what kind of metadata they want documents tagged with. For example, leaders at the customer service department may want to tag call center logs with metadata about the kind of problem the customer is facing and the emotional state of the caller.

Once they determine categories of metadata, subject matter experts at the department can start tagging documents with this metadata, and once this is complete, they can feed these tagged documents into the machine learning algorithm that will power the intelligent search engine. The bank will then be left with a search application that could automate and improve two parts of the search and discovery process:

Enrichment – When employees upload or create a document, the intelligent search application could automatically tag the documents with metadata, immediately preparing them for search. The application could also run through older documents and automatically add metadata to them as well.
Classification – The machine learning algorithm could also cluster the metadata into broader categories. As a result, documents that are uploaded and created could be automatically organized into folders and allow for easier search with keywords.

Example: Data Confidentiality
Banks and financial institutions could use an intelligent search application to restrict access to enterprise data based on different levels of confidentiality.

They could use these groups as thresholds for documents so that the higher one’s threshold, the more access they have. The top-level would be the most confidential, where nearly no one has access unless it is specifically defined.

The middle level might allow certain categories of people to access certain documents based on what they need to do their job. For example, an account executive for financial services may not have access to the bank’s profit and loss information. The bottom level would allow most or all employees to access openly accessible data, such as customer service agents.

Once thresholds are decided, the company’s subject matter experts and data scientists can begin to label various documents in the database according to their level of confidentiality. The company can then use that labeled data to train an algorithm to go through the rest of the database and find commonalities between all of the documents labeled under a certain threshold. The algorithm could then determine which other documents fit those patterns or involve similar topics.

Unified View of the Customer
Another use-case for intelligent search is gaining what vendors market as a unified view of customers. Customer data is often scattered across various data silos and in structured and unstructured formats, such as a history of transactions or a mortgage application respectively.

This makes it difficult for company employees, especially those that deal with customers every day, to know whether or not they have all of the information a company has on a customer when dealing with them. A wealth manager, for example, may have trouble finding all of the information about a client they need to make the best decision for their portfolio.

When we studied the vendor landscape of intelligent search applications in the banking industry, we found that 75% of the products in the space included capabilities for customer information retrieval. The unified view seems to be a point of resonance for banks and financial institutions in customer service and wealth management use-cases.

Example: Call Centers
A unified view of a customer may allow a call center agent to not only pull up a customer’s contact record in a CRM, but also their past emails with the company, call logs on their past phone calls with the company, and, in some cases, sentiment analysis information on these conversations.

As a result, the call center agent would have a better idea of how to deal with the customer; they may learn that an angry customer has been calling in frequently about overdraft fees and decide it’s better to refund the customer for those fees than to allow them to keep calling in to the support line and take up agent time.

In the future, this use-case may evolve into automated coaching for call center or live chat employees. Employees would get recommendations for how to best handle the customer and even what to sell them on. Instead of deciding for themselves whether or not to refund the irate customer, the AI software might recommend this to the employee.

Concept and Advanced Entity Search
A third use-case for intelligent search is the capability to search for broader concepts and phrases as opposed to individual words or entities. Employees could search for documents with more contextual natural language phrases, as opposed to just searching for specific keywords.

For example, an employee could search “angry customers with an account login issue between June and August” into the search application, and the software could present a list of call logs for customers fitting the criteria. Such a capability is useful for finding more information relating to concepts that could appear in various documents scattered throughout a database, especially when those concepts are discussed in tangential ways.

Example: Searching For Documents Related to LIBOR
In banking, the 2021 sunset of LIBOR may have compliance departments scrambling to search for contracts that reference it so that they might update or manage them for a post-LIBOR state of affairs. In many cases, it may still be very simple to find all LIBOR-related documents and update them via strict keyword searches.

However, there may be many documents within a database that contain LIBOR-related discussions that don’t specifically mention any keywords one might normally associate with LIBOR. Employees using traditional keyword-based search software might miss these documents,

Intelligent enterprise search software could help employees find these documents. Subject matter experts could first find documents that appear to only suggest LIBOR-related discussion and label these documents.

Data scientists could then run this labeled data through the machine learning algorithm behind the search software, and this would train the software to pick up on the patterns that tend to constitute LIBOR-related discussion within a document. As a result, employees could type “LIBOR” into the search application, and the software would return LIBOR-related documents that compliance officers would want to stay on top of.

This way, employees do not have to guess which of the results actually reference LIBOR without mentioning it directly, manually reading through documents to find LIBOR-related discussion. Instead, they would search for LIBOR as a concept, and the algorithm would search the enterprise database for entities/phrases related to that concept.

+1Share on LinkedInShare on Twitter

Becoming Information-Driven Begins with Pragmatic AI

d_schubmehl_m
Written by guest blogger, David Schubmehl, IDC Research Director, Cognitive/Artificial

Intelligence Systems.  Sponsored by Sinequa.

Over the last several years, I’ve spoken to many organizations that have all asked the same question: How can we most effectively make use of all of the research, documents, email, customer records and other information that our employees have collected over the years, especially those that are now retiring? In the past, organizations had corporate libraries and corporate librarians whose job it was to help collect, organize, and disseminate information to employees and staff when and where they needed it. That department and positions are long gone from most organizations today. Why have they gone? The rate of data and documents (including research papers, contracts, and even emails) has exploded, making this task impossible. But let’s be honest: even before today’s information explosion, no classification system could ever keep up with the fast pace of change in the economy. No one could have foreseen today’s most important questions, in content categories that did not exist until today. And with the baby boomers retiring at an ever-increasing rate, an urgent question must be asked: How do organizations get the most value from the vast amounts of information and knowledge that they’ve accumulated over decades?

IDC has identified the characteristics of organizations that are able to extract more value out of the information and the data available to them. Leader organizations make use of information access and analysis technologies to facilitate information access, retrieval, location, discovery, and sharing among their employees and other stakeholders. These insight leaders are characterized by:

  • Strategic use of information extracted from both content and data assets
  • Efficient access to unified and efficient access to information
  • Effective query capabilities (including dashboards)
  • Effective sharing and reuse of information among employees and other stakeholders
  • Access to subject matter experts and to the accumulated expertise of the organization
  • Effective leverage of relationships between information from different content and data sources

So how can artificial intelligence (AI) and machine learning affect information access and retrieval? The types of questions that are best answered by AI-enabled information access and retrieval tools are those that require input from many different data sources and often aren’t simple yes/no answers. In many cases, these types of questions rely on semantic reasoning where AI makes connections across an aggregated corpus of data and uses reasoning strategies to surface insights about entities and relationships. This is often done by building a broad-based searchable information index covering structured, unstructured, and semi-structured data across a range of topics (commonly called a knowledge base) and then using a knowledge graph that supports the AI based reasoning.

AI-enabled search systems facilitate the discovery, use, and informed collaboration during analysis and decision making. These technologies use information curation, machine learning, information retrieval, knowledge graphs, relevancy training, anomaly detection, and numerous other components to help workers answer questions, predict future events, surface unseen relationships and trends, provide recommendations, and take actions to fix issues.

Content analytics, natural language processing, and entity and relationship extraction are key components in dealing with enterprise information. According to IDC’s Global DataSphere model developed in 2018, of the 29 ZB of data creation, 88% is unstructured content that needs the aforementioned technologies to understand and extract the value from it. In addition, most of this content is stored in dozens, if not hundreds of individual silos, so repository connectors and content aggregation capabilities are also highly desired.

AI and machine learning provide actionable insights and can enable intelligent automation and decision making. Key technology and process considerations include:

  • Gleaning insights from unstructured data and helping to “connect the dots” between previously unrelated data points
  • Presenting actionable information in context to surface insights, inform decisions, and elevate productivity with an easy-to-use application
  • Utilizing information handling technologies that can be used in large scale deployments in complex, heterogeneous, and data-sensitive environments
  • Enriching content automatically and at scale
  • Improving relevancy continuously over time, based on user actions driven by machine learning
  • Improving understanding by intelligently analyzing unstructured content

IDC believes that the future for AI-based information access and retrieval systems is very bright, because the use of AI and machine learning coupled with next-generation content analysis technologies enable search systems to empower knowledge workers with the right information at the right time.

The bottom line is this: enabled by machine learning–based automation, there will be a massive change in the way data and content is managed and analyzed to provide advisory services and support or automate decision making across the enterprise. Using information-driven technologies and processes, the scope of knowledge work, advisory services, and decisions that will benefit from automation will expand exponentially based on intelligent AI-driven systems like those that Sinequa is offering.

For more information on using AI to be an information leader, I invite you to read the IDC Infographic, Become Information Driven, sponsored by Sinequa at https://www.sinequa.com/become-information-driven-sinequa/

+1Share on LinkedInShare on Twitter

Keeping Secrets Secret: How to Industrialize Information Privacy

Banks run on trust. At the core of trust is protecting the privacy of client information. Clients expect it. Regulators require it. Though a challenge for any financial institution, this challenge amplifies at complex global banks. Traditional approaches rely on human skill and craft, rather than on software. This means the average information privacy process isn’t industrialized or providing systematic assurance that it’s working.

Click here to download the solution white paper to learn how one of the world’s top 20 banks addressed this challenge.

keeping secrets - stamp draft

+1Share on LinkedInShare on Twitter

How Biopharmaceutical Companies Can Fish Relevant Information From A Sea Of Data

This article originally appeared in Bio-IT World

Content and data in the biopharmaceutical industry are complex and growing at an exponential rate. Terabytes from research and development, testing, lab reports, and patients reside in sources such as databases, emails, scientific publications, and medical records. Information that could be crucial to research can be found in emails, videos, recorded patient interviews, and social media.

school-of-fish

Extracting usable information from what’s available represents a tremendous opportunity, but the sheer volume presents a challenge as well. Add to that challenge the size of biopharmaceutical companies, with tens of thousands of R&D experts often distributed around the world, and the plethora of regulations that the industry must adhere to—and it’s difficult to see how anyone could bring all of that content and data together to make sense of it.

Information instrumental to developing the next blockbuster drug might be hidden anywhere, buried in a multitude of silos throughout the organization.

Companies that leverage automation to sift through all their content and data, in all its complexity and volume, to find relevant information have an edge in researching and developing new drugs and conducting clinical trials.

This is simply not a task that can be tackled by humans alone—there is just too much to go through. And common keyword searches are not enough, as they won’t tell you that a paper is relevant if the search terms don’t appear in it, or if a video has the answer unless the keywords are in the metadata of the video.

Today, companies can get help from insight engines, which leverage a combination of sophisticated indexing, artificial intelligence, and natural language processing for linguistic and semantic analyses to identify what a text is about, look for synonyms and extract related concepts. Gartner notes that insight engines, “enable richer indexes, more complex queries, elaborated relevancy methods, and multiple touchpoints for the delivery of data (for machines) and information (for people).” A proper insight engine does this at speed, across languages, and in all kinds of media.

For biopharmaceuticals, this is particularly powerful, allowing them to correlate and share research in all forms over widely distributed research teams. Here are several ways biopharma companies can use insight engines to accelerate their research.

Find A Network Of Experts

Many companies struggle to create the best teams for new projects because expertise is hidden in large, geographically-distributed organizations with multiple divisions. A drug repositioning project might require a range of experts on related drugs, molecules, and their mechanisms of action, medical experts, geneticists, and biochemists. Identifying those experts within a vast organization can be challenging. But insight engines can analyze thousands of documents and other digital artifacts to see who has experience with relevant projects.

The technology can go further, identifying which experts’ work is connected. If they appear together in a document, interact within a forum, or even communicate significantly via email, an insight engine can see that connection and deduce that the work is related. Companies can then create an “expert graph” of people whose work intersects to build future teams.

This technique can extend beyond the borders of the company, helping to identify the most promising collaboration partners outside the company in a given field, based on publicly available data, such as trial reports, patent filings and reports from previous collaboration projects.

Generate R&D News Alerts

Biopharma companies can also use insight engines to watch for new developments in drug research and stay on top of the latest trends. These news alerts can go beyond typical media sources to include scientific publications, clinical trial reports, and patent filings.

This capability can be used on SharePoint, Documentum, or other sources within a large company to surface relevant information. An insight engine ensures the right information gets to the right people in the right context, and in a timely way.

Optimize Clinical Trials

Clinical trials that stretch over many years generate millions of datasets for every drug and study provide a treasure trove of data. Biostatisticians can ensure they get a comprehensive list of patients having certain diseases within trials on a drug, something nearly impossible with traditional methods.

They can also search and analyze across many drugs and studies, across content and data silos. Over time, this allows biopharmaceutical companies’ growing number of clinical trials to become a valuable asset that can be easily leveraged across a growing number of use cases.

All of these uses can lead to biopharma companies developing new drugs more quickly and getting them to market faster—necessary as these companies face tremendous pressure to innovate quickly and develop new promising drugs as patents for older drugs expire. With insight engines, they can make every part of the journey more efficient, from research, to clinical trials, to regulatory processes, presenting incredible opportunities for everyone in this field.

 

+1Share on LinkedInShare on Twitter

Sinequa Featured in IDC Technology Spotlight Dedicated to Financial Services Organizations

ScreenHunter_1549 Jan. 16 16.48With increased regulatory pressures, data silo proliferation and cognitive drain on analysts, AI-powered platforms become a key enabler to extract insights from data.

Today, we announced that Sinequa is featured in a new IDC Technology Spotlight report: Financial Services Organizations: Extracting Powerful Insights with AI-Powered Platforms. The report, written by Steven D’Alfonso, research director, IDC Financial Insights, and David Schubmehl, research director, Cognitive/AI Systems, highlights the importance of AI-powered platforms in their ability to extract insights from data as well as the need for financial services organizations (FSOs) to improve their capabilities to derive insights from the data they possess.

According to the report, collecting and maintaining increased amounts of data related to their clients and portfolios can provide major opportunities to improve the customer experience and increase revenue while reducing risk. But at the same time, too much data can be a cognitive drain on analysts and knowledge workers. This increasing need to collect data from multiple applications requires FSO stakeholders to organize and provision their data in ways that allow analysts to extract meaningful insights. AI can help FSOs mature from being data-driven to being information-driven.

“Over the years, Sinequa has continued to expand its footprint within leading financial institutions such as Credit Agricole, DZ Bank, LCL, Navy Federal Credit Union, and U.S. Bank as our platform enables them to tackle the challenges highlighted in this report,” said Scott Parker, director of product marketing at Sinequa. “By offering a broad-based AI-powered platform including search, content analytics, semantic understanding and auto categorization technologies, Sinequa provides relevant insights to users in their work environments, while supporting a range of machine learning algorithms and capabilities to improve findability and relevance, allowing FSOs to access the information they need when they need it.”

With the demand for AI technologies that enable intelligent analytics increasing every year, IDC estimates that “by 2022 spending on AI technologies will grow to over $8 billion, up from $2 billion in 2017.” Sinequa has in the past offered a flexible information collection, access and analysis architecture and now provides cognitive capabilities, such as machine learning, natural language processing, improved relevance and better decision support, while offering intuitive user and data interaction capabilities.

To learn more, click here or on the banner below to sign up for the webinar.

idc-finance-webinar

+1Share on LinkedInShare on Twitter