Post

Introduction to Natural Language Processing

  • NLP is the field of AI concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.

Basics of data:

  • There are two types of data that exist: Structured and unstructured.
    1. Structured data
      • Is data that has a pre-defined specific format. For example, spreadsheets, tables, databases.
    2. Unstructured data,
      • Is data that doesn’t follow a pre-defined specific format. For example, emails, chats, blogs, books, audio files, video files, images.
  • We humans communicate predominantly in unstructured data, specifically via natural language and so it contributes to 70-90% of data in the world.
  • This data, like all data, contains power: Power to know your customers better, power to increase efficiency and the power to scale.
  • Businesses are in a race to be the first in their industry to unlock the potential of their unstructured data.

There are Two parts to NLP:

Natural Language Processing - Parts

  1. Natural Language Understanding
    • Refers to mapping the given input from Natural Language into formal representation and analysing it.
    • If the input is in the form of audio, then Speech Recognition is applied first to convert it into text. Then the hard part starts – interpreting the meaning from the text as words have different meanings based on the context.
    • Once the meaning is extracted categorise the input and come up with an appropriate action or response.
  2. Natural Language Generation
    • Is the process of producing meaningful phrases and sentences in the form of natural language from some internal representation.
    • NLG is generally much easier than NLU since when we know the meaning that needs to be expressed, there are certain rules of language such as syntax and semantics that need to be followed in order to create an appropriate sentence.
    • If the text needs to be put into audio, like in the case of Siri or Alexa, then Speech Generation comes in, but this is not always required.

Applications of Natural Language Processing

Sentiment analysisto help you understand how users feel about your company or product
Speech recognitionto convert audio into text for further processing.
Chatbotsto provide 24/7/365 customer support.
Machine translationto help you reach new markets with minimal investment.
Autocompleting textsimilar to what Gmail does to help you increase your employee’s efficiency through a centralized knowledge base. Or to improve how your customers interact with your product / website.
Spell checkingto make sure that documents don’t contain errors. Especially relevant in industries with high compliance requirements such as banking, insurance and finance.
Keyword searchto locate relevant information faster across all data facilities, leading to increased efficiency.
Advertisement matchingFor example, when you search for a new car on Google or write an email about it to a friend, you will later receive car adverts.
Information extractionto extract the essence from large volumes of unstructured data.
Spam detectionto clean your email inbox.
Text generationto create new text, e.g. client contracts, research documents, training materials, and so on.
Automatic summarizationto add summaries to existing documents.
Questions answeringto answer specific questions based on disparate information sources.
Image captioningto annotate images.
Video captioningVideos can take a long time to watch. Sometimes it’s faster to extract information from them in the form of text and then process it further using other NLP techniques like summarization.
This post is licensed under CC BY 4.0 by the author.