Saturday, 9 December 2023

Whales, Generative AI and Enterprise use cases

 


Whales, Generative AI and Enterprise use cases

Whales are symbol of vastness, communication and wisdom across the world, power of whales has inspired numerous stories including Mobi Dick amongst other. The race of illusive 'white whale' is now shifted in digital ocean in the world of Generative AI.

This article shares how language models are evolving with analogy of whales and how small units of focused models (Orca) should be the focus area for enterprise AI and Analytics.

Why Whales and Language Models

Nomenclature of language models as whales is not just a coincidence, the metaphor has strong correlation.
Whale-Beluga-Orca
Whale-beluga-orca metaphor highlights the dynamic relationship between large and small language models, Large Models a.k.a. foundational models provides vast knowledge with sizable processing power, hence termed whale (blue whale), Small Models leverage foundational models to become specific and agile for one of more set of tasks. Interesting metaphor includes 'Free Willy' models with the idea of smaller models breaking free from the constraints of LLMs, Microsoft Research's Orca and Orca 2 are built for specific usage of Models with emphasis on research.

Taxonomy: Whales and Models

An illustrative comparison of reasons for naming a model and a whale.

Blue Whale: The largest animal on Earth, known for its deep, haunting vocalizations. Potential LLM associationOpenAI's GPT or AI at Meta's LlaMa2 due to its vast size and impressive text generation capabilities.

Fin Whale: Another large whale, known for its speed. Potential LLM associationNVIDIA AI's Megatron Turing NLG - massive model boasting performance in various tasks.

Beluga: 'Sea Canaries' Vocal and playful (able to swim backward). Potential Model AssociationStability AI's Free Willy 2, Beluga 1 & 2 for its focus on reasoning and complex question answering, mimicking the intricacies of Beluga.

Orca (Killer Whale): Intelligent and social, known for advanced hunting techniques and vocalizations. Potential Model association: Microsoft's Orca(#) models are specifically designed to analyze and understand complex behaviors and interactions.

Enterprise Use cases: LLM and SLM

A use case comparison will help illustrate the broad stroke and a fined-tuned use case of language models, do note that with the advancement of industry-specific GenAI models, there will be further classification on SLMs.

Conclusion:

With multiple SLM and LLM options, it's important to know the use case, Mistral AI's Mistral 7B and Orca 2 can be a good starting point for an enterprise to embark on 'data-driven AI journey'


Friday, 8 September 2023

Generative AI Platform for your organization

 Generative AI Platform for your organization

ChatGPT Enterprise or Custom Platform

Less than 2 weeks back OpenAI announced ChatGPT Enterprise, an enterprise grade access to GPT-4 with better context window and higher processing input capabilities, presenting here a PoV on should you choose ChatGPT Enterprise or use custom platform for your Generative AI needs.

Everyone needs Generative AI Platform, and they need it now

Generative AI is rapidly consuming the space (and time) owned by business analysts for enterprise decisions. Today CTO and CDO’s top priority is to have data and digital decisions ‘validated’ based on data from Generative AI , whether this is via GenAI solutions like ChatGPT, Meta LLMs or custom LLM models trained by institutes and organizations, here are five reasons why a comprehensive Generative AI platform is a must have.

  1. Enterprise Productivity: Generative AI usage boosts enterprise productivity in the range of ad-hoc chats to automated code and image generation. A platform with right governance ingredients is essential for an organization to meet demands from all units.
  2. Data Security: Like AI, Generative AI’s data lineage is still evolving, this means strong data security implementation is required to protect people, brand and the organization.
  3. Cost: Every CTO/CDO has experience of managing multiple SaaS Costs and would love to do things differently, esp. in areas of managing OpEx, any day - a platform is better and cost effective option than to have investment in multiple solutions.
  4. Vendor Lock-In : Rate of change in Generative AI landscape is possibly 100 times faster than cloud hype curve, with change every day - a platform is essential to cater for rapid delta. Organizations should avoid vendor lock-in and a Generative AI platform is the only way to address this challenge.
  5. Data Integrations: ‘Data-centric AI’ is reaching Gartner hype curve peak in next year or so, this means most organization will have their data platforms integrated with Generative AI platform.







ChatGPT Enterprise


With ChatGPT enterprise offerings available, let's compare the features with that of custom option.

Security:

  • Customer prompts and company data are not used for training OpenAI models.
  • Data encryption at rest (AES 256) and in transit (TLS 1.2+)
  • Certified SOC 2 compliant

Deployments:

  • Admin console with bulk member management
  • SSO
  • Domain verification

Collaboration and Analytics

  • Analytics dashboard for usage insights
  • Shareable chat templates for your company to collaborate and build common workflows
  • Unlimited access to advanced data analysis

LLM

  • Higher-speed performance for GPT-4 (up to 2x faster)
  • Unlimited access to GPT-4
  • 32k token context windows for 4x longer inputs
  • Free credits for custom solution












Custom Generative AI Platform


In House Platform:

  • Your platform, your ideas, its in your control

Your data in control

  • Keep your data in control, PII or Non PII, you have the control

For All or for a few

  • Choose who needs the platform access, you manage.

Get reports when you need

  • reporting and infographics when you need

Research and Labs

  • Create custom views for research members and devise labs

Integrate with systems

  • Integrate platform events and data with existing systems

Auditing and logging

  • Merge logs and audits with enterprise auditing and logging

Change and Deploy

  • Change vendor, broadcast and deploy when you want.










Generative AI Platform Selection Process




Saturday, 6 May 2023

Metaverse and Generative AI

In 2019, the metaverse was still a relatively niche concept, mainly discussed among gaming and technology enthusiasts. Major companies such as Facebook, Microsoft, and Nvidia have announced significant investments in metaverse development that has helped to increase its visibility and legitimacy in the mainstream.

Metaverse has seen a resurgence in hype and interest in recent years, especially since the beginning of 2021. While the concept has been around for several decades, it has gained renewed attention due to advancements in technology, such as virtual and augmented reality, as well as the COVID-19 pandemic, which has driven the demand for virtual experiences.

The recent hype surrounding non-fungible tokens (NFTs), which are digital assets that can be bought, sold, and traded like physical assets, has also contributed to renewed interest in the metaverse. NFTs are seen as a potential solution for creating digital scarcity and ownership in virtual worlds, which is a key aspect of the metaverse concept.

Overall, while the metaverse may have experienced ups and downs in its hype cycle over the years, now Generative AI is helping push renewed interest and investment in Metaverse, with many companies exploring its potential for a variety of applications beyond gaming.


Saturday, 27 June 2020

Data Lake : Swamp and DataOps

 Data Lake is getting popular in consumer and enterprise data strategy as it proposes a wide variety of ingestion, conformance, analytical, and visualization offerings. 

As the interest and adoption of Data Lake grow across multiple sectors, best practices, potential pitfalls, and operationalization techniques are blended into solutions and products, More than often best practices are followed at their best during the initial days and lose focus after the post-implementation phases.


Although a decade-old term now, "Data Lake" can be quick-referenced with the ecology of a freshwater lake, where water (data) is often collected from small streams (e.g. batch logs, weblogs) to large rivers (e.g. unstructured data, images, videos). 

Like the Littoral zone in any freshwater lake, a data lake has staging zone(s) where specific types of data are analyzed, filtered, and consumed. The photic zone is within eyesight and can host batch, ETL/ELT processes within the data lake infrastructure. Aphotic zone stores archived content and have a possible used case for large-scale data mining.


Unlike Data Mart or Database, the storage of Data Lake is usually a network file-store with compute capability for transformation and visualization of data. Data Lake with mature infrastructure can have multiple ranges of storage options from NAND/SSD (fast IO use cases) to Tapes / Vaults (Archives), and low intensity compute (IOT) to large-scale GPU farm for Streaming Analytics (multiple stream - CV use cases)


While there is general awareness of data lake implementation and operational procedures, often data office and CDO come across challanges after successful implementation of data transformation programs.


Data Swamp:
Image: https://www.dreamstime.com
Data Lake architecture is one of the strongest solutions to offer accessibility and democratization of data, biggest hurdle to this vision is 'Data Swamp'.

A data lake becomes a data swamp when data is accumulated and stored without categorization and there is no process to identify and clean the congestion within the lake. Data Swamps eliminate the democratic and accessibility aspects of CDO vision and this is the biggest challenge for data office and technology teams.

Often data swamps are results of in-adequate co-ordination beteeen data governance and technology implementation teams. As observed, many organization do not have clear roadmap or even a vision of how data will be consumed internally and externally.  


DataOps:

Effective utilization of data with appropriate controls needs organization vision and roadmap. Technology and Operations have TechOps ways of working, while Data Group and Operations define Data Governance.

"A data platform built on agreed set of principal and roadmap that helps transform information to actionable insights for organization."

 
Essential components of DataOps Strategy and contributing team can be as follows.

ComponentsTeam
Source ManagementTech
Infrastructure as CodeTech
Access, Monitoring and ControlData & Tech
Continuous Integration and DeliveryTech
Machine Learning and AI DevelopmentData & Tech
MLOps and Deployment StrategyData & Tech
Data Quality and Validation FrameworkData & Ops
Workflow ManagementData & Ops
Data ModelingData & Ops
Business ContinuityOps & Tech
 

Mature cloud implementation often combines DataOps as TechOps+MLOps with view on data availability and actionable insights. a reference implementation of DataOps Platform (AWS) can be illustrated as below.



Advancement in ML / AI implementation, backed by cheap storage and packaged products is pushing more organization to move from operation driven to more data driven business strategies, in this roadmap - DataOps is certainly one of the most significant milestone to implement and practice.