Data visualization using LLM with LIDA from Microsoft
With advancements in LLM, we have several LLM technologies capable of efficient data visualization at our disposal. In this article, we talk about how LIDA can be the future of data visualization, powering business intelligence for dynamic decision making.
What is LIDA
LIDA is a tool from Microsoft that lets users generate automatic grammar-agnostic visualization and infographics using LLMs.
LIDA Architecture:
How does LIDA work
The LIDA architecture contains 4 components.
- Summarizer: This component or module creates a natural language summary of data using column names, numbers, mean, median, mode, data types, etc., appropriately. This is done in two stages while taking care of LLM hallucinations by augmenting it with the ground truth data. Thus, the goal of the summarizer is to produce an information-dense but compact summary for a given dataset that is useful as a grounding context for visualization tasks. A useful context is defined as one that contains information, and analysts would need to understand the dataset and the tasks that can be performed on it.
- Goal explorer: With respect to the summary generated by the summarizer module, this module creates multiple questions or hypotheses for the data using an LLM, an appropriate visualization for the hypothesis, and a rationale for it. These questions can be completely left to the LLM, or a persona can be assigned to the LLM or custom questions can be given.
- Viz Generator: The Viz generator module mainly does 3 operations: a code scaffold constructor, which basically imports the programming language and visualization syntax grammar dependencies, for example, python and seaborn. Next, we have a code generator and code executor. The code generator takes a scaffold, a data summary, and a visualization goal and generates n visualizations for all goals and code executor implements several filter mechanisms that choose the correct code that can be compiled.
The viz generator also implements a refinement module that uses natural language to refine the generated visualizations, and it also has an LLM to double check the code for the visualizations, which are checked on multiple dimensions, and it also gives alternate visualization recommendations.
- Infographics: This module is tasked with generating stylized graphics based on output from the Viz Generator module. It implements a library of visual styles described in Natural Language that are applied directly to visualization images. These styles are applied in generating infographics using the text-conditioned image-to-image generation capabilities of diffusion models.
LIDA Simplified
A dataset taken as input flows in the following manner
- Natural language is created for describing and summarizing data
- Either custom question or LLM generated questions are taken to create visualization goals
- Special coding LLMs are used to generate code and visualizations
- Special Image LLMs are used to refine the visualizations
- Natural language input is taken for any alternate visualizations expected by the user
LIDA is poised to revolutionize the future of data visualization by seamlessly merging the power of natural language processing with advanced visual analytics. As LLMs continue to evolve, their integration into visualization tools enables a more intuitive, conversational approach to data interaction. Imagine creating and modifying visualizations by simply speaking or typing commands such as, “Show me the sales growth over the last three years, segmented by region” or “Filter this chart to highlight only the top 10 performing products.” This eliminates the steep learning curve associated with traditional BI tools, making data analysis accessible to non-technical users.
When combined with robust infrastructures like data lakes or data warehouses, LIDA-powered visualizations can tap into vast and diverse datasets in real time, offering a higher degree of flexibility and depth. This capability allows organizations to generate dynamic insights quickly, enabling data-driven decisions at an unprecedented scale.
The synergy between LIDA, LLMs, and advanced data storage systems holds the potential to transform organizational performance profoundly. By automating complex data workflows and uncovering actionable insights effortlessly, businesses can significantly reduce operational costs, optimize resources, and uncover new revenue opportunities. From streamlining supply chains to personalizing customer experiences, the applications are virtually limitless.
As these technologies continue to mature, the future of data visualization is not just about creating static charts but building interactive, responsive, and predictive environments that empower organizations to thrive in a data-driven world.
For more information about data lake implementation read our thoughts and work using the following links.
For more AI use cases, get in touch with us at
For computer vision specific use cases get in touch with us at
References: