How To Prepare Your Data For GenAI
Share- Nishadil
- January 16, 2024
- 0 Comments
- 5 minutes read
- 115 Views

Chief Product Officer at Nasuni . The recent events at OpenAI were extreme, but we should expect a little chaos as increasingly advanced AI models are deployed in the wild. There will be more board level debates and disagreements. Technical leaders will have conflicting visions of how to deploy, and we are already starting to see the external forces gathering strength.
There will be more and more pressure from a governance perspective as regulatory groups and independent agencies begin to shape and implement controls. Yet this chaos will not continue indefinitely. Generative AI (GenAI) services are popping up all over the place. While no one can say with certainty which particular AI tools will emerge as the ideal options for the enterprise, we do know that all of these solutions need data.
So, the question every business leader should be asking is: What can you do now to prepare your data for GenAI? On that front, I recommend implementing three fundamental strategies: 1. Assess. The first step is to assess the state and spread of your organization’s data. Think of this project as a kind of census.
You need to figure out how much data you have, its fundamental qualities and characteristics and even where it lives, from the geographic location to the technical storage systems. There are plenty of enterprise tools out there to help with these sorts of assessments. They will map out your data types, locations and relevance.
You’ll want to figure out how frequently different datasets have been accessed, and whether they are critical to your organization’s current work or completely outdated. This is not as simple as deleting or cleaning out older data, however, as decades old folders could contain hugely valuable project data that could spark insights, shape your current work or help refine your strategy.
This assessment will also help you understand which datasets have specific residence or sovereignty requirements, which is an important aspect of leveraging AI tools. 2. Consolidate. What you will likely find out during the assessment is that much of your unstructured file data is probably still anchored to specific locations.
These could be offices, data centers, factories, manufacturing plants or creative design centers. The nature of the operations at each location will vary depending on your business, but they all create the same overarching problem. If your entire corpus of data is distributed across multiple sites and stuck in these silos, your organization will not be able to get the most out of AI.
Again, AI services need data, and that data needs to be in one place if they are going to do their best work. So the next step will be consolidating all this data, likely in the cloud, the only storage medium that offers the effortless scale and cost structure to make global consolidation possible. Yet scale is not the only reason to consolidate in the cloud.
The initial AI services from Amazon Web Services (AWS), Google Cloud and Microsoft Azure are all cloud native tools. They run in the cloud, so you want your data to be in the cloud as well. Consolidating your data in the cloud renders it more usable, from the perspective of the AI solution. You will be able to generate more relevant insights because the tools will analyze all of your data, not just a specific subset at a specific location.
Imagine you are operating a large marketing agency, and one of your creative teams wants to pitch a new campaign for a beverage giant. As part of the ideation phase, they might want to leverage an AI tool to generate a new campaign from past campaigns for similar companies. If the associated files from all offices around the world are consolidated in one cloud volume, then one of your creatives could quickly find inspiration from previous work.
Otherwise, those ideas might as well be locked away in a forgotten file cabinet. 3. Secure. The process of assessing and consolidating your data is a perfect opportunity to strengthen and tighten data security. One of the basic tenets of data security is that you cannot rely on people to do the right thing.
In the age of ransomware, enterprises must rely on built in security, and this is your chance to switch to such a solution if you have not done so already. You want to make sure you have controls around your datasets and who can leverage them. The people using AI services must have the right and the authority to see and leverage that data in the first place.
Role based access control is certainly not a new priority in the enterprise data security world, but it is especially important in the age of GenAI. You do not want anyone leaking sensitive data into a large language model, whether they are doing so maliciously or inadvertently. During the assessment phase, you will need to map out whether there are compliance exceptions as well, so that when you apply your security rules, you know if certain datasets have sensitive information.
If you feed an AI service a collection of files that contain social security or credit card numbers, then you will be liable if anything were to be generated using that data. Your organization is still the steward of that data and it would be your fault for not having the right built in security tools and controls in place to identify and protect that sensitive information.
Conclusion These three steps are a good framework for preparing your enterprise in the months ahead. There is a great deal more to do, but the current chaos and AI madness will be sorted out. The technology is too powerful, the demand too great. These tools will be transformative for the enterprise, but if you hope to leverage them effectively, you must take the necessary steps to prepare your data today.
Forbes Technology Council is an invitation only community for world class CIOs, CTOs and technology executives. Do I qualify?.
Disclaimer: This article was generated in part using artificial intelligence and may contain errors or omissions. The content is provided for informational purposes only and does not constitute professional advice. We makes no representations or warranties regarding its accuracy, completeness, or reliability. Readers are advised to verify the information independently before relying on