“Data is the new source code,” as Ken Grohe, President & Chief Revenue Officer (CRO) at WekaIO, puts it in his trademark succinct manner, a sentiment he shared with me during a recent conversation on my podcast.
The true potential of your business is often hidden in the genetic code of your business’ ability to harness data. The question really is whether your current infrastructure is up to the challenge of processing data at ever-increasing efficiencies to drive business insights that can power everything from innovations to market opportunities, improve Return-on-Investment (ROI) and shorten Go-to-Market (GTM) cycles.
For many of us, unfortunately, personal experience with high volume data computing automatically harks back to lingering muscle memories of frustrating latency issues, version control and overall poor computational performance. But the good news is that handling limitless data at higher capacities does not automatically equate to poor performance.
Weka File System
The answer to high capacity performance computing for modern enterprise workloads under extreme conditions is pretty simple – a robust file system solution. A modern parallel file system is essential to get a ROI out of any data-intensive workloads at massive scales, such as artificial intelligence (AI) initiatives. Weka’s “Limitless Data Platform”, built on the Weka File System (WekaFS) is a software-defined architecture which allows organisations to remove hard limits on capacity scaling, and drive industry-leading performance and efficiencies in hyper scale storage.
In my mind, data is fast-turning into an asset to be placed on balance sheets, and should have key performance indicators (KPIs) placed on it, and be managed like any other mission critical asset (such as, human resource, plant & equipment, cash & liquid “assets” etc.). To that point, a data-performance enabler like a software-defined file system can provide you with the competitive edge you need to seamlessly consolidate all your workloads across data silos and drive high performance.
As one of the leading solutions in this market segment, the Weka File System (WekaFS) addresses this by leveraging best in networking technologies like NVMe-oF, NVIDIA Mellanox InfiniBand, 100Gb Ethernet, as well as advanced computing technologies like GPU acceleration.
According to Ken, the WekaIO Limitless Data Platform is an agile, flexible, safe, secure solution and can deliver up to 75% reduction in storage cost per unit, as seen in the case study with Genomics England’s massive gene sequencing project to sequence 5 million genomes by 2023.
With all of that in mind, I thought I would circle back and highlight for you, my top three takeaways from my conversation with Ken on our recent podcast and how they relate to these very issues from a business, technology and operational point of view.
AI and Big Data initiatives are going to be the drivers for big business results in the next decade. WekaIO enables successful AI and Big Data outcomes at a fraction of storage and maintenance costs. For instance, Ken mentioned that it takes only half the staff to run WekaIO than it does to run any NAS, SAN or cloud-based solutions. From my perspective, however, WekaIO’s enterprise-ready and hybrid-cloud features enable it to transcend AI and Big Data use cases, and lends itself to back up and DR strategies, IT agility, and lower CapEx.
There is naturally an expectation that WekaFS file system is a pathway to high performance computing on the public cloud, but the question that arises is “how do companies go about transitioning from existing NAS or SAN solutions and what benefits can they expect to see in the short and long-term”?.
Ken and I discussed into this during our recently podcast conversation, but it struck me after we had recorded the show that we would certainly get enquiries from our listeners for more information, and indeed we did – so I reached out to Ken and asked him to expand on this, and here is what he had to say:
“We have found that legacy systems tend to be inefficient and ineffective in supporting today’s demanding data-intensive applications, often forcing customers to make compromises on simplicity, speed, or scale.
Legacy storage technologies tended to be purpose-built to solve different problems, but when implemented as a system, produced islands, or silos of storage. This, in turn rendered the data invisible to the applications which needed it, resulting in a big problem for data centers today.
To address this, Weka’s Limitless Data Platform has been designed to offer a cost-efficient storage system, that combines simplicity, speed, and scale – shattering the storage limitations which continues to constrain enterprises from achieving better business outcomes.”, Ken Grohe
Perhaps the greatest technical differentiator I can see with the WekaIO solution, beyond obvious advantages in scalability and flexibility is its ability to ‘snapshot’ data instantly while allowing for updates, patches, testing or training to run without impacting the active ‘production’ copy of the file system.
Being a skeptic, I couldn’t help but wonder what would happen if the system came under stress, say from the likes of either a cyberattack or extreme high I/O workloads. So, naturally, I again reached out to Ken and asked him if he could give us further insights beyond what we discussed on the podcast, around the key technical advantages the Weka file system (FS) offered in this context, and his reply was enlightening:
“The WekaIO Limitless Data Platform does not rely on traditional back-up and/or disaster recovery scenarios which can often be compromised by cyberattacks or extreme high I/O workloads, as you mentioned Dez. We work with them, but incrementally we embrace a ‘snap to object’ or ‘snap to cloud’ approach, which bypasses those antiquated dependencies, and adds durability.
As a result the desired workflow can be achieved instantaneously by leveraging cloud, object, or on-premise NVME-tier storage, in scenarios such as cyberattacks or performance / unavailability, within the safety and dependability of a unified namespace.”, Ken Grohe
One of the biggest purported benefits of WekaIO solution, has been its ability to seamlessly integrate between on-premise, third party and cloud storage locations, from potentially small beginnings of tens of terabytes, to multi-petabyte scale, and to do so more cost effectively and infrastructure & human resource efficiently.
Given the difference in speeds inherent to each architecture type, I wondered if there might be potential performance bottlenecks or security issues which needed to be considered either from a design, implementation or operational points of view, as organisations attempted to assimilate decades old ‘frozen’ data on flash and disk with highly dynamic cloud storage.
This could be particularly relevant for data management challenges of large enterprises and government departments or agencies, or even financial institutions. Again I found myself realising the best person to answer this was Ken so once more I put the question to him, and his insights were once again on point:
“Dez to your point, a recent study from Forrester revealed that over 72% of all corporate data created is trapped in data islands. It is often marooned in separate SANs, NAS file shares, object storage, or even public or private / hybrid cloud platforms, in turn creating new islands, mostly never used.
Additionally, we have found that incompatible tools and manual copy procedures often make data sharing and data migration nearly impossible. We believe WekaIO’s enterprise-grade security, both in-flight and at-rest, provide the ideal data platform across any protocol and diverse workloads, while adhering to modern compliance policies.”, Ken Grohe
In a recent study IDC estimated that approx. 1.2 zettabytes (1.2 trillion gigabytes) of new data was created in the year 2010, up from 0.8 zettabytes the year before, and it has been estimated that the amount of the newly created data in 2020 was in the order of 44X to reach 35 zettabytes (35 trillion gigabytes).
With data usage growing at such an eye-watering fast pace, and massive scale, there can be little doubt that it is now time for enterprises of all sizes to start thinking about smart and efficient storage that can serve their data needs not just now, but help them capitalise on their data accrual five years down the line.
If you haven’t already tuned into the conversation I had with Ken recently on my podcast, please do via the link below, and I look forward to continuing this conversation with Ken and the team at WekaIO in further discussions soon.
- My podcast conversation with Ken Grohe: http://bit.ly/conversations-with-dez-podcast-featuring-ken-grohe
The battle for your privacy – is it already lost?
Spoiler: yes, it is – though there are things you can do which I’ll look at in a future blog. Your online activity is tracked in more ways than you know – and not just with cookies on web pages. Your telco service providers track you. Apps on your phone and tablet track you. Search engines, social media, smart speakers, your TV, and credit agencies all track you. That font on the web page? Tracking you. That Facebook logo you used to share a link or news article? It tracked you. So does WhatsApp1. You don’t use Facebook or Google? They’re still tracking you. Did you get an iPhone because Tim Cook says it’s private? There’s good news – leave it in the box turned off, and it is.
The stark fact is that in the digital world, you are nothing but a product to be sold, quite literally, to the highest bidder2.
The physical world isn’t much better
Out there in real life, things are better, right? Wrong. We mostly all carry our phones with us, and it becomes a personal town crier about our behaviour. Stores have what’s known as ‘beacons’ that connect to your phone via apps like Facebook. Ever visited a department store concession and then got loads of ads right after? What a coincidence! Nope, they knew you were there, and they think you’re a hot prospect, even if you looked and hated everything. The tracking continues with ‘free’ in-store WIFI.
But of course, you turned off WIFI, Bluetooth and location services before left your home, right? No, you didn’t? It’s not a surprise because you’ve been groomed not to.
Does it really matter?
I’m pretty sure that if someone physically followed you everywhere you went and watched what you did, taking detailed notes, you’d get pretty hacked-off with it. As it’s all digital and mostly hidden, we put up with it.
I can see the argument that ‘It’s just a computer running algorithms to serve me ads, so what’s there to worry about?’ It’s true, but there are also nefarious aspects to it. You need to consider the points below, in addition to that embarrassing ad served to your nan when she borrowed your tablet:
- You could end up paying more for goods or services because of your profile. It’s not legal everywhere, and even where it is illegal it’s very difficult to police
- If the company holding data about you is hacked, it could be used for criminal purposes such as the theft of your identity, theft of your property or assets, or even to extort you
- Compulsive spending, gambling addictions and other mental health issues could be fed by ads and content that follow you around the internet
- Government and security agencies can access commercial data, which could lead to more invasive surveillance if your metadata reveals connections to people or groups deemed of interest, even if your own connection to them is innocent or accidental
- Do you tend to get searched every time at the airport? It could be ‘pre-crime’ AI picking you out due to the digital trails leave3
It’s worth noting that these points don’t offer a complete picture, they’re just the tip of the iceberg.
OK, so what about privacy laws and regulations?
You’re probably thinking about the EU’s General Data Protection Regulation (GDPR), the California Consumer Protection Act (CCPA) and other similar laws that have spread around the world in recent years.
The truth is that the laws are there, but privacy violations aren’t policed at all in most cases – it’s up to us to tell regulators (or lawyers) after we’ve approached the offending organisation. Data breaches and well-researched cases brought by experts will get looked at of course4, but it depends where in the world you are.
Is it time to give in?
Keep fighting is my view, there is momentum out there. Privacy awareness is increasing, and even Google is changing – they will end tracking cookies in a few years. Don’t get too excited though, there’s lots of other tracking tech out there, which will only increase. Google are merely shifting position5, not stopping what they do. Cloud providers and thousands of SaaS companies already offer more tracking tech and personal data analytics services than you can imagine. And that’s before we get to data brokers who make data about you, their business.
Want to understand more? Check out the links below, and watch out for my next blog, where we’ll look at how you and your data are sold, and dive deeper into our world of creeping surveillance.
1: WhatsApp insist they don’t read your messages, but metadata about your contacts and usage is shared with other Meta companies, Facebook’s parent. Learn about metadata here:
2: UK regulator says real-time bidding violates GDPR, Martech, June 2019
3: Pre-crime Software for Border Guards, Privacy International
4: NOYB (None of Your Business) is a good example of legal expertise used to bring privacy cases with regulators:
5: Google’s cookie ban and FLoC, explained, Wired, May 2021: https://www.wired.co.uk/article/google-cookies-floc
Sustainability: using Data, AI and IoT for good
Data growth is always bad news, isn’t it?
You’d probably think all data growth is evil after my last two blogs1. I laid out how uncontrolled data growth was bad for your carbon footprint, bad for your risk exposure and bad for your budget. Unrestrained collection of personal data means it’s also bad for your privacy, too.
There’s an old saying ‘You can’t see the wood because of the trees’2, and this is all too often the case when it comes to data. We have so much of it, we can’t see or find the data that matters. Which, ironically, is a problem we won’t have for much longer with actual forests, given the way we’re working at deforestation.
Controlled and smart data growth can, however, be good for our planet. It already has been – we’d have wrecked the ozone layer without the satellite data collected decades ago that led to an unusually successful global effort. In the future our ability to collect and process even more data will be transformational, and we’ll absolutely need it to help us meet climate goals if we’re to sort this mess out.
The main reason we know where the climate emergency will take us is down to the digital modelling3 of our world. Due to our ability to collect ever more granular data, these models have got better over time. It’s allowed us to shift from a debateable ‘we think’ to a level of certainty that we can now say ‘there’s no doubt’. And digital models are driving change everywhere, in lots of positive ways.
Twins – but not the Schwarzenegger and DeVito kind
If you’ve ever seen the film Twins, where the two actors above played genetically engineered twins, you might think that ‘digital twins’ bear as much resemblance. You’d be wrong.
Machine learning and AI’s ability to process data has progressed so much in a relatively short time. We can use it to drive engineering efficiencies that improve reliability and extend the working life of all kinds of components. Aircraft engines once had 8-10 sensors, now they have many thousands, and data collected from them leads to all sorts of improvements. In a similar fashion, trains can create multiple terabytes of data in a relatively short space of time. Sensor tech has changed too – it’s not just about temperature, pressure, motion, or speed anymore, it’s now also about what machines can ‘see’, too.
This allows us to design better stadiums, model more efficient cities and transport systems, and make them smarter. Combining all these sensors with reliable networks means we can understand how events or extremes applied in the digital world, to a twin, will play out in the physical world. And what’s even more exciting is the capacity to use AI to do this in real-time, allowing us to react and avoid dangerous or wasteful situations arising in the first place.
It’s not all about avoiding a disaster or an extreme situation in a big, smart city though. AI running all the time in the background will have an increasing direct benefit on sustainability, pretty much everywhere. Things such as energy efficiency, optimised use of resources and limiting waste production can work in buildings, manufacturing plants, hospitals, Universities, or pretty much anywhere. Google famously pointed its own AI tech at datacentre cooling4 and saved 40% on its cooling bills, which produced a corresponding reduction in CO2.
IoT and 5G – they’re not just hype
While my angle in this blog centres on data, data relies on many components before it can be collected and used, and the two biggest deals here are the Internet of Things (IoT) and 5G. I can almost hear many of you thinking ‘5G? How many folks really have access to that???’
Right now, the real deal about 5G for many of us is the infrastructure changes to support it – the cables and switches that get the data to and from the 5G masts. It’s not just had an incremental upgrade; it’s all getting a mammoth one. This extra capacity is what’s allowing AI and sensor data to help us radically change things, and it’s happening even if you can’t get (or don’t use) a 5G signal yourself just yet. Many cities are already smarter than you think and 5G will allow them to get smarter. Really smart.
Sustainability doesn’t have to be a cost centre
There’s a lot of negative talk about how much it costs to be sustainable. It will vary by what business you’re in of course – there will be losers. That said, I’m a great believer that every organisation has a chance to change and for many, sustainability will have a cash benefit, not a cost. So, sticking to my data theme, what can you do?
In closing I’ll say that we’re heading for exciting changes in this area, and while AI, IoT and 5G get all the hype, our old friend data is what’s making it all happen. And the best part? For those of you so inclined, you can play a part too. If you want to experiment with actual data as a Citizen Data Scientist, there are many open-source libraries you can access – often published by higher education establishments or local governments (even smart city data) and by commercial organisations. As a commercial entity, you could even tap into this community yourself5.
For those less analytically inclined, there is an ever-growing number of ways to participate in Citizen Science6 and play your part as a (really) smart sensor – something your kids can enjoy too. Data isn’t always good or useful, but the good stuff has the possibility to be priceless to us all.
2: Changed slightly for ease of understanding, the actual saying is ‘You can’t see the wood for the trees’ https://www.collinsdictionary.com/dictionary/english/cant-see-the-wood-for-the-trees (link also explains the US variation)
3: Diagnosing Earth: the science behind the IPCC’s upcoming climate report, Aug 2021 https://www.nature.com/articles/d41586-021-02150-0
4: AI for data center cooling: More than a pipe dream, Datacenter Dynamics, April 2021
5: How to Use Citizen Data Scientists to Maximize Your D&A Strategy, Gartner, June 2021
6: Citizen Science Provides Useful Data For Sustainable Development Goals, International Study Shows, Forbes, July 2020, https://www.forbes.com/sites/jeffkart/2020/07/15/citizen-science-provides-useful-data-for-sustainable-development-goals-international-study-shows/
Data sustainability, and fixing the pain you didn’t know you had
The search to make data sustainable
I was recently researching some stats on IT and data sustainability related to the UN’s 17 Sustainable Development Goals (SDGs). While looking, a social post led me to a few sites that focus on SDG 3: Good Health and Wellbeing.
The sites included stories showing the results of operations that are commonplace in the first world but are sadly much less common in many poorer countries. They showed/recounted the moment when the bandages come off, and the person smiles. It’s a truly special smile that only comes when someone’s chronic pain is gone, or a core faculty such as sight or mobility returns.
Due to the fact I was searching for sustainability information on IT and data, it got me thinking: many businesses could have a ‘bandage removal moment’ of their own if they properly got to grips with their data.
Now, before I go any further, I want to stress that I’m not comparing an individual’s suffering to a business problem, I really am not. There is a correlation though, and an opportunity to reduce your carbon footprint, which matters to everyone.
Uncontrolled data growth: a sustainability blind spot
ICT and particularly data have a huge carbon footprint. If ICT was a country, calculations indicate1 it would fall somewhere between Germany and Japan2 in terms of CO2 output, with the highest estimates even eclipsing the output of Japan. This means your data has an impact on us all, but mostly on poorer nations, as highlighted by representations at COP26.
The data pain that I alluded to is uncontrolled growth – something most midsize to large organisations suffer from – which is a creeping condition that’s rarely dealt with until it becomes critical. Dealing with it early has many benefits, even if you do put your finances before saving the planet.
Why is data painful & what are the symptoms?
If you think about it, much of this is obvious, but it’s become normalised. There’s the high cost of enterprise arrays, and storage upgrades come around all too quickly. Cloud storage costs rise. There’s the increased burden of data governance and compliance. Security issues continue, with the associated risks of a data breach. Then there are the backup and replication costs, of which an increasing number are hidden due to them being a tick-box option lumped-in to the cost of cloud services (and often fail in terms of value and meeting retention/recovery needs). Nor can we forget disaster recovery (DR) where costs can be substantial.
Too much data also leads to a lack of visibility. Finding the right data can be hard. Finding quality data is harder still. I can’t tell you the number of times I’ve seen companies put projects in place to collect and store data that they don’t realise they already have. Unsurprisingly, data availability and quality is #2 on the list of reasons analytics and AI projects fail in the finance industry3, and I’m confident you’ll find similar stats whatever sector you work in.
Why does it happen?
In most cases, data ownership is a root cause. It’s either no-one’s, someone else’s, everyone’s or ‘just mine’. Not having an organisational owner for all data means that no responsibility equals few real controls. Then there is a lack of training on data management, which allows regular human nature to flourish – hoarding being the worst of many ills. Always remember – data value peaks and troughs, but risk (mostly) remains constant – particularly in our increasingly regulated world.
Once a data growth problem is bedded in, it becomes too big to deal with and generally sits obstinately at the bottom of the ‘too hard’ pile. Until, that is, it becomes unsustainable from a business perspective, and action must be taken.
Business vision restored
Imagine for a moment that you get a grip on your data – what happens? Several good things:
- Reduced risk – a smaller attack surface, tighter security and less chance of compliance failures, fines, and brand damage
- Savings on all the storage challenges mentioned above, with big wins in storage costs/cloud billing, plus faster recovery times and lower DR costs
- Your carbon footprint will be lower
Perhaps the biggest business win though, is opportunity. With a handle on all your data, transformation plans can become a reality much faster, so you’ll be delighting customers and shareholders alike. Contrary to popular belief less data, not more, will make your business smarter and more agile. You’ll also get to smile (no bandage required) in the knowledge that you’re helping to save the planet.
What can you do?
Without a sound case or a compelling event, your data problems will stay firmly at the bottom of the pile, so you need to do several things:
- Look at the numbers – start with unstructured data as that is often where the problem lies – it could be up to 80% of your total. Be sure to poke around for hidden costs
- Include every aspect – data management costs, governance costs (inc. risk), other risk factors and importantly, the cost of missed opportunity
- Understand the opportunity of data – it’s transformational (AI/ML etc.) – but only if you can find the data of the right quality. Investigate data project delays/failures in this area
- If you’re talking to the board, reference bottom line savings, ransomware, compliance fines/data breaches, and subsequent reputational damage
If you’re growing, you can avoid a lot of the potential problems by acting early. There’s no magic bullet – it really is a people, process and technology issue. If you’re an enterprise, you must start with discovery. Profile your data to find out how much of a mess you’re in – after that, automation is your friend but be careful… there’s a lot of technology snake oil out there.
If you found this blog interesting or useful, then do like I did and wonder over to some causes that make a real difference to real lives. As I’ve talked about ‘that special smile’, the Smile Train does exactly that kind of work, and Cure Blindness brings smiles by returning people’s sight. With a focus on carbon emissions recently at COP26, hopefully this has served as a reminder that sustainability4 comes in many forms. Wouldn’t it be great if you reduced your emissions and helped these charities? After all, every positive action has an impact!
1: Emissions from computing and ICT could be worse than previously thought, Science Daily, Sept 2021 https://www.sciencedaily.com/releases/2021/09/210910121715.htm
2: Carbon Emissions by Country, Worldometer, 2019 https://www.worldometers.info/co2-emissions/co2-emissions-by-country/
3: 21 Top AI Adoption Challenges for the Finance Industry, Analytics Week https://analyticsweek.com/21-top-ai-adoption-challenges-for-the-finance-industry/
4: SDG 3: Ensure healthy lives and promote well-being for all at all ages, United Nations, 2021 https://unstats.un.org/sdgs/report/2021/goal-03/
Data Professionals Ready to Head Back to In-Person Conferences
Is complexity a greater challenge than cybersecurity? Here’s how to counter both.
Huawei applies R&D smarts to global decarbonisation challenge
Discussion with James Canham-Ash, Head of EMEA Comms, Manhattan Associates￼
Next-Gen Networks & 5G, Facilitating Enterprise Business Transformation
Hybrid Cloud Patterns, VMware Cloud on AWS: Evolve Event 1
Trending On Elnion
- Supply Chain11 months ago
Discussion with James Canham-Ash, Head of EMEA Comms, Manhattan Associates￼
- 5G10 months ago
Next-Gen Networks & 5G, Facilitating Enterprise Business Transformation
- Cloud1 year ago
Hybrid Cloud Patterns, VMware Cloud on AWS: Evolve Event 1
- Telco & Mobile1 year ago
Discussion with Stacey Marx, President, National Business & Channels, AT&T
- Digital Enterprise1 year ago
NextGen Networks Transforms Enterprise Business: CXOCyience 2￼
- Cloud1 year ago
Changing Face of eCommerce: Virtual Panel – Supply Chain Insights
- Retail9 months ago
Brian Townshend, GM Omni Retail, Super Retail Group, Retail Leaders Forum 2021
- Data8 months ago
GDPR Adequacy Decision of UK Aims to Focus on Innovation over Privacy