Connect with us

Data

Let your organization benefit from the Weka File System

Published

on

Weka file system

“Data is the new source code,” as Ken Grohe, President & Chief Revenue Officer (CRO) at WekaIO, puts it in his trademark succinct manner, a sentiment he shared with me during a recent conversation on my podcast.

The true potential of your business is often hidden in the genetic code of your business’ ability to harness data. The question really is whether your current infrastructure is up to the challenge of processing data at ever-increasing efficiencies to drive business insights that can power everything from innovations to market opportunities, improve Return-on-Investment (ROI) and shorten Go-to-Market (GTM) cycles. 

For many of us, unfortunately, personal experience with high volume data computing automatically harks back to lingering muscle memories of frustrating latency issues, version control and overall poor computational performance. But the good news is that handling limitless data at higher capacities does not automatically equate to poor performance. 

Weka File System

The answer to high capacity performance computing for modern enterprise workloads under extreme conditions is pretty simple – a robust file system solution. A modern parallel file system is essential to get a ROI out of any data-intensive workloads at massive scales, such as artificial intelligence (AI) initiatives. Weka’s “Limitless Data Platform”, built on the Weka File System (WekaFS) is a software-defined architecture which allows organisations to remove hard limits on capacity scaling, and drive industry-leading performance and efficiencies in hyper scale storage. 

In my mind, data is fast-turning into an asset to be placed on balance sheets, and should have key performance indicators (KPIs) placed on it, and be managed like any other mission critical asset (such as, human resource, plant & equipment, cash & liquid “assets” etc.). To that point, a data-performance enabler like a software-defined file system can provide you with the competitive edge you need to seamlessly consolidate all your workloads across data silos and drive high performance.

As one of the leading solutions in this market segment, the Weka File System (WekaFS) addresses this by leveraging best in networking technologies like NVMe-oF, NVIDIA Mellanox InfiniBand, 100Gb Ethernet, as well as advanced computing technologies like GPU acceleration.

According to Ken, the WekaIO Limitless Data Platform is an agile, flexible, safe, secure solution and can deliver up to 75% reduction in storage cost per unit, as seen in the case study with Genomics England’s massive gene sequencing project to sequence 5 million genomes by 2023. 

With all of that in mind, I thought I would circle back and highlight for you, my top three takeaways from my conversation with Ken on our recent podcast and how they relate to these very issues from a business, technology and operational point of view.

Takeaway #1 

AI and Big Data initiatives are going to be the drivers for big business results in the next decade. WekaIO enables successful AI and Big Data outcomes at a fraction of storage and maintenance costs. For instance, Ken mentioned that it takes only half the staff to run WekaIO than it does to run any NAS, SAN or cloud-based solutions. From my perspective, however, WekaIO’s enterprise-ready and hybrid-cloud features enable it to transcend AI and Big Data use cases, and lends itself to back up and DR strategies, IT agility, and lower CapEx. 

There is naturally an expectation that WekaFS file system is a pathway to high performance computing on the public cloud, but the question that arises is “how do companies go about transitioning from existing NAS or SAN solutions and what benefits can they expect to see in the short and long-term”?. 

Ken and I discussed into this during our recently podcast conversation, but it struck me after we had recorded the show that we would certainly get enquiries from our listeners for more information, and indeed we did – so I reached out to Ken and asked him to expand on this, and here is what he had to say:

“We have found that legacy systems tend to be inefficient and ineffective in supporting today’s demanding data-intensive applications, often forcing customers to make compromises on simplicity, speed, or scale.

Legacy storage technologies tended to be purpose-built to solve different problems, but when implemented as a system, produced islands, or silos of storage. This, in turn rendered the data invisible to the applications which needed it, resulting in a big problem for data centers today.

To address this, Weka’s Limitless Data Platform has been designed to offer a cost-efficient storage system, that combines simplicity, speed, and scale – shattering the storage limitations which continues to constrain enterprises from achieving better business outcomes.”, Ken Grohe

Takeaway #2 

Perhaps the greatest technical differentiator I can see with the WekaIO solution, beyond obvious advantages in scalability and flexibility is its ability to ‘snapshot’ data instantly while allowing for updates, patches, testing or training to run without impacting the active ‘production’ copy of the file system. 

Being a skeptic, I couldn’t help but wonder what would happen if the system came under stress, say from the likes of either a cyberattack or extreme high I/O workloads. So, naturally, I again reached out to Ken and asked him if he could give us further insights beyond what we discussed on the podcast, around the key technical advantages the Weka file system (FS) offered in this context, and his reply was enlightening:

“The WekaIO Limitless Data Platform does not rely on traditional back-up and/or disaster recovery scenarios which can often be compromised by cyberattacks or extreme high I/O workloads, as you mentioned Dez. We work with them, but incrementally we embrace a ‘snap to object’ or ‘snap to cloud’ approach, which bypasses those antiquated dependencies, and adds durability.

As a result the desired workflow can be achieved instantaneously by leveraging cloud, object, or on-premise NVME-tier storage, in scenarios such as cyberattacks or performance / unavailability, within the safety and dependability of a unified namespace.”, Ken Grohe

Takeaway #3 

One of the biggest purported benefits of WekaIO solution, has been its ability to seamlessly integrate between on-premise, third party and cloud storage locations, from potentially small beginnings of tens of terabytes, to multi-petabyte scale, and to do so more cost effectively and infrastructure & human resource efficiently.

Given the difference in speeds inherent to each architecture type, I wondered if there might be potential performance bottlenecks or security issues which needed to be considered either from a design, implementation or operational points of view, as organisations attempted to assimilate decades old ‘frozen’ data on flash and disk with highly dynamic cloud storage.

This could be particularly relevant for data management challenges of large enterprises and government departments or agencies, or even financial institutions. Again I found myself realising the best person to answer this was Ken so once more I put the question to him, and his insights were once again on point:

“Dez to your point, a recent study from Forrester revealed that over 72% of all corporate data created is trapped in data islands. It is often marooned in separate SANs, NAS file shares, object storage, or even public or private / hybrid cloud platforms, in turn creating new islands, mostly never used.

Additionally, we have found that incompatible tools and manual copy procedures often make data sharing and data migration nearly impossible. We believe WekaIO’s enterprise-grade security, both in-flight and at-rest, provide the ideal data platform across any protocol and diverse workloads, while adhering to modern compliance policies.”, Ken Grohe

In a recent study IDC estimated that approx. 1.2 zettabytes (1.2 trillion gigabytes) of new data was created in the year 2010, up from 0.8 zettabytes the year before, and it has been estimated that the amount of the newly created data in 2020 was in the order of 44X to reach 35 zettabytes (35 trillion gigabytes).

With data usage growing at such an eye-watering fast pace, and massive scale, there can be little doubt that it is now time for enterprises of all sizes to start thinking about smart and efficient storage that can serve their data needs not just now, but help them capitalise on their data accrual five years down the line.

If you haven’t already tuned into the conversation I had with Ken recently on my podcast, please do via the link below, and I look forward to continuing this conversation with Ken and the team at WekaIO in further discussions soon.

Further Reading:

Data

Aerospike Appoints Martin James to Lead EMEA

Published

on

Martin James has been named vice president of Europe, the Middle East and Africa (EMEA) by Aerospike Inc., the pioneer in real-time data platforms. Martin joins Aerospike with 25 years of experience in the database industry.

Martin came from Percona, where he tripled the company’s revenue across EMEA and APAC. Prior to joining Percona, he managed enterprise sales at DataStax as regional vice president for northern Europe, achieving double-digit growth.

In today’s Right-Now Economy, Martin is in charge of promoting regional growth and satisfying client demand for Aerospike, and will create regional sales teams for Aerospike to meet the strict SLAs set by today’s data-driven businesses.

“Martin brings decades of experience to the Aerospike team. His broad experience in the UK/EMEA region and proven leadership ability will allow him to evangelise our Real-time Data Platform to businesses looking to modernise their data architectures. Now is an exciting time to lead the EMEA region as we extend our footprint across the UK and Europe.” said Jim Lodestro, CRO at Aerospike

Enterprises today depend on mission-critical real-time apps to accomplish business goals. James will promote the Aerospike Real-time Data Platform to companies looking to develop massive real-time applications with guaranteed sub-millisecond performance at gigabyte to petabyte scale.

Following the announcement of record sales of its Aerospike Real-time Data Platform in 2021, Aerospike recently also reported a record first half of 2022. The business tripled sales outside of North America, doubled anticipated 2020 growth, and quadrupled Aerospike Cloud Managed Service growth.

Aerospike have debuted two powerful products in the second quarter:

  • Aerospike Database 6, which natively supports JSON and JSONPath queries to help developers build large-scale document-based applications.
  • Aerospike SQL Powered by Starburst, which delivers massively parallel, complex SQL queries on petabyte-scale data stored in Aerospike.

Developers can now also test-drive Aerospike 6 with no setup required, in an improved Aerospike Developer Hub’s Code Sandbox, which also offers simple access to interactive tutorials, sample code snippets, and training.

Continue Reading

Data

The battle for your privacy – is it already lost?

Published

on

Spoiler: yes, it is – though there are things you can do which I’ll look at in a future blog. Your online activity is tracked in more ways than you know – and not just with cookies on web pages. Your telco service providers track you. Apps on your phone and tablet track you. Search engines, social media, smart speakers, your TV, and credit agencies all track you. That font on the web page? Tracking you. That Facebook logo you used to share a link or news article? It tracked you. So does WhatsApp1. You don’t use Facebook or Google? They’re still tracking you. Did you get an iPhone because Tim Cook says it’s private? There’s good news – leave it in the box turned off, and it is.

The stark fact is that in the digital world, you are nothing but a product to be sold, quite literally, to the highest bidder2.

The physical world isn’t much better

Out there in real life, things are better, right? Wrong. We mostly all carry our phones with us, and it becomes a personal town crier about our behaviour. Stores have what’s known as ‘beacons’ that connect to your phone via apps like Facebook. Ever visited a department store concession and then got loads of ads right after? What a coincidence! Nope, they knew you were there, and they think you’re a hot prospect, even if you looked and hated everything. The tracking continues with ‘free’ in-store WIFI.

But of course, you turned off WIFI, Bluetooth and location services before left your home, right? No, you didn’t? It’s not a surprise because you’ve been groomed not to. 

In addition, facial recognition and gait analysis AI are increasingly used, which are clearly more invasive than security cameras. Once the preserve of national security services, this tech is now well embedded in commercial organisations. In the case of Amazon’s checkout-free stores, they actually watch everything you put in your basket, so you can just walk out and be billed. Convenient, but how is that data used? I’m sure everyone who uses those stores has read the privacy policy and know already. No need to worry then…

Does it really matter?

I’m pretty sure that if someone physically followed you everywhere you went and watched what you did, taking detailed notes, you’d get pretty hacked-off with it. As it’s all digital and mostly hidden, we put up with it.

I can see the argument that ‘It’s just a computer running algorithms to serve me ads, so what’s there to worry about?’ It’s true, but there are also nefarious aspects to it. You need to consider the points below, in addition to that embarrassing ad served to your nan when she borrowed your tablet:

  • You could end up paying more for goods or services because of your profile. It’s not legal everywhere, and even where it is illegal it’s very difficult to police 
  • If the company holding data about you is hacked, it could be used for criminal purposes such as the theft of your identity, theft of your property or assets, or even to extort you
  • Compulsive spending, gambling addictions and other mental health issues could be fed by ads and content that follow you around the internet
  • Government and security agencies can access commercial data, which could lead to more invasive surveillance if your metadata reveals connections to people or groups deemed of interest, even if your own connection to them is innocent or accidental
  • Do you tend to get searched every time at the airport? It could be ‘pre-crime’ AI picking you out due to the digital trails leave3

It’s worth noting that these points don’t offer a complete picture, they’re just the tip of the iceberg.

OK, so what about privacy laws and regulations?

You’re probably thinking about the EU’s General Data Protection Regulation (GDPR), the California Consumer Protection Act (CCPA) and other similar laws that have spread around the world in recent years.

The truth is that the laws are there, but privacy violations aren’t policed at all in most cases – it’s up to us to tell regulators (or lawyers) after we’ve approached the offending organisation. Data breaches and well-researched cases brought by experts will get looked at of course4, but it depends where in the world you are.

Is it time to give in?

Keep fighting is my view, there is momentum out there. Privacy awareness is increasing, and even Google is changing – they will end tracking cookies in a few years. Don’t get too excited though, there’s lots of other tracking tech out there, which will only increase. Google are merely shifting position5, not stopping what they do. Cloud providers and thousands of SaaS companies already offer more tracking tech and personal data analytics services than you can imagine. And that’s before we get to data brokers who make data about you, their business.

Want to understand more? Check out the links below, and watch out for my next blog, where we’ll look at how you and your data are sold, and dive deeper into our world of creeping surveillance. 


Sources:

1: WhatsApp insist they don’t read your messages, but metadata about your contacts and usage is shared with other Meta companies, Facebook’s parent. Learn about metadata here:
https://www.youtube.com/watch?v=xP_e56DsymA

2: UK regulator says real-time bidding violates GDPR, Martech, June 2019
https://privacyinternational.org/examples-abuse/1981/pre-crime-software-border-guards

3: Pre-crime Software for Border Guards, Privacy International
https://privacyinternational.org/examples-abuse/1981/pre-crime-software-border-guards  

4: NOYB (None of Your Business) is a good example of legal expertise used to bring privacy cases with regulators:
https://noyb.eu/en 

5: Google’s cookie ban and FLoC, explained, Wired, May 2021: https://www.wired.co.uk/article/google-cookies-floc  

Continue Reading

Data

Sustainability: using Data, AI and IoT for good

Published

on

Data growth is always bad news, isn’t it?

You’d probably think all data growth is evil after my last two blogs1. I laid out how uncontrolled data growth was bad for your carbon footprint, bad for your risk exposure and bad for your budget. Unrestrained collection of personal data means it’s also bad for your privacy, too.

There’s an old saying ‘You can’t see the wood because of the trees’2, and this is all too often the case when it comes to data. We have so much of it, we can’t see or find the data that matters. Which, ironically, is a problem we won’t have for much longer with actual forests, given the way we’re working at deforestation.

Controlled and smart data growth can, however, be good for our planet. It already has been – we’d have wrecked the ozone layer without the satellite data collected decades ago that led to an unusually successful global effort. In the future our ability to collect and process even more data will be transformational, and we’ll absolutely need it to help us meet climate goals if we’re to sort this mess out.

The main reason we know where the climate emergency will take us is down to the digital modelling3 of our world. Due to our ability to collect ever more granular data, these models have got better over time. It’s allowed us to shift from a debateable ‘we think’ to a level of certainty that we can now say ‘there’s no doubt’. And digital models are driving change everywhere, in lots of positive ways.

Twins – but not the Schwarzenegger and DeVito kind

If you’ve ever seen the film Twins, where the two actors above played genetically engineered twins, you might think that ‘digital twins’ bear as much resemblance. You’d be wrong.

Machine learning and AI’s ability to process data has progressed so much in a relatively short time. We can use it to drive engineering efficiencies that improve reliability and extend the working life of all kinds of components. Aircraft engines once had 8-10 sensors, now they have many thousands, and data collected from them leads to all sorts of improvements. In a similar fashion, trains can create multiple terabytes of data in a relatively short space of time. Sensor tech has changed too – it’s not just about temperature, pressure, motion, or speed anymore, it’s now also about what machines can ‘see’, too.

This allows us to design better stadiums, model more efficient cities and transport systems, and make them smarter. Combining all these sensors with reliable networks means we can understand how events or extremes applied in the digital world, to a twin, will play out in the physical world. And what’s even more exciting is the capacity to use AI to do this in real-time, allowing us to react and avoid dangerous or wasteful situations arising in the first place.

It’s not all about avoiding a disaster or an extreme situation in a big, smart city though. AI running all the time in the background will have an increasing direct benefit on sustainability, pretty much everywhere. Things such as energy efficiency, optimised use of resources and limiting waste production can work in buildings, manufacturing plants, hospitals, Universities, or pretty much anywhere. Google famously pointed its own AI tech at datacentre cooling4 and saved 40% on its cooling bills, which produced a corresponding reduction in CO2.

IoT and 5G – they’re not just hype

While my angle in this blog centres on data, data relies on many components before it can be collected and used, and the two biggest deals here are the Internet of Things (IoT) and 5G. I can almost hear many of you thinking ‘5G? How many folks really have access to that???’

Right now, the real deal about 5G for many of us is the infrastructure changes to support it – the cables and switches that get the data to and from the 5G masts. It’s not just had an incremental upgrade; it’s all getting a mammoth one. This extra capacity is what’s allowing AI and sensor data to help us radically change things, and it’s happening even if you can’t get (or don’t use) a 5G signal yourself just yet. Many cities are already smarter than you think and 5G will allow them to get smarter. Really smart.

Sustainability doesn’t have to be a cost centre

There’s a lot of negative talk about how much it costs to be sustainable. It will vary by what business you’re in of course – there will be losers. That said, I’m a great believer that every organisation has a chance to change and for many, sustainability will have a cash benefit, not a cost. So, sticking to my data theme, what can you do?

  • Think about reducing/expiring unnecessary data – it all has an impact and a potentially much bigger cost/risk profile (see here)
  • Some (whom I disagree with) say that data is ‘the new oil’. Sadly, if you don’t think about where you store it, it could be powered by the old black stuff (or another fossil fuel)
  • Think about shifting it to the cloud. The big cloud providers are mostly powered by renewables, and reach a level of efficiency that most orgs can’t get close to themselves
  • In closing I’ll say that we’re heading for exciting changes in this area, and while AI, IoT and 5G get all the hype, our old friend data is what’s making it all happen. And the best part? For those of you so inclined, you can play a part too. If you want to experiment with actual data as a Citizen Data Scientist, there are many open-source libraries you can access – often published by higher education establishments or local governments (even smart city data) and by commercial organisations. As a commercial entity, you could even tap into this community yourself5.

    For those less analytically inclined, there is an ever-growing number of ways to participate in Citizen Science6 and play your part as a (really) smart sensor – something your kids can enjoy too. Data isn’t always good or useful, but the good stuff has the possibility to be priceless to us all.

    Sources:

    1: 1st blog: https://elnion.com/data-sustainability-and-fixing-the-pain-you-didnt-know-you-had/ , 2nd blog https://elnion.com/when-ransomware-is-also-leakware-what-can-you-do/

    2: Changed slightly for ease of understanding, the actual saying is ‘You can’t see the wood for the trees’ https://www.collinsdictionary.com/dictionary/english/cant-see-the-wood-for-the-trees (link also explains the US variation)

    3: Diagnosing Earth: the science behind the IPCC’s upcoming climate report, Aug 2021 https://www.nature.com/articles/d41586-021-02150-0

    4: AI for data center cooling: More than a pipe dream, Datacenter Dynamics, April 2021

    https://www.datacenterdynamics.com/en/analysis/ai-for-data-center-cooling-more-than-a-pipe-dream/

    5: How to Use Citizen Data Scientists to Maximize Your D&A Strategy, Gartner, June 2021

    https://www.gartner.com/smarterwithgartner/how-to-use-citizen-data-scientists-to-maximize-your-da-strategy

    6: Citizen Science Provides Useful Data For Sustainable Development Goals, International Study Shows, Forbes, July 2020, https://www.forbes.com/sites/jeffkart/2020/07/15/citizen-science-provides-useful-data-for-sustainable-development-goals-international-study-shows/

    Continue Reading

    Trending On Elnion

    Copyright © 2021 ELNION ONLINE - All rights reserved.