Strata-Hadoop World pioneer, Martin Hall, giving this year’s keynote
Directly following Martin Hall’s thought-provoking keynote (‘Collaboration and openness drive innovation in artificial intelligence’) at this year’s Strata-Hadoop World NYC, Hall was quickly ushered off to our private press room to elaborate on some of the finer points behind what he’d shared on stage. It was during this session that Hall succinctly described the larger handicaps that hinder progress in big data, and a ground-breaking technology that’s available to overcome them.
“I believe that big data and data analytics are still disruptive, but it is excessively cumbersome to make things happen with them.”
Issues that (still) keep us from greatness
According to Martin Hall, the 3 common handicaps to big data initiatives are:
- The number of challenges and difficulties that must be overcome when deploying big data and data analytics solutions often exceed what a business is able to take on.
- Valuable data science models and insights are getting siloed within organizations. If the models don’t get seen and used, they can’t generate value, no matter how great their potential.
- People with data skills are also getting siloed within organizations. Without collaboration in an open environment, data professionals are many times replicating the same jobs that have already been completed by other data professionals at their same place of business. This represents a huge loss of time and resources.
“There is a relationship between the value of the data and the value of the analytics that are built off of it. So, you have to be able to get data into your system”
Enough about problems, let’s focus on overcoming them
With SOOO many big data analytics solutions on the market, just the thought of choosing technologies and platforms is completely overwhelming. The good news is that there is a single open-source (read: free) collaborative platform, available immediately from GitHub, that you can use today to get a good start on overcoming said big data deployment challenges. That platform is called Trusted Analytics Platform, and here’s how it promises to revitalize your organization’s big data analytics initiatives.
TAP is designed to integrate all of the most popular big data technologies, including Hadoop HDFS, MapReduce, Spark, MongoDB, Cassandra, Influx DB, Couch DB, Postgres SQL, and much more. Once you have your data technologies configured and integrated into TAP, it’s fast and super simple to move data resources around or pull data from sources as needed. Martin Hall made the brilliant point that “there is a relationship between the value of the data and the value of the analytics that are built off of it.” Raw big data is low value “so, you have to be able to get data into your system” quickly. TAP has built-in integration for Flume, Sqoop, and Kafka for easy and fast data ingestion.
From a data scientist’s perspective, TAP’s Analytics Toolkit (ATK) makes data science A LOT faster with distributed processing so that you can run algorithms on many cores simultaneously. Although you can program the ATK with Python, on the back-end code is actually transformed into Scala and run on Spark. 🙂 TAP also enables data science “workflows” that can be reproduced and reused by other data scientists, to solve problems in many different verticals. Lastly, the built-in integration between connected technologies on TAP means that you can get what you need from other components without having to go in and work with the systems manually – for example, you can pull data straight from the HDFS using Python within a simple Jupyter notebook.
From the photo above, you can see that there are plenty of open-source technologies that you can integrate with TAP, but it doesn’t stop there. If you’re working with a big data technology that’s not supported in native TAP, the platform’s simple plug-and-play integration piece allows for custom technology add-ons. In this way, TAP continually evolves and expands as a platform, all while meeting increasingly specific deployment requirements.
Personally, one of my favorite things about TAP is that it’s open-source and it integrates open-source technologies. The way TAP integrates open-source technologies only accelerates the time-to-value of complex big data initiatives. Most of the big data platforms I’ve seen simply don’t play nice with others. TAP creates an environment where otherwise competing solutions can be used to supplement and enhance each other’s effectiveness. TAP is not about replacement. It’s an additive stack.
As sure as TAP integrates this diverse set of big data technologies, it’s also provisioned to meet the security requirements of each and every technology component it integrates. In this way, the platform doubles-up and triples-up on security measures. As far as deployment options, it can be run on a private cloud, but for organizations still cloud-wary, TAP can also be deployed on-premise.
An open and collaborative platform for data professionals
TAP provides a common platform where data scientists can publish and share their models with application developers and with fellow data scientists. This centrality, ease of access, and sharing of work products optimizes the amount of time and money invested overall.
“The principle behind TAP is to make it easier, faster, and less expensive for organizations to create value from their data.”
For more information or to test drive TAP for yourself, be sure to visit the TAP website: http://trustedanalytics.org/
[Major thanks to TAP for sponsoring my involvement with the TAP project!! Thank you 🙂 ]
Sign-up for my newsletter and ...
Each month you'll get a fresh email that tells you about the latest news, events, learning resources, and products that are changing the face of our industry.