Share Rise of the Stack Developer
Share to email
Share to Facebook
Share to X
By Darren Pulsipher
The podcast currently has 8 episodes available.
John Henry (Wikipedia) - Talcott West Virginia
The tall tale of John Henry is just as endearing now as is was 170 years ago when the story started. Machine vs Man. (Think Skynet and Terminator). As we are at the beginning of another industrial revolution (Industry 4.0) many people are concerned that AI will take their jobs. Just like steam power took the jobs of thousands of railroad workers in the 1800s. In this episode, we explore the changes in workforces and effecting change in such an environment.
Story of John Henry
History repeats itself
Links
In this episode, the history of data center architecture and application development is reviewed, and the trends of application development shaping the data center of the future. Find out how containers, serverless, and data mesh architectures are being leveraged to decrease deployment times and increase reliability.
Purpose Built Hardware-software Stacks
Virtualization Architectures
Cloud Architectures
Service and Container Architectures
Internet of Things Architectures
Data and Information Management Architectures
Security and Identity Aspects
Intel Optane DC Persistent Memory to the reuse
Intel's new Persistent Memory technology has three modes of operation. One as an extension of your current memory. Imagine extending your server with 9TBytes of Memory. The second is called AppDirect mode where you can use the Persistent Memory as a persistent segment of memory or as a high-speed SSD. The third mode is called mix mode. In this mode, a percentage of the persistent memory is used for AppDirect and the other to extend your standard DDR4 Memory. When exploring this new technology, I realized that I could take the persistent memory and use it as a high-speed SSD. If I did that could I increase the throughput of my ElasticSearch Server? So I set up a test suite to try this out.
Hardware Setup and configuration
1. Physically configure the memory
• 2-2-2 configuration is the faster configuration for using the Apache Pass Modules.
2. Upgrade the BIOS with the latest updates
3. Install supported OS
• ESXi6.7+
• Fedora29+
• SUSE15+
• WinRS5+
4. Configure the Persistent Memory for 100% AppDirect Mode to get maximum storage
# ipmctl create -goal MemoryMode=0 PersistentMemoryType=AppDirect5. Now create the filesystem Create the FileSystem
# mkfs.xfs /dev/ElasticSearch0sNow that you should be able to access the filesystems via /mnt/ElasticSearch0s and /mnt/ElasticSearch1s.
Setup and configuration
We chose to evaluate the performance and resiliency of ElasticSearch using off the shelf tools. One of the most used performance test suites for ElasticSearch is ESRally. ESRally was developed by the makers of ElasticSearch and is easy to set up and run. It comes with its nomenclature that is easy to pick up.
• Tracks - Test Cases are stored in this directory.
• Cars – configuration files to be used against the distributions
• Race – This contains the data, index, and log files for ElasticSearch run. Each run has a separate directory.
• Distributions – ElasticSearch installations
• Data – data used for the tests
• Logs – logs for ESRally not for ElasticSearch.
ESRally can attach to a specific ElasticSearch cluster, or it can be configured to install and run a standard release of ElasticSearch on the fly, stored in the distribution directory. When I was looking at what directories to move between a PMEM drive and a SATA drive, I looked at the race directory. I found out that I would be limited by the data directory and log directory as well. So I decided to move the complete .rally directory between the two drives.
ESRally Pre-requisites
• Python 3.5+ which includes pip3.
• Java JDK 1.8+
Software Installation
ESRally can be installed from GitHub repository. See http://ESRally.readthedocs.io for more information.
To install ESRally, use the pip3 utility. Pip3 is installed when you install python 3.5.
# pip3 install ESRallyNow you want to configure ESRally
# export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdkESRally config creates a “.rally” directory in your home directory. This “.rally” directory is used to store all of the tests, data, and indices.
Next, you need to install the eventdata track. https://github.com/elastic/rally-eventdata-track
This track generates about 1.8 TBytes of traffic through ElasticSearch. It is a good test because much is the data is generated and not read from a drive, which means you are not limited by another drive’s performance and you can test the raw performance of the different drives with ElasticSearch. Total storage used is about 200GBytes.
Running Tests (Races)
Next, you run the tests for the different configurations. First, I run the following test against the .rally directory running on the SATA drive. I run the same test with increasing number of test clients. The “track” includes ingestion and search tests.
# ESRally --distribution-version=7.3.1 --car=16gheap --track=eventdata \
PMEM Drive Testing
Now that we have collected the test results from the SATA drive. We need to make sure we have set up the DCPMM to AppDirect mode, create a PMEM drive and mount the filesystem.
Ipmctl command to configure your ElasticSearch in AppDirect mode
# ipmctl create -goal MemoryMode=0 PersistentMemoryType=AppDirectNdctl command to create a ElasticSearch device for mounting
# ndctl create-namespace -r regi...Developing a Data Strategy can be difficult, especially if you don’t know where your current organization is and where it wants to go. The Information Management Maturity Model helps CDOs and CIOs find out where they currently are in their Information Management journey and their trajectory. This map helps guide organizations as they continuously improve and progress to the ultimate data organization that allows them to derive maximum business value from their data.
The model can be seen as a series of phases, starting from least mature to most mature: Standardized, Managed, Governed, Optimized, and Innovation. Many times an organization can exist in multiple phases at the same time. Look for where the majority of your organization operates, and then identify your trail-blazers that should be further along in maturity. Use your Trail-blazers to pilot or prototype new processes, technologies or organizational structures.
Standardized Phase
The standardized phase has three sub-phases. Basic, Centralized, and Simplified. Most organizations find them self somewhere in this phase of maturity. Look at the behaviors, technology, and processes that you see in your organization to find where you fit.
Basic
Almost every organization fits into this phase, at least partially. Here data is only used reactively and in an ad hoc manner. Additionally, almost all the data collected is stored based on predetermined time frames (often “forever”). Companies in BASIC do not erase data for fear of missing out on some critical information in the future. Attributes that best describe this phase are:
Centralized (Data Collection Centralized)
As organizations begin to evaluate data strategy they first look at centralizing their storage into large Big Data Storage solutions. This approach takes the form of Cloud storage or on-prem big data appliances. Once the data is collected in a centralized location Data Warehouse technology can be used to enable basic business analytics for the derivation of actionable information. Most of the time this data is used to fix problems with customers, supply chain, product development, or any other area in your organization where data is generated and collected. The attributes that best describe this phase are:
Simplified
As the number of data sources increase in an organization, companies begin to form organizations that focus on data strategy, organization and governance. This shift begins with a Chief Data Officer’s (CDO) office. There are several debates on where the CDO fits in the company, under the CEO or CIO. Don’t get hung up on where they sit in the organization. The important thing is to establish a data organization focus and implement a plan for data normalization. Normalization gives the ability to correlate different data sources to gather new insight into what is going on across your entire company. Note without normalization, the data remains siloed and only partially accessible. Another key attribute of this phase is the need to develop a plan to handle sheer, the massive volume of data being collected. Because of the increase in volume and cost of storing this data, tiered storage becomes important. Note that in the early stages it is almost impossible to know the optimal way to manage data storage. We recommend using the best information available to develop rational data storage plans, but with the cavett that this will need to be reviewed and improved once the data is being used. The attributes that best describe this phase are:
Managed (Standard Data Profiles)
At this phase organizations have formalized their data organization and have segmented the different roles in the data organization: Data Scientist, Data Stewards, Data Engineers are now on the team and have defined roles and responsibilities. Meta-Data Management becomes a key factor in success at this phase, and multiple applications can now take advantage of the data in the company. Movement from a Data Warehouse to a Data Lake has taken place to allow for more agility in development of data-centric applications. Data Storage has been virtualized to allow for a more efficient and dynamic Storage solution. Data Analytics can now run on data sets from multiple sources and departments in the company. These attributes best describe this phase:
Governed
The Governed phase is primarily reached when a centralized approach to data management has been achieved, and a holistic approach to governing and securing the data has been accomplished. The CDO works closely with the CSO (Chief Security Officer) to guarantee that the data and security strategies are working together to protect company’s valuable data while making it accessible for analytics. Data is classified into different buckets based on the criticality, secrecy, or importance of the data. Compliance dictated by regulations is automated and applied to data across the organization. Increased visibility into data usage and security increases with the joint Data and Security Strategies and tactical plans. Basic Artificial Intelligence is being used widely in the organization, and business decisions are inferred by data. Data can now be gathered and cataloged from all over the company including Internet of Thing (IoT) devices on the company’s physical assets. These attributes best describe this phase:
Optimized
As the organization’s data collection continues to increase, they need to find efficiencies in automation and continuous process improvement. Automation of data processes is the primary target in the Optimized phase. Specifically the automation of annotation and Meta Tagging data decreases time to derive value from the data. Data has now become too large to move to one centralized place, and a “Distributed Data Lake” architecture emerges as the most optimal way to manage data. Machine Learning is key in this phase to begin providing information to decision-makers to help optimize business operations and value. Application and data are deployed on network, storage, and compute infrastructure based on historical information and AI models. These attributes best describe this phase:
Innovation
The ultimate organization is not just driven by data but creates new products, offerings, and service...
Organizations are looking to their vast data stores for nuggets of information that give them a leg up on their competition. “Big Data Analytics” and Artificial Intelligence are the technologies promising to find those gold nuggets. Mining data is accomplished through a “Distributed Data Lake Architecture” that enables cleansing, linking, and analytics of varied distributed data sources.
Ad Hoc Data Management
Data Warehouse Architecture
Data Lake Architecture
Distributed Data Lake (Data Mesh)
Rise of the Stack Developer (ROSD) - DWP
Big Data analytics needs data to be valuable. Collecting data from different machines, IoT devices, or sensors is the first step to being able to derive value from data. Ingesting data with Kafka is a common method used to collect this data. Find out how using Intel's Optane DC Persistent Memory to decrease ingestion congestion and increase total thruput of your ingestion solution.
Kafka in real examples
Optane DC Persistent Memory
Improving Ingestion using Persistent Memory
Testing Results
One of the growing areas to help with Legacy Integration and automation of integration is the use of automation tools and frameworks. Over the last 3 years, a significant emphasis on the automation of workflows with legacy and new cloud-aware applications for information workers has emerged. These tools sets are called Robotic Process Automation (RPA) tools.
Robotic Process Automation (RPA)
What RPAs are NOT
What RPAs are
Current Market Place – 2019
Place where RPA works well
RPA Modes of Operation
Attended
· Handles tasks for individual employees
· Employees trigger and direct a bot to carry out an activity
· Employees trigger bots to automate tasks as needed at any time
· Increases productivity and customer satisfaction at call centers and other service desk environments
Unattended
· Automates back-office processes at scale
· Provisioned based on rules-based processes
· Bots complete business processes without human intervention per a predetermined schedule
· Frees employees from rote work, lowering costs, improving compliance, and accelerating processes
How to integrate RPA in your Enterprise
Managing Change
Managing Security
Managing RPA tools and bots with SecDevOps Workflows
RPA Bundling
SecDevOps Pipelining
Pitfalls of RPA bots
Tips and Tricks
· Treat RPAs as Complex Services running in your Multi-Hybrid Cloud
· Run you RPA bots through SecDevOps Workflows like other applications.
· Inject Security and Auth at runtime into the RPA tool.
· Find ways to reuse RPA bots in different parts of your organization.
· Have a plan to replace your RPA bot with a simplified integration
· ...
System Admin - 2002
Stack Developer - 2019
System Administration, Configuration Management, Build Engineer, and DevOps many of the same responsibilities but over the years the names have changed. Listen as Darren Pulsipher gives a brief history of his journey through the software and product development over the last four decades and how so much has changed and much has remained the same.
Darren’s History
The podcast currently has 8 episodes available.