<\/i>","library":"fa-solid"},"layout":"horizontal","toggle":"burger"}" data-widget_type="nav-menu.default">

博客

想要更好的事件相关性吗?AIOPS或更好的监视 - 从哪里开始

在脸书上分享
分享到Twitter
Share on linkedin
分享Reddit
分享口袋

We all hate those phone calls. “Hey, did you know that the lead flow into Salesforce is down?” No matter how far IT technology has come, we still get calls when our customers find things first. And then the finger-pointing starts. We’ve spent the last 10 years trying to make business run more smoothly by connecting our applications. But when something breaks, we’re left figuring it out by looking at one tool, then another, then another — and then we’re stuck in a room arguing about whose tool is right. You’ve got too many tools, too many events and alerts, and more disjointed pieces of information than you know what to do with.

Enter event correlation — an insightful way to find a needle in a haystack or to find a needle you weren’t otherwise aware of. Wikipedia defines event correlation as “atechnique for making sense of a large number of events and pinpointing the few events that are really important in that mass of information - this is accomplished by looking for and analyzing relationships between events.”

This is what AIOps vendors are saying they can do, and they say they can do it with cutting-edge machine learning techniques (But really, today it’s mostly statistical analytics.).随着Gartner和AIA等分析师公司为AIOPS发布买家指南,似乎很诱人地研究了他们解决这一切的诺言!Many think that they can just add an AIOps tool on top of their existing mix without actually having to touch their suite of disjointed monitoring tools.

But is this really the way to solve your problems?我们认为实际上有两种方法可以进行事件相关性。让我们分解并比较两者。

1. The Statistical Event Correlation Approach (The AIOps Way)

AIOPS工具使用统计分析以及用户定义的规则来起泡并确定事件的优先级。这些统计方法依赖于时间(例如同时发生多个事件?),网络接近度(是否位于同一子网上的两个中断?)和类似质量的数量(在许多事件中显示了一个特定的单词?)。

These tools can be really great at spotting a pattern. Given a set of whitelisted and blacklisted events, they can perform brute-force pattern matching to classify new incoming events as good or bad. They can also detect these patterns themselves, but for this, they need long training times and large sets of data.

但是这种方法有两个挑战。首先是这些AIOPS工具不是域的意识,也不固有地理解IT元素本身,这可能会导致它们浮出水面。默认情况下,您知道的事情没有任何意义。例如,您知道打印机和网站也没有关系,即使它们同时产生了事件。但是您的统计方法可能不知道这一点。

The other challenge: around 70 percent of IT incidents, per analyst firm EMA, are completely new and haven’t occurred in the past. So, relying on past behavior means that you can miss first-time issues that are important to know about. Wouldn’t you want to be sure that you’ll catch more than just 30 percent of IT issues?

还有另一件事 - 因为AIOPS工具倾向于依靠其他监视工具进行事件,因此它们追溯地查看了旧数据,并且该数据仅与监视工具配置为发送的数据一样好。垃圾进垃圾出。通常,摄入的事件已经进行了处理和辩论,这进一步从whack中进行了统计分析。具有更复杂的机器学习和AI的工具还需要需求数据科学技能,这通常很难获得。

AIOPS工具确实可以非常擅长分析非结构化数据(例如,事件中的文本以及服务台票中的自然语言处理),以确定基础架构监视工具肯定会错过的高级相关性。有些人甚至超越了IT OPS领域,并从社交媒体等其他流中摄入数据,使公司能够真正了解其用户或品牌何时受到影响。

2. The Intelligent, Domain-Based Correlation Approach

This approach involves modernizing your infrastructure monitoring solution itself. The secret here is selecting a platform with an ability to perform event correlation based on a native, deep understanding of the IT infrastructure components and dependencies. One that is domain-aware as well as service-aware. One that knows how the infrastructure works in order to determine logical relationships. Understanding how these individual monitored elements support a critical service at any given point in time helps to prioritize the most important issues to investigate and resolve first.

An infrastructure monitoring solution like Zenoss helps you to understand IT service risks in real time and to sift through noise by bubbling up service-impacting events (with prioritized root-cause analysis to ease resolution). Zenoss has inherent domain understanding about the devices themselves and how they work. It knows that a failing fan in a converged infrastructure server will affect its performance and that a printer error is not going to take down infrastructure for an e-commerce website even if it is on the same subnet. It knows that an issue on a backup server will affect the mobile app it supports, even if it isn’t at peak use and customers aren’t yet affected.

The benefits of using this approach go even further. By consolidating your monitoring toolset, you can also reduce license and labor costs. By handling a significant amount of event correlation within your monitoring tools, you can prevent event storms from moving to the service desk, where they are more costly to manage.

The Best of Both Worlds

Before you slap yet another tool on top of your monitoring solution, stop and think about what you are solving for. We know companies that have pursued an AIOps solution instead of tackling their monitoring, ultimately realizing that this approach fell short.

那么,我们的疾病处方?首先调整监视方法。您的监视是您的第一道防线,并确保您拥有具有优质见解的统一视图是关键。然后,您可以在更广泛的数据集中释放AIOPS工具,以捕获错过的东西,从而补充和补充监控。

Here are some tips on how you should leverage each type of tool to get the best of both worlds.

使用ZENOSS软件进行:

  • Unified monitoring of infrastructure performance and availability across your hybrid IT environment
  • Amalgamating event data from your other monitoring tools (In other words, your infrastructure monitoring “monitor of monitors”)
  • 生成与基础架构相关的洞察
  • 自动化的警报和与基础架构相关的,影响服务的事件的票务

使用AIOPS平台:

  • 融合更广泛的IT领域的数据,从而导致高级业务见解
  • 内部团队的重要数据的跨职能协作和可视化
  • 寻找更高层次的、季节性趋势超出了你usual monitoring metrics and indicators

要了解有关AIOPS工具如何补充Zenoss等解决方案的更多信息,请考虑参加即将到来的Galaxz18会议在德克萨斯州奥斯汀。

Categories

订阅

在下面的框中输入您的电子邮件地址以订阅我们的博客。

Loading
特色内容
Analyst Report
Forrester Wave™:智能应用和服务监控,2019年第一季度
Analyst Report
Gartner Market Guide for AIOps Platforms

Enabling IT to Move at the Speed of Business

Zenoss是为现代IT基础设施而建造的。让我们讨论如何一起工作。

安排演示

Want to see us in action? Schedule a demo today.