
医疗应用:大数据的容量足够用来创造更加个性化的“治疗处方”
1980年之前,临床医师们主要依赖“经验”、“直觉”以及“触摸不到的线索”来判断一个发烧了的小孩子到底是由较轻的疾病(如感冒)还是由比较严重的疾病(如急性肺炎或脑膜炎)引起的。换句话说,他们靠直觉来看病。在1980年,一个由研究者组成的小组研究了那些有经验的儿科医生是如何为他们的病人诊断的。他们发现了那些杰出的医师在直觉中参考了“输入信息”,而那些缺乏经验的医师在试图可靠地试用这些“输入信息”时就显得过于主观了。
在随后的研究中,研究人员从精确度和客观性两个方面上加强了他们的系统。在这个系统中,那些正在接受培训的儿科医师能够像有经验的医师那样接触到很多因严重疾病而导致发烧的儿童。事情发生了根本上的变化:直觉的建立被质化和量化地形成了一种形式,并且这种形式可以被那些经验并不丰富的医生所利用。如今,几乎所有正在为发烧儿童看病的医生都在证实这精妙的发现。
如果我们把目标确定为为每位儿童的每次就诊都提供最好的治疗,那么我们需要的就不仅仅是直觉和专业的技能了,因为人无完人。基于证据的医疗方法(EBM)通过把临床研究整合进治疗准则来帮助医师提高治疗水平。然而就普遍意义来说,EBM一般是基于“小数据”的研究——与动辄数十万或数百万的大数据不同,一个大型的EBM则是包含了数千例病例的系统。在这样的小样本规模系统中输入信息必须被良好地定义和形式化,随之而来的结果便是包含了所有这些信息的治疗准则在解释病人与病人之间的差异时就显得力不从心。因而EBM有时被人们嘲笑为“菜谱式治疗”,医生们只是机械地遵循着这些治疗的“配方”来治病。鸡肉与菠菜对于一些人来说也许是顿美味,但是当我们要为一位素食主义者上菜时又该怎么办呢?
大数据的容量足够用来创造更加个性化的“治疗菜谱”。利用一个容量为5亿人的数据集,你可以为一个体重超重且高胆固醇每天必须服用阿司匹林和立普妥的35岁男人,或者为一个与上述情况完全相同但是体重偏轻的人定制治疗方案。
大数据也可以允许我们通过在粗略的未经处理的数据集中逐条比对来发现微小但是强有力的线索,从而进行分析研究。小数据集中通常不能处理粗糙的原始数据,因为它不能分辨“心梗”与“心肌梗死”的区别,即便他们指的是同样的事情。并且由于在小数据集中只能使用单一的术语,使得我们无法做出确凿的归纳。同时小数据集也无法支持需要识别“心梗”与“心肌梗死”是同一种术语的研究。小数据集同样无法支持我们使用很细节的线索作为输入,因为它们在数据集的发生具有太大的随机性–确凿的归纳是无法从这样的小样本数据集中得到的。
目前有越来越多的争议在讨论大数据是否正在取代直觉在医疗中的地位。无论怎样,大数据仍是我们最大的希望–计算机可以在模仿人类专家直觉方面跟进一步,那时我们就再也不用依赖EBM这样的小数据集了。真正的问题并不是大数据正在威胁医疗中的直觉,而恰恰相反,是在于我能未能做到这一点。我们如今在医疗领域并未过于依赖大数据,因为这的确需要大数据量,而医学研究者们手中并没有真正的大型临床数据集。
建立,维护,标识以及保密临床临床数据集的代价太高昂了。泄露数据集信息的惩罚很重,而建立这样数据集的利益却几乎不存在。即便是政府支持的健康信息流通项目通常也不进行数据统计。取而代之的是,这些系统被用作让登陆者进入一个外部系统,一次只能取回一位患者的数据,并且得到的数据通常是摘要形式的。大数据分析是无法在这样的体系中实现的。
然而,大数据量医疗数据集受到的最大壁垒是医疗信息中盛行的所谓“最佳实践准则”,这一准则已经落后于其他行业一二十年了。医疗信息体系仍在持续强化使用陈旧的数据屏障,而这屏障正是维持“小数据集”研究的基础。在这个体系中,只有通过审核的,标准的,被编辑过的数据才能被接收——这里没有任何粗糙的原始数据!随之产生的数据集便是小数据集,因为屏障式的处理过程是强化数据源的瓶颈,由于缺乏一致性,许多可用的数据被拒之门外。这个屏障创造了同质化的数据,而排除了能使系统真正有用的多样性,这就如同白面包一样——一个被滤去了谷物最好营养物质的空空的净化盒。如果在大数据中使用了这样的屏障,谷歌和亚马逊就不可能成功,原始的大数据正是他们成功的原因。
除非每个医生都同时拥有无与伦比的直觉,否则计算机就应该用来提供更好的医疗。如果我们在处理过程中摒弃小数据思维,并开始建立真正的大数据,那么大数据在医疗支持中将会发挥更加巨大的作用。
Until 1980, clinicians relied heavily on “experience,” “instinct,” and “intangible clues” to determine whether a child with a fever had a minor disease (such as a cold) or something more serious (such as pneumonia or meningitis). In other words, they used “intuition.” In 1980, a group of researchers studied how experienced pediatricians assess these patients. They identified several findings that expert clinicians use as “inputs” to their intuition, but found they were too subjective to be reliably used by doctors with less experience.
In follow-up studies, the researchers honed their system to be more accurate and objective. Using this system, pediatricians in training were able to assess illness severity for children with fever almost as well as experienced pediatricians! In essence, the building blocks of this intuition had been identified and quantified into a form that new doctors could use despite their lack of experience. Today, nearly every doctor treating a febrile child documents these subtle findings.
If we aim for every clinician to provide perfect care every time, then we need more than just intuition and expertise, because nobody’s perfect. Evidence-based Medicine (EBM) helps clinicians provide better care by summarizing clinical studies into care guidelines. However, EBM is generally based on “Small Data” studies – a large EBM study consists of thousands of cases instead of the millions or billions typical of Big Data. With such small sample sizes the data inputs must be well-defined and perfectly formatted, and the resulting all-encompassing guidelines often fail to account for differences between patients. EBM is sometimes derided as “cookbook medicine” where doctors blindly follow “recipes” for care. Chicken and spinach might be a great meal for most people, but what if I’m serving a vegetarian?
Big Data is large enough to create “care recipes” that are far more individualized. With a dataset of 500 million people, you can create one care recipe for overweight, 35-year old men with high cholesterol on daily aspirin and Lipitor; and another recipe for the same group of men who differ only by being underweight.
Big Data also enables analytics that “read between the lines” to find subtle but powerful clues embedded in raw, unprocessed data. Small Data often can’t handle raw input because it can’t tell that “MI” and “myocardial infarction” both refer to the same thing, and there aren’t enough cases in Small Data to draw valid conclusions by using just one of those terms. Small Data is also too small to allow the analytics to “figure out” that “MI” and “myocardial infarction” are equivalent terms. Small Data also isn’t big enough to use subtle clues as inputs because they occur too infrequently in the dataset – valid conclusions cannot be drawn from such small sample sizes.
Arguments are raging about whether Big Data is usurping the role of intuition in Medicine. However, Big Data is our best hope for computers to do a better job at mimicking the intuition of human experts so that we no longer need to rely on Small Data EBM. The real problem with Big Data is not its threat to clinical intuition but rather our failure to do it at all. We aren’t using Big Data heavily in Medicine because it requires really big datasets, and medical researchers don’t have really big clinical datasets.
Building, maintaining, de-identifying, and securing clinical datasets has high costs. The penalties for failing to secure the data are severe, while the incentives to create such datasets are nearly non-existent. Even government-supported Health Information Exchanges often don’t actually aggregate data. Rather, they serve as record locators into external systems where data can be retrieved one patient at a time, often only in summary form. Big Data analytics cannot be done with that architecture.
However, the biggest barriers to big medical datasets are the prevailing “best practices” in medical informatics, which continue to lag other industries by ten to twenty years. Medical informaticists still insist upon enforcing the antiquated data curation processes required to support “Small Data” analytics. Only approved, standardized, encoded data can be accepted – no raw data or subtleties allowed here! The resulting dataset is small because the curation process is a resource-intensive bottleneck, and much of the available data is rejected for lack of conformity. Curation creates a homogenized data product devoid of the variety that makes it truly useful, not unlike white bread, an empty carb source stripped of its whole grain nutrient goodness. Google and Amazon do not succeed at Big Data despite a lack of curation, they succeed because of it.
Until every doctor has perfect intuition all the time, computers should help us provide better care. Big Data can play a big role in supporting care, provided that we start creating truly Big Data and stop trying to apply Small Data thinking to the process.
Dr. Jonathan Handler serves as the Chief Medical Information Officer at M*Modal and is a board-certified emergency physician with twenty years of experience in Medical Informatics.
数据分析咨询请扫描二维码
若不方便扫码,搜微信号:CDAshujufenxi
如何考取数据分析师证书:以 CDA 为例 在数字化浪潮席卷各行各业的当下,数据分析师已然成为企业挖掘数据价值、驱动决策的 ...
2025-07-15CDA 精益业务数据分析:驱动企业高效决策的核心引擎 在数字经济时代,企业面临着前所未有的数据洪流,如何从海量数据中提取有 ...
2025-07-15MySQL 无外键关联表的 JOIN 实战:数据整合的灵活之道 在 MySQL 数据库的日常操作中,我们经常会遇到需要整合多张表数据的场景 ...
2025-07-15Python Pandas:数据科学的瑞士军刀 在数据驱动的时代,面对海量、复杂的数据,如何高效地进行处理、分析和挖掘成为关键。 ...
2025-07-15用 SQL 生成逆向回滚 SQL:数据操作的 “后悔药” 指南 在数据库操作中,误删数据、错改字段或误执行批量更新等问题时有发生。 ...
2025-07-14t检验与Wilcoxon检验的选择:何时用t.test,何时用wilcox.test? t 检验与 Wilcoxon 检验的选择:何时用 t.test,何时用 wilcox. ...
2025-07-14AI 浪潮下的生存与进阶: CDA数据分析师—开启新时代职业生涯的钥匙(深度研究报告、发展指导白皮书) 发布机构:CDA数据科 ...
2025-07-13LSTM 模型输入长度选择技巧:提升序列建模效能的关键 在循环神经网络(RNN)家族中,长短期记忆网络(LSTM)凭借其解决长序列 ...
2025-07-11CDA 数据分析师报考条件详解与准备指南 在数据驱动决策的时代浪潮下,CDA 数据分析师认证愈发受到瞩目,成为众多有志投身数 ...
2025-07-11数据透视表中两列相乘合计的实用指南 在数据分析的日常工作中,数据透视表凭借其强大的数据汇总和分析功能,成为了 Excel 用户 ...
2025-07-11尊敬的考生: 您好! 我们诚挚通知您,CDA Level I和 Level II考试大纲将于 2025年7月25日 实施重大更新。 此次更新旨在确保认 ...
2025-07-10BI 大数据分析师:连接数据与业务的价值转化者 在大数据与商业智能(Business Intelligence,简称 BI)深度融合的时代,BI ...
2025-07-10SQL 在预测分析中的应用:从数据查询到趋势预判 在数据驱动决策的时代,预测分析作为挖掘数据潜在价值的核心手段,正被广泛 ...
2025-07-10数据查询结束后:分析师的收尾工作与价值深化 在数据分析的全流程中,“query end”(查询结束)并非工作的终点,而是将数 ...
2025-07-10CDA 数据分析师考试:从报考到取证的全攻略 在数字经济蓬勃发展的今天,数据分析师已成为各行业争抢的核心人才,而 CDA(Certi ...
2025-07-09【CDA干货】单样本趋势性检验:捕捉数据背后的时间轨迹 在数据分析的版图中,单样本趋势性检验如同一位耐心的侦探,专注于从单 ...
2025-07-09year_month数据类型:时间维度的精准切片 在数据的世界里,时间是最不可或缺的维度之一,而year_month数据类型就像一把精准 ...
2025-07-09CDA 备考干货:Python 在数据分析中的核心应用与实战技巧 在 CDA 数据分析师认证考试中,Python 作为数据处理与分析的核心 ...
2025-07-08SPSS 中的 Mann-Kendall 检验:数据趋势与突变分析的有力工具 在数据分析的广袤领域中,准确捕捉数据的趋势变化以及识别 ...
2025-07-08备战 CDA 数据分析师考试:需要多久?如何规划? CDA(Certified Data Analyst)数据分析师认证作为国内权威的数据分析能力认证 ...
2025-07-08