京公网安备 11010802034615号
经营许可证编号:京B2-20210330
统计之 - Contingency Table
In statistics, a Contingency Table(also referred to as Cross Tabulationor cross tab) is a type of table in a matrix format that displays the(multivariate) frequency distribution of thecategorical variables.The term contingency table was first used by Karl Pearson in "On the Theoryof Contingency and Its Relation to Association and Normal Correlation",[1]part of the Drapers' Company Research Memoirs Biometric Series I published in1904.
A crucial problem of multivariate statistics is finding(direct-)dependence structure underlying the variables contained inhigh-dimensional contingency tables. If some of the conditional independencesare revealed, then even the storage of the data can be done in a smarter way(see Lauritzen (2002)). In order to do this one can use information theoryconcepts, which gain the information only from the distribution of probability,which can be expressed easily from the contingency table by the relative frequencies.
Suppose that we have two variables, sex (male or female) and handedness(right- or left-handed). Further suppose that 100 individuals are randomlysampled from a very large population as part of a study of sex differences inhandedness. A contingency table can be created to display the numbers ofindividuals who are male and right-handed, male and left-handed, female andright-handed, and female and left-handed. Such a contingency table is shown below.
The numbers of the males, females, and right- and left-handedindividuals are called Marginal Totals. The grand total, i.e., the totalnumber of individuals represented in the contingency table, is the number inthe bottom right corner.
The table allows us to see at a glance that the proportionof men who are right-handed is about the same as the proportion of women whoare right-handed although the proportions are not identical. The significanceof the difference between the two proportions can be assessed with a variety ofstatistical tests including Pearson's chi-squared test, the G-test, Fisher'sexact test, and Barnard's test, provided the entries in the table representindividuals randomly sampled from the population about which we want to draw aconclusion. If the proportions of individuals in the different columns varysignificantly between rows (or vice versa), we say that there is a contingencybetween the two variables. In other words, the two variables are notindependent. If there is no contingency, we say that the two variables areindependent.
The example above is the simplest kind of contingency table,a table in which each variable has only two levels; this is called a 2 x 2contingency table. In principle, any number of rows and columns may be used.There may also be more than two variables, but higher order contingency tablesare difficult to represent on paper. The relation between ordinal variables, orbetween ordinal and categorical variables, may also be represented incontingency tables, although such a practice is rare.
数据分析咨询请扫描二维码
若不方便扫码,搜微信号:CDAshujufenxi
主讲人简介 张冲,海归统计学硕士,CDA 认证数据分析师,前云南白药集团资深数据分析师,自媒体 Python 讲师,全网课程播放量破 ...
2026-04-10在数据可视化与业务分析中,同比分析是衡量业务发展趋势、识别周期波动的核心手段,其核心逻辑是将当前周期数据与上年同期数据进 ...
2026-04-10在机器学习模型的落地应用中,预测精度并非衡量模型可靠性的唯一标准,不确定性分析同样不可或缺。尤其是在医疗诊断、自动驾驶、 ...
2026-04-10数据本身是沉默的,唯有通过有效的呈现方式,才能让其背后的规律、趋势与价值被看见、被理解、被运用。统计制图(数据可视化)作 ...
2026-04-10在全球化深度发展的今天,跨文化传播已成为连接不同文明、促进多元共生的核心纽带,其研究核心围绕“信息传递、文化解读、意义建 ...
2026-04-09在数据可视化领域,折线图是展示时序数据、趋势变化的核心图表类型之一,其简洁的线条的能够清晰呈现数据的起伏规律。Python ECh ...
2026-04-09在数据驱动的时代,数据分析早已不是“凭经验、靠感觉”的零散操作,而是一套具备固定逻辑、标准化流程的系统方法——这就是数据 ...
2026-04-09长短期记忆网络(LSTM)作为循环神经网络(RNN)的重要改进模型,凭借其独特的门控机制(遗忘门、输入门、输出门),有效解决了 ...
2026-04-08在数据分析全流程中,数据质量是决定分析结论可靠性的核心前提,而异常值作为数据集中的“异类”,往往会干扰统计检验、模型训练 ...
2026-04-08在数字经济飞速发展的今天,数据已渗透到各行各业的核心场景,成为解读趋势、优化决策、创造价值的核心载体。而数据分析,作为挖 ...
2026-04-08在数据分析全流程中,数据处理是基础,图形可视化是核心呈现手段——前者负责将杂乱无章的原始数据转化为干净、规范、可分析的格 ...
2026-04-07在数据分析与统计推断中,p值是衡量假设检验结果显著性的核心指标,其本质是在原假设(通常为“无效应”“无差异”)成立的前提 ...
2026-04-07在数字经济深度渗透的今天,数据已成为企业生存发展的核心资产,企业的竞争本质已转变为数据利用能力的竞争。然而,大量来自生产 ...
2026-04-07Python凭借简洁的语法、丰富的生态库,成为算法开发、数据处理、机器学习等领域的首选语言。但受限于动态类型、解释性执行的特性 ...
2026-04-03在深度学习神经网络中,卷积操作是实现数据特征提取的核心引擎,更是让模型“看懂”数据、“解读”数据的关键所在。不同于传统机 ...
2026-04-03当数字化转型从企业的“战略口号”落地为“生存之战”,越来越多的企业意识到,转型的核心并非技术的堆砌,而是数据价值的深度挖 ...
2026-04-03在日常办公数据分析中,数据透视表凭借高效的汇总、分组功能,成为Excel、WPS等办公软件中最常用的数据分析工具之一。其中,“计 ...
2026-04-02在数字化交互的全场景中,用户的每一次操作都在生成动态的行为轨迹——电商用户的“浏览商品→点击详情→加入购物车”,内容APP ...
2026-04-02在数字化转型深度推进的今天,企业数据已成为驱动业务增长、构建核心竞争力的战略资产,而数据安全则是守护这份资产的“生命线” ...
2026-04-02在数据驱动决策的浪潮中,数据挖掘与数据分析是两个高频出现且极易被混淆的概念。有人将二者等同看待,认为“做数据分析就是做数 ...
2026-04-01