- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Introduction to Data Mining 4/18/2004 ‹#›
© Tan,Steinbach, Kumar
Attribute Type
al
Description
The values of a nominal attribute are just different names, i.e., nominal attributes provide only enough information to distinguish one object from another. (=, )
5 A B 7 C 8 3 2 1
D 10 4
E
15
5
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Types of Attributes
There are different types of attributes
– Nominal(标称)
hardness of minerals, {good, better, best}, grades, street numbers
median, percentiles, rank correlation, run tests, sign tests
Interval
For interval attributes, the differences between values are meaningful, i.e., a unit of measurement exists. (+, - ) For ratio variables, both differences and ratios are meaningful. (*, /)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Types of data sets
Record
– – Data Matrix Document Data
–
Transaction Data
Graph
– – World Wide Web Molecular Structures
– ID has no limit but age has a maximum and minimum value
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 ‹#›
Measurement of Length
The way you measure an attribute is somewhat may not match the attributes properties.
– – – –
– – – –
Distinctness: Order: Addition: Multiplication:
= < > + */
Nominal attribute: distinctness Ordinal attribute: distinctness & order Interval attribute: distinctness, order & addition Ratio attribute: all 4 properties
4/18/2004
‹#›
Attribute Values
Attribute values are numbers or symbols assigned to an attribute Distinction between attributes and attribute values
– Same attribute can be mapped to different attribute values
‹#›
What is Data?
Collection of data objects and their attributes
Attributes
An attribute is a property or characteristic of an object
– Examples: eye color of a person, temperature, etc.
Continuous Attribute
– Has real numbers as attribute values – Examples: temperature, height, or weight. – Practically, real values can only be measured and represented using a finite number of digits. – Continuous attributes are typically represented as floating-point variables.
Tid Refund Marital Status 1 2 3 4 5 6 7 8 9 10
10
Taxable Income Cheat 125K 100K 70K 120K No No No No Yes No No Yes No Yes
Yes No No Yes No No Yes No No No
Examples: ID numbers, eye color, zip codes
Examples: rankings (e.g., taste of potato chips on a scale from 1-10), grades, height in {tall, medium, short} Examples: calendar dates, temperatures in Celsius or Fahrenheit. Examples: temperature in Kelvin, length, time, counts
Ratio
new_value = a * old_value
Length can be measured in meters or feet.
Discrete and Continuous Attributes
Discrete Attribute
– Has only a finite or countably infinite set of values – Examples: zip codes, counts, or the set of words in a collection of documents – Often represented as integer variables. – Note: binary attributes are a special case of discrete attributes
– Object is also known as record, point, case, sample, entity, or instance
Divorced 220K Single Married Single 85K 75K 90K
© Tan,Steinbach, Kumar
Introduction to Data Mining
Ordinal
An order preserving change of values, i.e., new_value = f(old_value) where f is a monotonic function.
Interval
new_value =a * old_value + b where a and b are constants
– Ordinal(序数)
– Interval(区间)
– Ratio(比率)
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Properties of Attribute Values
The type of an attribute depends on which of the following properties it possesses:
Ordered
– – Spatial Data Temporal Data
–
–
Sequential Data
Genetic Sequence Data
© Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
‹#›
Important Characteristics of Structured Data
calendar dates, temperature in Celsius or Fahrenheit
mean, standard deviation, Pearson's correlation, t and F tests geometric mean, harmonic mean, percent variation
Single Married Single Married
– Attribute is also known as variable, field, characteristic, or feature Objects
Divorced 95K Married 60K
A collection of attributes describe an object