-
1.
公开(公告)号:US20140280239A1
公开(公告)日:2014-09-18
申请号:US13962103
申请日:2013-08-08
Applicant: SAS Institute Inc.
Inventor: James Edward Georges , David Lee Kuhn , Edward Lew Rowe , John Michael Kichak , Karcsi Fritz Lehr
IPC: G06F17/30
CPC classification number: G06F16/2468 , G06F21/6254
Abstract: A method of determining a similarity between records in a data set is provided. Data organized into a plurality of records is received. First characters associated with a field and a first record of the plurality of records are selected. The selected first characters are subdivided into a first sliding series of a defined number of characters. Second characters associated with the field and a second record of the plurality of records are selected. The selected second characters are subdivided into a second sliding series of the defined number of characters. A similarity score between the first sliding series and the second sliding series is calculated. Whether or not the first sliding series and the second sliding series are similar is determined based on the calculated similarity score.
Abstract translation: 提供了一种确定数据集中的记录之间的相似性的方法。 接收组织成多个记录的数据。 选择与多个记录的字段和第一记录相关联的第一个字符。 所选择的第一个字符被细分为一个定义数量的字符的第一个滑动系列。 选择与字段相关联的第二字符和多个记录的第二记录。 所选择的第二个字符被细分为定义数量的字符的第二个滑动系列。 计算第一滑动系列和第二滑动系列之间的相似性得分。 基于所计算出的相似性得分确定第一滑动系列和第二滑动系列是否相似。
-
公开(公告)号:US09483477B2
公开(公告)日:2016-11-01
申请号:US14868666
申请日:2015-09-29
Applicant: SAS Institute Inc.
Inventor: Leslie Madonna Francis , Brian Oneal Miles , Shrividya Sastry , David Lee Kuhn
IPC: G06F17/30
CPC classification number: G06F17/30076 , G06F17/301 , G06F17/30106
Abstract: In a system automatically processing data from a first computing device for use on a second computing device, a registry file including a plurality of filename parameters is read. Each filename parameter identifies a matching filename pattern, an extract script indicator, and a read file indicator. The extract script indicator indicates an extract script for a file having a filename that matches the matching filename pattern. The read file indicator indicates how to read the file having the filename that matches the matching filename pattern. One parameter of the plurality of filename parameters is selected by matching a filename of a source file to the matching filename pattern of the one parameter. The associated extract script is selected and used to read data from the source file using the associated read file indicator and the read data is output to a different file and in a different format.
Abstract translation: 在系统中,自动地处理来自第一计算设备的用于在第二计算设备上使用的数据,读取包括多个文件名参数的注册表文件。 每个文件名参数标识匹配的文件名模式,提取脚本指示符和读取的文件指示符。 提取脚本指示符指示具有与匹配的文件名模式匹配的文件名的文件的提取脚本。 读取文件指示符指示如何读取具有与匹配的文件名模式匹配的文件名的文件。 通过将源文件的文件名与一个参数的匹配文件名模式相匹配来选择多个文件名参数的一个参数。 相关联的提取脚本被选择并用于使用关联的读取文件指示符从源文件读取数据,并且将读取的数据输出到不同的文件和不同的格式。
-
3.
公开(公告)号:US20140280343A1
公开(公告)日:2014-09-18
申请号:US14016689
申请日:2013-09-03
Applicant: SAS Institute Inc.
Inventor: James Edward Georges , David Lee Kuhn , Edward Lew Rowe , John Michael Kichak , Karcsi Fritz Lehr
IPC: G06F17/30
CPC classification number: G06F16/2468 , G06F21/6254
Abstract: A method of determining a similarity between records in a data set is provided. Data organized into a plurality of records is received. First characters associated with a field and a first record of the plurality of records are selected. The selected first characters are encoded and subdivided into a first sliding series of a defined number of characters. Second characters associated with the field and a second record of the plurality of records are selected. The selected second characters are encoded and subdivided into a second sliding series of the defined number of characters. Whether or not the first sliding series and the second sliding series are similar is determined by comparing the encoded and subdivided first characters to the encoded and subdivided second characters using a fuzzy matching algorithm.
Abstract translation: 提供了一种确定数据集中的记录之间的相似性的方法。 接收组织成多个记录的数据。 选择与多个记录的字段和第一记录相关联的第一个字符。 所选择的第一个字符被编码并被细分成一个定义数量的字符的第一个滑动系列。 选择与字段相关联的第二字符和多个记录的第二记录。 所选择的第二个字符被编码并细分为定义数量的字符的第二个滑动系列。 通过使用模糊匹配算法将编码和细分的第一字符与编码和细分的第二字符进行比较来确定第一滑动系列和第二滑动系列是否相似。
-
公开(公告)号:US09971779B2
公开(公告)日:2018-05-15
申请号:US15333333
申请日:2016-10-25
Applicant: SAS Institute Inc.
Inventor: Leslie Madonna Francis , Brian Oneal Miles , Shrividya Sastry , David Lee Kuhn
IPC: G06F17/30
CPC classification number: G06F17/30076 , G06F17/301 , G06F17/30106
Abstract: In a system automatically processing data from a first computing device for use on a second computing device, a registry file including a plurality of filename parameters is read. Each filename parameter identifies a matching filename pattern, an extract script indicator, and a read file indicator. The extract script indicator indicates an extract script for a file having a filename that matches the matching filename pattern. The read file indicator indicates how to read the file having the filename that matches the matching filename pattern. One parameter of the plurality of filename parameters is selected by matching a filename of a source file to the matching filename pattern of the one parameter. The associated extract script is selected and used to read data from the source file using the associated read file indicator and the read data is output to a different file and in a different format.
-
公开(公告)号:US20170039202A1
公开(公告)日:2017-02-09
申请号:US15333333
申请日:2016-10-25
Applicant: SAS Institute Inc.
Inventor: Leslie Madonna Francis , Brian Oneal Miles , Shrividya Sastry , David Lee Kuhn
IPC: G06F17/30
CPC classification number: G06F17/30076 , G06F17/301 , G06F17/30106
Abstract: In a system automatically processing data from a first computing device for use on a second computing device, a registry file including a plurality of filename parameters is read. Each filename parameter identifies a matching filename pattern, an extract script indicator, and a read file indicator. The extract script indicator indicates an extract script for a file having a filename that matches the matching filename pattern. The read file indicator indicates how to read the file having the filename that matches the matching filename pattern. One parameter of the plurality of filename parameters is selected by matching a filename of a source file to the matching filename pattern of the one parameter. The associated extract script is selected and used to read data from the source file using the associated read file indicator and the read data is output to a different file and in a different format.
Abstract translation: 在系统中,自动地处理来自第一计算设备的用于在第二计算设备上使用的数据,读取包括多个文件名参数的注册表文件。 每个文件名参数标识匹配的文件名模式,提取脚本指示符和读取的文件指示符。 提取脚本指示符指示具有与匹配的文件名模式匹配的文件名的文件的提取脚本。 读取文件指示符指示如何读取具有与匹配的文件名模式匹配的文件名的文件。 通过将源文件的文件名与一个参数的匹配文件名模式相匹配来选择多个文件名参数的一个参数。 相关联的提取脚本被选择并用于使用关联的读取文件指示符从源文件读取数据,并且将读取的数据输出到不同的文件和不同的格式。
-
公开(公告)号:US20160210297A1
公开(公告)日:2016-07-21
申请号:US14868666
申请日:2015-09-29
Applicant: SAS Institute Inc.
Inventor: Leslie Madonna Francis , Brian Oneal Miles , Shrividya Sastry , David Lee Kuhn
IPC: G06F17/30
CPC classification number: G06F17/30076 , G06F17/301 , G06F17/30106
Abstract: In a system automatically processing data from a first computing device for use on a second computing device, a registry file including a plurality of filename parameters is read. Each filename parameter identifies a matching filename pattern, an extract script indicator, and a read file indicator. The extract script indicator indicates an extract script for a file having a filename that matches the matching filename pattern. The read file indicator indicates how to read the file having the filename that matches the matching filename pattern. One parameter of the plurality of filename parameters is selected by matching a filename of a source file to the matching filename pattern of the one parameter. The associated extract script is selected and used to read data from the source file using the associated read file indicator and the read data is output to a different file and in a different format.
Abstract translation: 在系统中,自动地处理来自第一计算设备的用于在第二计算设备上使用的数据,读取包括多个文件名参数的注册表文件。 每个文件名参数标识匹配的文件名模式,提取脚本指示符和读取的文件指示符。 提取脚本指示符指示具有与匹配的文件名模式匹配的文件名的文件的提取脚本。 读取文件指示符指示如何读取具有与匹配的文件名模式匹配的文件名的文件。 通过将源文件的文件名与一个参数的匹配文件名模式相匹配来选择多个文件名参数的一个参数。 相关联的提取脚本被选择并用于使用关联的读取文件指示符从源文件读取数据,并且将读取的数据输出到不同的文件和不同的格式。
-
-
-
-
-