-
公开(公告)号:WO2011133705A3
公开(公告)日:2012-02-02
申请号:PCT/US2011033306
申请日:2011-04-20
Applicant: MICROSOFT CORP
Inventor: FUXMAN ARIEL , NGUYEN HOA , SILVA JULIANA FREIRE DE LIMA E , PAPARIZOS STELIOS , AGRAWAL RAKESH , CHEN ZHIMIN , COLAGIOVANNI LAWRENCE WILLIAM , SIKCHI PRAKASH
CPC classification number: G06F17/30386 , G06Q30/0281 , G06Q30/0603
Abstract: Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user. Updates from at least 500 million different data sources may be scheduled to occur as frequently as several times daily.
Abstract translation: 公开了用于将产品信息从多个数据源自动合成到在线目录中的方法和系统,特别地,用于基于属性值对自动合成产品信息。 可以通过实体提取,饲料摄取和其他机制从具有不同分类和模式的多个结构化和非结构化数据源获得信息。 产品信息可以另外地或替代地基于流行度数据获得或导出。 产品信息可以被清洁,分段和归一化。 产品信息可能被聚集,因此最接近的产品,属性名称和属性值相关联。 可以确定属性名称的代表值,并且可以更新在线目录,使得条目对目录用户是全面的,有意义的和有用的。 可能会安排从至少5亿个不同数据源进行更新,频繁发生,每天多次。
-
公开(公告)号:AU2011242753B2
公开(公告)日:2014-05-15
申请号:AU2011242753
申请日:2011-04-20
Applicant: MICROSOFT CORP
Inventor: FUXMAN ARIEL , NGUYEN HOA , SILVA JULIANA FREIRE DE LIMA E , PAPARIZOS STELIOS , AGRAWAL RAKESH , CHEN ZHIMIN , COLAGIOVANNI LAWRENCE WILLIAM , SIKCHI PRAKASH
Abstract: Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user. Updates from at least 500 million different data sources may be scheduled to occur as frequently as several times daily.
-
公开(公告)号:AU2011242753A1
公开(公告)日:2012-09-27
申请号:AU2011242753
申请日:2011-04-20
Applicant: MICROSOFT CORP
Inventor: FUXMAN ARIEL , NGUYEN HOA , SILVA JULIANA FREIRE DE LIMA E , PAPARIZOS STELIOS , AGRAWAL RAKESH , CHEN ZHIMIN , COLAGIOVANNI LAWRENCE WILLIAM , SIKCHI PRAKASH
Abstract: Methods and systems for automatically synthesizing product information from multiple data sources into an on-line catalog are disclosed, and in particular, for automatically synthesizing the product information based on attribute-value pairs. Information for a product may be obtained, via entity extraction, feed ingestion, and other mechanisms, from a plurality of structured and unstructured data sources having different taxonomies and schemas. Product information may additionally or alternatively be obtained or derived based on popularity data. The product information may be cleansed, segmented and normalized. The product information may be clustered so closest products, attribute names and attribute values are associated. A representative value for an attribute name may be determined, and the on-line catalog may be updated so that entries are comprehensive, meaningful and useful to a catalog user. Updates from at least 500 million different data sources may be scheduled to occur as frequently as several times daily.
-
-