Abstract:
An improved method for handling data sets (12, 14) is disclosed. The method comprises the steps of: Providing a first characteristic (20.1) associated with a first data set (12) and at least one of the following: A single data value (12') and a second characteristic (20.2) associated with a second data set (14); the provided characteristics allowing feasible comparison of the first data set (12), the second data set (14) and the single data value (12'), and calculating at least one of the following: Similarity of the first data set (12) with the second data set (14) based on the first and second characteristics (20.1, 20.2), similarity of the first data set (12) with the single data value (12') based on the first characteristic (20.1) and the single data value (12'), confidence indicating how well the first characteristic reflects properties of the first data set (12) based on the first characteristic, and confidence indicating how well the similarity of the first data set with the single data value (12') reflects properties of the single data value based on the first characteristic and the single data value (12').
Abstract:
Datenqualitätsüberwachung bezieht sich auf das Messen von Datenqualität geladener Daten in Bezug auf eine vordefinierte Datenqualitätsmessgröße. Die Datenqualität wird durch Anwenden eines in Qualitätsregeln definierten logischen Kalküls auf die geladenen Daten gemessen. Die Datenqualitätsmessung wird unter Verwendung von zumindest einem des Folgenden durchgeführt: Delta-Veränderungen der geladenen Daten und Delta-Veränderungen der Qualitätsregeln.
Abstract:
A data processing system for: receiving an analysis request comprising multiple data analysis commands to generate an analysis report; dividing the commands into private analysis commands and public analysis commands; sending the private analysis commands to a trusted distributed file system; sending a portion of the public analysis commands to an public distributed file system; sending the remainder of the public analysis commands to the trusted distributed file system; and generating the analysis report using public analysis results from the public distributed file system and trusted analysis results from the trusted distributed file system.
Abstract:
A method for a logging process in a data storage system (10C) including a set of storage tiers (115), each storage tier of the set of storage tiers (115) having different performancecharacteristics (e.g. error rate, communication rate, power consumption, delay time). The set of storage tiers (115) is divided into subsets (115A, 115B, 115C, 115D) using the performance characteristics. The logging process is initialized for creating a separate log file (121A, 121B, 121C, 121D) for each of the subsets of storage tiers (115A, 115B, 115C, 115D) for maintaining a history of data changes in the subset of storage tiers, thereby creating a plurality of log files (121); in response to a change in data stored in at least one storage tier of a subset of storage tiers (115), generating one or more log records comprising information about the change, and writing the one or more log records into the respective log files (121A, 121B, 1210, 121D). Such log files may be used during backup and restoration.
Abstract:
A computer-implemented method for detecting one or more multi-column composite key column sets, the method comprising: accessing (102) a plurality of first columns (Pl-P3); selecting (104) two or more of the first columns for use as a current set (218) of candidate columns; determining (106), by comparing object-identifiers stored in association with parameter values of the candidate columns with each other, if for the current sec of candidate columns at least one tuple (219) of parameter values exists whose parameter values are respectively stored in association with two or more shared ones of the object identifiers; in case said at least one tuple does not exist, identifying (110) the current candidate column set as a multi-column composite key column set; otherwise, replacing (112) the second candidate column by another selected one of the first columns or adding said other selected one of the first columns to the candidate column set.
Abstract:
A method for accessing a set of data tables in a source database (117), the method comprises: providing a set of table categories for tables in the source database; providing a set of metrics (such as read access rates, number of records, number of primary keys), each metric comprising a respective characteristic metric for each table category; For each table of the set of the data tables evaluating the set of metrics; analyzing the evaluated set of metrics; and categorizing the table into one of the set of table categories using the result of the analysis; outputting information indicative of the table category of each table of the set of tables; in response to the outputting receiving a request to select data tables of the set of data tables according to a part of the table categories for data processing; and selecting a subset of data tables of the set of data tables using the table categories for performing the data processing (e.g. ETL or data mining) on the subset of data tables.
Abstract:
A method for managing backups comprises the provision of a computer system with main memory; a plurality of logical partitions (LPARs), each assigned respective first portions of memory, and each with at least one application consuming a fraction of first memory portion. A second portion of memory is used as global memory, not overlapping with the first portion, and for each LPAR is used to store images of the first memory portions consumed by the application on the logical partition. The application may be a database management program, whilst images may be created by copy-on-write, split-mirror or redirect-on-write. The image may be a complete image of the assigned first memory portion. Memory elements may be dynamically reallocated to resize global memory and/or first memory portion; and sub-portions of global memory may be dynamically resized according to requirement predictions.
Abstract:
Die Erfindung betrifft ein von einem Computer ausgeführtes Verfahren zum Erkennen von einer oder mehreren mehrspaltigen Spaltengruppen mit zusammengesetztem Schlüssel, wobei das Verfahren aufweist: a) Zugreifen (102) auf eine Vielzahl von ersten Spalten (P1 bis P3); b) Auswählen (104) von zwei oder mehr der ersten Spalten, um sie als eine aktuelle Gruppe (218) von in Frage kommenden Spalten zu verwenden; c) Feststellen (106), indem Objektkennungen miteinander verglichen werden, die in Verbindung mit Parameterwerten der in Frage kommenden Spalten gespeichert werden, ob für die aktuelle Gruppe von in Frage kommenden Spalten mindestens ein Tupel (219) aus Parameterwerten vorhanden ist, dessen Parameterwerte jeweils in Verbindung mit zwei oder mehr gemeinsam verwendeten Kennungen der Objektkennungen gespeichert werden; d1) falls das mindestens eine Tupel nicht vorhanden ist, Kennzeichnen (110) der aktuellen in Frage kommenden Spaltengruppe als eine mehrspaltige Spaltengruppe mit zusammengesetztem Schlüssel; d2) andernfalls Ersetzen (112) der zweiten in Frage kommenden Spalte durch eine andere ausgewählte Spalte der ersten Spalten oder Hinzufügen der anderen ausgewählten Spalte der ersten Spalten zu der in Frage kommenden Spaltengruppe.