Package unifeat.dataset
Class DatasetInfo
java.lang.Object
unifeat.dataset.DatasetInfo
This java class is used to keep the input data, split input data to
test/train sets and crate CSV file format.
- Author:
- Sina Tabakhi
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate double[][]private Stringprivate booleanprivate booleanprivate booleanprivate booleanprivate String[]private String[]private intprivate intprivate intprivate intprivate intprivate String[]private Stringprivate Stringprivate Stringprivate Randomprivate double[][]private double[][] -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncreateFeatNames(int[] array) This method creates a string of the names of features in the selected feature arrayString[]This is used to return the names of class labelsString[]This is used to return the names of featuresintThis is used to return number of classes in the datasetintThis is used to return number of samples in the dataset(train set + test set)intThis is used to return number of features in each sampleintThis is used to return number of samples in the test setintThis is used to return number of samples in the train setdouble[][]This is used to return the test set valuesdouble[][]This is used to return the train set valuesprivate intindexClass(String nameClass) This method converts the String name of class labels to the integer valueprivate voidThis method sets the dataset and the class labels of samplesprivate voidThis method read and set the train/test sets and class labels of samplesbooleanThis is used to return the status of train/test setsbooleanThis is used to return the status of the class label filebooleanThis is used to return the status of the datasetbooleanThis is used to return the status of the samples' classprivate booleanisCorrectString(String str) This method checks that the input string don't contain the semicolon(;) or tab characters.voidpreProcessing(String path1, String path2) This is used to read dataset and class label files, split datasets and set their valuesvoidpreProcessing(String path1, String path2, String path3) This is used to read datasets and class labels, split datasets and set their valuesprivate voidThis method reads the datasetprivate voidThis method reads the train/test setsprivate voidThis method reads the class labels of the samplesprivate voidThis method randomly splits the input dataset to the train/test sets 2/3 of the dataset is used as train set and 1/3 of the dataset is used as test setprivate voidThis method sets the train/test sets train and test sets previously are split by userprivate voidThis method converts and sets the string values of the input data to the double values
-
Field Details
-
numData
private int numData -
numTrainSet
private int numTrainSet -
numTestSet
private int numTestSet -
numFeature
private int numFeature -
numClass
private int numClass -
checkDataSet
private boolean checkDataSet -
checkTrainTestSet
private boolean checkTrainTestSet -
checkClassLabels
private boolean checkClassLabels -
checkSamplesClass
private boolean checkSamplesClass -
Classlabel
-
allNameFeatures
-
nameFeatures
-
pathData
-
pathTestSet
-
pathLabel
-
originalData
-
allData
private double[][] allData -
trainSet
private double[][] trainSet -
testSet
private double[][] testSet -
rand
-
-
Constructor Details
-
DatasetInfo
public DatasetInfo()
-
-
Method Details
-
isCorrectString
This method checks that the input string don't contain the semicolon(;) or tab characters. Also, its checks that the input string contains the comma character- Parameters:
str- the input string- Returns:
- true if the input string is in the correct format
-
init
This method sets the dataset and the class labels of samples- Parameters:
path1- the path of the dataset filepath2- the path of the class label file
-
init
This method read and set the train/test sets and class labels of samples- Parameters:
path1- the path of the train set filepath2- the path of the test set filepath3- the path of the class labels of samples
-
readExample1
private void readExample1()This method reads the dataset -
readExample2
private void readExample2()This method reads the train/test sets -
readLabel
private void readLabel()This method reads the class labels of the samples -
indexClass
This method converts the String name of class labels to the integer value- Parameters:
nameClass- the name of the class labels as String- Returns:
- the index of the class
-
splitFeatureWithLabel
private void splitFeatureWithLabel()This method converts and sets the string values of the input data to the double values -
splitDataSetToTrainAndTest1
private void splitDataSetToTrainAndTest1()This method randomly splits the input dataset to the train/test sets 2/3 of the dataset is used as train set and 1/3 of the dataset is used as test set -
splitDataSetToTrainAndTest2
private void splitDataSetToTrainAndTest2()This method sets the train/test sets train and test sets previously are split by user -
preProcessing
This is used to read dataset and class label files, split datasets and set their values- Parameters:
path1- the path of the datasetspath2- the path of the class labels
-
preProcessing
This is used to read datasets and class labels, split datasets and set their values- Parameters:
path1- the path of the train setpath2- the path of the test setpath3- the path of the class labels
-
isCorrectDataset
public boolean isCorrectDataset()This is used to return the status of the dataset- Returns:
- true if the dataset file is in the correct format
-
isCorrectClassLabel
public boolean isCorrectClassLabel()This is used to return the status of the class label file- Returns:
- true if the class labels file is in the correct format
-
isCorrectSamplesClass
public boolean isCorrectSamplesClass()This is used to return the status of the samples' class- Returns:
- true if the the class labels of the samples is valid
-
isCompatibleTrainTestSet
public boolean isCompatibleTrainTestSet()This is used to return the status of train/test sets- Returns:
- true if the train and test sets are compatible
-
getNumData
public int getNumData()This is used to return number of samples in the dataset(train set + test set)- Returns:
- number of samples
-
getNumFeature
public int getNumFeature()This is used to return number of features in each sample- Returns:
- number of features
-
getNumTrainSet
public int getNumTrainSet()This is used to return number of samples in the train set- Returns:
- number of samples in the train set
-
getNumTestSet
public int getNumTestSet()This is used to return number of samples in the test set- Returns:
- number of samples in the test set
-
getNumClass
public int getNumClass()This is used to return number of classes in the dataset- Returns:
- number of classes in the dataset
-
getClassLabel
This is used to return the names of class labels- Returns:
- the array of class labels' names
-
getTrainSet
public double[][] getTrainSet()This is used to return the train set values- Returns:
- the matrix of train set
-
getTestSet
public double[][] getTestSet()This is used to return the test set values- Returns:
- the matrix of test set
-
getNameFeatures
This is used to return the names of features- Returns:
- the array of features' names
-
createFeatNames
This method creates a string of the names of features in the selected feature array- Parameters:
array- the array of indices of the selected features- Returns:
- a string of the integer array
-