Package unifeat.dataset
Class DatasetInfo
java.lang.Object
unifeat.dataset.DatasetInfo
This java class is used to keep the input data, split input data to
test/train sets and crate CSV file format.
- Author:
- Sina Tabakhi
-
Field Summary
Modifier and TypeFieldDescriptionprivate double[][]
private String
private boolean
private boolean
private boolean
private boolean
private String[]
private String[]
private int
private int
private int
private int
private int
private String[]
private String
private String
private String
private Random
private double[][]
private double[][]
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptioncreateFeatNames
(int[] array) This method creates a string of the names of features in the selected feature arrayString[]
This is used to return the names of class labelsString[]
This is used to return the names of featuresint
This is used to return number of classes in the datasetint
This is used to return number of samples in the dataset(train set + test set)int
This is used to return number of features in each sampleint
This is used to return number of samples in the test setint
This is used to return number of samples in the train setdouble[][]
This is used to return the test set valuesdouble[][]
This is used to return the train set valuesprivate int
indexClass
(String nameClass) This method converts the String name of class labels to the integer valueprivate void
This method sets the dataset and the class labels of samplesprivate void
This method read and set the train/test sets and class labels of samplesboolean
This is used to return the status of train/test setsboolean
This is used to return the status of the class label fileboolean
This is used to return the status of the datasetboolean
This is used to return the status of the samples' classprivate boolean
isCorrectString
(String str) This method checks that the input string don't contain the semicolon(;) or tab characters.void
preProcessing
(String path1, String path2) This is used to read dataset and class label files, split datasets and set their valuesvoid
preProcessing
(String path1, String path2, String path3) This is used to read datasets and class labels, split datasets and set their valuesprivate void
This method reads the datasetprivate void
This method reads the train/test setsprivate void
This method reads the class labels of the samplesprivate void
This method randomly splits the input dataset to the train/test sets 2/3 of the dataset is used as train set and 1/3 of the dataset is used as test setprivate void
This method sets the train/test sets train and test sets previously are split by userprivate void
This method converts and sets the string values of the input data to the double values
-
Field Details
-
numData
private int numData -
numTrainSet
private int numTrainSet -
numTestSet
private int numTestSet -
numFeature
private int numFeature -
numClass
private int numClass -
checkDataSet
private boolean checkDataSet -
checkTrainTestSet
private boolean checkTrainTestSet -
checkClassLabels
private boolean checkClassLabels -
checkSamplesClass
private boolean checkSamplesClass -
Classlabel
-
allNameFeatures
-
nameFeatures
-
pathData
-
pathTestSet
-
pathLabel
-
originalData
-
allData
private double[][] allData -
trainSet
private double[][] trainSet -
testSet
private double[][] testSet -
rand
-
-
Constructor Details
-
DatasetInfo
public DatasetInfo()
-
-
Method Details
-
isCorrectString
This method checks that the input string don't contain the semicolon(;) or tab characters. Also, its checks that the input string contains the comma character- Parameters:
str
- the input string- Returns:
- true if the input string is in the correct format
-
init
This method sets the dataset and the class labels of samples- Parameters:
path1
- the path of the dataset filepath2
- the path of the class label file
-
init
This method read and set the train/test sets and class labels of samples- Parameters:
path1
- the path of the train set filepath2
- the path of the test set filepath3
- the path of the class labels of samples
-
readExample1
private void readExample1()This method reads the dataset -
readExample2
private void readExample2()This method reads the train/test sets -
readLabel
private void readLabel()This method reads the class labels of the samples -
indexClass
This method converts the String name of class labels to the integer value- Parameters:
nameClass
- the name of the class labels as String- Returns:
- the index of the class
-
splitFeatureWithLabel
private void splitFeatureWithLabel()This method converts and sets the string values of the input data to the double values -
splitDataSetToTrainAndTest1
private void splitDataSetToTrainAndTest1()This method randomly splits the input dataset to the train/test sets 2/3 of the dataset is used as train set and 1/3 of the dataset is used as test set -
splitDataSetToTrainAndTest2
private void splitDataSetToTrainAndTest2()This method sets the train/test sets train and test sets previously are split by user -
preProcessing
This is used to read dataset and class label files, split datasets and set their values- Parameters:
path1
- the path of the datasetspath2
- the path of the class labels
-
preProcessing
This is used to read datasets and class labels, split datasets and set their values- Parameters:
path1
- the path of the train setpath2
- the path of the test setpath3
- the path of the class labels
-
isCorrectDataset
public boolean isCorrectDataset()This is used to return the status of the dataset- Returns:
- true if the dataset file is in the correct format
-
isCorrectClassLabel
public boolean isCorrectClassLabel()This is used to return the status of the class label file- Returns:
- true if the class labels file is in the correct format
-
isCorrectSamplesClass
public boolean isCorrectSamplesClass()This is used to return the status of the samples' class- Returns:
- true if the the class labels of the samples is valid
-
isCompatibleTrainTestSet
public boolean isCompatibleTrainTestSet()This is used to return the status of train/test sets- Returns:
- true if the train and test sets are compatible
-
getNumData
public int getNumData()This is used to return number of samples in the dataset(train set + test set)- Returns:
- number of samples
-
getNumFeature
public int getNumFeature()This is used to return number of features in each sample- Returns:
- number of features
-
getNumTrainSet
public int getNumTrainSet()This is used to return number of samples in the train set- Returns:
- number of samples in the train set
-
getNumTestSet
public int getNumTestSet()This is used to return number of samples in the test set- Returns:
- number of samples in the test set
-
getNumClass
public int getNumClass()This is used to return number of classes in the dataset- Returns:
- number of classes in the dataset
-
getClassLabel
This is used to return the names of class labels- Returns:
- the array of class labels' names
-
getTrainSet
public double[][] getTrainSet()This is used to return the train set values- Returns:
- the matrix of train set
-
getTestSet
public double[][] getTestSet()This is used to return the test set values- Returns:
- the matrix of test set
-
getNameFeatures
This is used to return the names of features- Returns:
- the array of features' names
-
createFeatNames
This method creates a string of the names of features in the selected feature array- Parameters:
array
- the array of indices of the selected features- Returns:
- a string of the integer array
-