Archive for the ‘Machine Learning’ Category
Efficient deep learning-based approach for malaria detection using red blood cell smears | Scientific Reports – Nature.com
Posted: June 11, 2024 at 2:48 am
This section contains the details of the proposed methodology and details of the pre-trained models employed in this study. Figure 1 shows the workflow of the proposed methodology. The details of each step are provided in the subsequent sections.
Work flow of proposed methodology.
The dataset used in this study was obtained from the public data repository. It contains a total of 27,558 cell images with 13,779 parasitized images and 13,779 uninfected images. These images were obtained from 150 unhealthy patients (infected individuals) and 50 healthy patients. The expert slide-readers and pathologists manually annotated the whole dataset. Color variations in red cell images are due to different blood stains during the image acquisition process. Figure 2 shows samples of parasitized cell images and uninfected cell images.
Samples taken from the red blood cell image datasets contain parasitized cell images and uninfected cell images.
Preprocessing is a very crucial and initial step for deep learning image classification tasks. The dataset contains 13,779 images of parasitized cells and 13,779 images of uninfected cells, which are equally balanced. The cell images contain various widths and heights, and the deep learning model requires equal or fixed-size input. To test the models robustness and compatibility, we resized the images. After resizing, the next important step is to split the cell images into two parts; training and testing. The 80% data are used to train the deep learning models and 20% are kept for testing the model efficacy and performance. Table 1 shows the parasitized and uninfected images after data splitting into train and test.
Deep learning models learn complex patterns of data through various layers. Deep learning has demonstrated effectiveness in many image classification tasks in medical, engineering, and other applications. Deep learning models work well on large datasets, however, consume a lot of computational resources. The hyperparameter settings, loss function, and other layers are used to solve these problems by fastening the training process of deep learning models, reducing computational time, reducing layers, and creating efficient deep models30. Transfer learning is a popular technique that favors the pre-trained models that have been trained on large datasets such as ImageNet, and produces better results for small datasets (Table 2).
Architecture of proposed deep learning model for malaria detection.
EfficientNet-B2 is a CNN model that is exceptionally accurate and reliable and is mostly used for image classification problems. It is well suited for problems that require fewer parameters and have minimal processing resources. Using depth-wise separable convolutions (DWSC), an efficient scaling approach, this model improves the classification accuracy. The main aim of using EfficientNet-B2 in disease detection is its efficiency and accuracy because of its small model size and minimal computing resources. Figure 3 shows the architecture of the proposed model. The development of the EfficientNet-B2 model leads to the addition of a dropout layer, ultimately yielding an output shape of (5,5,1408). We use the flattened layer to convert the multi-dimensional input layer into a one-dimensional one. After that, we utilized three dense layers, four batch normalization layers, and three activation layers. We achieved this after flattening the layers into a single dimension. The first two dense layers of the network utilize ReLU activation functions. The Rectified Linear Unit (ReLU) functions not only collect complicated patterns correctly, but they also lower the chances of overfitting and generalization errors. This makes the model work better overall. The last dense layer primarily employs the sigmoid activation function for classification activities, particularly in binary classification situations. We use this function to complete classification tasks. Batch normalization is an essential component of deep learning architectures that improves accuracy while simultaneously speeding up the training process.
For training purposes, batch normalization uses a small amount of data to calculate the mean and standard deviation of each feature. The statistical data is then used to standardize the input when that step is completed. This approach minimizes internal co-variate shift, which is the change in the distribution of network activation resulting from differences in the parameters of the training process so that it can be used more efficiently. The efficiency of optimization techniques can be increased by batch normalization, which involves standardizing the input. If this is done, the model can be built more quickly and is less likely to encounter gradients that are evaporating or exploding. Additionally, it acts as a regularizer, which means it reduces the need for additional methods of regularization.
Malaria can be detected by analyzing images for symptoms using deep learning models that focus on red blood cells. The proposed model is trained to identify malaria-related symptoms by employing a collection of expert classifications applied to blood cells. Once the model has been adequately trained, it will have the ability to evaluate recently obtained blood cells and offer medical personnel useful information, thereby enabling a faster and more precise diagnosis. Once the model is adequately trained, it possesses the potential to aid physicians in the diagnostic process by classifying newly obtained blood cell samples as either infected or uninfected with malaria. Utilizing deep learning-based malaria detection models in clinical settings offers several potential advantages. These devices have the capability to deliver precise and prompt diagnosis, particularly in regions where there is a scarcity of skilled microscopists. These techniques expedite the initiation of medication for individuals with malaria, enabling front-line healthcare professionals to promptly identify the infection. Consequently, the incidence and mortality rates linked to malaria decline. Moreover, automated analysis is capable of efficiently managing a significant volume of samples on a broad scale, therefore alleviating the workload of laboratory personnel, particularly during outbreaks or monitoring initiatives.
This study also employed fine-tuned deep learning models such as CNN, VGG-16, DenseNet version 121,169, 201, Inception version 3, etc. for Malaria detection. Different pre-trained fine-tuned deep learning models and their trainable parameters are given in Table 3.
A CNN, a type of neural network, consists of numerous layers and aims to directly identify patterns from image pixels. It requires minimal pre-processing31. The convolution layer, the pooling layer, and the fully connected layer are the three essential layers that are widely considered to be the foundation of a CNN. We utilized three convolution blocks, three Maxpooling blocks, and three blocks for Batch normalization, ReLU activation, and Dropout layers. The convolution layer, a fundamental component of a CNN, performs the majority of the computational work. This layer performs the convolution or filtration operation on the input and then transmits the response to the subsequent layer. We place the pooling layer between the successive convolution layers to spatially reduce the input representation and the required processing space. This layer performs the pooling process on each sliced input, thereby reducing the computational workload for the subsequent convolution layer. After that, We flatten all the layers into single dimensions and then add two dense layers with Batch and ReLU activation. The application of the completely linked layer (sigmoid layer) generates the final output, which is also equal to the number of classes32. The detailed architecture of CNN is shown in Fig. 4.
Detailed CNN architecture.
In 2014, VGG16 won the ILSVR (ImageNet) competition and is now considered one of the most advanced vision models available. The VGG-16 network was trained using the ImageNet database and consists of 16 weighted layers, including 13 convolutional layers and 3 fully connected layers. Despite limited image datasets, the VGG-16 network delivers high accuracy due to its extensive training. VGG16 is capable of both object detection and classification with 92.7% accuracy, classifying 1000 images into 1000 unique categories. It is a widely used image classification algorithm that is easy to implement using transfer learning. By adding new layers to neural networks and utilizing batch normalization, the training process can be accelerated, making learning easier and the model more robust33.
Inception V3 is a deep CNN architecture introduced in 2015 by Google researchers. It is the third version of the Inception family of models and is designed to be more efficient and accurate than its predecessors. The Inception V3 model boasts a more expansive network compared to its predecessors, the Inception V1 and V2 models. This deep CNN is specifically designed to be trained on low-configuration computers, although it is still a challenging process that can take several days. Transfer learning provides a solution to this issue by retaining the parameters of previous layers while only updating the last layer for new categories. This approach involves deconstructing the Inception V3 model by removing its final layer, thereby leveraging the benefits of transfer learning34.
DenseNet121 is a CNN architecture that has gained widespread use in image classification tasks since its introduction in 2017. DenseNet121 architecture aims to increase the depth of deep learning networks while improving their training efficiency. This is achieved through the use of short connections between layers. In DenseNet, each layer is connected to all other layers that are deeper in the network, making it a CNN. The number 121 pertains to the count of layers with trainable weights, excluding batch normalization layers. The remaining 5 layers consist of the initial 77 convolutional layer, 3 transitional layers, and a fully connected layer35.
DenseNet169 is a deep CNN architecture that is part of the DenseNet family of models. It was introduced by researchers at Facebook AI Research in 2017 as an improvement over the original DenseNet model. DenseNet169 has 169 layers, which is more than the original DenseNet but less than DenseNet201. Like other DenseNet models, DenseNet169 uses dense connectivity to promote feature reuse and reduce the number of parameters needed to train the network. It also includes bottleneck layers to reduce the computational cost of convolutions. DenseNet169 has achieved state-of-the-art performance on several benchmark datasets, making it a popular choice for image classification tasks requiring high accuracy36.
DenseNet20137 is a deep CNN architecture. DenseNet201 uses a dense connectivity structure, where each layer is connected to every other layer in a feed-forward fashion. This dense connectivity promotes feature reuse and reduces the number of parameters needed to train the network. DenseNet201 also includes a feature called bottleneck layers which reduces the computational cost of convolutions by using 11 convolutions to reduce the dimensionality of the input. DenseNet201 has achieved state-of-the-art performance on several benchmark datasets and is widely used in image classification tasks.
ResNet50, an architecture in deep learning, was introduced in 2015 by Microsoft researchers. It has found applications in a range of computer vision tasks, including the analysis of medical images. ResNet50 is designed to overcome the challenge of vanishing gradients by introducing shortcut connections that allow the network to learn residual representations. By utilizing ResNet50, researchers have been able to attain various results in computer vision tasks, including object detection, image classification, and medical image analysis38.
EfficientNet-B1 is a neural network architecture that was proposed by Google researchers in 2019. It is part of the EfficientNet family of models that are designed to achieve high accuracy while minimizing computational resources. It has fewer parameters and floating-point operations (FLOP) than larger models but still achieves competitive performance on various benchmark datasets. EfficientNet-B1 has been used in a range of computer vision tasks, including image classification, object detection, and segmentation39. Its efficient design makes it particularly suitable for mobile and embedded devices.
EfficientNet-B7 is a powerful model that has shown promising results in a variety of computer vision tasks, including medical image analysis. It is the largest model in the EfficientNet family and has significantly more parameters and FLOP than smaller models in the family. EfficientNet-B740 achieves state-of-the-art performance on various benchmark datasets, including ImageNet, with significantly fewer computational resources than previous state-of-the-art models. However, due to its large size, EfficientNet-B7 may not be suitable for mobile and embedded devices with limited computational resources.
MobileNet is a family of neural network architectures that are designed to be efficient on mobile and embedded devices with limited computational resources. It was proposed by Google researchers in 2017 and has since become a popular choice for a range of computer vision tasks. MobileNet achieves its efficiency by using depth-wise separable convolutions, which separate the spatial and channel-wise dimensions of convolutions and reduce the number of parameters and computations. This design allows MobileNet to achieve high accuracy while requiring significantly fewer resources than larger models. MobileNet has been implemented in various frameworks and is widely used in real-world applications41.
MobileNetV2 is a follow-up to the original MobileNet architecture, proposed by Google researchers in 2018. It further improves the efficiency and accuracy of the original architecture by introducing several novel features. One of the key improvements is the use of a bottleneck block that expands and then contracts the number of channels, allowing for better feature extraction. MobileNetV2 also uses a technique called linear bottlenecks, which adds a linear activation function after each depth-wise convolution to further reduce the computational cost. These innovations make MobileNetV2 one of the most efficient neural network architectures for mobile and embedded devices, while still achieving high accuracy on a range of computer vision tasks39.
The performance of all models that were used in this study was evaluated using precision, recall, F1 score, and accuracy. After training the model, the testing part is used to test the models efficiency and classification. The performance is also evaluated using the confusion matrix. The confusion matrix constitutes TP, TN, FP, and FN predictions.
TP: The true positive rate refers to the actual positive class that is predicted to be positive.
TN: The true negative rate refers to the correct negative predictions made by the model among all negative records.
FP: There is a false positive rate that states the actual negative predictions that are classified as positive by the model.
FN: There is a false negative rate that states the records belong to the positive class and are predicted as negative by the model.
Accuracy: The number of truly classified predictions by a model among the total number of predictions it makes or computes to divide the TP plus TN prediction by the total number of predictions.
$$begin{aligned} = frac{TP+TN}{TP+TN+FP+FN} end{aligned}$$
(1)
Precision: Precision is the number of true positive predictions from the total number of actual predictions classified by the model or computed to divide the TP predictions by the TP plus FP predictions.
$$begin{aligned} = frac{TP}{TP+FP} end{aligned}$$
(2)
Recall: The recall is the score of the correct positive prediction that the model found by looking at all of the actual positive tweets or by dividing the TP predictions by the TP plus FN predictions.
$$begin{aligned} = frac{TP}{TP+FN} end{aligned}$$
(3)
F1 score: An F1 score is an evaluation metric that estimates model performance by taking the average of recall and precision.
$$begin{aligned} =2times frac{Precisiontimes Recall}{Precision+Recall} end{aligned}$$
(4)
Excerpt from:
Predicting sales and cross-border e-commerce supply chain management using artificial neural networks and the … – Nature.com
Posted: at 2:48 am
This section presents a model for supply chain management in CBEC using artificial intelligence (AI). The approach provides resource provisioning by using a collection of ANNs to forecast future events. Prior to going into depth about this method, the dataset specifications utilized in this study are given.
The performance of seven active sellers in the sphere of international products trade over the course of a month was examined in order to get the data for this study. At the global level, all of these variables are involved in the bulk physical product exchange market. This implies that all goods bought by clients have to be sent by land, air, or sea transportation. In order to trade their items, each seller in this industry utilizes a minimum of four online sales platforms. Each of the 945 documents that make up the datasets that were assembled for each vendor includes data on the number of orders that consumers have made with that particular vendor. Each record's bulk product transactions have minimum and maximum amounts of 3, and 29 units, respectively. Every record is defined using a total of twenty-three distinct attributes. Some of the attributes that are included are order registration time, date, month, method (platform type used), order volume, destination, product type, shipping method, active inventory level, product shipping delay history indicated by active in the previous seven transactions, and product order volume history throughout the previous seven days. For each of these two qualities, a single numerical vector is used.
This section describes a CBEC system that incorporates a tangible product supply chain under the management of numerous retailers and platforms. The primary objective of this study is to enhance the supply chain performance in CBEC through the implementation of machine learning (ML) and Internet of Things (IoT) architectures. This framework comprises four primary components:
Retailers They are responsible for marketing and selling products.
Common sales platform Provides a platform for introducing and selling products by retailers.
Product warehouse It is the place where each retailer stores their products.
Supply center It is responsible for instantly providing the resources needed by retailers. The CBEC system model comprises N autonomous retailers, all of which are authorized to engage in marketing and distribution of one or more products. Each retailer maintains a minimum of one warehouse for product storage. Additionally, retailers may utilize multiple online sales platforms to market and sell their products.
Consumers place orders via these electronic commerce platforms in order to acquire the products they prefer. Through the platform, the registered orders are transmitted to the product's proprietor. The retailer generates and transmits the sales form to the data center situated within the supply center as soon as it receives the order. The supply center is responsible for delivering the essential resources to each retailer in a timely manner. In traditional applications of the CBEC system, the supply center provides resources in a reactive capacity. This approach contributes to an extended order processing time, which ultimately erodes customer confidence and may result in the dissolution of the relationship. Proactive implementation of this procedure is incorporated into the proposed framework. Machine learning methods are applied to predict the number of orders that will be submitted by each agent at future time intervals. Following this, the allocation of resources in the storage facilities of each agent is ascertained by the results of these forecasts. In accordance with the proposed framework, the agent's warehouse inventory is modified in the data center after the sales form is transmitted to the data center. Additionally, a model based on ensemble learning is employed to forecast the quantity of upcoming orders for the product held by the retailer. The supply center subsequently acquires the required resources for the retailer in light of the forecast's outcome. The likelihood of inventory depletion and the time required to process orders are both substantially reduced through the implementation of this procedure.
As mentioned earlier, the efficacy of the supply chain is enhanced by this framework via the integration of IoT architecture. For this purpose, RFID technology is implemented in supply management. Every individual product included in the proposed framework is assigned a unique RFID identification tag. The integration of passive identifiers into the proposed model results in a reduction of the system's ultimate implementation cost. The electronic device serves as an automated data carrier for the RFID-based asset management system in the proposed paradigm. The architecture of this system integrates passive RFID devices that function within the UFH band. In addition, tag reader gateways are installed in the product warehouses of each retailer to facilitate the monitoring of merchandise entering and departing the premises. The proposed model commences the product entry and exit procedure through the utilization of the tag reader to extract the distinct identifier data contained within the RFID tags. The aforementioned identifier is subsequently transmitted to the controller in which the reader node is connected. A query containing the product's unique identifier is transmitted by the controller node to the data center with the purpose of acquiring product information, including entry/exit authorization. Upon authorization of this procedure, the controller node proceeds to transmit a storage command to the data center with the purpose of registering the product transfer information. This registration subsequently modifies the inventory of the retailer's product warehouse. Therefore, the overall performance of the proposed system can be categorized into the subsequent two overarching phases:
Predicting the number of future orders of each retailer in future time intervals using ML techniques.
Assigning resources to the warehouses of specific agents based on the outcomes of predictions and verifying the currency of the data center inventory for each agent's warehouse. The following sub-sections will be dedicated to delivering clarifications for each of the aforementioned phases.
The imminent order volume for each vendor is forecasted within this framework through the utilization of a weighted ensemble model. A direct proportionality exists between the quantity of prediction models and the number of retailers that participate in the CBEC system. In order to predict the future volume of customer orders for the affiliated retailer, each ensemble model compiles the forecasts produced by its internal learning models. The supplier furnishes the requisite supplies to each agent in adherence to these projections. Through proactive measures to alleviate the delay that arises from the reactive supply of requested products, this methodology maximizes the overall duration of the supply chain product delivery process. Utilizing a combination of FSFS and ANOVA, the initial step in forecasting sales volume is to identify which attributes have the greatest bearing on the sales volume of particular merchants. Sales projections are generated through the utilization of a weighted ensemble model that combines sales volume with the most pertinent features. The proposed weighted ensemble model for forecasting the order volume of a specific retailer trained each of the three ANN models comprising the ensemble using the order patterns of the input from that retailer. While ensemble learning can enhance the accuracy of predictions produced by learning systems, there are two additional factors that should be considered in order to optimize its performance even further.
Acceptable performance of each learning model Every learning component in an ensemble system has to perform satisfactorily in order to lower the total prediction error by combining their outputs. This calls for the deployment of well-configured learning models, such that every model continues to operate as intended even while handling a variety of data patterns.
Output weighting In the majority of ensemble system application scenarios, the efficacy of the learning components comprising the system differs. To clarify, while certain learning models exhibit a reduced error rate in forecasting the objective variable, others display a higher error rate. Consequently, in contrast to the methodology employed in traditional ensemble systems, it is not possible to designate an identical value to the output value of every predictive component. In order to address this issue, one may implement a weighting strategy on the outputs of each learning component, thereby generating a weighted ensemble system.
CapSA is utilized in the proposed method to address these two concerns. The operation of the proposed weighted ensemble model for forecasting customer order volumes is illustrated in Fig.1.
Operation of the proposed weighted ensemble model for predicting order volume.
As illustrated in Fig.1, the ensemble model under consideration comprises three predictive components that collaborate to forecast the order volume of a retailer, drawing inspiration from the structure of the ANN. Every individual learning model undergoes training using a distinct subset of sales history data associated with its respective retailer. The proposed method utilizes CapSA to execute the tasks of determining the optimal configuration and modifying the weight vector of each ANN model. It is important to acknowledge that the configuration of every ANN model is distinct from that of the other two models. By employing parallel processing techniques, the configuration and training of each model can be expedited. Every ANN model strives to determine the parameter values in a way that minimizes the mean absolute error criterion during the configuration phase. An optimal configuration set of learning models can be obtained through the utilization of this mechanism, thereby guaranteeing that every component functions at its designated level. After the configuration of each ANN component is complete, the procedure to determine the weight of the output of the predictive component is carried out. In order to accomplish this goal, CapSA is employed. During this phase, CapSA attempts to ascertain the output value of each learning model in relation to its performance.
After employing CapSA to optimize the weight values, the assembled and weighted models can be utilized to predict the volume of orders for novel samples. To achieve this, during the testing phase, input features are provided to each of the predictive components ANN1, ANN2, and ANN3. The final output of the proposed model is computed by averaging the weighted averages of the outputs from these components.
It is possible for the set of characteristics characterizing the sales pattern to contain unrelated characteristics. Hence, the proposed approach employs one-way ANOVA analysis to determine the significance of the input feature set and identify characteristics that are associated with the sales pattern. The F-score values of the features are computed in this manner utilizing the ANOVA test. Generally speaking, characteristics that possess greater F values hold greater significance during the prediction stage and are thus more conspicuous. Following the ranking of the features, the FSFS method is utilized to select the desired features. The primary function of FSFS is to determine the most visible and appropriate subset of ranked features. The algorithm generates the optimal subset of features by iteratively selecting features from the input set in accordance with their ranking. As each new feature is incorporated into the feature subset at each stage, the learning model's prediction error is assessed. The feature addition procedure concludes when the performance of the classification model is negatively impacted by the addition of a new feature. In such cases, the optimal subset is determined as the feature subset with the smallest error. Utilizing the resultant feature set, the ensemble system's components are trained in order to forecast sales volume.
CapSA is tasked with the responsibility of identifying the most appropriate neural network topologies and optimal weight values within the proposed method. As previously stated, the ensemble model under consideration comprises three ANNs, with each one tasked with forecasting the forthcoming sales volume for a specific retailer. Using CapSA, the configuration and training processes for each of these ANN models are conducted independently. This section provides an explanation of the procedure involved in determining the optimal configuration and modifying the weight vector for each ANN model. Hence, the subsequent section outlines the steps required to solve the aforementioned optimization problem using CapSA, after which the structure of the solution vector and the objective function are defined. The suggested method's optimization algorithm makes use of the solution vector to determine the topology, network biases, and weights of neuronal connections. As a result, every solution vector in the optimization process consists of two linked parts. The first part of the solution vector specifies the network topology. Next, in the second part, the weights of the neurons and biases (which match the topology given in the first part of the solution vector) are determined. As a result, the defined topology of the neural network determines the variable length of the solution vectors in CapSA. Because a neural network might have an endless number of topological states, it is necessary to include certain restrictions in the solution vector that relate to the topology of the network. The first part of the solution vector is constrained by the following in order to narrow down the search space:
The precise count of hidden layers in any neural network is one. As such, the first element of the solution vector consists of one element, and the value of that element represents the number of neurons assigned to the hidden layer of the neural network.
The hidden layer of the neural network has a minimum of 4 and a maximum of 15 neurons.
The number of input features and target classes, respectively, determine the dimensions of the input and output layers of the neural network. As a result, the initial segment of the solution vector, known as the topology determination, solely specifies the quantity of neurons to be contained in the hidden layers. Given that the length of the second part of the solution vector is determined by the topology in the first part, the length of the first part determines the number of neurons in the neural network. For a neural network with I input neurons, H hidden neurons, and P output neurons, the length of the second part of the solution vector in CapSA is equal to (Htimes (I+1)+Ptimes (H+1)).
In CapSA, the identification of optimal solutions involves the application of a fitness function to each one. To achieve this goal, following the solution vector-driven configuration of the neural network's weights and topology, the network produces outputs for the training samples. These outputs are then compared to the actual target values. Following this, the mean absolute error criterion is applied to assess the neural network's performance and the generated solution's optimality. CapSAs fitness function is thus characterized as follows:
$$MAE=sum_{i=1}^{N}left|{T}_{i}-{Z}_{i}right|$$
(1)
In this context, N denotes the quantity of training samples, while Ti signifies the desired value to be achieved for the i-th training sample. Furthermore, the output generated by the neural network for the i-th training sample is denoted as Zi. The proposed method utilizes CapSA to ascertain a neural network structure capable of minimizing Eq.(1). In CapSA, both the initial population and the search bounds for the second portion of the solution vector are established at random [1, +1]. Thus, all weight values assigned to the connections between neurons and biases of the neural network fall within this specified range. CapSA determines the optimal solution through the following procedures:
Step 1 The initial population of Capuchin agents is randomly valued.
Step 2 The fitness of each solution vector (Capuchin) is calculated based on Eq.(1).
Step 3 The initial speed of each Capuchin agent is set.
Step 4 Half of the Capuchin population is randomly selected as leaders and the rest are designated as follower Capuchins.
Step 5 If the number of algorithm iterations has reached the maximum G, go to step 13, otherwise, repeat the following steps:
Step 6 The CapSA lifespan parameter is calculated as follows27:
$$tau ={beta }_{0}{e}^{{left(-frac{{beta }_{1}g}{G}right)}^{{beta }_{2}}}$$
(2)
where g represents the current number of iterations, and the parameters ({beta }_{0}), ({beta }_{1}), and ({beta }_{2}) have values of 2, 21, and 2, respectively.
Step 7 Repeat the following steps for each Capuchin agent (leader and follower) like i:
Step 8 If i is a Capuchin leader; update its speed based on Eq.(3)@@27:
$${v}_{j}^{i}=rho {v}_{j}^{i}+tau {a}_{1}left({x}_{bes{t}_{j}}^{i}-{x}_{j}^{i}right){r}_{1}+tau {a}_{2}left(F-{x}_{j}^{i}right){r}_{2}$$
(3)
where the index j represents the dimensions of the problem and ({v}_{j}^{i}) represents the speed of Capuchin i in dimension j. ({x}_{j}^{i}) indicates the position of Capuchin i for the j-th variable and ({x}_{bes{t}_{j}}^{i}) also describes the best position of Capuchin i for the j-th variable so far. Also, ({r}_{1}) and ({r}_{2}) are two random numbers in the range [0,1]. Finally, (rho) is the parameter affecting the previous speed, which is set to 0.7.
Step 9 Update the new position of the leader Capuchins based on their speed and movement pattern.
Step 10 Update the new position of the follower Capuchins based on their speed and the leaders position.
Step 11 Calculate the fitness of the population members based on Eq.(1).
Step 12 If the entire populations position has been updated, go to Step 5, otherwise, repeat the algorithm from Step 7.
Step 13 Return the solution with the least fitness as the optimal configuration of the ANN model.
Once each predictive component has been configured and trained, CapSA is utilized once more to assign the most advantageous weights to each of these components. Determining the significance coefficient of the output produced by each of the predictive components ANN1, ANN2, and ANN3 with respect to the final output of the proposed ensemble system is the objective of optimal weight allocation. Therefore, the optimization variables for the three estimation components comprising the proposed ensemble model correspond to the set of optimal coefficients in this specific implementation of CapSA. Therefore, the length of each Capuchin in CapSA is fixed at three in order to determine the ensemble model output, and the weight coefficients are assigned to the outputs of ANN1, ANN2, and ANN3, correspondingly. Each optimization variable's search range is a real number between 0 and 1. After providing an overview of the computational methods employed in CapSA in the preceding section, the sole remaining point in this section is an explanation of the incorporated fitness function. The following describes the fitness function utilized by CapSA to assign weights to the learning components according to the mean absolute error criterion:
$$fitness=frac{1}{n} sum_{i=1}^{n}{T}_{i}-frac{sum_{j=1}^{3}{w}_{j}times {y}_{j}^{i}}{sum_{j=1}^{3}{w}_{j}}$$
(4)
where ({T}_{i}) represents the actual value of the target variable for the i-th sample. Also, ({y}_{j}^{i}) represents the output estimated by the ANNj model for the i-th training sample, and wj indicates the weight value assigned to the ANNj model via the solution vector. At last, n describes the number of training samples.
A weight coefficient is allocated to each algorithm within the interval [0,1], delineating the manner in which that algorithm contributes to the final output of the ensemble model. It is crucial to note that the weighting phase of the learning components is executed only once, after the training and configuration processes have been completed. Once the optimal weight values for each learning component have been determined by CapSA, the predicted volume of forthcoming orders is executed using the trained models and the specified weight values. Once the predictive output of all three implemented ANN models has been obtained, the number of forthcoming orders is computed as follows by the proposed weighted ensemble model:
$$output=frac{sum_{i=1}^{3}{w}_{i}times {y}_{i}}{sum_{i=1}^{3}{w}_{i}}$$
(5)
Within this framework, the weight value (wi) and predicted value (yi) denote the ANNi model's assigned weight and predicted value, respectively, for the provided input sample. Ultimately, the retailer satisfies its future obligations in accordance with the value prediction produced by this ensemble model.
By predicting the sales volume of the product for specific retailers, it becomes possible to procure the requisite resources for each retailer in alignment with the projected sales volume. By ensuring that the supplier's limited resources are distributed equitably, this mechanism attempts to maximize the effectiveness of the sales system. In the following analysis, the sales volume predicted by the model for each retailer designated as i is represented by pi, whereas the agent's current inventory is denoted by vi. Furthermore, the total distribution capacity of the supplier is represented as L. In such a case, the supplier shall allocate the requisite resources to the retailer as follows:
Sales volume prediction Applying the model described in the previous part, the upcoming sales volume for each agent in the future time interval (pi) is predicted.
Receiving warehouse inventory The current inventory of every agent (vi) is received through supply chain management systems.
Calculating the required resources The amount of resources required for the warehouse of each retailer is calculated as follows:
$$S_{i} = max left( {0,p_{i} - v_{i} } right)$$
(6)
Calculating each agents share of allocatable resources The share of each retailer from the allocatable resources is calculated by Eq.(7), (N represents the number of retailers):
$${R}_{i}=frac{{S}_{i}}{sum_{j=1}^{N}{S}_{j}}$$
(7)
Resource allocation The supply center sends the needed resources for each agent according to the allocated share (Ri) to that agents warehouse.
Inventory update The inventory of every agent is updated with the receipt of new resources.
The rest is here:
A multi-institutional machine learning algorithm for prognosticating facial nerve injury following microsurgical resection … – Nature.com
Posted: at 2:48 am
Facial nerve injury is a morbid complication of treatment for VS, with downstream effects ranging from social stigmata, patient depression and reduced quality of life16,17, to corneal abrasions and ulcers from incomplete eye closure and loss of corneal sensation18. Other than tumor size, relatively little is understood about factors that may influence facial nerve outcomes in microsurgery for VS. The clinical impact of facial nerve injury and importance of facial nerve preservation is highlighted by the extensive literature exploring predictors of facial nerve injury19,20,21,22,23. We leveraged our multi-institutional experience at two centers with high volumes of VS patients and applied machine learning techniques to identify novel predictors of facial nerve injury in patients treated with microsurgery.
Machine learning technologies have recently undergone a resurgence alongside the development of computational tools for handling and storing the large amounts of data required for their meaningful and broad scale utilization13,24. The recognition that such tools can be used to glean novel trends from data that are not readily apparent from common descriptive statistical approaches makes their application within the clinical domain a valuable and ongoing endeavor25. Such a phenomenon can be seen in the present study where tests of association, comparing measures of centrality between outcome groups, did not identify any factors which significantly differed between patients with and without preserved facial function. In contrast, random forest feature importance analysis discerned four featuresBMI, case length, age and the tumor dimension representing growth towards the brainstem (measurement B)as being relevant in predicting 6-month facial nerve status. While further studies must be carried out to fully characterize the mechanistic role of these factors in facial nerve outcome, this demonstrates the utility of applying novel data science techniques to uncover non-linear interactions between variables which may have real-world, clinical relevance.
As previously noted, tumor measurements utilized in our study were selected due to their relationships to surgical corridors, as well as having been shown to correlate well with tumor size by volumetric analysis in previous literature10. We found high ICC for all measurements, which was comparable to other reports in the literature on similar VS measurement tasks26,27. Although historically, an overall larger tumor size has been demonstrated to portend worse facial nerve function after microsurgical resection19,20,28,29,30, results of the present study identified the tumor dimension representing growth within the cerebellopontine angle between the mid-axis of the tumor and the brainstem as most predictive of facial nerve outcome. Our findings are consistent with prior literature, while providing further insight into possible mechanisms by which tumor size may influence facial nerve injury. A relatively larger tumor dimension within the cerebellopontine angle, between the brainstem and porus acusticus is postulated to result in more thinning and splaying of the facial nerve. This causes direct mechanical injury and makes the facial nerve more difficult to distinguish from tumor capsule and surrounding adherent arachnoid, placing the facial nerve at greater risk of iatrogenic injury31. Thus, our study builds on prior literature reporting greater tumor size as a predictor of facial nerve injury following vestibular schwannoma microsurgery, by suggesting that the tumor dimension representing growth within the cerebellopontine angle from the mid-axis of the tumor towards the brainstem has the greatest implication on facial nerve outcome. We did not identify any difference between our facial nerve preservation and facial nerve dysfunction groups when comparing this dimension. It is worth noting that we observed a relatively higher rate of Koos grade III and IV tumors compared to other published series, suggesting that this series may be skewed towards larger tumors overall. This may partially explain our inability to decipher a difference between facial nerve preservation and facial nerve injury groups based on tumor size. We anticipate that future studies including larger cohorts of patients might capture a relationship between facial nerve susceptibility to injury as this tumor dimension increases.
Older patient age has been previously shown to be predictive of facial nerve dysfunction, similar to our own findings20,29, though this remains controversial. While some studies have found no significant relationship between post-operative facial nerve function and age32, our study and others have identified a trend towards increasing age influencing unfavorable facial nerve outcomes following vestibular schwannoma microsurgery33. Others reporting on this finding have hypothesized on the influence of frailty, burden of comorbidities, decreased neurologic reserve resulting in reduced facial nerve rehabilitation potential33, and the confounding influence of age itself on facial nerve grading given that skin laxity and thinning may contribute to worse grading and/or worsened manifestations of facial nerve paralysis in elderly patients34. We further hypothesize that the basis of this relationship might be less favorable tissue dissection planes in patients of advanced age, placing older patients at greater risk of iatrogenic facial nerve injury. Although further detailed analysis of the role of age in facial nerve outcome on patients undergoing vestibular schwannoma microsurgery is beyond the scope of the current study, further study would certainly be valuable to confirm and better characterize the nature of this relationship. Our study further demonstrated additional unique features predictive of facial nerve outcomes which have not been previously identified. Our hypotheses regarding the role of BMI and case length are discussed further below.
Interestingly, our model identified BMI and operative case length as being highly predictive of facial nerve outcome at 6months post-operatively. To the best of our knowledge, these associations have not been clearly delineated in previous studies. One study examined facial nerve injury in the context of post-operative complications and the need for readmission or re-operation, finding no significant association to BMI35. However, as the authors note, facial nerve injury often occurs without the requirement for reoperation and readmission, thus is likely underrepresented in their analysis. Another study evaluated the influence of BMI on mean HB score pre-operatively (1.1 non-obese vs. 1.0 obese, p=0.16) and post-operatively (1.9 non-obese vs. 1.7 obese, p=0.32) finding no difference between obese and non-obese groups36. However, the timing of facial nerve function assessment is not clearly specified in this study and when facial function is modelled as a categorical variable (rather than continuous, summarized with mean HB scores), obese patients were more likely than non-obese patients to have HB scores equal to or greater than III (9.2% non-obese vs. 17.7% obese). The observed association between BMI and facial nerve dysfunction in our study may be seen as hypothesis-generating, and should be explored in future studies. It is possible that difficult surgical ergonomics in high-BMI patients make tumor dissection off of the facial nerve more difficult, placing patients at higher risk of dysfunction37,38,39. For example, in higher BMI patients, relatively higher mass of the neck and shoulder may further narrow an already small operative working corridor, which in addition to requiring less ergonomic positioning for tumor access, limits the dissection vectors and angles, and reduces range of motion and visibility. The increased utilization of endoscopes40 and exoscopes41 in lateral skull base surgery may eventually mitigate some of these constraints.
Operative duration is identified as a key factor associated with facial nerve outcome in microsurgical resection of vestibular schwannomas in the present studyto our knowledge, this is the first such description of this association, however, this is consistent with previous studies in which prolonged operative duration has been shown to be associated with a higher rate of complications42. Our observed association of increased operative length being associated with a higher likelihood of facial nerve dysfunction may be reflective in part of the known association between tumor size and facial nerve outcomes, as a result of larger tumors having longer average operative durations. However, given that larger overall tumor size and individual tumor measurements in three dimensions (parallel to the posterior petrous bone, between central axis of tumor and porus acusticus, and from porus acusticus to distalmost extent of tumor growth within the IAC) were not found to be predictive of facial nerve dysfunction, other factors which may increase case length should be considered and investigated in future studies as the underlying mechanism of this association. Factors such as tumor hypervascularity43, adherence to the facial nerve perineurium, and the direction of facial nerve displacement may be reflected among difference in operative length across patients, and thus contribute to the observed differential risk of facial nerve dysfunction as it relates to case length20. These factors may serve as a surrogate for dissection complexity. Lastly, it is important to recognize that this algorithm, as any machine learning/artificial intelligence tool, is limited by the inputs. As such, there may be other confounding variables that influence facial nerve injury risk which were not captured in our data or analysis. Further study will be critical to better understand the myriad factors which may influence the role of case length on facial nerve outcome in vestibular schwannoma microsurgery.
A major strength of this study is the inclusion of patient cohorts from three hospitals across two health systems, increasing the generalizability of the resulting model. The model demonstrates an expected performance decay from 90.5 to 84% when assessed on unseen data from one of the included institutions. This level of performance decay both demonstrates the low likelihood of overfitting of this model and the relative reliability of the model in the real world (clinical) context. While the current model demonstrates good accuracy while avoiding overfitting, we recognize that performance will continue to improve in the deployment phase as further data is collected at external sites and through future prospective validation with patient data from the participating institutions (Supplementary Fig.2). While we appreciate the tremendous benefit of multi-center data collection to enhance reproducibility, generalizability and clinical translation of our algorithm, we also recognize that as we increase the number of participating centers and expand to include institutions outside of our region, hospital-related factors (setting, level of care, equipment, etc.) and surgeon-related factors (patient selection, preferred surgical approach, years of experience, etc.), will need to be considered and evaluated in this stage of deployment44.
A limitation of the present study is an overall small proportion of patients with facial nerve dysfunction, which likely limited the statistical significance of associations which may have clinical relevance, as well as our ability to further stratify patients into different grades of facial function (i.e. HB IVI). As vestibular schwannoma is a relatively rare disease entity, expanding our database with each currently participating institution will occur at a rate of roughly 3060 patients per year, thus increasing the time to build a dataset robust enough to meaningfully improve the model metrics and generalizability. However, we aim to overcome this limitation through dissemination of our results and the current iteration of the algorithmwe aim to expand this work to include additional intuitions both nationally and internationally with the goals of improving statistical power, and further increasing the generalizability of this work. As additional validation is performed, we anticipate that the machine learning lifecycle will re-start, including further iterations of model evaluation and tuning to further improve performance.
As previously noted, the current iteration of this algorithm was developed based on manual tumor measurements that have been shown to have strong reproducibility and correlation with volumetric analysis throughout the vestibular schwannoma literature. However, accelerated deployment could be expedited through automated tumor segmentationseveral such promising tools have recently been developed for vestibular schwannoma, however, in all cases the authors acknowledge that these will require further validation before implementation45,46,47,48. This approach has shown significant promise in other medical contexts, particularly in developing strategies for automating chest X-ray review during the COVID-19 pandemic49,50, and in the identification of concerning vs. benign gastrointestinal polyps51,52. Lastly, as data science techniques are increasingly applied in medicine, no discussion of their implementation in this context is complete without considering the protection of patient privacy and confidentiality. The algorithm we present here is run locally and completely offline. However, cloud-based automation offers several advantages that must be weighed against the potential for data leakagestrategies for obviating security concerns while maintaining the flexibility, reliability, and accelerated deployment afforded by these tools are under development. A full discussion of such methods is beyond the scope of this paper, but can be further explored in recent works by Mei et al.53 and Wu et al.54, among others.
It is our goal that this algorithm will ultimately be utilized as a clinically valuable tool for stratifying an individual patients risk of facial nerve injury, aiding in pre-operative counseling about treatment approach (watchful waiting vs. radiosurgery vs. microsurgical resection) and timing. Importantly, the model was evaluated via accuracy, sensitivity and specificity given the common utilization of these as metrics of test performance in the clinical setting. In this specific context, we interpret the 90% accuracy to be excellent compared to the 85% accuracy which has been referenced as a benchmark of acceptable performance15we further anticipate improved accuracy and generalizability performance (less performance decay), with the addition of validation examples during deployment. In addition, the sensitivity and specificity of 90% and 90% represent that the model performs equally well at predicting which patients are likely to have complete facial nerve preservation as it does at predicting which patients are likely to have facial nerve dysfunction. We anticipate that further validation through collaboration with additional centers which treat high volumes of vestibular schwannomas will continue to improve the models performance.
Recognizing that clinicians and patients with little to no computer programming background may find it cumbersome to implement the algorithm, we plan to develop a graphical user interface to facilitate ease of use in both exploratory and clinical settings. This concept has been applied in other areas of medicine to facilitate a user-friendly implementation of artificial intelligence in the clinical environment55,56.
View post:
What is AI? Everything to know about artificial intelligence – ZDNet
Posted: at 2:48 am
Some of the most impressive advancements in AI are the development and release of GPT 3.5 and, most recently,GPT-4o, in addition to lifelike AI avatars and deepfakes. But there have been many other revolutionary achievements in AI -- too many to include here.
Here are some of the most notable.
ChatGPT is an AI chatbot capable of generating and translating natural language and answering questions. Though it's arguably the most popular AI tool, thanks to its widespread accessibility, OpenAI made significant waves in artificial intelligence by creating GPTs 1, 2, and 3 before releasing ChatGPT.
Also:6 ways ChatGPT can make your everyday life easier
GPT stands for Generative Pre-trained Transformer, and GPT-3 was the largest language model at its 2020 launch, with 175 billion parameters. Then came GPT-3.5, which powers the free tier of ChatGPT. The largest version, GPT-4, accessible through the free version of ChatGPT, ChatGPT Plus, and Microsoft Copilot, has one trillion parameters.
Though the safety of self-driving cars is a top concernfor potential users, the technology continues to advance and improve with breakthroughs in AI. These vehicles use ML algorithms to combine data from sensors and cameras to perceive their surroundings and determine the best course of action.
Also: An autonomous car that wakes up and greets you could be in your future
The autopilot feature in Tesla's electric vehicles is probably what most people think of when considering self-driving cars. But Waymo, from Google's parent company Alphabet, also makes autonomous rides -- as a driverless taxi, for example, or to deliver Uber Eats-- in San Francisco, CA, and Phoenix, AZ.
Cruise is another robotaxi service, and auto companies like Audi, GM, and Ford are also presumably working on self-driving vehicle technology.
The achievements of Boston Dynamics stand out in the area of AI and robotics. Though we're still a long way from creating Terminator-level AI technology, watching Boston Dyanmics' hydraulic, humanoid robots use AI to navigate and respond to different terrains is impressive.
Google subsidiaryDeepMindis an AI pioneer focusing on AGI. Though not there yet, the companymade headlines in 2016 for creating AlphaGo, an AI system that beat the world's best (human) professional Go player.
Since then, DeepMind has created AlphaFold, a system that can predict the complex 3D shapes of proteins. It has also developed programs to diagnose eye diseases as effectively as top doctors.
Also: What is generative AI and why is it so popular? Here's everything you need to know
See original here:
What is AI? Everything to know about artificial intelligence - ZDNet
Learning by machines, for machines: Artificial Intelligence in the world’s largest particle detector – ATLAS Experiment at CERN
Posted: at 2:48 am
In todays age, you can't do much without interfacing with artificial intelligence and machine learning (AI/ML). This technology lets you unlock your phone via face recognition, helps curate your social media feed and powers internet search. In the future, it promises to automate tasks as mundane as driving a car and as cerebral as scientific outreach. Its clear transformative capability has captured our collective attention, sparking dialogue across scientific communities, governments and the general public, alike. But long before ChatGPT or DALL-E, the basic statistical principles that underpin the world's most sophisticated ML tools were hard at work in the field of high-energy collider physics. Today, they are enabling unprecedented progress in understanding the nature of our fundamental universe.
High-energy physics (HEP) can trace its relationship with ML back many decades, with the earliest neural networks coming into play in the 1990s. ML algorithms improved Higgs-boson searches at CERNs LEP collider, powered CP-violation measurements at the B factories at KEK and SLAC, and enabled the observation of single top-quark production at Fermilab's Tevatron collider. They were also key for the discovery and study of the Higgs boson as well as the observation of the ultra rare two-muon decay of the Bs meson at the LHC.
But it wasn't until the 2010s that modern computational power and methodological innovation enabled deep learning, and let AI-based research methods like ML really shine. In many ways, the relationship between particle physics and ML is a natural and symbiotic one. High energy particle collisions offer a means to study fundamental interactions under conditions similar to the early universe, allowing a window into potential particles or processes that are frozen out in the current universe. In this way, finding optimal and intelligent ways to sift through the trove of data from experiments at CERNs Large Hadron Collider (LHC) is crucial, as it enables researchers to precisely characterise the Standard Model (SM) and understand mysteries like dark matter and matter-antimatter asymmetry that motivate new physics beyond the Standard Model (BSM).
Collider-based experiments led to some of the original big data problems. Experiments like ATLAS at the LHC operate with staggering data rates, producing over 60 terabytes per second, yet only some of these proton collision events may contain processes of interest. What's more, these experiments offer a dataset that is unique in its complexity and statistical power, in which new ML architectures or problems such as systematic biases or hardware optimization can be studied.
The task of operating the ATLAS experiment and extracting results from its vast datasets is a computational labyrinth. From its inception in the 1990s, the ATLAS experiment was designed to process every proton collision digitally. Within seconds of a collision the data from millions of sensors have been filtered through a web of custom electronics and analysed on a computing farm with tens of thousands of CPUs. Collisions of interest are recorded and reanalysed countless times by physicists looking to better understand the nature of the universe. In all cases the goal of this analysis is to discover precisely what happened at the interaction point, where two protons travelling at 99.999999% the speed of light collide in the accelerator, producing a plethora of new particles.
Unfortunately, the physics at the interaction point can be elusive. Many of the most interesting SM or BSM particles produced in the proton collision will decay in less than one trillionth of one trillionth of a second! Physicists can only pick up on hints of SM or BSM physics by looking for the decay products of the most interesting particles, which may themselves decay before reaching any sensors in ATLAS. From a single collision the ATLAS experiment may record thousands of individual particles, and to make matters worse, it typically has to deal with dozens of simultaneous collisions. To understand what happened at the interaction point, physicists must carefully reconstruct, identify and measure each of these particles. These are then used to reconstruct the entire collision event, which are scoured for processes of interest that may lead to a better understanding of known particles, or shed light on the existence of never-before-observed ones.
ML methods are designed to harness large amounts of data in order to infer new information, making them naturally suited to various data processing tasks in ATLAS, from the moment a particle hits the detector all the way to the final published results. The examples that follow give a representative idea of how extensively this technology has pervaded the experiment, but merely scratch the surface of the full picture and potential of ML in HEP.
During regular operation of the ATLAS experiment, the first challenge is what to do with the data that is created. With multiple subsystems, an LHC frequency of 40 million collisions per second and millions of individual channels of data to read out, the ATLAS experiment produces a data rate far greater than can possibly be written to disk. A complex trigger system implements algorithms that rapidly evaluate incoming data events to determine if they are interesting enough to keep, rejecting the overwhelming majority of events produced.
This task requires sophisticated inference that can be done very quickly, introducing the use of fast ML to accelerate ML algorithms traditionally run in software for use in hardware such as field-programmable gate arrays (FPGAs). This process allows for greater intelligence closer to the source of the data, leading to more accurate reconstruction and better trigger decisions. For example, the energy and timing of signals in the ATLAS electromagnetic calorimeter subsystem can now be estimated by convolutional and recurrent ML architectures in real time of LHC operation, outperforming existing signal filters (Figure 1). This capability of ML to perform fast and accurate regression of key physical quantities can also be used for more accurate calibrations of detector signals.
AI/ML also comes into play in the reconstruction algorithms that turn detector signals into physics objects. Well before ATLAS recorded its first data, physicists had developed hundreds of algorithms to reconstruct specific particle types based on the signatures they leave in the different ATLAS sub-detectors. Some particles, like b-hadrons, will decay before reaching any of the ATLAS sub-detectors, and are discerned by triangulating the trajectories of the decay products back to a displaced vertex that is separated from the proton collision point by just a few millimetres. ML has proved essential in identifying this distinctive signature. The latest tools to identify b-hadrons in the detector make use of cutting-edge architectures, such as transformers with attention mechanisms that carefully study simulated b-hadron decays and learn to reject vertices from regular light quark processes at the best rate achieved in ATLAS to date. Transformers have also been used to learn the complex signature of a particle decaying to two b-hadrons when the decay is collimated and the b-hadron tracks are overlapping.
Once the data have been recorded and the events are reconstructed, it's time to study the underlying physics mechanism that produced the event. ATLAS physics analyses are predicated on effective solutions to classic signal-to-noise problems. Many processes of interest are incredibly rare and can be challenging to distinguish among the billions of ordinary proton collisions. Here is where ML can shine: its broad ability to exploit subtle features within a complex and high-statistics dataset make it a primary workhorse for isolating interesting signal processes.
No particle has held more interest in the past decade than the Higgs boson, discovered in 2012 by the ATLAS and CMS experiments and met with great fanfare and excitement. Understanding and characterising the Higgs boson and the underlying mechanism of mass generation remains an essential goal of high-energy physics today, and ML is in use throughout Higgs-boson analyses. The 2018 observation of the Higgs boson in its most common, but trickiest, decay channel, Hbb, made use of a classic boosted decision tree (BDT) architecture to classify the Higgs-boson signal from the overwhelming background of multijet processes to make observation possible (Figure 2).
ML has also enabled unprecedented study of the top quark, the heaviest known particle and one with a particularly interesting connection to the Higgs boson. In 2023, researchers adapted a graph neural network to model collisions in a geometrical way using the particles produced during the collision and their relationship to one another in the detector space. Training this model to separate the rare four-top-quark-production process from SM backgrounds allowed ATLAS to make its first statistically confident observation of such events, along with a measurement of its production rate and constraints on key possible extensions of the SM.
While these examples of ML to isolate a specific signal demonstrate the depth of its effectiveness, another implementation of ML can reveal its breadth. A growing interest in the LHC community in anomaly detection has led to the proliferation of ML methods that can isolate unusual phenomena from a well-known background model. Such an approach lowers the need to rely on a specific signal model, making these search techniques very broad and sensitive to new physics that may have been missed by previous analysis approaches. In recent years, ATLAS published its first use cases of anomaly detection, implemented via ML algorithms without complete labelling information of training inputs, all in the context of searches for new heavy particles decaying to two-object final states (Figure 3). These analyses leveraged the power of data-driven ML training through a mix of conventional and novel architectures to perform model-independent searches for new particles with a variety of mass and decay hypotheses, providing an invaluable new approach for extracting the most from the ATLAS dataset.
Despite these successes, there's no such thing as a flawless solution. While ML can offer incredible benefits to ATLAS throughout all stages of the analysis chain, its usage has to be closely coupled with continuous monitoring. Models can inherit unintended biases in the course of training, leading it to make spurious or, even worse, incorrect inferences. The risk of such biases is so significant that it has spawned a broader subfield of AI alignment and safety, and must be carefully considered when applying ML tools to produce physics results.
Luckily there are many ways ATLAS physicists can tackle this challenge. One potential source of such bias emerges from the use of simulated collision events to develop ML tools. While physicists have invested decades into generating accurate and fast simulations, there are still some known ways in which their predictive capabilities can break down. Furthermore, the development of a tool using a particular selection of data with limited statistical power can often require the intentional decorrelation of the model's learned conclusions from certain sensitive properties that should not be considered. To address these issues, physicists make use of dedicated de-biasing or decorrelation techniques from the HEP-ML research community, such as moment decomposition or distance correlation. The limitations of statistical power in simulation samples used for training can also be mitigated through the use of fast simulation methods, which use ML to circumvent the costly full Monte Carlo simulation chain by making fast estimations of key collision and detector properties.
On top of it all, developing, training and running these advanced algorithms takes a staggering amount of power. To run and adequately cool the mainframes and supercomputers of the CERN Data Centre takes about 37 gigawatt-hours per year, about 3% of CERN's total annual electrical consumption when the LHC operates. While this computing covers all CERN operations, including many applications beyond AI/ML development, producing this quantity of electricity has a significant carbon footprint. The growing role of AI/ML, combined with the uptake of larger and larger models, means that associated power consumption will likely increase as well; for context, Open AIs ChatGPT uses half a gigawatt-hour daily! Greener approaches are being investigated to continue these operations at CERN in an increasingly climate-focused society. Through dedicated sustainability initiatives, CERN is working with experts across areas of research to find environmentally friendly data management solutions and greener ways to run collider experiments.
With this striking history of success, and expectations for computational power to continue its tremendous rise, the future of ML in high-energy physics is bright. ATLAS researchers are collaborative by nature, and much of the work described here wouldn't be possible without close ties to the computer science and AI/ML research communities. Maintaining and expanding these relationships means that physics experimentation will continue to benefit from the latest and greatest in ML algorithms and software capabilities. A recent push across CERN to provide more "open data" recorded by the experiments will further engage researchers outside of HEP who can benefit from the uniquely complex and high-statistics LHC datasets to design and optimise their tools.
Beyond the horizons of ATLAS, AI/ML techniques are similarly impacting the broader landscape in physics. Within theoretical physics, ML offers the promise of dramatically reducing computation cost/time of challenging calculations and simulations, among other things. Further, ML is being studied to perform comprehensive optimizations of future detector designs, which comes at an exciting time for the strategic planning of next-generation colliders.
The long-term future of AI will have an impact on our world that is exciting, transformative and yet unimaginable and things are no different for particle physics. Through continued collaboration and thoughtful planning for the potential ethical and environmental consequences, researchers can properly harness AI/ML to usher in a new era of precision understanding (and potentially groundbreaking discoveries) in particle physics.
The author would like to thank Katarina Anthony, Dan Guest, Andreas Hoecker, Walter Hopkins, Michael Kagan, Zach Marshall, Benjamin Nachman, and Manuella Vincter for their input and feedback.
Julia Gonski is a Panofsky Fellow (associate staff scientist) working on the energy frontier at SLAC National Accelerator Laboratory. Her research focuses on novel approaches to searching for beyond the Standard Model physics with the ATLAS experiment, particularly incorporating machine learning (ML) and anomaly detection. She also works on fast ML for electronics in advanced trigger and readout systems, and planning for next-generation global collider facilities.
Link:
Comparative Analysis of Classification of Neonatal Bilirubin by Using Various Machine Learning Approaches – Cureus
Posted: at 2:48 am
Specialty
Please choose I'm not a medical professional. Allergy and Immunology Anatomy Anesthesiology Biostatistics Cardiac/Thoracic/Vascular Surgery Cardiology Critical Care Dentistry Dermatology Diabetes and Endocrinology Emergency Medicine Epidemiology and Public Health Family Medicine Forensic Medicine Gastroenterology General Practice Genetics Geriatrics Health Policy Hematology HIV/AIDS Hospital-based Medicine I'm not a medical professional. Infectious Disease Integrative/Complementary Medicine Internal Medicine Internal Medicine-Pediatrics Medical Education and Simulation Medical Physics Medical Student Nephrology Neurological Surgery Neurology Nuclear Medicine Nutrition Obstetrics and Gynecology Occupational Health Oncology Ophthalmology Optometry Oral Medicine Orthopaedics Osteopathic Medicine Otolaryngology Pain Management Palliative Care Pathology Pediatrics Pediatric Surgery Pharmacology Physical Medicine and Rehabilitation Plastic Surgery Podiatry Preventive Medicine Psychiatry Psychology Pulmonology Radiation Oncology Radiology Rheumatology Substance Use and Addiction Surgery Therapeutics Trauma Urology Miscellaneous
More:
Machine learning and hydrodynamic proxies for enhanced rapid tsunami vulnerability assessment | Communications … – Nature.com
Posted: at 2:48 am
Synthetic variables for shielding mechanism and debris impact as proxies for water velocity
To comprehensively analyze the individual contributions of the three approaches for accounting for water velocity, we systematically trained different eXtra Trees (XT) models33, each featuring a unique combination of input variables. The reference scenario (ID0) serves as both the initial benchmark and foundational baseline, encompassing the minimum set of variables retained across all subsequent scenarios. This baseline incorporates only basic input variables sourced from the original MLIT database, further enriched with some of the geospatial variables introduced by Di Bacco et al. characterized by the most straightforward computation23. Subsequently, the additional models are generated by iteratively introducing velocity-related (directly or indirectly) features into the model. This stepwise approach allows us to isolate the incremental improvements in predictive accuracy attributed to each individual component under consideration. Table 1 in Methods offers a concise overview of all tested variables, with those included in the reference scenario highlighted in italics.
The core results of the analysis aimed at assessing the predictive performance variability among the various trained models are summarized in Fig.1, which illustrates the global average accuracy (expressed in terms of hit rate (HR) on the test set) achieved by each model across ten training sessions. In the figure, each column represents a specific combination of input features, with x markers indicating excluded variables during each model training. Insights into the importance of individual input features on the models predictive performance are provided by the circles, the size of which corresponds to the mean decrease in accuracy (mda) when each single variable is randomly shuffled.
Circle size reflects the mean decrease in accuracy (mda) when individual variables are shuffled and x markers indicate excluded variables in model training.
The pair plot in Fig.2, illustrating the correlations and distributions among considered velocity-related variables as well as Distance across the seven damage classes in the MLIT dataset, has been generated to support the interpretation of the results and enrich the discussion. This graphical representation employs scatter plots to display the relationships between each pair of variables, while the diagonal axis represents kernel density plots for the individual features.
The pie chart summarizes the distribution of the various damage states within the dataset (shades from light pink to violet). The pair plot displays the relationships between each pair of variables, while the diagonal axis represents kernel density plots for the individual features.
The baseline model (ID0), established as a reference due to its exclusion of any velocity information, attains an average accuracy of 0.836. In ID1, the model exclusively incorporates the direct contribution of vsim, resulting in a modest improvement, with accuracy reaching 0.848. The subsequent model, ID2, closely resembling ID1 but replacing vsim with vc, demonstrates a decline in performance, with an accuracy value of 0.828. This decrease is attributed to the redundancy between vc and inundation depth (h), both in their shared importance as variables and in the decrease of hs importance compared to the previous case. Essentially, when both variables are included, the model might become confused because h, which could have been a relevant variable when introduced alone, may now appear less important due to the addition of vc, which basically provides the same information in a different format.
The analysis proceeds with the introduction of buffer-related proxies to account for possible dynamic water effects on damage. Initially, we isolate the effect of the two considered mechanisms: the shielding (ID3) exerted by structures within the buffers (NShArea and NSW) and the debris impact (NDIArea, ID4). In both instances, we observe an enhancement in accuracy, with values reaching 0.877 and 0.865, respectively. Their combined effect is considered in model ID5, yielding only a marginal overall performance improvement (0.878), due to the noticeable correlation between NShArea and NDIArea, especially for the more severe damage levels (Fig.2), with the two variables sharing their overall importance. Combination ID6, with the addition of vc, does not exhibit an increase in accuracy compared to the previous model (0.871), thus confirming the redundant contribution of a variable directly derived from another.
In the subsequent three input feature combinations, we explore the possible improvements in accuracy through the inclusion of vsim in conjunction with the considered proxies. In the case of ID7, where vsim is combined solely with shielding effect, no enhancement is observed (0.870) compared to the corresponding simple ID3. Similarly, when replacing shielding with the debris proxy (ID8), an overall accuracy of 0.867 is achieved, closely resembling the performance of ID4, lacking direct velocity input. The highest accuracy (0.889) is instead obtained when all three contributions are included simultaneously. Hence, the inclusion of vsim appears to result only in a marginal enhancement of model performance, with also an overall lower importance compared to the considered two proxies. From a physical perspective, albeit without a noticeable correlation between the data points of vsim and NShArea (Fig.2), this result can be explained by recognizing that flow velocity indirectly encapsulates the shielding effect arising from the presence of buildings, which are typically represented in hydrodynamic models as obstructions to wave propagation or through an increase in bottom friction for urban areas8,34,35,36. Since this alteration induced by the presence of buildings directly influences the hydrodynamic characteristics of the tsunami on land, the resulting values of vsim offer limited additional improvement to the models predictive ability compared to what is alreadyprovided by h and NShArea. Moreover, the very weak correlation of the considered proxies with the primary response variable h (Fig.2) reinforces their importance in the framework of a machine learning approach, since they provide distinct input information compared to flow velocity, which, instead, is directly related to h, as discussed for vc. Such observations then support the idea of regarding these proxies as suitable variables for capturing dynamic water effects on buildings.
In all previous combinations, observed field values (hMLIT) served as the primary data source for inundation depth information. However, for a more comprehensive analysis, we also introduced feature combination ID10, similar to ID9 but employing simulated inundation depths (hsim) in place of hMLIT. This model achieves accuracy levels comparable to its counterparts and exhibits a consistent feature importance pattern, albeit with a slight increase in the importance of the Distance variable.
For completeness, normalized confusion matrices, describing hit and misclassification rates among the different damage classes, are reported in Supplementary Fig.S1. These matrices reveal uniform error patterns across all models, with Class 5 consistently exhibiting higher misclassification rates, as a result of its underrepresentation in the dataset, as illustrated in Fig.2. Concerning the potential influence of such dataset imbalance on the results, it is worth noting that, for the primary aim of this study, it does not alter the overall outcomes in terms of relative importance of the various features on damage predictions, as affecting all trained models in the same way.
Delving further into the analysis of the results, the objective shifts toward gaining a thorough understanding of the relationships between the variables influencing the damage mechanisms. Indeed, while we have shown that the inclusion of water velocity components or the adoption of a more comprehensive multi-variable approach enhances tsunami damage predictions, machine learning algorithms have often been criticized for their inherent black-box nature30,31,32.
To address this challenge, we have chosen to embrace the concept of explanation through visualization by illustrating how it remains possible to derive explicit and informative insights from the outcomes derived from a machine learning approach, all while embracing the inherent complexity arising from the multi-variable nature of the problem at hand.
The results of trained models are then translated into the form of traditional fragility functions, expressing the probability of exceeding a certain damage state as a function of inundation depth, for fixed values of the feature under investigation, distinguished for velocity-related (Fig.3), site-dependent (Fig.4) and structural building attributes (Fig.5). In addition to the central value, the derived functions incorporate the 10th90th confidence intervals to provide a comprehensive representation of predictive uncertainty associated with them.
Fragility functions for fixed values of a direct velocity information (vsim), b proxy for shielding effect (NShArea) and c proxy for debris impact (NDIArea). The median fragility function is represented as a solid line, while the shaded area represents the 10th90th confidence interval.
Fragility functions for fixed values of a coastal typology (CoastType) and b distance from the coastline (Distance). The median fragility function is represented as a solid line, while the shaded area represents the 10th90th confidence interval.
Fragility functions for fixed values of a structural type (BS) and b number of floors (NF). The median fragility function is represented as a solid line, while the shaded area represents the 10th90th confidence interval.
Starting with the analysis of the fragility functions obtained for fixed values of velocity-related variables (Fig.3), it is possible to observe the substantial impact of the hydrodynamic effects, especially in more severe inundation scenarios. Notably, differences in the median fragility functions for the more damaging states (DS5) are only evident when velocity reaches high values (around 10m/s), while those for 0.1 and 2m/s are practically overlapping, albeit featuring a wide uncertainty band, demonstrating how the several additional explicative variables included into the model affect the damage process. More pronounced differences in the fragilities become apparent for lower damage states, under shallower water depths (h<2m) and slower flow velocities, although a substantial portion of the predictive power in non-structural damage scenarios predominantly relies on the inundation depth8,11,13. The velocity proxy accounting for the shielding effect (NShArea) mirrors the behavior observed for vsim, but with greater variability for DS7.
For instance, the probability of reaching DS7 with an inundation depth of 4m drops from ~70% for an isolated building (NShArea=0) to roughly 40% for one located in a densely populated area (NShArea=0.5). This substantial variation not only highlights the influence of this variable for describing the damage mechanism, but also explains its profound impact on the models predictive performance shown in Fig.1. Conversely, for less severe DS, the central values of the three considered fragility functions tend to converge onto a single line, indicating that the shielding mechanism primarily influences the process leading to the total destruction of buildings. Distinct patterns emerge for the velocity proxy related to debris impact (NDIArea), particularly for DS5, emphasizing its crucial role in predicting relevant structural damages.
For example, at an inundation depth of 4m, the probability of reaching DS7 is 40% when NDIArea=0 (i.e., no washed-away structures in the buffer area for the considered building), but it rises to ~90% when NDIArea=0.3 (i.e., 30% of the buffer area with washed-away buildings). Moreover, similarly to NshArea, the width of the uncertainty band generally narrows with decreasing damage state, thus suggesting that inundation depth acts as the main predictor for low entity damages. These results represent an advancement beyond the work of Reese et al.26, who first attempted to incorporate information on shielding and debris mechanisms into fragility functions based on a limited number of field observations for the 2009 South Pacific tsunami, and Charvet et al.8, who investigated the possible effect of debris impacts (through the use of a binary variable) on damage levels for the 2011 Great East Japan event.
Concerning morphological variables, Fig.4 well represents the amplification effect induced by ria-type coasts, especially for the higher damage states, consistently with prior literature8,11,13,37,38. However, above 6m, the median fragility curve for the plain coastal areas exceeds that of the ria-type region, in line with findings by Suppasri et al.37,38, who also described a similar trend pattern. Nevertheless, it is worth observing that the variability introduced by other contributing features muddles the differences between the two coastal types, with the magnitude of the uncertainty band almost eclipsing the noticeable distinctions in the central values. This observation highlights the imperative need to move beyond the use of traditional univariate fragility functions, in favor of multi-variable models, intrinsically capable of taking these complex interactions into account. Distance from the coast has emerged as a pivotal factor in predictive accuracy (Fig.1) and this is also evident in the corresponding fragility functions computed for Distance values of 170, 950 and 2600m (Fig.4). Obviously, a clear negative correlation exists between Distance and inundation depth (Fig.2), with structures closer to the coast being more susceptible to damage, especially in case of structural damages. In detail, more pronounced differences in the fragility patterns are observed for DS5 and DS6, where the probability of exceeding these damage states with a 2m depth is almost null for buildings located within a distance of 1km from the coast, while it increases to over 80% for those in close proximity to the coastline. This mirrors the observations resulting for NDIArea (Fig.3), where greater distances result in less damage potential from washed-away buildings.
Figure5 illustrates the fragility functions categorized by structural types (BS) and building characteristics represented in terms of NF. Overall, the observed patterns align with the findings discussed in the preceding figures. When focusing on the median curves, it becomes evident that these features exert minimal influence on the occurrence of non-structural damages, with overlapping curves and relatively narrow uncertainty bands for DS5, owing to the mentioned dominance of inundation depth as main damage predictive variable in such cases.
However, for the more severe damage states, distinctions become more marked. Reinforced-concrete (RC) buildings exhibit lower vulnerability, followed by steel, masonry and wood structures, with the latter two showing only minor differences among them. A similar trend is also evident for NF, with taller buildings being less vulnerable than shorter ones under severe damage scenarios. The most relevant differences emerge when transitioning from single or two-story buildings to multi-story dwellings. However, once again, it is worth noting that, beyond these general patterns, also highlighted in previous studies1,5,8,11,26,34,37, the influence of other factors tends to blur the distinctions among the central values of the different typologies, as visible, for instance, for the confidence interval for steel buildings, which encompasses both median fragility functions for wood and masonry structures.
View original post here:
Machine learning-guided realization of full-color high-quantum-yield carbon quantum dots – Nature.com
Posted: at 2:48 am
Workflow of ML-guided synthesis of CQDs
Synthesis parameters have great impacts on the target properties of resulting samples. However, it is intricate to tune various parameters for optimizing multiple desired properties simultaneously. Our ML-integrated MOO strategy tackles this challenge by learning the complex correlations between hydrothermal/solvothermal synthesis parameters and two target properties of CQDs in a unified MOO formulation, thus recommending optimal conditions that enhance both properties simultaneously. The overall workflow for the ML-guided synthesis of CQDs is shown in Fig.1 and Supplementary Fig.1. The workflow primarily consists of four key components: database construction, multi-objective optimization formulation, MOO recommendation, and experimental verification.
It consists of four key components: database construction, multi-objective optimization (MOO) formulation, MOO recommendation, and experimental verification.
Using a representative and comprehensive synthesis descriptor set is of vital importance in achieving the optimization of synthesis conditions36. We carefully selected eight descriptors to comprehensively represent the hydrothermal system, one of the most common methods to prepare CQDs. The descriptor list includes reaction temperature (T), reaction time (t), type of catalyst (C), volume/mass of catalyst (VC), type of solution (S), volume of solution (VS), ramp rate (Rr), and mass of precursor (Mp). To minimize human intervention, the bounds of synthesis parameters are determined primarily by the constraints of the synthesis methods and equipment used, instead of expert intuition. For instance, in employing hydrothermal/solvothermal method to prepare CQDs, as the reactor inner pot is made of polytetrafluoroethylene material, the usage temperature should be 220oC. Moreover, the capacity of the reactor inner pot used in the experiment is 25mL, with general guidance of not exceeding 2/3 of this volume for reactions. Therefore, in this study, the main considerations of experimental design are to ensure experimental safety and accommodate the limitations of equipment. These practical considerations naturally led to a vast parameter space, estimated at 20 million possible combinations, as detailed in Supplementary Table1. Briefly, the 2,7-naphthalenediol molecule along with catalysts such as H2SO4, HAc, ethylenediamine (EDA) and urea, were adopted in constructing the carbon skeleton of CQDs during the hydrothermal or solvothermal reaction process (Supplementary Fig.2). Different reagents (including deionized water, ethanol, N,N-dimethylformamide (DMF), toluene, and formamide) were used to introduce different functional groups into the architectures of CQDs, combined with other synthesis parameters, resulting in tunable PL emission. To establish the initial training dataset, we collected 23 CQDs synthesized under different randomly selected parameters. Each data sample is labelled with experimentally verified PL wavelength and PLQY (see Methods).
To account for the varying importance of multiple desired properties, an effective strategy is needed to quantitatively evaluate candidate synthesis conditions in a unified manner. A MOO strategy has thus been developed that prioritizes full-color PL wavelength over PLQY enhancement, by assigning an additional reward when maximum PLQY of a color surpassing the predefined threshold for the first time. Given (N) explored experimental conditions, {(({x}_{i},,{y}_{i}^{c},,{y}_{i}^{gamma }{|; i}=(1,2,ldots,N))}, ({x}_{i}) indicates the (i)-th synthesis condition defined by 8 synthesis parameters, ({y}_{i}^{c}) and ({y}_{i}^{gamma }) indicate the corresponding color label and yield (i.e., PLQY) given ({x}_{i}); ({y}_{i}^{c}in left{{c}_{1},,,{c}_{2},ldots,{c}_{M}right}) for (M) possible colors, ({y}_{i}^{gamma }in left[0,,1right]). The unified objective function is formulated as the sum of maximum PLQY for each color label, i.e.,
$$mathop{sum}nolimits_{{c}_{j}}{Y}_{{c}_{j}}^{max },$$
(1)
where (jin left{1,,2,,ldots,,Mright}) and ({Y}_{{c}_{j}}^{max }) is 0 if (nexists {y}_{i}^{c}={c}_{j}); otherwise
$${Y}_{{c}_{j}}^{max }={max }_{i}left[Big({y}_{i}^{gamma }+R{{cdot }}{mathbb{1}}left({y}_{i}^{gamma }ge alpha right)Big){{cdot }}{mathbb{1}}left({y}_{i}^{c}={c}_{j}right)right].$$
(2)
({mathbb{1}}({{cdot }})) is an indicator function that output 1 if true, otherwise outputs 0. The term (Rcdot {mathbb{1}}({y}_{i}^{gamma }ge alpha )) enforces a higher priority of full-color synthesis, where PLQY for each color shall be at least (alpha) ((alpha=0.5) in our case) to have an additional reward of (R) ((R=10) in our settings). (R) can be any real value larger than 1 (i.e., maximum possible improvement of PLQY for one synthesis condition), to ensure the higher priority of exploring synthesis conditions for colors in which yield has not achieved (alpha). We set (R) to 10, such that the tens digit of unified objective functions value clearly indicates the number of colors with maximum PLQYs exceeding (alpha), and the units digit reflects the sum of maximum PLQYs (without the additional reward) for all colors. As defined by the ranges of PL wavelength in Supplementary Table2, seven primary colors considered in this work are purple (<420nm), blue (420 and <460nm), cyan (460 and <490nm), green (490 and <520nm), yellow (520 and <550nm), orange (550 and <610nm), and red (610nm), i.e., (M=7). Notably, the proposed MOO formulation unifies the two goals of achieving full color and high PLQY into a single objective function, providing a systematical approach to tune synthesis parameters for the desired properties.
The MOO strategy is premised on the prediction results of ML models. Due to the high-dimensional search space and limited experimental data, it is challenging to build models that generalize well on unseen data, especially considering the nonlinear nature of the condition-property relationship37. To address this issue, we employed a gradient boosting decision tree-based model (XGBoost), which has proven advantageous in handling related material datasets (see Methods and Supplementary Fig.3)30,38. In addition, its capability to guide hydrothermal synthesis has been proven in our previous work (Supplementary Fig.4)21. Two regression models, optimized with the best hyperparameters through grid search, were fitted on the given dataset, one for PL wavelength and the other for PLQY. These models were then deployed to predict all unexplored candidate synthesis conditions. The search space for candidate conditions is defined by the Cartesian product of all possible values of eight synthesis parameters, resulting in ~20 million possible combinations (see Supplementary Table1). The candidate synthesis conditions, i.e., unexplored regions of the search space, are further ranked by MOO evaluation strategy with the prediction results.
Finally, the PL wavelength and PLQY values of the CQDs synthesized under the top two recommended synthesis conditions are verified through experiments and characterization, whose results are then augmented to the training dataset for the next iteration of the MOO design loop. The iterative design loops continue until the objectives are fulfilled, i.e., when the achieved PLQY for all seven colors surpasses 50%. In prior studies on CQDs, its worth noting that only a limited number of CQDs with short-wavelength fluorescence (e.g., blue and green), have reached PLQYs above 50%39,40,41. On the other hand, their long-wavelength counterparts, particularly those with orange and red fluorescence, usually demonstrate PLQYs under 20%42,43,44. Underlining the efficacy of our ML-powered MOO strategy, we have set an ambitious goal for all fluorescent CQDs: the attainment of PLQYs exceeding 50%. The capacity to modulate the PL emission of CQDs holds significant promise for various applications, spanning from bioimaging and sensing to optoelectronics. Our four-stage workflow is crafted to forge an ML-integrated MOO strategy that can iteratively guide hydrothermal synthesis of CQDs for multiple desired properties, while also constantly improving the models prediction performance.
To assess the effectiveness of our ML-driven MOO strategy in the hydrothermal synthesis of CQDs, we employed several metrics, which were specifically chosen to ascertain whether our proposed approach not only meets its dual objectives but also enhances prediction accuracy throughout the iterative process. The unified objective function described above measures how well the two desired objectives have been realized experimentally, and thus can be a quantitative indicator of the effectiveness of our proposed approach in instructing the CQD synthesis. The evaluation output of the unified objective function after a specific ML-guided synthesis loop is termed as objective utility value. The MOO strategy improves the objective utility value by a large margin of 39.27% to 75.44, denoting that the maximum PLQY in all seven colors exceeds the target of 0.5 (Fig.2a). Specifically, at iterations 7 and 19, the number of color labels with maximum PLQY exceeding 50% increases by one, resulting in an additional reward of 10 each time. Even on the seemingly plateau, the two insets illustrate that the maximally achieved PLQY is continuously enhanced. For instance, during iterations 8 to 11, the maximum PLQY for cyan emission escalates from 59% to 94%, and the maximum PLQY for purple emission rises from 52% to 71%. Impressively, our MOO approach successfully fulfilled both objectives within only 20 iterations (i.e., 40 guided experiments).
a MOOs unified objective utility versus design iterations. b Color explored with new synthesized experimental conditions. Value ranges of colors defined by PL wavelength: purple (PL<420nm), blue (420nm PL<460nm), cyan (460nm PL<490nm), green (490nm PL<520nm), yellow (520nm PL<550nm), orange (550nm PL<610nm), and red (610nm PL). It shows that while high PLQY has been achieved for red, orange, and blue in the initial dataset, the MOO strategy purposefully enhances PLQYs for yellow, purple, cyan, green respectively in subsequent synthesized conditions in a group of five. c MSE between the predicted and real target properties. d Covariance matrix for correlation among the 8 synthesis parameters (i.e., reaction temperature T, reaction time t, type of catalyst C, volume/mass of catalyst VC, type of solution S, volume of solution VS, ramp rate Rr, and mass of precursor Mp) and 2 target properties, i.e., PLQY and PL wavelength (PL ). e Two-dimensional t-distributed stochastic neighbor embedding (t-SNE) plot for the whole search space, including unexplored (circular points), training (star-shaped points), and explored (square points) conditions, where the latter two sets are colored by real PL wavelengths.
Figure2b reveals that the MOO strategy systematically explores the synthesis conditions for each color, addressing those that have not yet achieved the designed PLQY threshold, starting with yellow in the first 5 iterations and ending with green in the last 5 iterations. Notably, within each quintet of 5 iterations, a singular color demonstrates an enhancement in its maximum PLQY. Initially, the PLQY for yellow surges to 65%, which is then followed by a significant rise in purples maximum PLQY from 44% to 71% during the next set of 5 iterations. This trend continues with cyan and green, where the maximum PLQY escalates to 94% and 83% respectively. Taking into account both the training set (i.e., the first 23 samples) and the augmented dataset, the peak PLQY for all colors exceeds 60%. Several colors approach 70% (including purple, blue, and red), and some are near 100% (including cyan, green, and orange). This further underscores the effectiveness of our proposed ML technique. A more detailed visualization of the PL wavelength and PLQY along each iteration is provided in Supplementary Fig.5.
The MOO strategy ranks candidate synthesis conditions based on ML prediction; thus, it is vital to evaluate the ML models performance. Mean squared error (MSE) is employed as the evaluation metric, commonly used for regression, which is computed based on the predicted PL wavelength and PLQY from the ML models versus the experimentally determined values45. As shown in Fig.2c, the MSE of PLQY drastically decreases from 0.45 to approximately 0.15 within just four iterations a notable error reduction of 64.5%. The MSE eventually stabilizes around 0.1 as the iterative loops progress. Meanwhile, the MSE of PL wavelength remains consistently low, always under 0.1. MSE of PL wavelength is computed after normalizing all values to the range of zero to one for a fair comparison, thus MSE of 0.1 signifies a favorable deviation within 10% between the ML-predicted values and the experimental verifications. This indicates that the accuracies of our ML models for both PL wavelength and PLQY consistently improve, with predictions closely aligning with actual values after enhanced learning from augmented data. This not only demonstrates the efficacy of our MOO strategy in optimizing multiple desired properties but also in refining ML models.
To unveil the correlation between synthesis parameters and target properties, we further calculated the covariance matrix. As illustrated in Fig.2d, the eight synthesis parameters generally exhibit low correlation among each other, indicating that each parameter contributes unique and complementary information for the optimization of the CQDs synthesis conditions. In terms of the impact of these synthesis parameters on target properties, factors such as reaction time and temperature are found to influence both PL wavelength and PLQY. This underscores the importance for both experimentalists and data-driven methods to adjust them with higher precision. Besides reaction time and temperature, PL wavelength and PLQY are determined by distinct sets of synthesis parameters with varying relations. For instance, the type of solution affects PLQY with a negative correlation, while solution volume has a stronger positive correlation with PLQY. This reiterates that, given the high-dimensional search space, the complex interplay between synthesis parameters and multiple target properties can hardly be unfolded without capable ML-integrated methods.
To visualize how the MOO strategy has navigated in the expansive search space (~20 million) using only 63 data samples, we have compressed the initial training, explored, and unexplored space into two dimensions by projecting them into a new reduced embedding space using t-distributed stochastic neighbor embedding (t-SNE)46. As shown in Fig.2e, discerning distinct clustering patterns by color proves challenging, which emphasizes the intricate task of uncovering the relationship between synthesis conditions and target properties. This complexity further underscores the critical role of a ML-driven approach in deciphering the hidden intricacies within the data. The efficacy of ML models is premised on the quality of training data. Thus, selecting training data that span as large search space as possible is particularly advantageous to models generalizability37. As observed in Fig.2e, our developed ML models benefit from the randomly and sparsely distributed training data, which in turn encourage the models to further generalize to previously unseen areas in the search space, and effectively guide the searching of optimal synthesis conditions within this intricate multi-objective optimization landscape.
With the aid of ML-coupled MOO strategy, we have successfully and rapidly identified the optimal conditions giving rise to full-color CQDs with high PLQY. The ML-recommended synthesis conditions that produced the highest PLQY of each color are detailed in the Methods section. Ten CQDs with the best optical performance were selected for in-depth spectral investigation. The resulting absorption spectra of the CQDs manifest strong excitonic absorption bands, and the normalized PL spectra of the CQDs displayed PL peaks ranging from 410nm of purple CQDs (p-CQDs) to 645nm of red CQDs (r-CQDs), as shown in Fig.3a and Supplementary Fig.6. This encompasses a diverse array of CQD types, including p-CQDs, blue CQDs (b-CQDs, 420nm), cyan CQDs (c-CQDs, 470nm), darkcyan CQDs (dc-CQDs, 485nm), green CQDs (g-CQDs, 490nm), yellow-green CQDs (yg-CQDs, 530nm), yellow CQDs (y-CQDs, 540nm), orange CQDs (o-CQDs, 575nm), orange red CQDs (or-CQDs, 605nm), and r-CQDs. Importantly, PLQY of most of these CQDs were above 60% (Supplementary Table3), exceeding the majority of CQDs reported to date (Supplementary Table4). Corresponding photographs of full-color fluorescence ranging from purple to red light under UV light irradiation are provided in Fig.3b. Excellent excitation-independent behaviors of the CQDs have been further revealed by the three-dimensional fluorescence spectra (Supplementary Fig.7). Furthermore, a comprehensive investigation of the time-resolved PL spectra revealed a notable trend. The monoexponential lifetimes of CQDs progressively decreased from 8.6ns (p-CQDs) to 2.3ns (r-CQDs) (Supplementary Fig.8). This observation signified that the lifetimes of CQDs diminished as their PL wavelength experiences a shift towards the red end of the spectrum47. Moreover, the CQDs also demonstrate long-term photostability (>12hours), rendering them potential candidates for applications in optoelectronic devices that require stable performance over extended periods of time (Supplementary Fig.9). All the results together demonstrate the high quality and great potential of our synthesized CQDs.
a Normalized PL spectra of CQDs. b Photographs of CQDs under 365 nm-UV light irradiation. c Dependence of the HOMO and LUMO energy levels of CQDs.
To gain further insights into the properties of the synthesized CQDs, we calculated their bandgap energies using the experimentally obtained absorption band values (Supplementary Fig.10 and Table5). It is revealed that the calculated bandgap energies gradually decrease from 3.02 to 1.91eV from p-CQDs to r-CQDs. In addition, we measured the highest occupied molecular orbital (HOMO) energy levels of the CQDs using ultraviolet photoelectron spectroscopy. As shown in the energy diagram in Fig.3c, the HOMO values exhibit wave-like variations without any discernible pattern. This result further suggests the robust predictive and optimizing capability of our ML-integrated MOO strategy, which enabled the successful screening of these high-quality CQDs from vast and complex search space using only 40 sets of experiments.
To uncover the underlying mechanism of the tuneable optical effect of the synthesized CQDs, we have carried out a series of characterizations to comprehensively investigate their morphologies and structures (see Methods). X-ray diffraction (XRD) patterns with a single graphite peak at 26.5 indicate a high-degree graphitization in all CQDs (Supplementary Fig.11)15. Raman spectra exhibit a stronger signal intensity for the ordered G band at 1585cm1 compared to the disordered D band at 1397cm1, further confirming the high-degree graphitization (Supplementary Fig.12)48. Fourier-transform infrared (FT-IR) spectroscopy was then performed to detect the functional groups in CQDs, which clearly reveals the NH2 and NC stretching at 3234 and 1457cm1, respectively, indicating the presence of abundant NH2 groups on the surface of CQDs, except for orange CQDs (o-CQDs) and yellow CQDs (y-CQDs) (Supplementary Fig.13)49. The C=C aromatic ring stretching at 1510cm1 confirms the carbon skeleton, while three oxide-related peaks, i.e., OH, C=O, and CO stretching, were observed at 3480, 1580, and 1240cm1, respectively, due to abundant hydroxyl groups of the precursor. The FT-IR spectrum also shows a stretching vibration band SO3 at 1025cm1, confirming the additional functionalization of y-CQDs by SO3H groups.
X-ray photoelectron spectroscopy (XPS) was adopted to further probe the functional groups in CQDs (Supplementary Fig.14 to 23). XPS survey spectra analysis reveals three main elements in CQDs, i.e., C, O, and N, except o-CQDs and y-CQDs. Specifically, o-CQDs and y-CQDs lack the N element and y-CQDs contains S element. The high-resolution C1s spectrum of CQDs can be deconvoluted into three peaks, including a dominant CC/C=C graphitic carbon bond (284.8eV), CO/CN (286eV), and carboxylic C=O (288eV), revealing the structures of CQDs. The N1s peak at 399.7eV indicates the presence of NC bonds, verifying the successful N-doping in the basal plane network structure of CQDs, except o-CQDs and y-CQDs. The separated peaks of O1s at 531.5 and 533eV indicate the two forms of oxyhydrogen functional groups with C=O and CO, respectively, consistent with the FT-IR spectra50. The S2p band of y-CQDs can be decomposed into two peaks at 163.5 and 167.4eV, representing SO3/2P3/2 and SO3/2P1/2, respectively47,51. Combining the results of structure characterization, the excellent fluorescence properties of the CQDs are attributed to the presence of N-doping, which reduces non-radiative sites of CQDs and promotes the formation of C=O bonds. The C=O bonds play a crucial role in radiation recombination and can increase the PLQY of the CQDs.
To gain deeper insights into the morphology and microstructures of the CQDs, we have then conducted transmission electron microscopy (TEM). The TEM images demonstrate uniformly shaped and monodisperse nanodots, with the gradual increase of average lateral sizes ranging from 1.85nm for p-CQDs to 2.3nm for r-CQDs (Fig.4a and Supplementary Fig.24), which agrees with the corresponding PL wavelength, providing further evidence for the quantum size effect of CQDs (Fig.4a)47. High-resolution TEM images further reveal the highly crystalline structures of CQDs with well-resolved lattice fringes (Fig.4b-c). The measured crystal plane spacing of 0.21nm corresponds to the (100) graphite plane, further corroborating the XRD data. Our analysis suggests that the synthesized CQDs possess a graphene-like high-crystallinity characteristic, thereby giving rise to their superior fluorescence performance.
a The lateral size and color of full-color fluorescent CQDs (inset: dependence of the PL wavelength and the lateral size of full-color fluorescent CQDs). Data correspond to meanstandard deviation, n=3. b, c High-resolution TEM images and the fast Fourier transform patterns of p-, b-, c-, g-, y-, o- and r-CQDs, respectively. d Boxplots of PL wavelength (left)/PLQY (right) and 7 synthesis parameters of CQDs. VC is excluded here as its value range is dependent on C, whose relationships with other parameters are not directly interpretable. The labels at the bottom indicate the minimum value (inclusive) for the respective bins, whereas the bins on the left are the same as the discretization of colors in Supplementary Table2, the bins on the right are uniform. Each box spans vertically from the 25th percentile to the 75th percentile, with the horizontal line marking the median and the triangle indicating the mean values. The upper and lower whiskers extend from the ends of the box to the minimum and maximum data values.
Following the effective utilization of ML in thoroughly exploring the entire search space, we proceeded to conduct a systematic examination of 63 samples using box plots, aiming to elucidate the complex interplay between various synthesis parameters and the resultant optical properties of CQDs. As depicted in Fig.4d, the synthesis under conditions of high reaction temperature, prolonged reaction time, and low-polarity solvents, tends to result in CQDs with a larger PL wavelength. These findings are consistent with the general observations in the literature, which suggest that the parameters identified above can enhance precursor molecular fusion and nucleation growth, thereby yielding CQDs with increased particle size and high PL wavelength47,52,53,54,55. Moreover, a comprehensive survey of existing literature implies that precursors and catalysts, typically including electron donation and acceptance, aid in producing long-wavelength CQDs56,57. Interestingly, diverging from traditional findings, we successfully synthesized long-wavelength red CQDs under ML guidance, with 2,7-naphthalenediol containing electron-donating groups as the precursor and EDA is known for its electron-donating functionalities as the catalyst. This significant breakthrough questions existing assumptions and offers new insights into the design of long-wavelength CQDs.
Concerning PLQY, we found that catalysts with stronger electron-donating groups (e.g., EDA) led to enhanced PLQY in CQDs, consistent with earlier observations made by our research team16. Remarkably, we uncovered the significant impact of synthesis parameters on CQDs PLQY. In the high PLQY regime, strong positive correlations were discovered between PLQY and reaction temperature, reaction time, and solvent polarity, previously unreported in the literature58,59,60,61. This insight could be applied to similar systems for PLQY improvement.
Aside from the parameters discussed above, other factors such as ramp rate, the amount of precursor, and solvent volume also influence the properties of CQDs. Overall, the emission color and PLQY of CQDs are governed by complex, non-linear trends resulting from the interaction of numerous factors. Its noteworthy to mention that the traditional methods used to adjust CQDs properties often result in a decrease in PLQY as the PL wavelength redshifts4,47,51,54. However, utilizing AI-assisted synthesis, we have successfully increased the PLQY of the resulting full-color CQDs to over 60%. This significant achievement highlights the unique advantages offered by ML-guided CQDs synthesis and confirms the powerful potential of ML-based methods in effectively navigating the complex relationships among diverse synthesis parameters and multiple target properties within a high-dimensional search space.
View post:
What is the Future of AI-Driven Employee Monitoring? – InformationWeek
Posted: at 2:48 am
How much work are you getting done, and how are you performing it? Artificial intelligence is poised to answer those questions, if it isnt already. Employers such as Walmart, Starbucks, and Delta, among others, are using AI company Aware to monitor employee messages, CNBC reports.
Employers have been monitoring workers long before the explosion of AI, but this technologys use in keeping tabs on employees has sparked debate. On one side, AI as an employee monitoring tool joins the ranks of other AI use cases touted as the future of work. On the other side, critics raise questions about the potential missteps and impact on employees.
How can AI be used in employee monitoring, and are there use cases that benefit both employers and employees?
Productivity tracking is at the forefront of the AI and employee monitoring conversation. Are employees working when they are on the clock? Answering this question is particularly top-of-mind for employers with people on remote or hybrid schedules.
A lot of workers are doing something called productivity theater to show that they're working when they might not be, says Sue Cantrell, vice president of products and workforce strategy at consulting firm Deloitte Consulting.
AI can be used to sift through data to identify work patterns and measure employee performance against productivity metrics. Fundamentally, the sector is about analytics and being able to process more data and understand patterns more quickly and make intelligent recommendations, Elizabeth Harz tells InformationWeek. She is CEO of Veriato, an insider risk management and employee monitoring company.
Related:Why Technology and Employee Privacy Clash
Veriatos customers most often use its AI-powered platform for insider risk management and user activity monitoring, according to Harz.
Insider risk is a significant cybersecurity concern. The Cost of Insider Risks Global Report 2023, conducted by the Ponemon Institute, found that 75% of incidents are caused by non-malicious insiders. We believe using AI to help teams get more predictive instead of reactive in cyber is critical, Harz explains.
Using AI to monitor workers can be about their own safety, as well as that of the company. People have different dynamics than they did when they went to the office Monday through Friday. It doesn't mean sexual harassment has gone away. It doesn't mean hostile work environments have gone away. It doesn't mean that things that happened previously have just stopped, but we need new tools to evaluate those things, says Harz.
Related:Data Privacy in the Age of AI Means Moving Beyond Buzzwords
AI also offers employers the opportunity to engage employees on performance quality and improvement. If you're able to align the information you're getting on how a particular employee is executing the work relative to what you consider to be best practices, you can use that to create personalized coaching tools that employees ultimately do find beneficial or helpful, Stephanie Bell, chief programs and insights officer at the nonprofit coalition Partnership on AI, tells InformationWeek.
AI-driven employee monitoring has plenty of tantalizing benefits for employers. It can tap into the massive quantities of data employers are gathering on their workforce and identity patterns in productivity, performance, and safety. GenAI really allows for language and sentiment analysis in a way that just really wasn't possible prior to LLMs, says Harz.
Measuring productivity seems like a rock-solid employer use case for AI, but productivity isnt always black and white. Yes, it's easy to collect data on whether or not workers are online or not, Cantrell points out. But is that really measuring outcomes or collecting data that can really help improve organizational value or benefits? That's open to question.
Related:Privacy, Surveillance & Tech: What FISAs Renewal Means
A more nuanced approach to measuring performance could be beneficial to both employer and employee. And enterprises are acknowledging the opportunities in moving away from traditional productivity metrics, like hours worked. Research from Deloitte Insights found that 74% of respondents recognize the importance of finding better ways to measure employee performance and value compared to traditional productivity metrics.
AI monitoring potentially has more benefits when it is used in a coaching capacity. Where we see the real value is around using AI as a coach. When [it] monitors workers, for example, on their work calls and then [provides] coaching in the background or [uses] AI to infer skills from their daily work to suggest improvements for growth or development, or you're using AI to monitor people's movements on a factory floor to [make] suggestions for well-being, says Cantrell.
This kind of coaching tool is less about if an employee is moving their mouse or keeping their webcam on and more about how they are performing their work and ways they could improve.
AI monitoring tools also can be used to make workplaces safer for people. If integrated into video monitoring, for example, it can be used to identify unsafe workplace behaviors. Employers can follow up to make the necessary changes to protect the people working for them.
But like its many other applications, AI-driven employee monitoring requires careful consideration to actually realize its potential benefits. What data is being gathered? How is it being used? Does the use of AI have a business case? You should have a very clear business rationale for collecting data. Don't just do it because you can, Cantrell cautions.
Realizing the positive outcomes of any technology requires an understanding of its potential pitfalls. Employee monitoring, for one, can have a negative impact on employees. Nearly half of employees (45%) who are monitored using technology report that their workplaces negatively affect their mental health, according to an American Psychological Association (APA) survey.
The perception of being watched at work can decrease peoples trust in their employer. They feel like their employer is spying on them, and it can have punitive consequences, says Cantrell.
Employee monitoring can also have a physical impact on workers. In warehouse settings, workers can be expected to hit high productivity targets in fast-paced, repetitive positions. Amazon, for example, is frequently scrutinized for its worker injury rate. In 2021 employees at Amazons facilities experienced 34,000 serious injuries, an injury rate more than double that of non-Amazon warehouses, according to a study from the Strategic Organizing Center, a coalition of labor unions.
Amazon has faced fines for its worker injuries from agencies like the Occupational Safety and Health Administration (OSHA) and Washingtons Department of Labor and Industries. The musculoskeletal injuries in these citations have been linked to the surveillance-fueled pace of work in Amazon warehouses by reports from the National Employment Law Project and the Strategic Organizing Center, Gabrielle Rejouis, a distinguished fellow with the Georgetown Law Center on Privacy & Technology and a senior fellow with the Workers' Rights Institute at Georgetown Law, tells InformationWeek in an email interview.
While AI may fuel workplace surveillance systems, it does not bear the sole responsibility for outcomes like this. It's not like the AI is arbitrarily setting standards, says Bell. These are managerial decisions that are being made by company leaders, by managers to push employees to this rate. They're using the technology to enable that decision-making.
People are an important part of the equation when looking at how AI employee monitoring is used, particularly if that technology is making suggestions that impact peoples jobs.
AI tools could analyze conversations at a call center, monitoring things like emotional tone. Will AI recognize subtleties that a human easily could? Bell offers the hypothetical of a call center employee adopting a comforting tone and spending a little extra time on the phone with a customer who is closing down an account after the death of a spouse. The call is longer, and the emotional tone is not upbeat.
That's the case where you want that person to take the extra time, and you want that person to match the emotional tone of the person on the other end of the line not to maintain across the board standards, she says.
An AI monitoring system could flag that employee for failing to have an upbeat tone. Is there a person in the loop to recognize that the employee made the right choice, or will the employee be penalized?
Employee monitoring bolstered by AI capabilities also has the potential to impact the way employees interact with one another. When you have this generalized surveillance, it really chills employee activity in speech, and in the research that I've done that turns up in making it harder for employees to build trusting relationships with each other, Bell shares.
Employers could potentially use AI monitoring tools to quell workers ability to exercise their rights. One of the most concerning ways that electronic worker surveillance and automated management benefit employers is that it can obscure management union busting, says Rejouis. If surveillance can find any and every mistake a worker makes, employers can use this to provide a non-union justification for firing an organizer.
The regulatory landscape for AIs use in the workplace is still forming, but that doesnt mean employers are completely free of legal concerns when implementing AI employment monitoring tools.
Employee privacy is a paramount concern. We need to make sure that we're complying with a variety of privacy laws, Domenique Camacho Moran, a partner and leader of the employment law practice at New York law firm Farrell Fritz, tells InformationWeek.
Workers generate the data used by monitoring tools. How is their data, much of it personal, being collected? Does that collection happen only in the workplace on work devices? Does it happen on personal devices? How is that data protected?
The Federal Trade Commission is paying attention to how AI management tools are impacting privacy. As worker surveillance and AI management tools continue to permeate the workplace, the commission has made clear that it will protect Americans from potential harms stemming from these technologies, Benjamin Wiseman, associated director, division of privacy at identity protection at the FTC, said in remarks at a Harvard Law School event in February.
With the possibility of legal and regulatory scrutiny, what kind of policies should enterprises be considering?
Be clear with workers about how the data is being used. Who's going to see it? says Cantrell. Involve workers in co-creating data privacy policies to elevate trust.
Bias in AI systems is an ongoing concern, and one that could have legal ramifications for enterprises using this technology in employee monitoring tools. The use of AI in hiring practices, and the potential for bias, is already the focus of legislation. New York, for example, passed a law regarding the use of AI and automated tools in the hiring process in attempt to combat bias. Thus far, compliance with the law has been low, according to the Society for Human Resource Management (SHRM). But that does not erase the fact that bias in AI systems exists.
How do we make sure that AI monitoring is non-discriminatory? We know that was the issue with respect to AI being used to filter and sort resumes in an application process. I worry that the same issues are present in the workplace, says Camacho Moran.
Any link between the use of AI and discrimination opens enterprises to legal risk.
Employee monitoring facespushback on a number of fronts already. The International Brotherhood of Teamsters, the union representing employees of UPS, fought for a ban on driver-facing cameras in the UPS contract, the Washington Post reports.
The federal government is also investigating the use of employee monitoring. In 2022, the National Labor Relations Board (NLRB) released a memo on surveillance and automated management practices.
[NLRB] General Counsel Jennifer Abruzzo announced her intention to protect employees, to the greatest extent possible, from intrusive or abusive electronic monitoring and automated management practices through vigorously enforcing current law and by urging the board to apply settled labor-law principles in a new framework, according to a NLRB press release.
While conversation about new regulatory and legal frameworks is percolating, it could be quite some time before they come to fruition. I don't think we understand enough about what it will be used for for us to have a clear path towards regulation, says Camacho Moran.
Whether it is union pushback, legal action by individual employees, or regulation, challenges to the use of AI in employee are probable. It's hard to figure out who's going to step in to say enough is enough, or if anybody will, says Camacho Moran.
That means enterprises looking to mitigate risk will need to focus on existing laws and regulations for the time being. Start with the law. We know you can't use AI to discriminate. You can't use AI to harass. It's not appropriate to use AI to write your stories. And, so we start with the law that we know, says Camacho Moran.
Employers can tackle this issue by developing internal taskforces to understand the potential business cases for the use of AI in employee monitoring and to create organization-wide policies that align with current regulatory and legal frameworks.
This is going to be a part of every workplace in the next several years. And so, for most employers, the biggest challenge is if you're not going ahead and looking at this issue, the people in your organization are, says Camacho Moran. Delay is likely to result in inconsistent usage among your team. And that's where I think the legal risk is.
What exactly is the future of work? You can argue that the proliferation of AI is that future, but the technology is evolving so quickly and so many of its uses cases are still nascent, it is hard to say what exactly that future will look like years or decades down the road.
If AI-driven employee monitoring is going to be a part of every workplace, what does responsible use look like?
The answer lies in creating a dialogue between employers and employees, according to Bell. Are employers looking for use cases that employees themselves would seek out? she asks.
For example, a 2023 Gartner survey found that 34% of digital workers would be open to monitoring if meant getting training classes and/or career development paths, and 33% were open to monitoring if it would help them access information to do their jobs.
A big part of this is just recognizing the subject matter expertise of the folks who are doing the work themselves andwhere they could use support, Bell continues. Because ultimately that's going to be a contribution back to business outcomes for the employer.
Transparency and privacy are also important facets of responsible use. Do employees know when and how their data is being collected and analyzed by AI?
Consent is an important, if tricky, element of employee monitoring. Can they workers opt out of this type of monitoring? If opting out is an option, can employees do so without the threat of losing their jobs?
Most workplaces are at-will workplaces, which means an employer does not need justification for firing an employee, Rejouis points out. This makes it harder for employees to meaningfully refuse certain changes to the workplace out of fear of losing their jobs.
When allowing employees to opt out isnt possible, say for video monitoring safety on a manufacturing floor, there are still ways to protect workers privacy. Data can be anonymized and aggregated so that we're protecting people's privacy, says Cantrell.
While AI can be implemented as a powerful monitoring tool, the technology itself needs regular monitoring. Is it measuring what is supposed to be measuring? Has any bias been introduced into the system? Is it actually providing benefits for employers and employees? We always need some human judgment involved and awareness of what the potential downsides and bias that the AI is bringing us could be, says Cantrell.
Most enterprises are not building their own AI systems for worker monitoring. They are working with third-party vendors that offer employee monitoring tools powered by AI. Part of responsible use is understanding how those vendors are managing the potential risk and downsides of their AI systems.
The way we're approaching it at Veriato is, just as you can imagine, being extremely thoughtful about what features we release in the wild and what we keep with beta customers and customer advisory panels that just really test and run things for a longer period of time than we would with some other releases to make sure that we have positive experiences for our partners, Harz shares.
Any innovation boom, AI or otherwise, comes with a period of trial and error. Enterprise leadership teams are going to find out what does and does not work.
Bell emphasizes the importance of keeping employees involved in the process. While many organizations are rushing to implement the buzziest tools, they could benefit from slowing down and identifying use cases first. Start with the problem statement rather than the tool, she says. Starting with the problem statement, I think, is almost always going to be the fastest way to identify where something is going to deliver anyone value and be embraced by the employees who would be using it.
Cantrell considers the use of AI in employee monitoring a goldmine or a landmine. It can bring dramatic benefits for both workers and organizations if done right. But if not done right and not done responsibly, workforce trust can really diminish and it can be a what I call a landmine, she says.
Read the original post:
What is the Future of AI-Driven Employee Monitoring? - InformationWeek
Scientist advance simulation of metal-organic frameworks with machine learning – Phys.org
Posted: at 2:48 am
This article has been reviewed according to ScienceX's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
fact-checked
peer-reviewed publication
trusted source
proofread
close
Hydrogen storage, heat conduction, gas storage, CO2 and water sequestrationmetal-organic frameworks (MOFs) have extraordinary properties due to their unique structure in the form of microporous crystals, which have a very large surface area despite their small size. This makes them extremely interesting for research and practical applications. However, MOFs are very complex systems that have so far required a great deal of time and computing power to simulate accurately.
A team led by Egbert Zojer from the Institute of Solid State Physics at Graz University of Technology (TU Graz) has now significantly improved these simulations using machine learning, which greatly accelerates the development and application of novel MOFs. The researchers have published their method in npj Computational Materials.
"To simulate certain properties of MOFs, it is necessary to simulate huge supercells. This applies, for example, to the calculation of heat conduction in MOFs, which is highly relevant for almost all applications," says Egbert Zojer, describing the challenge that had to be solved.
"The simulated supercells often contain tens of thousands or even hundreds of thousands of atoms. For these huge systems, it is then necessary to solve the equations of motion five to 10 million times. This is far beyond present day computational possibilities using reliable quantum mechanical methods."
Thus, until now, transferrable force fields often parametrized on the basis of experiments were often used for such calculations. However, the results obtained with such force fields turned out to be generally not sufficiently reliable.
This is now fundamentally changed by the use of machine-learned potentials. These are adapted to quantum mechanical simulations by utilizing a newly developed interplay of existing algorithms, including approaches developed at the University of Vienna. For the necessary material-specific machine learning of the potentials, the quantum mechanical simulations need to be carried out only for comparatively few and significantly smaller structures.
As a result, the calculations run many orders of magnitude faster and it is possible to simulate the forces in the huge supercells many millions of times on modern supercomputers. The decisive advantage here is that there is no relevant loss of accuracy compared to doing the simulations using quantum mechanical methods.
For the example of heat conduction of MOFs, this means that the newly developed simulation strategy will make it possible to simulate the relevant material properties even before the MOFs are synthesized, thus allowing researchers to reliably develop customized structures on the computer.
This represents a major leap forward for research into complex materials, which for heat transport will, for example, allow researchers to optimize the interaction between the metal oxide nodes and the semiconducting organic linkers. Using the new simulation strategy will also make it easier to overcome complex challenges. For example, MOFs must have good or poor thermal conductivity depending on their application.
A hydrogen storage system, for instance, must be able to dissipate heat well, while in thermoelectric applications good electrical conduction should be combined with the lowest possible heat dissipation.
In addition to simulating thermal conductivity, the new machine-learned potentials are also ideal for calculating other dynamic and structural properties of MOFs. These include crystallographic structures, elastic constants, as well as vibrational spectra and phonons, which play a decisive role in the thermal stability of MOFs and their charge transport properties.
"We now have tools that we know are incredibly efficient at providing us with reliable quantitative figures. This enables us to systematically change the structures of the MOFs in the simulations, while at the same time knowing that the simulated properties will be accurate. This will allow us, based on causality, to understand which changes in the atomistic structure generate the desired effects," says Egbert Zojer, who knows that research groups in Munich and Bayreuth have already taken up the new simulation strategy despite its recent publication.
More information: Sandro Wieser et al, Machine learned force-fields for an Ab-initio quality description of metal-organic frameworks, npj Computational Materials (2024). DOI: 10.1038/s41524-024-01205-w
Journal information: npj Computational Materials
Follow this link:
Scientist advance simulation of metal-organic frameworks with machine learning - Phys.org