DeepFacade

MIT 4.453 Creative Machine Learning

A web application that generates 3D facades based on textual descriptions by developing relief depth map models using CycleGANs.

The external face or front elevation of a building, commonly known as its façade, stands as a prominent testament to the architectural style and character inherent in its design. Throughout the annals of architectural history, façades have served as vehicles for specific artistic styles, presenting an amalgamation of design elements carefully arranged and exhibited to elicit interpretation from the beholder.

Facade design by Bingyu Guo and Maria Sofia

Relief Model of Student’s works at TJU workshop. Instructors: Ali Rahim, Bingyu Guo, Miguel Matos

Our initial approach involved employing the Pix2pix method to convert input images into normal map images. Pix2Pix is a conditional generative adversarial network (cGAN) that learns a mapping from input images to output images. The model is trained on pairs of images, such as building facade labels to corresponding building facades, and then attempts to generate the respective output image given any input image.

Figure 1: Training a conditional GAN to map AI images to normal map

Data A: Greyscale images

Style transfer is becoming a key technique in architectural design, notably through a method developed by an architectural studio at the University of Pennsylvania. This approach uses an image-based style transfer model to create new architectural facade styles. It starts with a content image and integrates various architectural facade elements as the style image. This method allows for a blend of existing structures with innovative concepts, resulting in unique and visually striking facades.

The dataset comprised two sets of paired images, extracted from previous research on relief models, and preprocessed to a resolution of 256 x 256 pixels.

Data B: Normal maps

Left Figure: Generated Data VS Ground Truth

After training the model for 200 epochs, we observed two main drawbacks.

First, the cropped images did not perfectly align the details between the image and the normal map.

Second, normal maps require precise color transitions to generate smooth surfaces, which proved challenging for the model to learn within its specific architecture.

As a result, we adopted a different model, CycleGAN, as our final approach, which directly translates AI inspiration images into relief depth maps (elevational grayscale images of the 3D models). CycleGAN is designed to handle unpaired image datasets, allowing it to learn the translation between two domains (A and B) without requiring corresponding images in both source and target domains. The model consists of two parallel GANs, with each GAN responsible for learning the mapping between one domain and the other. These two GANs are then combined to establish a cycle-consistent mapping between the two domains.

Training CycleGAN to map between one input style to the other.

To enhance our dataset, we supplemented it with MidJourney, generating variations of existing inspiration images and relief depth maps. Both A and B datasets consisted of 500 images, preprocessed to a resolution of 512 x 384 pixels, and were trained for 200 epochs.

Data A: Full Inspiration Images and Data B: Relief model Images

RESULTS of Our Model

The training process yielded successful results, as evidenced by a significant decrease in training loss within the initial 25 epochs, followed by a gradual reduction throughout the remainder of the training. Upon comparing the relief depth maps generated by the model with the input inspiration images, we observed that the relief depth images effectively captured the primary configuration depicted in the inspiration images, while reducing excessive details as anticipated.

The training loss noticeably drops during the first 25 epochs, then continues to decline slowly for the rest of the training period

Our generated relief depth map compared with Imageamigo and MiDaS based on the same input AI image

Our model’s success shines when compared to existing depth map generators, namely Imageamigo and PyTorch MiDaS. Input inspirational photo that was fed to both models. Imageamigo’s model can generate depth maps and calculate relative distances between objects in an image. Similarly, MiDaS computes the relative inverse depth from an image, and encodes a resulting image in colors corresponding to these depth values. The MiDaS repository includes multiple models including a small, high-speed model and a very large model that provide the highest accuracy. The models have been trained on 10 distinct datasets using multi-objective optimization to ensure high quality on a wide range of inputs.

It is clear from the comparison of our model against Imageamigo and MiDaS that our model performs more successfully because it can generate discrete geometry features that can be used as a functional depth map in architectural facade generation.

Detail Surface Reconfiguration

In conclusion, this project has explored the realm of depth map estimation and 2D to 3D generation in the context of architectural design. The investigation has encompassed an in-depth analysis of existing AI models, specifically Pix2Pix and CycleGANs, which have demonstrated their efficacy in generating 3D relief models from 2D prompts.

2D Depth Relief Map Generated by Our Model to 3D Model